Abstract
Turn-taking interactions are foundational to the development of social, communicative, and cognitive skills. In infants, vocal turn-taking experience is predictive of infants’ socioemotional and language development. However, different forms of turn-taking interactions may have different effects on infant vocalizing. It is presently unknown how caregiver vocal, non-vocal and multimodal responses to infant vocalizations compare in extending caregiver-infant vocal turn-taking bouts. In bouts that begin with an infant vocalization, responses that maintain versus change the communicative modality may differentially affect the likelihood of further infant vocalizing. No studies have examined how caregiver response modalities that either matched or differed from the infant acoustic (vocal) modality might affect the temporal structure of vocal turn-taking beyond the initial serve-and-return exchanges. We video-recorded free-play sessions of 51 caregivers with their 9-month-old infants. Caregivers responded to babbling most often with vocalizations. In turn, caregiver vocal responses were significantly more likely to elicit subsequent infant babbling. Bouts following an initial caregiver vocal response contained significantly more turns than those following a non-vocal or multimodal response. Thus prelinguistic turn-taking is sensitive to the modality of caregivers’ responses. Future research should investigate if such sensitivity is grounded in attentional constraints, which may influence the structure of turn-taking interactions.
Keywords: turn-taking, caregiver-infant interactions, multimodal communication, prelinguistic vocalizations, communicative development
Turn-taking interactions are viewed as the foundational infrastructure of human communication (Levinson, 2016). Across industrialized and indigenous cultures, there is a universal tendency to minimize overlap and to minimize silence in vocal turn-taking interactions (Stivers et al., 2009; Sacks et al., 1978). Interactants tend to not speak at the same time, and interruptions in conversations are experienced as aversive (Farley et al., 2010). Interactants also do not wait very long before starting to talk (Templeton et al., 2022). The average gap between turns across cultures is about 250 milliseconds (Stivers et al., 2009), which is a short amount of time for interactants to comprehend what others say and to plan their own speech. Perceiving and producing speech concurrently within a vocal turn-taking context poses a significant challenge. To understand others’ speech quickly while paying attention to incoming new words, interactants predict upcoming input at semantic, syntactic, lexical, and phonological levels (DeLong et al., 2005; Van Berkum et al., 2005), and keep improving the accuracy of their predictions as conversation continues. While listening and comprehending others’ speech, interactants must also organize upcoming turns while anticipating the moment that the speaker’s turn will end (De Ruiter et al., 2006; Bögels & Torreira, 2015). How the structure of turn-taking emerges in early development is not well understood. Patterns of caregiver responses to immature infant behavior (e.g., babbling) may afford opportunities for turn-taking to occur. Here we focus on an aspect of caregiver behavior (i.e., response modality) that facilitates turn-taking interactions with infants.
Caregiver-infant turn-taking interactions
Turn-taking interactions are ubiquitous throughout caregiver-infant interactions. As early as two days after birth, infants and their mothers show turn exchanges in breast-feeding interactions. While infants suck pacifiers or nipples, mothers are unlikely to move or jiggle them until infants pause sucking. In turn, while mothers jiggle, infants are unlikely to start sucking (Kaye, 1977). Over the first year of life, infants take rapid adult-like vocal turns with caregivers, despite their limited understanding of language and immature cognitive capacities (Bateson, 1975; Snow, 1977; Hilbrink et al., 2015). By the end of the first year, infants understand around 77 common words, and produce only 1–6 words (Frank et al., 2017). Despite such immaturity in language comprehension and production, even at three to five months, infants take turns quickly, initiating their productions around 600 milliseconds after caregivers’ turns finish. From three to six months, they also interrupt caregivers less (Hilbrink et al., 2015). These extended vocal turn-taking interactions are critical to infant language comprehension and production development.
For 9-month-old infants, the degree to which their vocalizations overlap with those of their caregivers predicts later language delays. Infants who are language delayed in the third year are also more likely to have exhibited altered turn-taking dynamics with their caregivers, such that the lags between vocal turns are larger than their non-delayed counterparts (Northrup & Iverson, 2015). The number of conversational turns infants experience predicts their vocabulary growth and language comprehension (Donnelly & Kidd, 2021; Zimmerman et al., 2009) and facilitates infants’ vocal maturity (Bloom et al., 1987). Beyond the first year, the number of conversational turns children experience at 18 months contributes significantly to their emotional communication, secure attachment, and emotional regulation scores at 30 months (Gómez & Strasser, 2021). Conversational turn counts at 24 months also accounts for 14%−27% of the variance in children’s IQ, verbal comprehension, and vocabulary size ten years later (Gilkerson et al., 2018). Turn-taking interactions are thus foundational to the development of communication, language and socioemotional development.
Infant communicative development is embedded in social interaction (Tamis-LeMonda et al., 2001; Tamis-LeMonda et al., 2014). Most infant vocalizations occur in the context of caregiver-infant vocal turn-taking (Gratier et al., 2015, but see Long et al., 2020). The vocal feedback loop formed by caregiver-infant interactions facilitates vocal development and learning. For example, when 9-month-old infants babbled, caregivers were more likely to respond to those vocalizations that were more speechlike (Albert et al., 2018; Gros-Louis et al., 2006). In turn, infants (both typically developing and those later diagnosed with autism) were more likely to babble in the speechlike manner again if they received a caregiver response (Warlaumont et al., 2014). Caregiver and infant responded to each other at a similar frequency (Gratier et al., 2015), and this constant social-vocal feedback loop reinforced infants to produce mature babbles, a precursor to spoken language (Goldstein & Schwade, 2010; Warlaumont et al., 2014; Masek et al., 2021). Furthermore, 9-month-old infants increased the proportion of mature babbling, such as fully resonant vowels and adult-like canonical syllables, after caregivers contingently responded vocally to their babbling (Goldstein & Schwade, 2008). However, these studies generally did not look beyond the initial serve-and-return interaction between caregivers and infants. In addition, studies of early turn-taking typically focused on only a single modality, that of vocal behavior.
The benefits and demands of multimodal interactions
Besides speaking, caregivers connect and interact with infants in multiple modalities (Suarez-Rivera et al., 2022). They point to or shake toys to play with the infants, touch infants to encourage or comfort them, and sometimes interact with infants in various modalities at the same time. Caregivers’ touching behaviors helped establish and sustain caregiver-infant synchrony in behavioral and neural activity (Carozza & Leong, 2021). Experimenter-provided touches served as social cues for 4-month-olds to find word boundaries in continuous speech, and for 7-month-olds to learn auditory patterns (Seidl et al., 2015; Lew-Williams et al., 2019). Infants learned words better when caregivers’ speech was accompanied by dynamic and synchronous gestures (de Villiers Rader & Zukow-Goldring, 2012). When reading books to 5-month-old infants, mothers exaggerated touch and speech when using both modalities at the same time, and they tended to touch infants’ body parts while naming them (Abu-Zhaya et al., 2017). Additionally, 9-month-old infants were sensitive to the temporal alignment of multimodal cues, as they looked significantly longer when a gesture and an accented syllable did not align in time (Esteve-Gibert et al., 2015). Caregiver non-vocal responses, if contingent on babbling, facilitated infants’ production of more complex and mature babbling (Goldstein et al., 2003). It is likely that caregiver non-vocal responses to infants’ babbling elicit speechlike vocalizations and promote a multimodal social feedback loop in the same way that caregiver vocal responses do. It is an open question as to whether caregiver vocal, non-vocal and multimodal responses have similar efficacy in facilitating vocal turn-taking with prelinguistic infants.
While there are clear benefits of multimodal interaction, there are also challenges inherent in processing information from multiple modalities. For example, infants were unlikely to babble after caregivers’ smiling in multimodal turn-taking interactions (Rohlfing et al., 2019; Zhang et al., 2024). This may be a result of the potential attentional switching cost, induced by different interaction modalities during a vocal turn-taking bout. It is cognitively demanding to perceive and process changes in a conversational partner’s behavior during a synchronous turn-taking interaction. In adults, attentional switching costs are associated with changes in communicative signals during turn-taking. For example, during adult conversational turn-taking, performance in a word recognition task decreased when their conversation partners’ voices switched (Lin & Carlile, 2019). Speech recall performance also decreased when the conversation partners’ voices came from different locations (Lin & Carlile, 2015). Similar attentional switching costs may be brought on in infants if their interaction partner changes response modalities.
Additional attentional demands affect performance on a wide variety of motor tasks. Babbling, “a repetition or combination of articulatory movements with interrupted or continued phonation”, is fundamentally a motor task (Stelt & Koopmans-van Beinum, 1986). Increased attentional loads prompt adults and children to inhibit somatic movements like breathing, spontaneous eye movements and blinks (Obrist et al., 1969; Boiten et al., 1994; Denot-Ledunois et al., 1998). In the frequency-altered feedback paradigm, in which participants heard the playback of their own voice with altered pitches, increased attentional demands interfered with motor control and degraded subjects’ ability to produce compensatory vocal responses to the altered auditory feedback (Tumber et al., 2014). In a real-world task, training in attentional control (resistance to distraction) improved tennis performance in stressful environments (Ducrocq et al., 2016). Since infants have limited attentional capacity (Ruff & Capozzoli, 2003), switching costs resulting from changing communicative modalities during interaction may interfere with the motor organization of vocalizing. Such interference may lead to a reduction in babbling. However, most studies of the effects of caregiver responses have only investigated either the initial two (Goldstein & Schwade, 2008) or three alternations between caregivers and infants (Warlaumont et al., 2014). The dynamics of extended bouts of turn-taking following an infant vocalization have received less attention. As a result, we know little about the ways in which caregiver responses of differing modalities influence the possible patterns of caregiver-infant sequences of turn-taking.
The current study
To investigate the efficacy of caregiver vocal, non-vocal and multimodal responses in facilitating caregiver-infant vocal turn-taking, we video-recorded naturalistic free-play sessions between caregivers and their infants. We annotated the timing and modality of caregiver responses to infant babbling. Then we decomposed caregiver-infant turn-taking interactions turn by turn, analyzing which caregiver responses facilitated further infant babbling and turn-taking. We measured facilitation in two ways: 1) the likelihood of infant babbling again after caregiver responses, and 2) the number of vocal turns in a turn-taking bout.
We hypothesized that caregiver vocal responses to babbling would be more effective in facilitating vocal turn-taking with infants, because there might be a cross-modal change cost when caregivers responded in a non-vocal or multimodal way. Specifically, we predicted that caregivers would respond to infant babbling more frequently with vocal than non-vocal or multimodal behavior, based on previous findings (Miller & Lossia, 2013; Kärtner et al., 2008). We predicted that, in turn, caregiver vocal responses would be more likely to elicit infant babbling, prolonging vocal turn-taking interactions. Overall, we predicted that caregiver-infant turn-taking alternations would be rapid with short gaps between turns, replicating previous findings (Hilbrink et al., 2015).
Method
Participants
Fifty-one caregivers and their 9-month-old infants participated in the study (28 female; mean age 9 months 21 days; range 8 months 19 days – 10 months 15 days; SD = 14.90 days). Participants were recruited from birth announcements in the local newspaper and compensated with an infant T-shirt or a book. Participants were part of a larger dataset from a previously published study (Albert et al., 2018). The study was conducted according to guidelines laid down in the Declaration of Helsinki, with written informed consent obtained from a parent or guardian for each child before any assessment or data collection. All procedures involving human subjects in this study were approved by the Institutional Review Board for Human Participant Research (IRB) Office at Cornell University.
Procedure
The caregivers were asked to play with their children as they would normally do at home for 15 minutes, in a large (3.65m x 4.57m) playroom with toys. The dyads were video recorded through three remote-controlled disguised cameras mounted in corners of the room. The dyads wore wireless microphones with transmitters so that we could obtain high-quality recordings of infant babbling and caregiver speech.
Data Coding and Analysis
Infant Babbling and Caregiver Responses
We coded infant babbling according to Oller’s infraphonological acoustic classification system (Oller, 2000). Babbling consists of prelinguistic vocalizations that contain vowels or consonant-vowel syllables. These vocalizations are considered to be precursors to mature speech (Buder et al., 2013; Oller et al., 2001). Babbling does not include vegetative sounds like crying, coughing or laughing because these sounds are not considered speech precursors (Buder et al., 2013; Oller et al., 2001). Caregiver responses to infant babbling were defined as behaviors that occurred within 2 seconds of the offset of infants’ babbling (Elmlinger et al., 2019; Van Egeren et al., 2001). They were categorized as vocal, non-vocal or multimodal responses according to their modalities (Table 1). Caregiver vocal responses were further categorized according to the current attentional focus of the dyadic interaction. Vocal responses were coded as related, unrelated, or other (Table 2). Thus relatedness of responding was not given by the infant vocalization alone.
Table 1.
Definitions of Caregiver Response Categories
| Category | Definition | Example |
|---|---|---|
| Vocal Response | Any sounds made with the mouth | Speech |
| Non-vocal Response | Physical actions | Touching a toy |
| Multimodal Response | Vocal and Non-vocal co-occurring | Labeling a toy while picking it up |
Note. Caregiver vocal responses included speech and non-speech sounds such as raspberries and laughing, but did not include vegetative sounds (e.g., coughing, clearing the throat). A caregiver multimodal response was coded if there was any overlap or co-occurrence between a vocal and a non-vocal response. The vocal and non-vocal responses did not have to overlap perfectly, and it did not matter which one came first.
Table 2.
Caregiver Vocal Response Subcategories
| Subcategory | Definition | Example | Proportion | |
|---|---|---|---|---|
| Vocally Related | Vocal responses directly related to the focus of a dyadic interaction | Talking about a toy while infant is holding it like “this is a ball” | 29.82% | |
| Vocally Unrelated | Vocal responses that directly bring attention to a different interaction or object | Pointing out a new toy like “look at that guy over there! What’s that?” | 2.30% | |
| Other | Narrative | Statements related to baby’s state or actions | “You are so big!” | 24.37% |
| Imitation | Duplications of baby’s sound | Infant: [ba]; Caregiver: [ba] | 12.23% | |
| Affirmation | Conversational turns that do not provide new information | “Uh-huh”“Yeah” | 30.49% | |
| Non-sequitur | Statements not associated with the infant or current context of the infant’s environment | “What should we have for dinner?” | 0.79% | |
Note. Vocally related or unrelated responses either focus or redirect infant attention to a clear referent, while affirmation or imitation responses do not have a clear referent.
We followed the coding scheme developed in a previous paper (Albert et al., 2018). However, we replaced the categories of “sensitive/redirective” with “related/unrelated”, as the relatedness of caregiver behavior to the ongoing interaction was the focus of the current study (see Table 2). For all data coding, the first coder coded all the play sessions, and a second coder coded 20% of the play sessions. Cohen’s kappa (Cohen, 1968) was 0.83 for infant babbling, 0.72 for caregiver response modality, and 0.93 for caregiver response relatedness. We obtained Cohen’s kappa by comparing codes from the two coders at each video frame (there were about 30 frames per second in our videos), so that the reliability measurement was sensitive to the timing difference between codes.
Turn-taking Decomposition
To investigate the caregiver responses that influenced vocal turn-taking interactions with infants, we decomposed caregiver-infant turn-taking interactions. We counted the number of turns per turn-taking bout and measured the individual turns’ duration. Turn-taking bouts were broken down to individual turns and total number of turns per bout. Bouts started with an infant babble that occurred more than two seconds (Elmlinger et al., 2019) after any previous caregiver behaviors. Thus bouts were separated by pauses of 2 seconds before and after any turns (Figure 1). First, we analyzed caregiver responses to infant babbles. Their responses were coded as vocal, non-vocal or multimodal, or none. We counted the average number of infant babbling instances and caregiver responses across subjects.
Figure 1. Illustration of Turn-Taking Bouts.
Note. A) Turn-taking bouts were separated by pauses of two seconds. The initial serve-and-return consisted of one infant babble and one caregiver vocal, non-vocal, or multimodal response. Following the initial serve-and-return, all vocalizations by either partner were counted as occurring in the same bout, as long as they occurred within two seconds of each other. For example, the multiple consecutive babbles as shown in bout 1, and the multiple consecutive caregiver utterances in bout 2 were all included in their respective bout. There is a total of six turns in bout 1 shown above, and five in bout 2. We included both infant and caregiver speech when counting turns (e.g., Donnelly & Kidd, 2021; Zimmerman et al., 2009; Gómez & Strasser, 2021; Gilkerson et al., 2018; Romeo et al., 2018). B) In bout 3, the caregiver responded to an infant babble with the same (vocal) modality. In bout 4, the caregiver responded to an infant babble with a different (non-vocal) modality.
Next, we analyzed the relative efficacy of caregiver responses in eliciting the next infant babble. We calculated the babbling elicitation rate of each type of caregiver response across time. Elicitation of a following infant babbling was defined as babbling occurring after caregiver response onset, and within two seconds before or after caregiver response offset. We divided the time window (from −2.5 seconds before to 2 seconds after) into 9 time bins of 0.5 seconds in duration. For each subject, babbling elicitation rate was measured as the number of babbles elicited divided by the total number of caregiver responses within each time bin. We tested the effect of caregiver response modality on babbling elicitation rates across time bins using linear mixed-effects regression models in R (lme4; Bates et al., 2015). We included individual participants as a random effect, following the examples of previous research in using linear-mixed effects regression models to analyze caregiver-infant turn-taking (Hilbrink et al., 2015; Lammertink et al., 2015). Then we applied Type III sum of squares analyses to the regression model to assess the main effects and interaction (“anova” function in R), and obtained post-hoc pairwise comparisons with Bonferroni adjustment (“emmeans” function in R).
Lastly, we focused on the initial serve-and-return to analyze the relative effectiveness of the initial caregiver response in extending further vocal turn-taking interactions with infants. We counted the number of vocal turns, including both infant babbling and caregiver vocalizations (e.g., Romeo et al., 2018; Donnelly & Kidd, 2021), in turn-taking bouts following each type of caregiver response. Thus, we evaluated the influence of different caregiver responses on turn-taking extension at the dyadic level (Figure 1). We examined the effect of caregiver response modality on total number of turns in each turn-taking bouts with a linear mixed-effects model. We analyzed the data on the event level (i.e., each turn-taking bout is an event) and included individual participants as a random effect. Type III sum of squares was applied to find the main effect (“anova” function in R), and post-hoc t-tests were applied to obtain pairwise comparisons (p-values were Bonferroni-corrected; “emmeans” function in R).
Determining chance levels of vocal turn-taking
Babbling elicitation rate is likely to be influenced by the different frequencies of caregiver response types. For example, if caregivers usually reacted to babbling by talking, then by chance alone it would be more likely for the following babble to occur immediately after a caregiver vocal response. To determine and control for chance levels of babbling elicitation, we designed a random sampling analysis to obtain chance-level babbling elicitation rates of vocal, non-vocal and multimodal responses. We compared the chance-level rate with the babbling elicitation rate calculated from observed experimental data. Significant differences between the two would indicate that the effects of caregiver responses in eliciting infant babbling were not due to chance alone.
In the random sampling analysis, we kept the amount and the timing of infant babbling the same as in our observed data, and kept the amount of caregiver responses unchanged, but assigned the caregiver responses to random timestamps within the 15-minute interaction. We then calculated the babbling elicitation rate from the randomly sampled dataset. We averaged the chance elicitation rate over 1000 repetitions of randomized caregiver responses. For each caregiver response modality, we compared the randomized to the observed elicitation and tested their difference with one-sample t-tests across time bins. Each time bin was 0.5s in duration and ranged from 2 seconds before caregiver response offset to 2 seconds after (Elmlinger et al., 2019). A Bonferroni correction for multiple comparisons was applied to adjust the p-values (corrected p-values = p-values x 9 tests; Jafari & Ansari-Pour, 2019; Chen et al., 2017). Significant differences between randomized and observed elicitation rates indicated that the efficacy of specific modalities of caregiver responses in eliciting infant babbling differed from chance as a function of time bin.
A potential issue with the random sampling method was that babbling, instead of being uniformly distributed over time, is usually produced in temporal bursts (Abney et al., 2017). Thus, a completely random sampling of timepoints might result in assigning responses in a distribution that did not correspond to the bursty pattern of infant vocalizing. To account for this, we assigned caregiver responses to timepoints at which caregiver responses had actually occurred in the interaction. Essentially, we shuffled the timestamps of the caregiver responses. In this way, we avoided assigning caregiver responses to timepoints when they were realistically unlikely to happen during the play session. We averaged the chance elicitation rate over 1000 repetitions of shuffled caregiver responses. The chance-level elicitation rates from this shuffled sampling analysis were indistinguishable from the random sampling analysis. Thus, results of shuffled sampling were not included in the graphs.
Results
Caregiver Response Modality
Infants on average babbled 68.49 (SD = 38.14) times over the 15-min session. Caregivers responded to an average of 61.40% of babbling (SD = 36.85%). Specifically, caregivers responded to 47.34% infant babbling with vocal responses (SD = 30.39%), 5.52% with non-vocal (SD = 6.35%), and 8.54% with multimodal (SD = 9.16%; Figure 2). We calculated these percentages by dividing the number of babbles caregivers responded to by the total number of infant babbles for each dyad. Then we calculated means and standard deviations across dyads.
Figure 2. Overview of Findings.
Note. Caregivers responded to infant babbling with vocal responses the most. Caregiver vocal responses were more likely to elicit a following infant babbling instance than non-vocal or multimodal responses, while the latter two were more likely to terminate a vocal turn-taking interaction. Bold lines indicate significance (α = 0.05).
We found that caregiver vocal responses had a higher babbling elicitation rate than non-vocal or multimodal responses (Figure 3A). Type III sum of squares analyses revealed a significant interaction between response modality and babbling elicitation rates across time bins (F (16, 1200) = 14.11, p < 0.001). Post-hoc paired t-tests with Bonferroni correction showed that the elicitation rates were significantly different between vocal and non-vocal modality at multiple time bins (Table 3). There was no significant difference between non-vocal and multimodal modality at any time bin. Infant babbling elicited by caregiver vocal responses occurred most frequently around 0–0.5s after caregiver response offset, but those elicited by non-vocal or multimodal responses were distributed more uniformly without a peak in timing (Figure 3A). This effect held after we compared the results to chance-level rates. The babbling elicitation rate of caregiver vocal responses was significantly different from what we would expect by chance for multiple time bins, whereas that of non-vocal or multimodal responses were mostly the same as chance-level (Figure 3B-D; Table 3).
Figure 3. Babbling Elicitation Rates of Caregiver Vocal, Non-Vocal and Multimodal Responses, Compared to Chance, Two Seconds Before and After Caregiver Responses.
Note. The x-axis is time (seconds) after caregiver response offset, zero indicating caregivers’ response offset. Negative values along the x-axis indicate overlap between elicited babbling and caregiver responses, while positive values indicate silent gaps between turns. Each time bin is 0.5 seconds in duration. For example, 1.5 on the x-axis represents 1 to 1.5 seconds after the caregiver response offset (1 s < interval ≤ 1.5 s). The y-axis is babbling elicitation rate, calculated as the number of babbles elicited divided by the total number of caregiver responses within each time bin. A) Comparison of babbling elicitation rates of different caregiver responses. Caregiver vocal responses were more likely to elicit infant babbling than non-vocal or multimodal responses. Infant babbling elicited by caregiver vocal responses occurred most frequently around 0–0.5s after caregiver response offset. B-D) Comparisons of babbling elicitation rates of responses to chance levels, with one-sample t-tests for each of the 9 time bins. Bonferroni-corrected results can be found in Table 3. * p < .05, ** p < .01, *** p < .001.
Table 3.
Statistical Results From t-tests Comparing the Babbling Elicitation Rates of Caregiver Vocal, Non-vocal and Multimodal Responses in Observed Experimental Data and Sampling Data
| Comparison of babbling elicitation rates between | Time bin | β | t | p Raw | n Number of comparisons | p Bonferroni-corrected |
|---|---|---|---|---|---|---|
| Experimental data of vocal and non-vocal (Figure 3A) | -2 | 0.001 | 0.22 | 0.83 | 3 | 1.00 |
| -1.5 | 0.002 | 0.40 | 0.69 | 3 | 1.00 | |
| -1 | 0.01 | 2.90 | 0.004 | 3 | 0.01 | |
| -0.5 | 0.02 | 6.26 | < 0.0001 | 3 | < 0.001 | |
| 0 | 0.05 | 10.20 | < 0.0001 | 3 | < 0.001 | |
| 0.5 | 0.06 | 13.26 | < 0.0001 | 3 | < 0.001 | |
| 1 | 0.05 | 10.07 | < 0.0001 | 3 | < 0.001 | |
| 1.5 | 0.03 | 7.47 | < 0.0001 | 3 | < 0.001 | |
| 2 | 0.02 | 4.75 | < 0.0001 | 3 | < 0.001 | |
| Experimental data of vocal and multimodal (Figure 3A) | -2 | 0.002 | 0.33 | 0.74 | 3 | 1.00 |
| -1.5 | 0.004 | 0.85 | 0.39 | 3 | 1.00 | |
| -1 | 0.01 | 2.56 | 0.01 | 3 | 0.03 | |
| -0.5 | 0.03 | 7.15 | < 0.0001 | 3 | < 0.001 | |
| 0 | 0.04 | 9.61 | < 0.0001 | 3 | < 0.001 | |
| 0.5 | 0.06 | 13.14 | < 0.0001 | 3 | < 0.001 | |
| 1 | 0.05 | 10.27 | < 0.0001 | 3 | < 0.001 | |
| 1.5 | 0.03 | 7.38 | < 0.0001 | 3 | < 0.001 | |
| 2 | 0.02 | 4.77 | < 0.0001 | 3 | < 0.001 | |
| Experimental data of non-vocal and multimodal (Figure 3A) | -2 | 0.0005 | 0.11 | 0.91 | 3 | 1.00 |
| -1.5 | 0.002 | 0.45 | 0.65 | 3 | 1.00 | |
| -1 | 0.002 | 0.34 | 0.74 | 3 | 1.00 | |
| -0.5 | 0.004 | 0.89 | 0.38 | 3 | 1.00 | |
| 0 | 0.003 | 0.59 | 0.56 | 3 | 1.00 | |
| 0.5 | 0.0005 | 0.12 | 0.91 | 3 | 1.00 | |
| 1 | 0.001 | 0.21 | 0.84 | 3 | 1.00 | |
| 1.5 | 0.0004 | 0.09 | 0.93 | 3 | 1.00 | |
| 2 | 0.00007 | 0.02 | 0.99 | 3 | 1.00 | |
| Experimental data and random sampling data of vocal (Figure 3B) |
-2 | 0.03 | 32.63 | <2.2e-16 | 9 | < 0.001 |
| -1.5 | 0.02 | 12.94 | <2.2e-16 | 9 | < 0.001 | |
| -1 | 0.01 | 3.47 | 0.001 | 9 | 0.009 | |
| -0.5 | 0.01 | 1.66 | 0.100 | 9 | 0.90 | |
| 0 | 0.03 | 3.72 | 0.0005 | 9 | 0.005 | |
| 0.5 | 0.04 | 6.34 | 8.859e-8 | 9 | < 0.001 | |
| 1 | 0.03 | 3.37 | 0.001 | 9 | 0.01 | |
| 1.5 | 0.01 | 2.50 | 0.016 | 9 | 0.14 | |
| 2 | 0.0003 | 0.77 | 0.568 | 9 | 1.00 | |
| Experimental data and random sampling data of non-vocal (Figure 3C) | -2 | 0.002 | 4.18 | 9.578e-5 | 9 | 0.0009 |
| -1.5 | 0.0004 | 0.25 | 0.785 | 9 | 1.00 | |
| -1 | 0.001 | 1.37 | 0.133 | 9 | 1.00 | |
| -0.5 | 0.005 | 1.66 | 0.100 | 9 | 0.90 | |
| 0 | 0.003 | 1.66 | 0.100 | 9 | 0.90 | |
| 0.5 | 0.002 | 0.77 | 0.611 | 9 | 1.00 | |
| 1 | 0.002 | 0.98 | 0.624 | 9 | 1.00 | |
| 1.5 | 0.0007 | 0.46 | 0.710 | 9 | 1.00 | |
| 2 | 0.0005 | 0.34 | 0.721 | 9 | 1.00 | |
| Experimental data and random sampling data of multimodal (Figure 3D) | -2 | 0.004 | 16.01 | <2.2e-16 | 9 | < 0.001 |
| -1.5 | 0.004 | 7.32 | 2.55e-13 | 9 | < 0.001 | |
| -1 | 0.001 | 0.86 | 0.542 | 9 | 1.00 | |
| -0.5 | 0.001 | 0.66 | 0.561 | 9 | 1.00 | |
| 0 | 0.004 | 1.63 | 0.213 | 9 | 1.00 | |
| 0.5 | 0.001 | 0.70 | 0.789 | 9 | 1.00 | |
| 1 | 0.0002 | 0.17 | 0.890 | 9 | 1.00 | |
| 1.5 | 0.0005 | 0.42 | 0.752 | 9 | 1.00 | |
| 2 | 0.001 | 0.54 | 0.766 | 9 | 1.00 |
Note. Statistical results of experimental data comparisons: from post-hoc pairwise t-tests comparing the babbling elicitation rates of caregiver vocal, non-vocal and multimodal responses in observed experimental data. Bonferroni corrections were applied (“emmeans” function in R, “adjust = Bonferroni”) to obtain the corrected p-values. Statistical results of experimental and random sampling data comparisons: from one-sample t-tests comparing the babbling elicitation rates of caregiver vocal/non-vocal/multimodal responses in observed experimental data and sampling data at each time bin. We applied Bonferroni corrections by multiplying the raw p-values by the number of tests to compute the corrected p-values.
To validate this result, we analyzed how different caregiver response modalities ended turn-taking by stopping infant babbling after the first turn. Unsurprisingly, infants were more likely to stop babbling after caregiver’s non-vocal or multimodal responses. Repeated-measures One-way ANOVA revealed a main effect of caregiver response modality on the number of bouts extinguishing after a caregiver response (F (2, 69) = 7.955, p < 0.01; Figure 4A). Post-hoc paired t-tests with Bonferroni correction showed that significantly higher proportions of bouts extinguished after caregiver non-vocal responses than vocal responses (t (30) = 3.66, p-corrected < 0.01), and multimodal responses than vocal responses (t (39) = 3.16, p-corrected = 0.03). There was no difference between non-vocal and multimodal responses (t (26) = - 0.697, p-corrected = 0.91). These t-tests were done on the subject level because each subject contributed a proportion of bouts that extinguished after caregiver vocal, non-vocal and multimodal responses.
Figure 4. Ending and Extension of Caregiver-Infant Vocal Turn-Taking, after Caregiver Vocal, Non-Vocal and Multimodal Responses.
Note. A) The proportion of bouts following caregiver responses that extinguished at the caregiver response turn. There were significantly more bouts that ended after caregiver non-vocal or multimodal responses than after caregiver vocal responses. B) The total number of turns per bout, calculated by the sum of the initial two turns (infant babble and caregiver response) and the amount of following vocal turns. There were significantly more turns in bouts that followed caregiver vocal responses than non-vocal. Bouts following vocal responses showed a marginal effect of containing more turns than those following multimodal responses (indicated with ^). C) The number of turns per bout contributed by infants and caregivers, in turn-taking bouts following caregiver vocal, non-vocal and multimodal responses. There were significantly more infant turns in bouts following caregiver vocal responses than multimodal responses. * p < .05, ** p < .01, *** p < .001.
Looking beyond the initial serve and return interaction, turn-taking bouts following caregiver vocal, non-vocal and multimodal responses contained significantly different numbers of turns (F (2, 807.75) = 4.908, p = 0.008; Figure 4B). Post-hoc t-tests with Bonferroni correction showed that bouts following vocal responses (M = 4.88, SD = 1.56) comprised significantly more turns than those following non-vocal (M = 3.86, SD = 3.37; β = 1.17, t = 2.42, p-corrected = 0.047). Bouts following vocal responses showed marginal effects of containing more turns than those following multimodal (M = 3.65, SD = 2.04; β = 0.99, t = 2.26, p-corrected = 0.072). There was no significant difference between those following non-vocal and multimodal responses (β = 0.18, t = 0.29, p-corrected = 1.00). Thus, caregiver vocal responses were more likely to extend vocal turn-taking.
Did caregivers and infants differentially contribute to turn-taking bout extension? We first examined the proportion of turns taken by infants and caregivers separately. We found that overall infants contributed significantly higher proportions of turns. Specifically, infants had a higher proportion of turns than caregivers in bouts following caregiver vocal and non-vocal responses (Table 4). Next, we assessed the average number of caregiver and infant turns per bout following caregiver vocal, non-vocal and multimodal responses. We conducted a 3 modality (vocal, non-vocal, multimodal) by 2 contributor (caregivers, infants) ANOVA on the number of turns per bout. We found a main effect of modality (F (2, 213.06) = 7.14, p = 0.001), and a main effect of contributor (F (1, 195.11) = 4.70, p = 0.03; infants contributed significantly more turns than caregivers; Figure 4C). There was no significant interaction (F (2, 195.11) = 1.20, p = 0.30). Post-hoc t-tests with Bonferroni correction showed that there were significantly more infant turns in bouts following caregiver vocal responses than multimodal responses (β = 0.81, t = 3.43, p-corrected = 0.002). There was no significant difference between infant turns following non-vocal and multimodal responses (β = 0.42, t = 1.57, p-corrected = 0.36), or between non-vocal and vocal responses (β = 0.39, t = 1.51, p-corrected = 0.40). The number of caregiver turns did not differ in bouts following different modalities (all p-corrected > 0.16). Furthermore, bouts following caregiver vocal and non-vocal responses showed a marginal effect of containing more infant turns than caregiver turns (vocal: β = 0.41, t = 1.87, p-corrected = 0.063; non-vocal: β = 0.52, t = 1.85, p-corrected = 0.066). These results suggest that infant turn contributions drove the extension of turn-taking bouts following caregiver vocal responses.
Table 4.
Proportions of Caregivers’ and Infants’ Turns in Turn-taking Bouts
| Turn-taking bouts following: | Caregiver turn proportion (M;SD) | Infant turn proportion (M;SD) | Comparison of proportions |
|---|---|---|---|
| Caregiver vocal responses | 47.02% (8.44%) | 52.98% (8.44%) | W = 726.5, p = 0.02 |
| Caregiver non-vocal responses | 45.40% (9.79%) | 54.60% (9.79%) | W = 113, p = 0.02 |
| Caregiver multimodal responses | 50.58% (10.46%) | 49.42% (10.46%) | W = 159, p = 0.68 |
| All response | 46.6% (8.10%) | 53.4% (8.10%) | W = 2484.5, p = 0.02 |
Note. Infant turn proportion is calculated as the number of infant turns in a given bout divided by the total number of turns in a given bout. The mean and standard deviation reported are across subjects. We tested whether the difference of caregiver and infant turn proportions differs from 0 with a Wilcoxon signed rank test. We found that overall, the differences in caregiver and infant turn proportions are significantly different from 0.
To summarize, caregivers responded to infant babbling most often vocally, and their vocal responses were more likely to elicit infant babbling and extend vocal turn-taking interactions with infants. Infants were active contributors to the extended turn-taking bouts.
Caregiver Response Relatedness
To understand what drove caregiver vocal responses to be effective in eliciting and extending vocal turn-taking with infants, we examined caregiver response relatedness by comparing caregiver vocally related and vocally unrelated responses, with the same set of analyses we applied to compare caregiver vocal, non-vocal and multimodal responses. Out of an average of 33.94 caregiver vocal responses, 29.82% (M = 10.12, SD = 8.42 responses) were vocally related, 2.30% (M = 0.78, SD = 1.09 responses) were vocally unrelated, and the rest 67.88% were Other (M = 23.04, SD = 15.42 responses; See Table 2). We found that caregiver vocally related responses had a higher elicitation rate than vocally unrelated responses (Figure 5), though this effect was likely due to chance (see random sampling procedure below). Type III sum of squares analyses revealed a significant interaction between response relatedness and babbling elicitation rates across time bins (F (8, 800) = 5.69, p < 0.01). Infant babbling elicited by caregiver vocally related responses occurred most frequently around 0–0.5s after caregiver response offset, but those elicited by vocally unrelated responses were distributed more uniformly without a peak in timing.
Figure 5. Babbling Elicitation Rates of Different Caregiver Vocal Response Subcategories, Two Seconds Before and After Caregiver Responses.
Note. Babbling elicitation rates of different caregiver vocal responses, including related, unrelated and the four “Other” categories (narrative, imitation, affirmation, non-sequitur). The x-axis is time (seconds) after caregiver response offset, zero indicating caregivers’ response offset. Negative values along the x-axis indicate overlap between elicited babbling and caregiver responses, while positive values indicate silent gaps between turns. Each time bin is 0.5 seconds in duration. The y-axis is babbling elicitation rate, calculated as the number of babbles elicited divided by the total number of caregiver responses within each time bin.
Using our random sampling procedure (determining chance levels of vocal turn-taking), we obtained chance-level babbling elicitation rates. We compared observed babbling elicitation rates with those expected by chance. Observed rates were consistent with what we would expect by chance alone. Since vocally related responses were more frequent than vocally unrelated responses (t (51.68) = 7.77, p < 0.001), it was more likely for an infant babble to occur after a vocally related response by chance alone. For time bins in which babbling elicitation rates of vocally related and vocally unrelated responses differed, the babbling elicitation rates of both responses were not significantly different from chance-level rates. Similarly, vocally related and vocally unrelated responses also did not differ in ending turn-taking interactions. There was no significant difference between the proportions of extinguished bouts after caregiver vocally related responses, versus after vocally unrelated responses (t (1) = −0.731, p = 0.48).
We counted the number of turns per turn-taking bout and found no difference of the amount of turns in bouts following caregiver vocally related responses (M = 4.82, SD = 2.51) and vocally unrelated responses (M = 4.12, SD = 2.95; β = 0.31, t = 0.298, p = 0.766). To sum up, caregiver response relatedness did not drive caregiver vocal responses to be more effective in eliciting infant babbling and extending vocal turn-taking interactions.
Caregiver Response Durations
It is likely that, instead of response relatedness, the durations of caregiver responses drove caregiver vocal responses to be more effective in eliciting and extending caregiver-infant turn-taking interactions. To investigate this possibility, we compared the durations of caregiver responses that successfully elicited infant babbling versus those that did not, in interaction with caregiver response modality. Type III sum of squares analyses revealed main effects of babbling elicitation (F (1, 2203.4) = 6.12, p = 0.01) and response modality (F (2, 2211.6) = 96.50, p < 0.001) on the durations of caregiver responses. However, the interaction between the two factors was not significant (F (2, 2193.4) = 0.74, p = 0.48). Caregiver responses that successfully elicited infant babbling were significantly shorter than those that did not (Figure 6A). To decompose the main effect of response modality, we conducted post-hoc pairwise t-tests with Bonferroni correction (“emmeans” function in R). Caregiver vocal responses were significantly shorter than non-vocal (β = 0.52, t = 12.70, p-corrected < 0.001) and multimodal responses (β = 0.30, t = 8.64, p-corrected < 0.001; Figure 6B). Caregiver multimodal responses were significantly shorter than non-vocal (β = 0.22, t = 4.36, p-corrected < 0.001; Figure 6B). In sum, caregiver vocal responses were shorter than non-vocal and multimodal responses.
Figure 6. Durations of Caregiver Responses.
Note. A) Durations of caregiver responses (including all modalities) that elicited babbling and those that did not. Caregiver responses that elicited babbling were significantly shorter than those that did not. B) Mean durations of caregiver vocal, non-vocal and multimodal responses in seconds. Vocal responses were significantly shorter than non-vocal and multimodal. Multimodal responses were significantly shorter than non-vocal. * p < .05, ** p < .01, *** p < .001.
Discussion
Turn-taking interactions are crucial for infant social, cognitive, and communicative development. Infant early turn-taking experiences occur primarily within dyadic interaction with caregivers. In this study, we explored what specific aspects of caregiver responses to babbling might facilitate extended vocal turn-taking with infants. We found that caregivers responded to infant babbling more often with vocal responses than non-vocal or multimodal responses. Caregiver vocal responses were most likely to elicit a following infant babble, whereas non-vocal and multimodal responses were less likely to elicit infants’ babbling. The elicitation effect was robust across different types of caregiver speech. Related and unrelated vocal responses did not differentially affect babbling. Thus, facilitation of vocal turn-taking may be independent of the relatedness of caregivers’ contingent speech. In summary, caregiver vocal responses were more effective in eliciting infant babbling and extending vocal turn-taking with infants.
We also examined the contributions of caregivers and infants separately and found that overall infants contributed significantly higher proportions of turns than caregivers (Table 4). Furthermore, there were significantly more infant turns in bouts following caregiver vocal responses than those following multimodal responses. In contrast, the number of caregivers’ turns did not differ in bouts following caregiver responses when examined across modalities (Figure 4C). In sum, infants actively contributed to the extension of turn-taking following caregiver vocal responses.
Caregiver-infant turn-taking interactions
Overall, turn-taking interactions were common between caregivers and infants, and infants’ babbling was more often responded to than not. During free-play, caregivers responded to 61.40% of infant babbling. Our caregiver response rate was higher than those reported in studies that only considered caregiver vocal responses (e.g., Fagan & Doveikis, 2017). However, our caregiver response rate was comparable to that of several studies across multiple ages and cultures which included both caregiver vocal and non-vocal responses. For example, mothers of eight-month-old infants responded to an average of 76% infant babbling (Hong & Gros-Louis, 2017). In a study of six-month-old infants, mothers responded to 62% of mother-directed babbling and 52% of object-directed babbling (Gros-Louis et al., 2014). Caregivers of three-month-old infants from six sociocultural contexts including Berlin, Los Angeles, Beijing, Delhi, urban Nso and rural Nso, responded to 63.92% of babbling when measured at their homes (range: 57.54% - 71.80%, Kärtner et al., 2008). We found caregiver responses to babbling to be predominantly vocal, with a 47.34% response rate. This result is consistent with a previous finding of an overall caregiver vocal response rate of 48.13% across six sociocultural contexts (range: 45.10% - 56.71%, Kärtner et al., 2008).
Attentional constraints organize caregiver-infant turn-taking
As we predicted, when caregivers changed the modality of the interaction infants were less likely to babble again. This effect may be due to interference from increased attentional demands as a result of processing changes in turn-taking modality. Increased attentional loads interfere with infants’ and adults’ motor tasks, ranging from somatic movements like breathing (Denot-Ledunois et al., 1998), producing vocalizations (Tumber et al., 2014), to real-world dyadic tasks like playing tennis (Ducrocq et al., 2016). Like tennis-playing, conversational turn-taking also involved back-and-forth exchanges between two interactants. In adults, attentional loads were induced when they process changes in conversation partners’ voices or voice locations during vocal turn-taking interactions (Lin & Carlile, 2015, 2019). Similarly, when infants vocalized but caregivers responded in a non-vocal or multimodal way, infants’ attentional demands may have also increased, thus interfering with their motor coordination during babbling. Hypotheses pertaining to attentional constraints are relevant given our comparisons of the different response modalities, though we did not directly measure infant attention. Future research should experimentally manipulate real-time modality changing in caregivers, and to examine its influence on both infant attention and turn-taking behaviors.
Multimodal responses bring new potential pathways of learning to the interaction with infants. Children rely on non-vocal behaviors like gaze, pointing, and body orientation to learn the reference of novel words (Baldwin et al., 1996; Grassmann & Tomasello, 2010; Kory Westlund et al., 2017). Caregiver touching or pointing to objects, and smiling at infants predict infant social as well as vocabulary outcomes (Ruddy & Bornstein, 1982; Pearson et al., 2011). In fact, infants’ attention is particularly guided by caregiver manual actions and object handling (Yu et al., 2009; Yu & Smith, 2012; Deák et al., 2014). By creating joint-attention interactions between caregivers and infants, non-vocal and multimodal responses afford unique opportunities for infants to learn object-naming nouns (Tomasello & Farrar, 1986).
Despite these learning opportunities, however, we found that non-vocal and multimodal interactions were significantly less likely to elicit infant babbling. Though many of the non-vocal and multimodal interactions were triadic (caregiver-infant-object), there is little evidence that these triadic interactions result in a decrease of infant babbling. For example, Goldstein et al. (2010) counted the number of infant vocalizations that were produced while infants held and looked at objects, and found that infants’ babbling production rate was 5.16 babbles/minute, which is comparable to the babbling rate in our study (4.57 babbles/minute). Thus, infants did not reduce or stop vocalizing simply because they were looking at, holding or playing with objects. Thus, we suggest that the lower likelihood of babbling we observed after caregiver non-vocal or multimodal responses, was due to increased attentional demands from processing changes in turn-taking modality.
As compared to adults, infants seem less flexible than adults in their ability to change modalities during social interaction. Mature conversationalists are required to flexibly change the target of their attention, spanning multiple modalities, while maintaining coherent and continuous conversation with their social partner (Kolodny & Edelman, 2015). In contrast, patterns of contingent caregiver responses tend to reflect the modality of infants’ behavior, so that infants are not required to flexibly change modalities to maintain turn-taking. During free-play with 10-month old infants, caregivers are more likely to provide contingent multimodal responses to infants’ multimodal behaviors, and non-vocal responses to infants’ gestures (van der Klis et al., 2023). In addition, caregivers were more likely to smile (a non-vocal behavior) following infants’ gaze-at-face (another non-vocal behavior; Rohlfing et al., 2019). Through 9 months of daily interactions with caregivers, infants may have developed an expectation to receive and produce responses within modalities, instead of across modalities. Our results suggest that multimodal flexibility during early turn-taking is very likely to be a learned capacity, grounded in attention and working memory constraints, that undergoes a developmental process of change.
In addition to the effect of caregiver response modality, we also found an effect of response duration. Caregiver responses that elicited infant babbling were significantly shorter in duration than those that did not (Figure 6). Caregiver vocal responses were significantly shorter than non-vocal and multimodal responses (Figure 6). The observed effect of caregiver response duration may be grounded in infants’ processing constraints early in their development. Working memory constraints were demonstrated by 11-month-old infants who could remember two, but not four syllables of 350 milliseconds duration each, regardless of whether the syllables were linguistic or non-linguistic (Newman & Simpson, 2023). This finding suggests an upper limit of 11-month-old infants’ auditory working memory capacity below 1.4 seconds. In the present study we found the mean duration of caregiver vocal responses to be 1.02 seconds, which is well within the capacity of infant working memory reported by Newman & Simpson (2023). In contrast, we found the mean duration of caregiver non-vocal responses to be 1.51 seconds, and that of multimodal responses to be 1.33 seconds, which were significantly longer than vocal responses. It is thus possible that caregiver vocal responses elicited more infant babbling because they were shorter, more likely to be remembered by infants, and more likely to hold their attention. In contrast, the longer duration of caregivers’ non-vocal and multimodal responses may have been outside the working memory capacity of a nine-month-old.
In addition to working memory capacity, the effect of caregiver response duration may also arise from infant expectation for rapid turn-taking. Mature vocal turn-taking has been shown to be comprised of fast exchanges with tight timing (Stivers et al., 2009; Templeton et al., 2022). Adult utterances are shorter in conversational turn-taking than in a monologue (Marklund et al., 2015). By 5 months, infants expect social responses to their vocalizations (Elmlinger et al., 2022). Our results suggested that 9-month-old infants, when babbling, might have developed an expectation for rapid responses in interactions built from experiences with caregivers. Receiving responses that were different from their prediction might induce an attentional cost that interferes with their vocal production. This finding was consistent with previous studies showing that short caregiver speech responses allowed infants more opportunities to vocalize, and that caregivers’ frequent pauses in turn-taking correlated with infants’ vocal participation (Elmlinger et al., 2019; Kiepura et al., 2021). These results were all found in the context of naturalistic free-play. Future studies should experimentally manipulate the duration of responses to assess the sensitivity of vocal turn-taking to temporal patterns of social behavior.
The development of adaptive skills emerges from learning opportunities that are embedded in real-time interaction (Smith & Karmazyn-Raz, 2022). Many studies of caregiver-infant turn-taking, however, focused on day-long audio recordings in an effort to find relations between turn-taking experience and developmental outcomes (e.g., Donnelly & Kidd, 2021; Zimmerman et al., 2009; Gilkerson et al., 2018). As we have shown, understanding the multimodal structure of interaction as it unfolds over short timescales can reveal opportunities for learning that exist beyond the initial serve-and-return that initializes interactions (Goldstein & Schwade, 2010; Goldstein & Schwade, 2008; Warlaumont et al., 2014).
Limitations
Our focus on turn-taking as a dyadic-level phenomenon introduced some potential limitations to the study. Rather than focus on individual pairs of exchanges, our measures of turn-taking bouts included extended sequences of caregiver and infant vocal turns. We defined turn-taking bouts as starting with an exchange between caregivers and infants, and including subsequent vocal turns by both caregivers and infants. Consecutive turns by the same speaker (following the initial serve-and-return) were counted when measuring the number of turns in a turn-taking bout. Thus, we examined the influence of caregiver response modality on turn-taking extension on the dyadic level.
Conclusion
Our study examined the proximal effect of caregiver response modality on extended caregiver-infant turn-taking interactions. Caregiver vocal responses were significantly more likely to elicit additional infant babbling and extend vocal turn-taking bouts. Caregiver non-vocal and multimodal responses were less likely to do so. Future studies should examine whether the effects of response modality arise from working memory and attentional constraints of early development, and how the effects of caregiver responsiveness may change longitudinally as infants develop and learn from accumulating turn-taking experiences.
Acknowledgements:
This research was supported by grants from the National Science Foundation (0844015) and the Eunice Kennedy Shriver National Institute of Child Health and Human Development (5R03HD61524-2). We thank undergraduate members of the Eleanor J. Gibson Laboratory for Developmental Psychology for their assistance with data collection and coding. We thank all families who participated in this research. The authors declare no conflicts of interest with regard to the funding source for this study.
References
- Abney DH, Warlaumont AS, Oller DK, Wallot S, & Kello CT (2017). Multiple coordination patterns in infant and adult vocalizations. Infancy, 22(4), 514–539. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Abu-Zhaya R, Seidl A, & Cristia A. (2017). Multimodal infant-directed communication: How caregivers combine tactile and linguistic cues. Journal of Child Language, 44(5), 1088–1116. [DOI] [PubMed] [Google Scholar]
- Albert RR, Schwade JA, & Goldstein MH (2018). The social functions of babbling: acoustic and contextual characteristics that facilitate maternal responsiveness. Developmental Science, 21(5), e12641. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baldwin DA, Markman EM, Bill B, Desjardins RN, Irwin JM, & Tidball G. (1996). Infants’ reliance on a social criterion for establishing word-object relations. Child Development, 67(6), 3135–3153. [PubMed] [Google Scholar]
- Bates D, Mächler M, Bolker B. & Walker S. (2015). Fitting linear mixed- effects models using lme4. Journal of Statistical Software 67, 1–48. [Google Scholar]
- Bateson MC (1975). Mother-infant exchanges: the epigenesis of conversational interaction. Annals of the New York Academy of sciences, 263(1), 101–113. [DOI] [PubMed] [Google Scholar]
- Bloom K, Russell A, & Wassenberg K. (1987). Turn taking affects the quality of infant vocalizations. Journal of Child Language, 14(2), 211–227. [DOI] [PubMed] [Google Scholar]
- Bögels S, & Torreira F. (2015). Listeners use intonational phrase boundaries to project turn ends in spoken interaction. Journal of Phonetics, 52, 46–57. [Google Scholar]
- Boiten FA, Frijda NH, & Wientjes CJ (1994). Emotions and respiratory patterns: review and critical analysis. International Journal of Psychophysiology, 17(2), 103–128. [DOI] [PubMed] [Google Scholar]
- Buder EH, Warlaumont AS, Oller DK, Peter B, & MacLeod A. (2013). An acoustic phonetic catalog of prespeech vocalizations from a developmental perspective. Comprehensive perspectives on child speech development and disorders: Pathways from linguistic theory to clinical practice, 4, 103–34. [Google Scholar]
- Carozza S, & Leong V. (2021). The role of affectionate caregiver touch in early neurodevelopment and parent-infant interactional synchrony. Frontiers in Neuroscience, 14(613378), 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen SY, Feng Z, & Yi X. (2017). A general introduction to adjustment for multiple comparisons. Journal of thoracic disease, 9(6), 1725–1729. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cohen J. (1968). Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213. [DOI] [PubMed] [Google Scholar]
- Deák GO, Krasno A, Triesch J, Lewis J, & Sepeda L. (2014). Watch the hands: Human infants can learn gaze-following by watching their parents handle objects. Developmental Science, 17(2), 270–281. [DOI] [PubMed] [Google Scholar]
- DeLong KA, Urbach TP, & Kutas M. (2005). Probabilistic word pre-activation during language comprehension inferred from electrical brain activity. Nature Neuroscience, 8(8), 1117–1121. [DOI] [PubMed] [Google Scholar]
- Denot-Ledunois S, Vardon G, Perruchet P, & Gallego J. (1998). The effect of attentional load on the breathing pattern in children. International Journal of Psychophysiology, 29(1), 13–21. [DOI] [PubMed] [Google Scholar]
- De Ruiter JP, Mitterer H, & Enfield NJ (2006). Projecting the end of a speaker’s turn: A cognitive cornerstone of conversation. Language, 82(3), 515–535. [Google Scholar]
- de Villiers Rader N, & Zukow-Goldring P. (2012). Caregivers’ gestures direct infant attention during early word learning: the importance of dynamic synchrony. Language Sciences, 34(5), 559–568. [Google Scholar]
- Donnelly S, & Kidd E. (2021). The longitudinal relationship between conversational turn-taking and vocabulary growth in early language development. Child Development, 92(2), 609–625. [DOI] [PubMed] [Google Scholar]
- Ducrocq E, Wilson M, Vine S, & Derakshan N. (2016). Training attentional control improves cognitive and motor task performance. Journal of Sport and Exercise Psychology, 38(5), 521–533. [DOI] [PubMed] [Google Scholar]
- Elmlinger SL, Schwade JA, & Goldstein MH (2019). The ecology of prelinguistic vocal learning: Parents simplify the structure of their speech in response to babbling. Journal of Child Language, 46(5), 998–1011. [DOI] [PubMed] [Google Scholar]
- Elmlinger SL, Schwade JA, Vollmer L, & Goldstein MH (2022). Learning how to learn from social feedback: The origins of early vocal development. Developmental Science, e13296. [DOI] [PubMed] [Google Scholar]
- Esteve-Gibert N, Prieto P, & Pons F. (2015). Nine-month-old infants are sensitive to the temporal alignment of prosodic and gesture prominences. Infant Behavior and Development, 38, 126–129. [DOI] [PubMed] [Google Scholar]
- Fagan MK, & Doveikis KN (2017). Ordinary interactions challenge proposals that maternal verbal responses shape infant vocal development. Journal of Speech, Language, and Hearing Research, 60(10), 2819–2827. [DOI] [PubMed] [Google Scholar]
- Farley SD, Ashcraft AM, Stasson MF, & Nusbaum RL (2010). Nonverbal reactions to conversational interruption: A test of complementarity theory and the status/gender parallel. Journal of Nonverbal Behavior, 34(4), 193–206. [Google Scholar]
- Frank MC, Braginsky M, Yurovsky D, & Marchman VA (2017). Wordbank: An open repository for developmental vocabulary data. Journal of Child Language, 44(3), 677–694. [DOI] [PubMed] [Google Scholar]
- Gilkerson J, Richards JA, Warren SF, Oller DK, Russo R, & Vohr B. (2018). Language experience in the second year of life and language outcomes in late childhood. Pediatrics, 142(4). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goldstein MH, King AP, & West MJ (2003). Social interaction shapes babbling: Testing parallels between birdsong and speech. Proceedings of the National Academy of Sciences, 100(13), 8030–8035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goldstein MH, & Schwade JA (2008). Social feedback to infants’ babbling facilitates rapid phonological learning. Psychological Science, 19(5), 515–523. [DOI] [PubMed] [Google Scholar]
- Goldstein MH, & Schwade JA (2010). From birds to words: Perception of structure in social interactions guides vocal development and language learning. In Blumberg MS, Freeman JH, & Robinson SR (Eds.), Oxford handbook of developmental behavioral neuroscience (pp. 708–729). Oxford University Press. [Google Scholar]
- Goldstein MH, Schwade J, Briesch J, & Syal S. (2010). Learning while babbling: Prelinguistic object-directed vocalizations indicate a readiness to learn. Infancy, 15(4), 362–391. [DOI] [PubMed] [Google Scholar]
- Gómez E, & Strasser K. (2021). Language and socioemotional development in early childhood: The role of conversational turns. Developmental Science, 24(5), e13109. [DOI] [PubMed] [Google Scholar]
- Grassmann S, & Tomasello M. (2010). Young children follow pointing over words in interpreting acts of reference. Developmental Science, 13(1), 252–263. [DOI] [PubMed] [Google Scholar]
- Gratier M, Devouche E, Guellai B, Infanti R, Yilmaz E, & Parlato-Oliveira E. (2015). Early development of turn-taking in vocal interaction between mothers and infants. Frontiers in Psychology, 6(1167), 236–245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gros-Louis J, West MJ, Goldstein MH, & King AP (2006). Mothers provide differential feedback to infants’ prelinguistic sounds. International Journal of Behavioral Development, 30(6), 509–516. [Google Scholar]
- Gros-Louis J, West MJ, & King AP (2014). Maternal responsiveness and the development of directed vocalizing in social interactions. Infancy, 19(4), 385–408. [Google Scholar]
- Hilbrink EE, Gattis M, & Levinson SC (2015). Early developmental changes in the timing of turn-taking: a longitudinal study of mother–infant interaction. Frontiers in Psychology, 6(1492), 246–257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hong Y, Gros-Louis J. (2017). Parental Verbal Responsiveness during Prelinguistic Vocal Development: Variability and Association with Language Outcomes (Doctoral dissertation, University of Iowa: ). [Google Scholar]
- Jafari M, & Ansari-Pour N. (2019). Why, When and How to Adjust Your P Values?. Cell Journal, 20(4), 604–607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kärtner J, Keller H, Lamm B, Abels M, Yovsi RD, Chaudhary N, & Su Y. (2008). Similarities and differences in contingency experiences of 3-month-olds across sociocultural contexts. Infant Behavior and Development, 31(3), 488–500. [DOI] [PubMed] [Google Scholar]
- Kaye K. (1977). Toward the origin of dialogue. In Schaffer HR (Ed.), Studies in Mother-Infant Interaction (pp. 89–117). London: Academic Press. [Google Scholar]
- Kiepura E, Niedźwiecka A, & Kmita G. (2021). Silence matters: The role of pauses during dyadic maternal and paternal vocal interactions with preterm and full-term infants. Journal of Child Language, 49(3), 451–468. [DOI] [PubMed] [Google Scholar]
- Kolodny O, & Edelman S. (2015). The problem of multimodal concurrent serial order in behavior. Neuroscience & Biobehavioral Reviews, 56, 252–265. [DOI] [PubMed] [Google Scholar]
- Kory Westlund JM, Dickens L, Jeong S, Harris PL, DeSteno D, & Breazeal CL (2017). Children use non-verbal cues to learn new words from robots as well as people. International Journal of Child-Computer Interaction, 13, 1–9. [Google Scholar]
- Lammertink I, Casillas M, Benders T, Post B, & Fikkert P. (2015). Dutch and English toddlers’ use of linguistic cues in predicting upcoming turn transitions. Frontiers in Psychology, 6(495), 274–291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lew-Williams C, Ferguson B, Abu-Zhaya R, & Seidl A. (2019). Social touch interacts with infants’ learning of auditory patterns. Developmental Cognitive Neuroscience, 35, 66–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Levinson SC (2016). Turn-taking in human communication–origins and implications for language processing. Trends in Cognitive Sciences, 20(1), 6–14. [DOI] [PubMed] [Google Scholar]
- Lin G, & Carlile S. (2015). Costs of switching auditory spatial attention in following conversational turn-taking. Frontiers in Neuroscience, 9(124), 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lin G, & Carlile S. (2019). The effects of switching non-spatial attention during conversational turn taking. Scientific Reports, 9(1), 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Long HL, Bowman DD, Yoo H, Burkhardt-Reed MM, Bene ER, & Oller DK (2020). Social and endogenous infant vocalizations. PloS one, 15(8), e0224956. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marklund U, Marklund E, Lacerda F, & Schwarz IC (2015). Pause and utterance duration in child-directed speech in relation to child vocabulary size. Journal of Child Language, 42(5), 1158–1171. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Masek LR, McMillan BT, Paterson SJ, Tamis-LeMonda CS, Golinkoff RM, & Hirsh-Pasek K. (2021). Where language meets attention: How contingent interactions promote learning. Developmental Review, 60, 100961. [Google Scholar]
- Miller JL, & Lossia AK (2013). Prelinguistic infants’ communicative system: Role of caregiver social feedback. First Language, 33(5), 524–544. [Google Scholar]
- Newman RS, & Simpson VM (2023). Infants’ short-term memory for consonant–vowel syllables. Journal of Experimental Child Psychology, 226, 105567. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Northrup JB, & Iverson JM (2015). Vocal coordination during early parent–infant interactions predicts language outcome in infant siblings of children with autism spectrum disorder. Infancy, 20(5), 523–547. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oller DK (2000). The emergence of the speech capacity. Laurence Erlbaum and Associates, Mahwah, N.J.: Psychology Press. [Google Scholar]
- Oller DK, Eilers RE, & Basinger D. (2001). Intuitive identification of infant vocal sounds by parents. Developmental Science, 4(1), 49–60. [Google Scholar]
- Pearson RM, Heron J, Melotti R, Joinson C, Stein A, Ramchandani PG, & Evans J. (2011). The association between observed non-verbal maternal responses at 12 months and later infant development at 18 months and IQ at 4 years: A longitudinal study. Infant Behavior & Development, 34(4), 525–533. [DOI] [PubMed] [Google Scholar]
- Obrist PA, Webb RA, & Sutterer JR (1969). Heart rate and somatic changes during aversive conditioning and a simple reaction time task. Psychophysiology, 5(6), 696–723. [DOI] [PubMed] [Google Scholar]
- Rohlfing KJ, Leonardi G, Nomikou I, Rączaszek-Leonardi J, & Hüllermeier E. (2019). Multimodal turn-taking: motivations, methodological challenges, and novel approaches. IEEE Transactions on Cognitive and Developmental Systems, 12(2), 260–271. [Google Scholar]
- Romeo RR, Leonard JA, Robinson ST, West MR, Mackey AP, Rowe ML, & Gabrieli JD (2018). Beyond the 30-million-word gap: Children’s conversational exposure is associated with language-related brain function. Psychological Science, 29(5), 700–710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ruddy MG, & Bornstein MH (1982). Cognitive correlates of infant attention and maternal stimulation over the first year of life. Child Development, 53(1), 183–188. [PubMed] [Google Scholar]
- Ruff HA, & Capozzoli MC (2003). Development of attention and distractibility in the first 4 years of life. Developmental Psychology, 39(5), 877. [DOI] [PubMed] [Google Scholar]
- Sacks H, Schegloff EA, & Jefferson G. (1978). A simplest systematics for the organization of turn taking for conversation. In Studies in the organization of conversational interaction (pp. 7–55). Academic Press. [Google Scholar]
- Seidl A, Tincoff R, Baker C, & Cristia A. (2015). Why the body comes first: Effects of experimenter touch on infants’ word finding. Developmental Science, 18(1), 155–164. [DOI] [PubMed] [Google Scholar]
- Smith LB, & Karmazyn-Raz H. (2022). Episodes of experience and generative intelligence. Trends in Cognitive Sciences, 26(12), 1064–1065. [DOI] [PubMed] [Google Scholar]
- Snow CE (1977). Mothers’ speech research: From input to interaction. In Snow CE & Ferguson CA (Eds.), Talking to children: Language input and acquisition (pp. 31–49). London: Cambridge University Press. [Google Scholar]
- Stelt JM, & Koopmans-van Beinum FJ (1986). The onset of babbling related to gross motor development. In Precursors of early speech (pp. 163–173). Palgrave Macmillan, London. [Google Scholar]
- Stivers T, Enfield NJ, Brown P, Englert C, Hayashi M, Heinemann T, Hoymann G, Rossano F, de Ruiter JP, Yoon K, & Levinson SC (2009). Universals and cultural variation in turn-taking in conversation. Proceedings of the National Academy of Sciences, 106(26), 10587–10592. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Suarez-Rivera C, Schatz JL, Herzberg O, & Tamis-LeMonda CS (2022). Joint engagement in the home environment is frequent, multimodal, timely, and structured. Infancy, 27(2), 232–254. [DOI] [PubMed] [Google Scholar]
- Tamis-LeMonda CS, Bornstein MH, & Baumwell L. (2001). Maternal responsiveness and children’s achievement of language milestones. Child Development, 72(3), 748–767. [DOI] [PubMed] [Google Scholar]
- Tamis-LeMonda CS, Kuchirko Y, & Song L. (2014). Why is infant language learning facilitated by parental responsiveness?. Current Directions in Psychological Science, 23(2), 121–126. [Google Scholar]
- Templeton EM, Chang LJ, Reynolds EA, LeBeaumont MDC, & Wheatley T. (2022). Fast response times signal social connection in conversation. Proceedings of the National Academy of Sciences, 119(4). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tomasello M, & Farrar MJ (1986). Joint Attention and Early Language. Child Development, 57(6), 1454–1463. [PubMed] [Google Scholar]
- Tumber AK, Scheerer NE, & Jones JA (2014). Attentional demands influence vocal compensations to pitch errors heard in auditory feedback. PLoS One, 9(10), e109968. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van Berkum JJ, Brown CM, Zwitserlood P, Kooijman V, & Hagoort P. (2005). Anticipating upcoming words in discourse: evidence from ERPs and reading times. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31(3), 443. [DOI] [PubMed] [Google Scholar]
- van der Klis A, Adriaans F, & Kager R. (2023). Infants’ behaviours elicit different verbal, nonverbal, and multimodal responses from caregivers during early play. Infant Behavior and Development, 71(101828), 1–17. [DOI] [PubMed] [Google Scholar]
- Van Egeren LA, Barratt MS, & Roach MA (2001). Mother–infant responsiveness: Timing, mutual regulation, and interactional context. Developmental Psychology, 37(5), 684–697. [DOI] [PubMed] [Google Scholar]
- Warlaumont AS, Richards JA, Gilkerson J, & Oller DK (2014). A social feedback loop for speech development and its reduction in autism. Psychological Science, 25(7), 1314–1324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yu C, & Smith LB (2012). Embodied attention and word learning by toddlers. Cognition, 125(2), 244–262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yu C, Smith LB, Shen H, Pereira AF, & Smith T. (2009). Active information selection: Visual attention through the hands. IEEE Transactions on Autonomous Mental Development, 1(2), 141–151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang VH, Elmlinger SL, & Goldstein MH (2024). Developmental cascades of vocal turn-taking connect prelinguistic vocalizing with early language. Infant Behavior and Development, 75(101945), 1–17. [DOI] [PubMed] [Google Scholar]
- Zimmerman FJ, Gilkerson J, Richards JA, Christakis DA, Xu D, Gray S, & Yapanel U. (2009). Teaching by listening: The importance of adult-child conversations to language development. Pediatrics, 124(1), 342–349. [DOI] [PubMed] [Google Scholar]






