Abstract
This study compared vocal development in Korean- and English-learning infants and examined ambient-language effects focusing on predominant utterance shapes. Vocalization samples were obtained from 14 Korean-learning children and 14 English-learning children, who ranged in age from 9 to 21 months, in monolingual environments using day-long audio recordings. The analyzers, who were blind to participants’ demographic information, identified utterance shapes to determine functional vocal repertoires through naturalistic listening simulating the caregiver’s natural mode of listening. The results showed no cross-linguistic differences in the amount of vocal output or the proportion of canonical syllables. However, the infants from the two language backgrounds showed differences regarding the predominant canonical utterance shapes. The percentage of VCV utterances in Korean-learning children was higher than in English-learning children while CV syllables predominated in the English-learning children. We speculate that the difference between the predominant utterance shapes of Korean- and English-learning children could be associated with differences in early lexical items typically acquired in the two language groups.
Keywords: Infant, Utterance shapes, Canonical babbling, Ambient language, Cross-linguistic
1. Introduction
Many studies have investigated possible ambient language effects in babbling, considering that the auditory experience of language is essential in vocal development (e.g., Engstrand et al., 2003; Rvachew et al., 2008; Whalen et al., 2007). The literature has produced a common opinion that the auditory experience of ambient language has effects on both infant speech production and speech perception in the first years of life (e.g., de Boysson-Bardies & Vihman, 1991; Edwards & Beckman, 2008; Eimas et al., 1971; Rivera-Gaxiola et al., 2005; Werker & Tees, 1984). However, there is controversy about whether such an ambient language effect occurs in “pure babbling,” sometimes referred to as “prelinguistic vocalizations,” as opposed to occurring in “meaningful speech.” This is a key distinction: Which utterances of infancy and early childhood are attempts at producing words, and which are babbling activities independent of words? A substantial number of studies have focused on phonetic or prosodic elements in pure babbling. The work has often implied that babbling utterances have been influenced by the phonology of the ambient language only, not by its lexicon.
Nevertheless, several other studies have found no discernible ambient language effects on babbling (e.g., Engstrand et al., 2003; Lee et al., 2017; Thevenin et al., 1985). For example, using a methodology designed to control possible bias effects, Lee et al. (2017) did not find significant overall language identification differences for English- and Chinese-learning infants at 8, 10, and 12 months. However, they did detect ambient-language effects in items that contained canonical syllables and in canonical syllable sequences identified by listeners as words. They suggested that the earliest ambient-language effects may be found in utterances influenced by language-specific features of lexical items. Based on these findings, and those of Engstrand et al. (2003), Lee et al. suggested that vocalizations may be influenced by infants’ targeting of language-specific lexical items rather than by language-specific phonology per se.
Infants learn to recognize words and word shapes of their ambient language and demonstrate a preference for well-formed word shapes during their first year (Aslin et al., 1998; Bergelson & Swingley, 2012; Jusczyk, 2002; Saffran et al., 1999; Tincoff & Jusczyk, 1999). Such capability and preference of infants in the first year might have an impact on their own babbling sounds and, furthermore, might be a key foundation for emerging words. In particular, as infants produce canonical syllables, it makes sense that they would begin to recognize similarities between their own canonical syllables and salient adult words in the ambient environment as they produce and listen to particular sounds and syllable sequences. This recognition process could contribute to infants forming speech-like motor production routines, sometimes referred to as a repertoire of “vocal motor schemes” (McCune & Vihman, 1987, 2001). In addition, infants’ learning process, especially with regard to emerging words, might be reinforced by adult feedback. Caregivers’ contingent responses such as imitation and modeling are believed to cause an increase in infant vocalizations (Dunst et al., 2010; Goldstein & Schwade, 2008; Gros-Louis et al., 2014; Gros-Louis and Miller, 2018). Although caregivers tend to respond to all kinds of infant sounds, including vegetative noises, they become more selective in responding to speech-like vocalizations as infants grow and produce a greater variety of sounds (Snow, 1977). In particular, caregivers usually show more active and enthusiastic responses to utterances that include lexical items, especially those referring to mommy or daddy (Stoel-Gammon, 2011). Caregivers’ selective feedback appears thus to encourage the use of particular sounds and syllable sequences, thereby contributing to infants’ forming speech-like motor production routines as well as beginning to work out the relationship between their own babbling sounds and salient adult words. In this way, the infant’s ability to recognize and produce well-formed syllables as well as the adult’s input and feedback support the emergence of words. These presumed patterns of learning suggest that we might be able to identify the effect of the ambient language from specific babbling shapes or syllable types that infants prefer to produce in early utterances, especially from utterance shapes that correspond to words that tend to enter infant repertoires early.
It has been reported that the combination of a consonant (C) and a vowel (V), in that order, is the most frequent syllable shape used by English-learning infants during the canonical babbling stage (Kent & Bauer, 1985; Mitchell & Kent, 1990; von Hapsburg & Davis, 2006). Kent and Bauer (1985) found that the CV shape in English-learning infants was predominant over VCV, VC, and CVC shapes, although singleton vowels (though not deemed “canonical”) were still the syllables most abundantly produced by infants.
Nevertheless, the CV utterance shape might not be universally predominant across infants with different language backgrounds. Data analyses from both spontaneous speech samples as well as parent reports using the Korean version of the MacArthur-Bates Communicative Development Inventories (Pae & Kwak, 2011) have supported that Korean children younger than 24 months most frequently produce the VCV word shape. For example, Ha and Pi (2018) phonemically transcribed words lists acquired by 50%–75% of children aged 12–30 months and found that younger children (i.e., younger than 24 months) produced the VCV word shape most frequently due to common acquisition by Korean infants of first words such as [Λma] for mommy, [ap*a] for daddy (the * indicates tenseness of the [p]), and [ani] for no. The percentage of words starting with consonants in these data increased with age. Given the possibility that babbling utterances may be influenced by the lexicon of the ambient language, it seems likely that Korean-learning children may more frequently produce VCV than CV shapes at some point after the onset of canonical babbling.
No cross-linguistic study has explored the relative occurrence of utterance or babbling shapes, although studies focusing on babbling shapes could well provide important information about possible effects of ambient language on early vocal development. Prior research reporting ambient language effects in babbling has focused instead on frequency of occurrence of individual segments, consonant-like or vowel-like elements, or individual CV syllable types (e.g., de Boysson-Bardies & Vihman, 1991; Kern et al., 2009; Lee et al., 2017; Lee et al., 2010; Rvachew et al., 2008).
In addition, it is important to consider certain methodological limitations of studies that have reported effects of ambient language on babbling. Studies have often used forced-choice judgments by human listeners of the language-background of recorded infant utterances (e.g., Atkinson et al., 1968; Engstrand et al., 2003; Thevenin et al., 1985) or relied on phonetic transcription or other auditory descriptions of features, including syllabic stress or intonation (e.g., de Boysson-Bardies & Vihman, 1991; Edwards & Beckman, 2008; Lee et al., 2010). Instrumental acoustic analyses addressing segmental and suprasegmental features have also been used to study possible ambient-language effects (e.g., Chung et al., 2012; de Boysson-Bardies, Halle, Sagart & Durand, 1989; Rvachew et al., 2008; Whalen et al., 2007). The three common methods in the literature have tended to include confounds or other methodological flaws that limit interpretation of possible ambient language effects. Some studies have suffered from possible biased utterance selection owing to external voices in the background of infant utterances during the utterance selection process. In addition, several published studies have failed to clearly indicate that judges were blind to the voices of other speakers in the background of infant utterances, thereby leading to possible bias in forced-choice listeners, transcribers, and acoustic analysts (e.g., de Boysson-Bardies et al., 1989; Kern et al., 2009). In addition, many studies have analyzed recordings collected from different settings and with differences in noise levels and recording equipment across the ambient languages compared (e.g., Engstrand et al., 2003; Rvachew et al., 2008). Furthermore, studies of speech acquisition often involve short recordings, which are insufficient to ensure representative sampling of vocalizations in natural environments and to capture vocal development in detail (e.g., Ha et al., 2014; Nathani et al., 2006). Importantly, it is now possible to conduct all-day recordings in order to obtain much more representative samples of infant utterances than have been possible in past research.
The purpose of this study was to conduct a cross-linguistic comparison of vocal development in Korean- and English-learning children both quantitatively and qualitatively, based on all-day recording samples of vocalizations from each infant, collected in the natural environment of the home using identical procedures for both languages. We especially focused on utterance shapes and examined whether Korean-learning children would show similarities of utterance shapes to the presumed ambient lexicon, where it is known that the VCV patterns are fairly common and which may be distinct from those seen in English-learning children’s environments.
2. Methods
2.1. Participants
Research flyers were posted on social network sites and at local daycare communities, and parents interested in participating in the study voluntarily contacted the research team. Fourteen Korean-learning children and 14 English-learning children in monolingual environments participated in this study. All the Korean-learning children resided in South Korea whereas all the English-learning children lived in Urbana-Champaign, Illinois, USA, at the time of data collection. The two language groups were age- and gender-matched. Information, including parents’ occupation status and education level as well as the consistency of involvement in caregiving by the mother and father figure, was obtained to match socioeconomic status between the two language groups. All parents in the two language groups were categorized as middle-class or above, college-educated, and professionally or semiprofessionally employed persons in two-parent families.
The participating children ranged in age from 9 months to 21 months, and the average age of both groups was 15.43 months at the time of data collection; each language group consisted of 6 boys and 8 girls. None of the children had significant physical, cognitive, developmental, or hearing problems according to their parents’ reports. One all-day recording was obtained from each infant. This study was approved by the Hallym University Institutional Review Board as well as the University of Illinois at Urbana-Champaign Institutional Review Board. Signed informed consent forms were obtained from the participants’ caregivers.
2.2. Data collection
The Language ENvironment Analysis (LENA) recorder was used to collect one day of home recording from each infant. The recording device is worn by a child in a chest pocket of a specially designed vest or other clothing. The parents were asked to record a typical day in the home, which enabled the researchers to collect a representative sampling of infant vocalizations in their natural environments. The device records for up to 16 h before switching off automatically. Therefore, audio recordings provided approximately 16 h of sampling for each infant.
2.3. Data analysis
Complete all-day home recordings for each child were first analyzed using the LENA automated analysis software program (LENA-pro, LENA Research Foundation, Boulder, Co, USA). The LENA automated analysis generates estimates related to children’s auditory environment, including the number of adult words spoken, the number of conversational turns between the child and adults or other children, the number of child vocalizations, the duration of exposure to TV/radio and electronic sounds or background noise, and the duration of silence. Although there were certainly individual differences in children’s auditory environments, children tended to show clusters of vocalizations across the day—that is, the distribution of volubility was focused in a few time blocks throughout the day. The LENA estimates indicated that children usually produced clusters of vocalizations for approximately two hours in the morning and again in the afternoon or evening. They also mainly produced vocalizations in the context of turn-taking with adults.
Twenty 5-minute segments with the child’s highest vocalization rate determined by the LENA automated analysis software program were selected from each all-day home recording for this study. This approach ensured that segments where the child was asleep or vocally inactive were not included. The recording segments were analyzed based on human coding to determine utterance shapes. Human coding was performed using the Action Analysis Coding and Training (AACT) software (Delgado, Buder & Oller, 2010), which was developed specifically for coding and analysis of infant vocalizations. The system has been used extensively in prior research (see e.g., Oller et al. 2020).
Utterances for analysis were first selected from the twenty 5-minute segments, based on a breath-group criterion—that is, one utterance per breath group (Lynch et al., 1995a, Lynch et al., 1995b). Vegetative sounds, cries, and laughs were excluded from the analysis. During this selection we did not determine whether utterances consisted of meaningful words; therefore, the data include prelinguistic vocalizations as well as words. Efforts were made to limit possible bias by following a strict criterion of selecting only utterances free of any possible cues to the ambient language (especially other voices in the background) other than cues that might be found in the infant’s voice. The analyzers who selected the utterances also did not participate in the coding or subsequent analysis process. Blind analyses were conducted in order to limit the possibility of bias in the data analysis—that is, coders were not provided with any information about the data or purpose of the study or research questions. Coders received intensive training sessions on the definitions of measures and data analysis until agreement between coders reached 80 % consistency. All the analyzers and coders involved in the data analysis were native speakers of Korean.
2.4. Measures
The total number of utterances was determined in accord with the breath-group criterion, and, in addition, the types of utterances were determined. Each utterance was broadly divided into those consisting of noncanonical or canonical syllables. Noncanonical utterances consisted of protophones such as quasivowels, full vowels (e.g., [a], [oa]), and marginal babbling. Canonical utterances were required to include at least one canonical syllable (e.g., [ba], [ada]). If an utterance was classified as canonical, the utterance shapes were also determined to compile each child’s inventory. The length and structure of canonical syllable shapes were identified to examine the extent to which syllable shapes were produced in the two language groups.
A syllable is the minimal rhythmic unit of an utterance (see Oller, 2000, especially Chapters 4 and 9 for details). The coders counted the number of syllables from both canonical and noncanonical utterances by listening to perceptible beats in the rhythm of vocalization. When we hear syllables, we can count them and even perceive long-short alternating rhythms (Fowler et al., 1986). Some researchers have attempted to account for perception of stress beats in speech by investigating the phenomenon of P-centers (Perceptual centers) (for more information, see Fowler et al., 1986; and Morton et al., 1976). Both canonical and precanonical babbling (containing or not containing well-formed syllables, respectively) include these differentiable rhythmic structures resulting from rises and falls in amplitude or pitch, and from changes in spectral patterns sometimes (but not always) perceived as transitions between margins (consonant-like sounds) and nuclei (vowel-like sounds). “A series of such syllables, if well-formed, often shows a timing pattern that corresponds roughly to the timing patterns of other rhythmic repetitive actions such as toe tapping or hand clapping” (Oller, 2000, p. 62). Based on these definitions/concepts of syllables and beats, we trained coders how to count the number of syllables and to identify canonical and non-canonical syllables.
The coders could also supplement their judgments by observing changes in amplitudes of the waveform in the acoustic display that accompanies the AACT software (i.e., TF32 computer software, from Milenkovic, 2001). Ultimately, all the syllables were classified as either noncanonical or canonical, and then the percentage of canonical babbling was calculated as the number of canonical syllables divided by the total number of syllables times 100.
2.5. Statistical analysis
Multivariate analysis of variance (MANOVA) was conducted to compare vocalizations of the English-learning and Korean-learning children. Dependent variables included the total number of utterances, percentage of canonical utterances, total number of syllables, and percentage of the canonical syllables. Also, the types of canonical syllable shapes that occurred in more than 10 % of the canonical utterances were identified. The frequency of occurrence of these predominant shapes was compared between the two language groups. A chi-square test of independence was performed to determine whether the preference toward these predominant syllable shapes differed significantly between the language groups. For all statistical analyses, the accepted level of significance was p < .05 (two-tailed).
3. Results
3.1. Volubility
Ultimately, 18,798 utterances were selected and coded in the analysis across all infants. Table 1 is a display of the total number of utterances, total number of canonical utterances, percentage of utterances with canonical syllables, total number of syllables, total number of canonical syllables, and percentage of canonical syllables from the Korean- and English-learning children. Table 2 is a display of the measures for each infant in the two language groups. First, the total number of utterances of both groups (volubility) was compared between the two language groups in order to confirm that the utterances were quantitatively similar between the two language groups. Recall that only utterances free of any possible cues to the ambient language were selected. The total number of utterances between the two groups was not significantly different (F = 0.228, p > 0.05). Consequently, the subsequent cross-linguistic comparisons were presumably not influenced by differences in the rate of vocalization by the two groups.
Table 1.
Total number of utterances, total number of canonical utterances, percentage of canonical utterances, total number of syllables, total number of canonical syllables, and the canonical babbling ratio (CBR, canonical syllables as a percentage of all syllables) in the two language groups.
| Measures | Korean-learning children | English-learning children | Statistical Values |
||
|---|---|---|---|---|---|
| M (SD) | M (SD) | F | p | ||
| Total # of utterances | 648.93 (229.18) | 693.79 (266.69) | .228 | .637 | .009 |
| Total # of canonical utterances | 341.79 (250.55) | 342.93 (235.48) | .000 | .990 | .000 |
| % of canonical utterances | 48.91 (20.16) | 46.23 (16.35) | .708 | .703 | .006 |
| Total # of syllables | 1219.71 (562.66) | 1365.36 (663.87) | .392 | .537 | .015 |
| Total # of canonical syllables | 589.07 (518.42) | 627.71 (540.39) | .037 | .848 | .001 |
| CBR | 41.95 (18.86) | 41.15 (14.71) | .016 | .901 | .001 |
Table 2.
Coding results for each infant in the two language groups.
| Language group | Child No. | age (month) | Total # of utterances | Total # of canonical utterances | % of canonical utterances | total # of syllables | total # of canonical syllables | CBR |
|---|---|---|---|---|---|---|---|---|
| Korean-learning children | 1 | 9 | 642 | 94 | 14.64 | 615 | 78 | 12.68 |
| 2 | 10 | 518 | 139 | 26.83 | 891 | 166 | 18.63 | |
| 3 | 11 | 487 | 198 | 40.66 | 1237 | 409 | 33.06 | |
| 4 | 12 | 536 | 259 | 48.32 | 1070 | 440 | 41.12 | |
| 5 | 13 | 629 | 384 | 61.05 | 1333 | 629 | 47.19 | |
| 6 | 14 | 386 | 102 | 26.42 | 678 | 130 | 19.17 | |
| 7 | 15 | 810 | 314 | 38.77 | 1403 | 462 | 48.37 | |
| 8 | 16 | 371 | 196 | 52.83 | 705 | 341 | 32.93 | |
| 9 | 17 | 516 | 369 | 71.51 | 948 | 507 | 53.48 | |
| 10 | 18 | 759 | 335 | 44.14 | 1089 | 408 | 37.47 | |
| 11 | 19 | 1184 | 983 | 83.02 | 2551 | 1879 | 73.66 | |
| 12 | 20 | 487 | 181 | 37.17 | 775 | 276 | 35.61 | |
| 13 | 21 | 929 | 708 | 76.21 | 2054 | 1309 | 63.73 | |
| 14 | 21 | 831 | 523 | 63.18 | 1727 | 1213 | 70.24 | |
| English-learning children | 1 | 9 | 403 | 106 | 26.30 | 886 | 187 | 21.11 |
| 2 | 10 | 497 | 173 | 34.81 | 942 | 246 | 26.11 | |
| 3 | 11 | 671 | 166 | 24.74 | 1067 | 256 | 23.99 | |
| 4 | 12 | 648 | 247 | 38.12 | 1403 | 529 | 37.7 | |
| 5 | 13 | 1254 | 796 | 63.48 | 3230 | 2139 | 66.22 | |
| 6 | 14 | 584 | 184 | 31.51 | 939 | 278 | 29.61 | |
| 7 | 16 | 493 | 199 | 40.37 | 937 | 340 | 36.29 | |
| 8 | 16 | 706 | 198 | 28.05 | 1224 | 331 | 27.04 | |
| 9 | 16 | 580 | 374 | 64.48 | 990 | 568 | 57.37 | |
| 10 | 19 | 1114 | 743 | 66.7 | 2136 | 1273 | 59.6 | |
| 11 | 20 | 996 | 708 | 71.08 | 1966 | 1120 | 56.97 | |
| 12 | 20 | 707 | 377 | 53.32 | 1353 | 614 | 45.38 | |
| 13 | 20 | 751 | 353 | 47.00 | 1153 | 517 | 44.84 | |
| 14 | 20 | 309 | 177 | 57.28 | 889 | 390 | 43.87 |
Each utterance was categorized as either noncanonical or canonical, and the percentage of canonical utterances (F = .149, p > 0.05) was not significantly different between the Korean- and English-learning children. The results of measurements based on syllable units were consistent with those based on utterances, showing that neither the total number of syllables (F = 0.392, p > 0.05) nor the percentage of canonical syllables (F = 0.016, p > 0.05) was significantly different between the Korean- and English-learning children. Except for one Korean-learning child at 9 months, all the children in the two language groups produced canonical syllables at a rate greater than 15 % of all syllables (see Table 2).
3.2. Canonical utterance shapes
As indicated above, an inventory of canonical shapes was compiled to determine which canonical syllable shapes were predominant. The frequency distribution for length of canonical utterances (in syllables), can be found in Fig. 1, for each language group. Children in both groups most frequently produced two-syllable canonical utterances, followed by one-syllable and three-syllable canonical utterances.
Fig. 1.
The percentage of canonical utterances with different numbers of syllables for the two language groups.
Table 3 is a display of the five predominant utterance shapes and the percentage of each for the Korean- and English-learning children. The three types of canonical utterances that were most frequently produced (i.e., constituting more than 10 % of the canonical utterances shapes) were the same in the two language groups (see Fig. 2)—namely, VCV, CV, and CVCV. Table 4 is a display of the percentage of each of the five predominant utterance shapes for each infant in the two language groups. For English-learning infants, CV was the predominant structure, accounting for approximately 23 % of the inventory. However, for Korean-learning infants, VCV was the predominant structure, accounting for approximately 28 % of the inventory. Chi-square tests were performed to determine whether the five structures occurred at different rates across the language groups. The test showed a significant relation between utterance shape and language group (X (4) = 147.486, p < .001).
Table 3.
Percentage of the five predominant canonical utterance shapes and the Chi-square test results in the two language groups.
| syllable structures | Korean-learning children |
English-learning children |
Statistical values2 |
|||
|---|---|---|---|---|---|---|
| M | SD | M | SD | X2(1) | p | |
| VCV | 27.72 | 13.80 | 17.13 | 8.65 | ||
| CV | 18.91 | 10.03 | 22.52 | 11.52 | 120.01 | < .00001 |
| CVCV | 13.28 | 8.69 | 13.28 | 8.69 | 6.72 | .0095 |
| CVV1 | 6.32 | 3.91 | 6.24 | 4.82 | 27.46 | < .00001 |
| VCVCV | 3.71 | 1.78 | 5.71 | 3.17 | 44.58 | < .00001 |
Note.
The utterance shape, CVV is a two-syllable shape (e.g., [pai], [mΛa]).
The Chi-square values of each of 4 paired comparisons with VCV structure type.
Fig. 2.
Percentage of occurrence of the three most frequently occurring utterance shapes by Korean- and English-learning children.
Table 4.
Percentage of utterance shape for each infant in the two language groups.
| Language group | Child No. | age (month) | % of each utterance shape |
||||
|---|---|---|---|---|---|---|---|
| VCV | CV | CVCV | CVV | VCVCV | |||
|
|
|
|
|||||
| Korean-learning children | 1 | 9 | 58.51 | 15.96 | 0.00 | 1.06 | 3.19 |
| 2 | 10 | 35.25 | 12.23 | 2.88 | 5.76 | 1.44 | |
| 3 | 11 | 10.61 | 17.68 | 9.60 | 6.06 | 4.55 | |
| 4 | 12 | 23.55 | 14.67 | 11.20 | 10.81 | 5.41 | |
| 5 | 13 | 32.03 | 8.59 | 17.19 | 10.42 | 5.21 | |
| 6 | 14 | 50.00 | 11.76 | 4.90 | 0.98 | 0.98 | |
| 7 | 15 | 20.06 | 42.36 | 12.42 | 4.78 | 2.55 | |
| 8 | 16 | 17.86 | 15.82 | 11.73 | 14.29 | 2.55 | |
| 9 | 17 | 21.68 | 29.81 | 8.94 | 5.42 | 3.52 | |
| 10 | 18 | 22.39 | 36.42 | 9.55 | 9.55 | 3.58 | |
| 11 | 19 | 20.55 | 17.40 | 29.91 | 3.26 | 7.22 | |
| 12 | 20 | 34.25 | 16.02 | 25.97 | 3.87 | 6.08 | |
| 13 | 21 | 9.89 | 10.45 | 22.74 | 8.62 | 2.97 | |
| 14 | 21 | 31.43 | 15.62 | 18.86 | 3.62 | 2.67 | |
| English-learning children | 1 | 9 | 15.09 | 16.04 | 4.72 | 1.89 | 7.55 |
| 2 | 10 | 10.98 | 35.84 | 8.09 | 9.25 | 1.73 | |
| 3 | 11 | 3.01 | 28.31 | 12.05 | 13.86 | 1.81 | |
| 4 | 12 | 9.72 | 18.22 | 12.96 | 2.83 | 5.26 | |
| 5 | 13 | 8.67 | 32.41 | 10.93 | 5.90 | 4.90 | |
| 6 | 14 | 27.17 | 6.52 | 9.24 | 10.33 | 4.89 | |
| 7 | 16 | 23.62 | 5.53 | 39.70 | 4.02 | 4.52 | |
| 8 | 16 | 29.29 | 26.26 | 7.58 | 3.03 | 8.08 | |
| 9 | 16 | 20.05 | 28.07 | 22.99 | 1.60 | 9.89 | |
| 10 | 19 | 14.67 | 37.95 | 9.42 | 6.46 | 7.67 | |
| 11 | 20 | 8.33 | 19.49 | 16.38 | 17.23 | 2.12 | |
| 12 | 20 | 31.03 | 16.18 | 17.51 | 1.06 | 11.94 | |
| 13 | 20 | 22.38 | 37.68 | 19.26 | 4.82 | 2.27 | |
| 14 | 20 | 15.82 | 6.78 | 9.60 | 5.08 | 7.34 | |
Considering the omnibus nature of chi-square tests (e.g., Thompson, 1988), we conducted follow-up tests. First, we computed raw residual and standard residuals for each cell, since cell-by-cell comparison of observed and expected values enables us to better understand the contribution of each cell to the results (Agresti, 2007; Delucchi, 1993). CV showed the largest raw and standard residuals, 179.6 and 13.4, respectively. VCV was identified as the cell with the second largest raw and standard residuals, 167 and 12.9, respectively. These results indicate that English-learning infants produced more CV and fewer VCV than would be expected by chance. Conversely, Korean-learning infants produced more VCV and fewer CV than would be expected by chance.
Subsequently, Chi-square tests were conducted in paired comparisons among the five utterance shapes to test relations between the utterance shapes and the language groups. Table 3 shows the Chi-square test results for each of four paired comparisons selected for the VCV utterance shape. Each comparison was two-by-two with the two language groups in the columns and the two utterance shapes (e. g., VCV vs CV) in the rows. Specifically, Chi-squares for 8 of the 10 possible paired comparisons were statistically significant at p < .01 (CV vs. VCVCV and CVV vs. VCVCV were not significant). Five of the 10 comparisons were statistically significant with Bonferroni adjustment (p < .005): VCV vs CV; VCV vs. CVV; VCV vs. VCVCV; CV vs. CVCV; and CVCV vs. VCVCV. Notably, VCV vs. CV showed the largest standard residual value and the largest Chi-square statistic, indicating a major discrepancy between the observed and expected values—i.e., the strongest single trend was that the Koreans produced more VCV shapes while the Americans produced more CVs.
4. Discussion
This study conducted cross-linguistic comparisons of vocal development in Korean- and English-learning children in terms of volubility and syllabic shapes of canonical utterances. The vocalization samples were acquired from Korean- and English-learning children using the LENA system in the child’s home setting over a full day-long period. In this way, the research was based on a large and representative sampling of vocalizations collected in a natural environment using identical procedures for both language groups.
No cross-linguistic differences were found in volubility or in the proportion of advanced vocalization types—that is, canonical utterances accounted for a similar percentage of all utterances for the two language groups. In both cases the transcriptions were made free of any possible cues to the ambient language. The percentages of canonical syllables were also not significantly different between children of the two gender- and age-matched language groups. Both language groups showed high percentages of canonical syllables, as should be expected for children in this age range (between 9 and 21 months). Except for one Korean-learning child at 9 months (whose rate of producing canonical syllables was 13 %), the percentages of canonical syllables in all the children in both language groups were greater than 15 %. Typically developing children generally have been reported to produce canonical babbling at greater than 15 % by no later than 10 months of age (Lynch et al., 1995a, Lynch et al., 1995b) based on laboratory samples selected for high volubility. The one Korean-learning child whose percentage of canonical syllables was below 15 % should not, we think, be considered atypical in speech development considering his age and the different sampling method of the current study compared to that of Lynch et al. (1995a, Lynch et al. 1995b). A recent study suggests notably lower percentages of canonical syllables in samples randomly selected from home recordings than percentages determined from laboratory samples in prior studies (Oller et al. 2020).
The present study focused on utterance shapes, hypothesizing that we might identify cross-linguistic differences, an effect of the ambient language, manifest in different preferred utterance shapes. The results revealed a language-specific preference as well as similarities of infant vocalization. Both the Korean- and English-learning children most frequently produced two-syllable canonical utterances followed by one-syllable and three-syllable canonical utterances. Three types of canonical shapes, VCV, CV, and CVCV, each accounted for more than 10 % of canonical utterances in both groups. The striking difference between the Korean- and English-learning children was seen in these most predominant shapes. English-learning children produced CV most frequently whereas Korean-learning children produced VCV most frequently. These patterns were significantly different, and showed the largest differences between the groups in terms of utterance shapes.
The results from English-learning children are consistent with those of Kent and Bauer (1985), who examined the most frequent syllable shapes and vocalization types during the babbling period. The study reported that CV shapes predominated over VCV, VC, and CVC shapes in English-acquiring children. In addition, literature that has investigated vocal development in Korean-learning children has consistently reported that they produced VCV and VC syllable structures more frequently than CV between 9 and 18 months of age (Ha, 2017; Ha et al., 2014). For example, Ha et al. (2014) investigated changes in vocal development among 5- to 20- month-old children acquiring Korean using the Korean-translated version of the Stark Assessment of Early Vocal Development-Revised (SAEVD-R; Nathani et al., 2006). The vocalizations in the SAEVD-R were classified into five hierarchical levels, of which Levels 4 and 5 are characterized by the emergence of consonants and vowels in canonical syllable forms and more complex syllables structures. In particular, Level 5 indicates complex syllables other than CV, namely VC, VCV, and VCVCV. Ha et al. (2014) found that Korean-learning children produced a relatively higher proportion of Level 5 than Level 4 syllable structures between 9 and 15 months of age. The authors assumed that the result was different from English-learning children in the study of Nathani et al. (2006), although direct comparisons were not possible. But it was clear that Korean-learning children produced vocalizations beginning with vowels more frequently than expected based on the pattern of English-learning infants. The present study supports the assumption that Korean- and English-learning children showed different preferences for utterance shapes.
Utterances that sound like words appear to engender positive responses from caregivers in social contexts. Such positive contingent responses have been reported to yield increased production of such utterance shapes (Goldstein et al., 2003). In particular, caregivers tend to show more enthusiastic and selective responses to utterances that sound like mommy and daddy; in the Korean language, this would be [Λma] for mommy and [ap*a] for daddy. Goldstein and Schwade (2008) showed that infants given contingent feedback by caregivers rapidly restructured their babbling, incorporating phonological patterns from caregivers’ speech. The infants in contingent feedback conditions did not produce the same phonetic elements their mothers modeled, but rather produced utterances with more resonant vowels or CV syllables. The researchers reasoned that the new vocalizations of the infants in the contingent condition shared infraphonological form, but not phonetic content, with their caregivers’ speech. They speculated that infants may have imitated at a more abstract level of speech, such as the category “fully resonant vowel” or “CV combination,” rather than targeting the phonemes of their mothers’ contingent utterances. Infants’ utterances and caregivers’ contingent responses and imitation of babbling appear to lead to vocal learning, create linkages between meaning and babbling, and ultimately help infants develop words.
Prior literature shows that most words in Korean children’s first word lists begin with a vowel and involve a VCV disyllable (Ha & Pi, 2016; 2018). Target words with VCV shape account for approximately 67 % of the lexicons of Korean-learning children between the ages of 12 and 17 months (Ha & Pi, 2018). These phonological characteristics of Korean children’s early vocabulary are strikingly different from those of American children: According to Stoel-Gammon (1998), vowel-initial words only account for approximately 12 % of the lexicons of English-learning children between the ages of 11 and 19 months. Indeed, Stoel-Gammon (1998) reports that the most predominant word shapes reported by parents on the MacArthur Communicative Development Inventories (Fenson et al., 1993) for English-learning children aged 11–30 months are CVC and CVCV. Therefore, we speculate that the predominant utterance shape of VCV in Korean-learning children is evidence of an ambient-language effect based on “lexically influenced vocalizations” (Lee et al., 2017). The results of the present study also are consistent with the possibility that ambient-language effects occur only after the onset of canonical babbling, and especially in emergent words. However, further investigations are needed to confirm this speculation. We are engaged in further investigation on this point. In all these efforts there is a need to distinguish as well as possible between pure babbling and words. It is important to acknowledge that such a distinction cannot be absolute. There is always the possibility that utterances of an infant are influenced by lexical learning even if listeners do not recognize that the infant’s utterances are targeted at words. However, as in the present work, we can at least ensure in our work that identifiable words are counted as such.
This is our first effort to directly compare vocalizations in English- and Korean-learning infants to develop a better understanding of the role of perceptual input from the ambient language in early vocal patterns. In order to confirm that the VCV utterance shape in Korean-learning children is lexically influenced, longitudinal studies are needed to track the trajectory of predominant babbling shapes, which may change as children adapt to communication—specifically to the ambient lexicon. Such longitudinal studies would help shed light on whether infants’ babbling is directed toward productions that resemble words rather than proceeding independently of the perceived lexicon. We also plan to compare child–mother interactions across different languages and cultures in order to explore whether mothers provide differential responses to infants’ babbling, as well as whether their differential responses are related to commonly occurring very early words that are known to have different shapes across languages (e.g., infant utterances that sound like mommy or daddy in English, [Λma] or [ap*a] in Korean). In addition, further cross-linguistic comparisons in phonetic and suprasegmental characteristics are warranted to understand the mechanisms of the developmental change of vocalizations during the first two years of life and to examine universal and language-specific characteristics of vocalizations.
Acknowledgments
This research was supported by the National Research Foundation of Korea Grant funded by the Korean Government (No. NRF-2016S1A2911363).
Footnotes
CRediT authorship contribution statement
Seunghee Ha: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing - original draft, Writing - review & editing. Cynthia J. Johnson: Data curation, Investigation, Methodology, Resources, Validation, Writing - review & editing. Kimbrough D. Oller: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Supervision, Validation, Writing - review & editing. Hyunjoo Yoo: Formal analysis, Methodology, Validation, Writing - review & editing.
References
- Agresti A. (2007). An introduction to categorical data analysis. Hoboken, NJ: Wiley. [Google Scholar]
- Aslin RN, Saffran JR, & Newport EL (1998). Computation of conditional probability statistics by 8-month-old infants. Psychological Science, 9(4), 321–324. [Google Scholar]
- Atkinson K, MacWhinney B, & Stoel C. (1968). An experiment on the recognition of babbling. ERIC Clearinghouse. [Google Scholar]
- Bergelson E, & Swingley D. (2012). At 6–9 months, human infants know the meanings of many common nouns. Proceedings of the National Academy of Sciences, 109 (9), 3253–3258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chung H, Kong EJ, Edwards J, Weismer G, Fourakis M, & Hwang Y. (2012). Cross-linguistic studies of children’s and adults’ vowel spaces. The Journal of the Acoustical Society of America, 131(1), 442–454. [DOI] [PMC free article] [PubMed] [Google Scholar]
- de Boysson-Bardies B, & Vihman MM (1991). Adaptation to language: Evidence from babbling and first words in four languages. Language, 67(2), 297–319. [Google Scholar]
- de Boysson-Bardies B, Hallé P, Sagart L, & Durand C. (1989). A cross-linguistic investigation of vowel formants in babbling. Journal of Child Language, 16(1), 1–17. [DOI] [PubMed] [Google Scholar]
- Delgado RE, Buder EH, & Oller DK (2010). Action analysis coding and training (AACT). Miami, FL: Intelligent Hearing Systems. [Google Scholar]
- Delucchi KL (1993). On the use and misuse of chisquare. In Keren G, &C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences (pp. 294–319). Hillsdale, NJ: Lawrence Erlbaum. [Google Scholar]
- Dunst CJ, Gorman E, & Hamby DW (2010). Effects of adult verbal and vocal contingent responsiveness on increases in infant vocalizations. Cell Reviews, 3, 1–11. [Google Scholar]
- Edwards J, & Beckman ME (2008). Some cross-linguistic evidence for modulation of implicational universals by language-specific frequency effects in phonological development. Language Learning and Development, 4(2), 122–156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fenson L, Dale P, Reznick J, Thal D, Bates E, Hartung J, Pethick S, & Reilly J. (1993). MacArthur Communicative Development Inventories: User’s Guide and Technical Manual. Baltimore, MD: Paul H. Brookes Publishing Co. [Google Scholar]
- Eimas PD, Siqueland ER, Jusczyk P, & Vigorito J. (1971). Speech perception in infants. Science, 171(3968), 303–306. [DOI] [PubMed] [Google Scholar]
- Engstrand O, Williams K, & Lacerda F. (2003). Does babbling sound native? Listener responses to vocalizations produced by Swedish and American 12-and 18-month-olds. Phonetica, 60(1), 17–44. [DOI] [PubMed] [Google Scholar]
- Fowler CA, Smith MR, & Tassinary LG (1986). Perception of syllable timing by prebabbling infants. The Journal of the Acoustical Society of America, 79(3), 814–825. [DOI] [PubMed] [Google Scholar]
- Goldstein MH, & Schwade JA (2008). Social feedback to infants’ babbling facilitates rapid phonological learning. Psychological Science, 19(5), 515–523. [DOI] [PubMed] [Google Scholar]
- Goldstein MH, King AP, & West MJ (2003). Social interaction shapes babbling: Testing parallels between birdsong and speech. Proceedings of the National Academy of Sciences, 100(13), 8030–8035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gros-Louis J, & Miller J. (2018). From “ah” to “bah”: Social feedback loops for speech sounds at key points of developmental transition. Journal of Child Language, 45, 807–825. [DOI] [PubMed] [Google Scholar]
- Gros-Louis J, West M, & King A. (2014). Maternal Responsiveness and the development of directed vocalizing in social interactions. Infancy, 19(4), 1–24. [Google Scholar]
- Ha S. (2017). Longitudinal study of vocal development in 9- to 18-month-old children acquiring Korean. Communication Sciences and Disroders, 22, 435–444. [Google Scholar]
- Ha S, & Pi M. (2016). Consonant frequency and phonological characteristics of Eojeols in spontaneous speech samples in 18 to 30-month-old Korean children. Communication Sciences & Disorders, 21, 567–579. [Google Scholar]
- Ha S, & Pi M. (2018). Phonological characteristics of early lexicon in Korean-acquiring children. Communication Sciences & Disorders, 23, 829–844. [Google Scholar]
- Ha S, Seol A, & Pae S. (2014). Vocal development of typically developing infants. Journal of the Korean Society of the Speech Sciences, 6, 161–169. [Google Scholar]
- Jusczyk PW (2002). How infants adapt speech-processing capacities to native-language structure. Current Directions in Psychological Science, 11(1), 15–18. [Google Scholar]
- Kent RD, & Bauer HR (1985). Vocalizations of one-year-olds. Journal of Child Language, 12(3), 491–526. [Google Scholar]
- Kern S, Davis BL, MacNeilage PF, Koçbas D, Kuntay A, & Zink I. (2009). Crosslinguistic similarities and differences in babbling: Phylogenetic implications. In Hombert JM (Ed.), Towards the origins of language and languages. New York: Academic Press. [Google Scholar]
- Lee CC, Jhang Y, Chen LM, Relyea G, & Oller DK (2017). Subtlety of ambient-language effects in babbling: A study of english-and Chinese-learning infants at 8, 10, and 12 months. Language Learning and Development, 13(1), 100–126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee SAS, Davis B, & MacNeilage P. (2010). Universal production patterns and ambient language influences in babbling: A cross-linguistic study of Korean-and English-learning infants. Journal of Child Language, 37(2), 293–318. [DOI] [PubMed] [Google Scholar]
- Lynch MP, Oller DK, Steffens ML, Levine SL, Basinger DL, & Umbel V. (1995a). Onset of speech-like vocalizations in infants with Down syndrome. American Journal on Mental Retardation, 100, 68–86. [PubMed] [Google Scholar]
- Lynch MP, Oller DK, Steffens ML, & Buder EH (1995b). Phrasing in prelinguistic vocalizations. Developmental Psychobiology, 28, 3–23. [DOI] [PubMed] [Google Scholar]
- McCune L, & Vihman MM (1987). Vocal motor schemes. Papers and Reports on Child Language Development, 26, 72–79. [Google Scholar]
- McCune L, & Vihman MM (2001). Early phonetic and lexical development. Journal of Speech Language and Hearing Research, 44(3), 670–684. [DOI] [PubMed] [Google Scholar]
- Milenkovic P. (2001). TF32 [Computer software]. Madison, WI: University of Wisconsin- Madison. [Google Scholar]
- Mitchell PR, & Kent RD (1990). Phonetic variation in multisyllable babbling. Journal of Child Language, 17(2), 247–265. [DOI] [PubMed] [Google Scholar]
- Morton J, Marcus S, & Frankish C. (1976). Perceptual centers (P-centers). Psychological Review, 83(5), 405. [Google Scholar]
- Nathani S, Ertmer DJ, & Stark RE (2006). Assessing vocal development in infants and toddlers. Clinical Linguistics & Phonetics, 20, 351–369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oller DK (2000). The emergence of the speech capacity. Mahwah, NJ: Erlbaum. [Google Scholar]
- Oller DK, Griebel U, Bowman DD, Bene ER, Long HL, Yoo H, & Ramsay G. (2020). Infant boys found to be more vocal than infant girls. Current Biology, 30 (10), PR426–R427. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pae S, & k KC (2011). Korean MacArthur-Bates Communicative Development Inventories (K M-B CDI). Seoul: Mindpress. [Google Scholar]
- Rivera-Gaxiola M, Silva-Pereyra J, & Kuhl PK (2005). Brain potentials to native and non-native speech contrasts in 7-and 11-month-old American infants. Developmental Science, 8(2), 162–172. [DOI] [PubMed] [Google Scholar]
- Rvachew S, Alhaidary A, Mattock K, & Polka L. (2008). Emergence of the corner vowels in the babble produced by infants exposed to Canadian English or Canadian French. Journal of Phonetics, 36(4), 564–577. [Google Scholar]
- Saffran JR, Johnson EK, Aslin RN, & Newport EL (1999). Statistical learning of tone sequences by human infants and adults. Cognition, 70(1), 27–52. [DOI] [PubMed] [Google Scholar]
- Snow. (1977). The development of conversation between mothers and babies. Journal of Child Language, 4, 1–22. [Google Scholar]
- Stoel-Gammon C. (2011). Relationships between lexical and phonological development in young children. Journal of Child Language, 38, 1–34. [DOI] [PubMed] [Google Scholar]
- Stoel-Gammon C. (1998). Sounds and words in early language acquisition: The relationship between lexical and phonological development. In Paul R. (Ed.), Exploring the speech-language connection (pp. 25–52). Baltimore, MD: Paul H. Brookes Publishing Co. [Google Scholar]
- Thevenin DM, Eilers RE, Oller DK, & Lavoie L. (1985). Where’s the drift in babbling drift? A cross-linguistic study. Applied Psycholinguistics, 6(1), 3–15. [Google Scholar]
- Thompson B. (1988). Misuse of chi-square contingency table test statistics. Educational and Psychological Research, 8, 39–49. [Google Scholar]
- Tincoff R, & Jusczyk PW (1999). Some beginnings of word comprehension in 6-month-olds. Psychological Science, 10(2), 172–175. [Google Scholar]
- Von Hapsburg D, & Davis BL (2006). Auditory sensitivity and the prelinguistic vocalizations of early-amplified infants. Journal of Speech Language and Hearing Research, 49, 809–822. [DOI] [PubMed] [Google Scholar]
- Werker JF, & Tees RC (1984). Cross-language speech perception: Evidence for perceptual reorganization during the first year of life. Infant Behavior &Development, 7(1), 49–63. [Google Scholar]
- Whalen DH, Levitt AG, & Goldstein LM (2007). VOT in the babbling of French- and English-learning infants. Journal of Phonetics, 35(3), 341–352. [DOI] [PMC free article] [PubMed] [Google Scholar]


