Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Mar 13.
Published in final edited form as: Lang Learn Dev. 2011 Jul 18;7(3):202–225. doi: 10.1080/15475441.2011.564569

An Articulatory Phonology Account of Preferred Consonant-Vowel Combinations

Sara Giulivi 1,2, D H Whalen 1, Louis M Goldstein 1,3, Hosung Nam 1, Andrea G Levitt 1,4
PMCID: PMC3596049  NIHMSID: NIHMS441604  PMID: 23505343

Abstract

Certain consonant/vowel combinations (labial/central, coronal/front, velar/back) are more frequent in babbling as well as, to a lesser extent, in adult language, than chance would dictate. The “Frame then Content” (F/C) hypothesis (Davis & MacNeilage, 1994) attributes this pattern to biomechanical vocal-tract biases that change as infants mature. Articulatory Phonology (AP; Browman and Goldstein 1989) attributes preferences to demands placed on shared articulators. F/C implies that preferences will diminish as articulatory control increases, while AP does not. Here, babbling from children at 6, 9 and 12 months in English, French and Mandarin environments was examined. There was no developmental trend in CV preferences, although older ages exhibited greater articulatory control. A perception test showed no evidence of bias toward hearing the preferred combinations. Modeling using articulatory synthesis found limited support for F/C but more for AP, including data not originally encompassed in F/C. AP thus provides an alternative biomechanical explanation.

1. Introduction

Babbling has been a source of evidence about language acquisition, but it is also a source of many puzzles. The issues concerning the nature of babbling and the way it relates to speech are still intriguing and open to discussion. Is babbling guided by innate linguistic factors or entirely by biomechanical factors? It is clear that biomechanical factors must be involved, given that the vocal tract is a biomechanical system. So, is it also possible to attribute language-like features to babbling? Further, are the biomechanical influences on babbling related to the production of speech per se, or do they derive from movement patterns shared with other functions, such as feeding?

Jakobson (1968) proposed that babbling represents a period in which the entire repertoire of spoken language is randomly explored, in a sort of “purposeless egocentric soliloquy of the child” (Jakobson, 1968: 22). In his view, babbling has nothing in common with early meaningful speech, that reflects, instead, universal markedness relations in the progressive differentiation of a series of phonological oppositions. In conflict with Jakobson, Locke (1983) proposed the first explicitly biological approach to phonological acquisition, according to which the articulatory gestures produced by the child exhibit innate universal characteristics, regardless of whether they belong to babbling or first words. The babbling repertoire is seen as a universal set of sounds from which first words can emerge. Thus, while Jakobson assumes that universality reflects the workings of distinctive features, Locke (1993) posits a greater influence of biomechanical limitations. Others find that the evidence is too limited to make a firm decision (e.g., Oller, 2000). Even knowing whether the “segments” of babbling have any theoretical standing is debatable. Transcribing babbling with segmental equivalents does not imply that infants have segmental targets. Adult listeners, whether parents or researchers, react to babbling as being similar to language and thus transcribable. However, what the child is controlling in the babbled utterances is still an open question.

Other evidence has shown target language influences on babbling in more narrow analyses (e.g., Oller, Wieman, Doyle, & Ross, 1976). In one study, adult speakers of French were able to identify whether 8-month-old babblers were learning French or not (Boysson-Bardies, Sagart, & Durand, 1984). The vowel space, as defined acoustically, that is used in babbling is influenced by the space of the ambient language (Boysson-Bardies, Hallé, Sagart, & Durand, 1989). One study of four languages revealed differences in the segmental inventories (Boysson-Bardies & Vihman, 1991), despite the earlier reports of a lack of effect. Intonational differences can be found as early as six month of age (Whalen, Levitt, & Wang, 1991). In all these cases, adaptation to the target language could be couched in terms of featural adaptation or sensorimotor resonance.

Common tendencies for the relative frequency of certain syllable structures have also been found in babbling and dictionary entries. Most babbling includes CV syllables, while consonant clusters and final consonants are rare (e.g., Oller, 2000). Here as well, the patterns could be explained in either phonological or biomechanical/dynamical frameworks. The unmarked nature of CV syllables across languages is quite clear (despite the possibly exceptional case of the Australian language known as Arrernte; Breen & Pensalfini, 1999). However, being consistent with a posited linguistic universal does not ensure that the phonological account is correct; other constraints may also play a role. Nam et al (2009) show that the CV preference emerges in a simple Hebbian learning model, due to the intrinsic accessibility of the in-phase mode of coupling (C and V have been shown to be coupled in-phase in CV and anti-phase in VC, Goldstein et al, 2006). Linguistic universals are generally theory dependent and difficult to evaluate (e.g., Newmeyer, 2005). The internal structural constraints of linguistic systems could be the cause (e.g., Hayes, 2004), but the origin could be biomechanical, either as a pattern unaffected by linguistic organization or as the source for the linguistic constraint itself. That is, if the source is biomechanical, there may be no need to posit a linguistic universal to explain the pattern.

Certain combinations of consonants and vowels have been discovered by MacNeilage and Davis to be preferred in both babbling and in dictionary counts of word types (Davis & MacNeilage, 1994, 1995; MacNeilage, 1998, 2008). Those authors observed that in early speech, labial consonants preferentially co-occur with central vowels, coronal consonants with front vowels, and dorsal consonants with back vowels. Rather than seeking an explanation in the markedness of the features, they proposed that the emergence of such preferred patterns is due to general physiological and anatomical principles. They claimed that the neuromuscular activity related to chewing forms the basis of the preference for the CV combinations mentioned above. The alternation between mouth opening and closing that takes place during mastication was assumed to be phylogenetically and ontogenetically linked to the rhythmic oscillatory movement of the mandible. This oscillation, superimposed on phonation, generates cycles (‘Frames’) that are perceived by the listener as syllables. During the early stages of language development, the infant exerts independent control only on the jaw, while other articulators (lips, tongue, and soft palate) have limited ability, if any, to actively vary their position in the brief span of a syllable. During jaw oscillation, if the tongue is in its resting position, the elevating movement of the mandible will make the lips form a passive constriction and produce a labial consonant, while jaw lowering will produce a central vowel; the authors assume that the tongue must be as distant from the alveolar ridge and the velar region as possible in order for the constriction to occur at the lips. Such a shape would result in a central vowel (MacNeilage & Davis, 1993: 345–346). If the tongue is, by chance, in an advanced posture, jaw elevation will produce a coronal constriction, since the front part of the tongue will make contact with the palate, while jaw lowering will produce a front vowel. In case of a (random) retraction of the tongue, jaw elevation will produce a velar constriction, and jaw lowering a back vowel (MacNeilage & Davis, 2000a: 287). The ability to exert control on articulators for the production of consonants and vowels (‘Content’), instead, is claimed to be acquired only in later stages of language acquisition, when motor control skills develop and the infant acquires a greater ability to master fine local movements of articulators within the time scale of a segment. The achievement of greater motor control allows inter-frame variation. According to MacNeilage & Davis, a first step from canonical to variegated babbling is represented by a particular intersyllabic pattern that seems to be present in infants’ first words, as well as in modern languages. It consists of a sequence made up of labial stop consonant - vowel - coronal stop consonant. The preference for this kind of pattern is called the ‘Labial-Coronal effect’ (‘LC effect’) and is considered to be the first step toward intersyllabic serial complexity (MacNeilage, Davis, Kinney, & Matyear, 1999, 2000). The LC pattern seems to be favored not only in babbling and early words, but also in adult speech (MacNeilage, et al., 1999). This may be due to a preference for easier articulatory combinations even after content has been mastered.

A number of studies have also shown that the three co-occurrence patterns deriving from ‘frame dominance’ are present in adult languages. Maddieson and Precoda (1992), while finding no overall evidence of “preference for articulatory convenience” (p. 55) do show “vowel deviation scores” that are consistent with the F/C predictions. More extensively, MacNeilage, Davis, Kinney, and Matyear (1999) analyzed data from dictionaries of 10 languages, namely English, Estonian, French, German, Hebrew, Japanese, New Zealand Maori, Quichua, Spanish and Swahili. They report that the ratios of observed to expected CV co-occurrence frequencies are greater than 1.0 only for the CV types favored in babbling and early words. Within those combinations, the labial-central combination is favored in 7 languages out of 10, the coronal-front in 7 languages out of 10 and the dorsal-back vowel in 8 languages out of 10. Our own study of Mandarin dictionary counts showed partial support, with diagonals of 0.93, 0.94 and 1.62 (Whalen, et al., submitted). The numbers for Mandarin are difficult to interpret given the affrication of coronals before high front vowels in the phonology. This not only removes a large number of potentially F/C compatible syllables, it changes the proportions in the other cells as well. With that in mind, the majority of reported adult language “type” counts show combinations that are in the preferred categories.

Dictionaries yield “type” counts, that is, frequencies of individual items regardless of their frequency of usage. Spoken corpora of adult speech are more similar to babbling by being “token” counts, that is, the frequency of occurrence of CV combinations is influenced by the relative frequency of particular lexical items. Babbling does not, as far as we can determine, contain types, so a token count is more comparable. Further, the token count is likely to reflect the infant’s experience of the language as well. In an analysis of two English corpora, two from French and one from Mandarin, Whalen et al. (submitted) found that the token counts gave very different results for the three languages. English had two out of three diagonals greater than one; French had all three greater; and Mandarin had one out of three. Further, the English and Mandarin patterns were exactly opposite of each other. In an analysis of all nine cells (rather than just the diagonals), the English results were significantly negatively correlated with the babbling data, showing the universal tendency overshadowing the properties of the ambient language. The CV preferences posited by the F/C account thus can be modified in token counts by language-specific word frequency patterns, so any preference for it in babbling must be strong, because that pattern is not consistently represented in the spoken input.

According to the F/C account, the relatively high frequency of the three CV patterns in adult languages, as well as in babbling and early speech, suggests that those patterns have been present since the origin of speech in hominids (Davis, MacNeilage, & Matyear, 2002; MacNeilage, et al., 2000). This suggestion seems to be strengthened by a preference for the same CV patterns in a corpus of protolanguage words (MacNeilage & Davis, 2000b). The presence in adult languages of CV co-occurrences involving shared positions of the tongue, like the coronal-front and the velar-back combinations, is attributed to a basic common motor constraint favoring articulatory ease, namely a constraint on the amount of lingual movement that needs to be made between the production of a consonant and the production of the following vowel (MacNeilage & Davis, 1993, 2000a). The presence in languages of the three basic patterns is also claimed to constitute for the infant “a ready-made initial access to the ambient language” (MacNeilage, et al., 2000: 161). The English data in Whalen et al. (submitted) suggests that the pattern may provide access to language in general but not necessarily the ambient language.

The literature on mandibular control is divided on the question of whether speech and mastication use similar control mechanisms or not. Some studies find a great deal of similarity (Kent, Mitchell, & Sancier, 1991), some find small differences (Ostry & Flanagan, 1989) while others find fundamental differences (Moore & Ruark, 1996; Moore, Smith, & Ringel, 1988; Ostry, Vatikiotis-Bateson, & Gribble, 1997). In a recent study by Steeve, Moore, Green, Reilly and McMurtrey (2008), an electromyographic analysis of the oromandibular activity during sucking, chewing and babble was conducted on fifteen 9-month-old normally developing children. The results were compared to those obtained from 12- and 15-month olds and from adults. It was observed that in 9-month-old children the coordinative patterns observed during sucking were the same as in older toddlers, and that the patterns observed for chewing and babble were similar to those observed in 15-month-olds and in adults (with a progressive refinement, rescaling and increase in stability with age). A comparison across behaviors showed that distinct, identifiable coordinative patterns were observable respectively for sucking, chewing and babble. The authors take this result as suggesting that the motor organization at the basis of babbling and speech may not be related to the coordinative organization of nutrition-related tasks, such as sucking or chewing. Given that a lack of difference is often difficult to interpret, the positive evidence of differences in organization must be taken seriously.

Articulatory Phonology (AP)(Goldstein, Byrd, & Saltzman, 2006) provides an alternative, but still physiologically motivated explanation for the preference for certain kinds of CV associations in early speech. That is, in AP, the physiological basis is posited to be in the interaction of the articulators during babbling, not due to jaw motion alone (as in the F/C account.) According to AP, the activity of the vocal tract can be analyzed in terms of ‘articulatory gestures’, namely actions of constriction and release, by the different vocal organs, organized into temporally overlapping structures (Browman & Goldstein, 1989; Goldstein & Fowler, 2003). Individual gestures can be coordinated, or ‘phased’, by means of planning oscillators associated with the gestures that are coupled in intrinsically stable ways, corresponding to the stable modes exemplified by many simple dynamical systems, and in control and coordination of rhythmic movement (Haken, Kelso, & Bunz, 1985; Turvey, 1990). Planning oscillators associated with consonant gestures and vowel gestures can be coupled either in-phase or anti-phase. In CV syllables, the two gestures are hypothesized to be coupled in-phase (the most stable coupling mode). Since gestures are assumed to be triggered at a fixed phase of their associated oscillators, this means that they are triggered synchronously, and begin to control movements of the vocal tract articulators at the same time within the limits of articulatory compatibility. The gestures of two consecutive consonants, by contrast, are coupled in anti-phase mode, and so are triggered sequentially. This is due to the fact that, if the onset consonant cluster of, say ‘spoon,’ were produced synchronously, the tongue tip gesture for /s/ would be hidden by the lip gesture for /p/ and would not be perceptually recoverable. Instead, as Goldstein at al. (2006) point out “a minimal requirement for a gestural molecule to emerge as a stable, shared structure in a speech community is that the gestures produced are generally recoverable by other members of the community” (p. 232). Consonant gestures and vowel gestures can be produced synchronously and still be recoverable due to the intrinsically different properties that characterize them. Vowel gestures are less constricted than consonant gestures, so that a consonant and a vowel gesture can be produced at the same time without the vowel interfering with the acoustic properties of the consonant. Furthermore, vowel gestures last longer than consonant gestures, so that in isolated CV syllables, there is a period without overlap between consonant and vowel gestures.

Goldstein et al. (2006) have argued the favored CV combinations result from inherent articulatory compatibility, so that coordinated movements to produce the C and V constrictions can begin synchronously, without any adjustments to their in-phase coupling. Articulations are compatible if they are either mechanically independent (as with the lips and tongue in /ba/) or constricted at the same locations (as with /di/ or /gu/). Non-preferred CV combinations involve articulatory incompatibilities between the C and V, so that their movements cannot begin completely synchronously without mutual interference (as with /du/ or /gi). The Articulatory Phonology account is thus also based in the biomechanics of speech, but it makes different assumptions about the relevant control structures.

Both the F/C and the AP accounts of the preferred patterns—indeed, the existence of the pattern itself—depend on the division of the babbled utterances into regions of the vocal tract that are later to be related to linguistic features. Although it is not necessary to assume that the infant has adult-like categories in order to apply these labels in transcription, it is necessary to acknowledge that the transcription is an imposition and not a reflection of the infant’s intention. It is likely that intention only becomes relevant when the infant begins using an articulatory routine repeatedly, but the nature of the earlier productions remains an interesting issue (Oller, 2000). There is enough agreement in transcriptions to give substance to the analyses, and enough of a relationship to biomechanics and to the ambient language to lend credence to the transcriptions. We can expect that new techniques for measuring speech production, such as ultrasound, will provide further refinement of our understanding of the substance of babbling, and brain imaging techniques such as near-infra-red spectroscopy (NIRS) may even provide evidence about intent, but transcriptions continue to provide our best current evidence of the development of infant speech. All results based on transcription, such as our own and MacNeilage and Davis’s, are subject to numerous pitfalls, such as native language bias and the bias toward hearing consonants and vowels in the first place. The current results must be placed within those limitations.

In sum, preference for certain CV co-occurrence patterns appears in babbling, first words and, to a lesser extent, adult language. The F/C account posits that the early ‘frame dominance’ in speech production gives way to greater articulatory control, as the infant “escapes frame dominance” (Davis & MacNeilage, 1995:1208). At later stages, content elements are claimed to emerge in the form of actively independent movements, allowing the production of different vowels and consonants. According to AP certain CV combinations are preferred (more frequent) both in babbling and adult speech as a consequence of the greater level of synergy (and consequent greater ease in syllable articulation) between the articulator gestures involved in the production of those combinations.

If the preference for certain CV syllables were to be attributed only to babbler’s initial lack of articulatory control (as in the F/C perspective), then a decrease in the relative frequencies of those combination should be expected with age, as the infant gains articulatory control.

In order to test whether successive stages of language acquisition are accompanied by a developmental trend in either direction in CV frequency, babbling data from three different ages were analyzed. According to the present view, as babblers mature, they gain more control over their articulators. Such increases in control should, on the F/C approach, lead to reduced (more adult-like) preponderance of the preferred CV combinations. We further explored this question by modeling the F/C and AP accounts via articulatory synthesis.

2. Searching for developmental trends in the frequencies of CV associations

2.1 Method

We analyzed data from the Haskins babbling database (Levitt & Wang, 1991; Whalen, Levitt, Hsiao, & Smorodinsky, 1995; Whalen, et al., 1991), containing longitudinal recordings and transcriptions of 17 babblers from three different linguistic backgrounds: French (6 babblers), English (6 babblers) and Mandarin (5 babblers). Children, all normally developing, were recorded biweekly between the ages of 5 or 6 months and 9 to 13 months. The transcriptions were done by a native speaker of Mandarin, who was also a speaker of English and Russian.

The procedure followed by Davis and MacNeilage (1995) was replicated on our data, in order to test the F/C hypothesis. In particular, three categories of consonants and vowels were taken into consideration. Consonants included only stops, nasals and glides (i.e., excluding fricatives), and were grouped, according to place of articulation, into labials (p, b, m, w), coronals (t, d, n, j) and velars (k, g, ŋ). We note that /w/ has, in adult language, both a labial and a velar component, but we follow MacNeilage and Davis in counting it only as a labial; its frequency is low enough that it would not change the results, had we instead (or additionally) counted it as velar. Vowels were grouped with reference to the front-back dimension of the vocal tract, into central (ʌ, ə, a, θ, ɞ, ʉ, ɨ, ɜ), front (i, ɪ, e, æ, œ, y, Œ, Ø) and back (u, ʊ, ɯ, o, ɔ, ɑ, ɒ, ɐ, ɤ).

We searched the database for all the CV combinations whose consonants and vowels were among the sounds listed above. Given three consonant categories and three vowel categories, we searched, on the whole, for nine different types of CV combinations, namely labial-central (B-A), labial-front (B-I), labial-back (B-U), coronal-central (D-A), coronal front (D-I), coronal-back (D-U), velar-central (G-A), velar-front (G-I) and velar-back (G-U). Henceforth, we will use capital ‘B’, ‘D’ and ‘G’ respectively for labial, coronal and velar consonants; and capital ‘A’, ‘I’, ‘U’ respectively for central, front and back vowels. Note that many of the sounds included, especially the vowels, are not found in the inventory of one or more of the target languages.

During the selection process, syllable position in an utterance was not taken into consideration, and diacritic marks and syllable boundaries (if marked in the transcriptions) were ignored. Segments connected by a ligature were categorized as the first element, and the second was ignored; for example, if a syllable was transcribed as [b ə ⏝ æ] it would be categorized as labial-central.

Utterances that could be classified as words were not excluded from the analysis. However, the number of such utterances in the database is insignificant. During data collection, parents were explicitly asked to avoid any interaction with their children; therefore no word elicitation took place in the sessions. Parent report at the end of the study period for each infant yielded a varying number of words that the parents felt the infant had mastered, but most of the infants had no words reportedly under control.

As a second step, ratios of observed to expected frequencies for the nine CV combinations were generated for each babbler at each age. Next, in order to investigate whether the frequencies of B-A, D-I and G-U combinations would show any developmental trend in the course of language development, we compared:

  1. the ratios of observed to expected frequencies for the favored CV combinations, generated from the English, French and Mandarin babbling data from 6-month-old infants, with those generated from the data of the infants at 9 and 12 months of age. An inventory of the sounds produced by each babbler at each age was also made, in order to find out whether the variety of phones would increase over time, as would be expected if articulatory control were becoming greater.

  2. the ratios generated for the three babbling ages with ratios generated from dictionary data, as representatives of each target language. In particular, for French and English we referred to dictionary data collected by MacNeilage, Davis, Kinney & Matyear (1999), and for Mandarin we analyzed data from an on-line Chinese-English Dictionary - (CEDICT) (www.mandarintools.com/cedict.html). We are aware that a comparison between babbling and dictionary data is between data that are different in nature: Babbling data are tokens, whereas dictionary data are types; a separate study discusses the differences in the ratios based on token counts (Whalen, et al., submitted).

  3. the English ratios generated from the Haskins babbling database to ratios generated for babbling and first words by the authors of the F/C account. Specifically, we make reference to Davis and MacNeilage (1995) for babbling, and to Davis, MacNeilage, Matyear (2002) for first words.

2.2 Transcription reliability

The use of phonetic transcription in the study of infant vocalizations has often proved problematic. The high variability in the degree of well-formedness characterizing infants’ sounds makes transcription intrinsically complicated and challenges inter-transcriber agreement. The need to describe infant sound productions, however, has led to developing different systems of sound classification, as well as new tools aimed at assessing transcription reliability.

Oller and Ramsdell (2006) propose a measure of transcription agreement that takes into account the different ‘weights’ of phonetic differences between segments, on the basis of accepted principles of phonological theory. They analyze transcriptions of 30 infant utterances. After transcription pairing and alignment, they measure the proportion of slots for which transcribers agree on the presence of a segment, and the proportion of phonetic features that are present in the transcriptions of segments belonging to the same slot. They calculate the overall agreement of transcriptions by multiplying the above two measures. In a 0 to 1 scale, they obtain average agreement of .60, whereas with a traditional, ‘unweighted’, measure they obtain agreement of about .40.

Ramsdell et al. (2007) stress the importance of evaluating “canonicity” of infants’ productions and transcriber’s confidence, as factors that can be predictive of transcription agreement levels. Syllables that are deemed by transcribers to represent firm control over the articulation (meeting “infrastructural requirements of well-formedness;” p. 794) are canonical and thus amendable to transcription. Pre-canonical utterances, on the other hand, are deemed too unstable to justify transcription: Agreement levels are low, as are transcriber confidence ratings. Non-canonical CV transitions are judged auditorily to be too slow to constitute adult-like consonant-vowel transitions (Oller, 2000). However, agreement levels on canonicity are also low (Ramsdell, et al., 2007:828). In our study reported here, we assume that a consonant transcription is sufficient to place our stimuli in the “canonical” category, and thus amenable to transcription.

Koopmans-van Beinum et al. (2001) propose a transcription system (AMSTIVOC) that analyzes separately the phonatory and articulatory aspects of infants’ productions. They identify 5 different types of phonation (according to whether phonation is continuous or interrupted), and 3 different types of articulation (depending on the number of articulatory movements enacted within one respiratory cycle). For consonants, they consider 3 places of articulations (front, central, back) and 5 manners (fricative and trill; stop, glide, nasal, lateral). Vowels are analyzed with respect to the position of tongue body (front, central, back) and the degree of jaw opening (close, close-mid, open-mid, open). According to the authors, the system allows detailed investigation of infant vocalizations and permits the effective identification of differences in the productions of deaf and hearing children.

Stockman et al. (1981), analyze transcription agreement in the productions of 4 children recorded from 7 to 21 months and coded by three different transcribers. They calculate agreement on identical segment matching and on stop feature matching. They find that inter-transcriber agreement in the identical segment analysis ranges from 13% to 61%. In the stop feature analysis they observe an overall mean agreement increase of 23%. Furthermore, the authors observe that, in general, agreement tends to increase with the child’s age.

In order to investigate transcription reliability in the present study, 10% of the transcribed utterances of the Haskins babbling database were retranscribed by two listeners (one was an adult native speaker of American English and the other, of both Cantonese and American English), both trained in phonetic transcription but unaware either of the language environment of the babblers or of the preferred CV combinations. In line with the purposes of the present analysis, we calculated agreement for those CV syllables in which the consonants were stops, nasals or glides. Agreement between transcribers was calculated on the basis of place of articulation matching. Consonants were classified as labial, dental and alveolar, and vowels as front, central, back. We calculated agreement for consonants and vowels separately, but we also looked at whole syllable matching. We obtained 63% of agreement for consonants and 62% for vowels. The rates are less than those found by MacNeilage and Davis for transcriptions in which the transcribers compared their results and attempted to resolve discrepancies. This rate of agreement is similar to that found by Stockman et al. (1981) for babblers of the same age and with independent transcriptions.1

2.3 Results

Our first step was to count the number of CV co-occurrences present in the Haskins babbling database, across the three categories of consonants and vowels detailed earlier. The counts for each of the three ages (6, 9 and 12 months) are given in Table 1.

Table 1.

Number of occurrences of consonant and vowel places of articulation produced at 6, 9 and 12 months by the French-, English-, and Mandarin-learning babblers.

FRENCH
6 months 9 months 12 months
Cons B D G Tot B D G Tot B D G Tot Grand total
Vow
A 76 34 25 135 20 72 11 103 66 142 24 232 470
I 117 147 70 334 55 218 61 334 54 279 98 431 1099
U 28 14 12 54 3 26 5 34 11 17 10 38 126
Grand total 221 195 107 523 78 316 77 471 131 438 132 701 1695
ENGLISH
6 months 9 months 12 months
Cons B D G Tot B D G Tot B D G Tot Grand total
Vow
A 18 76 45 139 82 147 87 316 114 263 142 519 974
I 93 533 255 881 202 621 254 1077 165 539 316 1020 2978
U 7 22 15 44 25 31 31 87 24 23 34 80 211
Grand total 118 631 315 1064 309 799 372 1480 303 825 492 1619 4163
MANDARIN
6 months 9 months 12 months
Cons B D G Tot B D G Tot B D G Tot Grand total
Vow
A 23 12 17 52 37 11 15 63 76 118 85 319 394
I 42 95 68 205 36 135 91 262 151 913 475 1539 2006
U 1 5 2 8 20 2 2 24 7 76 16 99 131
Grand total 66 112 87 265 93 148 108 349 234 1107 576 1917 2531

From the counts reported in Table 1, ratios of observed to expected frequencies for the nine CV combinations were generated (see Table 2). The numbers in bold face in the diagonals correspond to the ratios of observed to expected frequencies for the three types of CV combinations predicted as favored by the F/C account, namely labial-central, coronal-front and velar-back.

Table 2.

Ratios of observed to expected frequencies obtained from the data of the Haskins babbling database.

FRENCH
6 months 9 months 12 months
Cons B D G B D G B D G
Vow
A 1.33 0.68 0.91 1.17 1.04 0.65 1.52 0.98 0.55
I 0.83 1.18 1.02 0.99 0.97 1.12 0.67 1.04 1.21
U 1.23 0.70 1.09 0.53 1.14 0.90 1.55 0.72 1.40
ENGLISH
6 months 9 months 12 months
Cons B D G B D G B D G
Vow
A 1.17 0.92 1.09 1.24 0.86 1.09 1.17 0.99 0.90
I 0.95 1.02 0.98 0.90 1.07 0.94 0.86 1.04 1.02
U 1.43 0.84 1.15 1.38 0.66 1.42 1.60 0.56 1.40
MANDARIN
6 months 9 months 12 months
Cons B D G B D G B D G
Vow
A 1.78 0.55 1.00 2.20 0.41 0.77 2.23 0.73 1.01
I 0.82 1.10 1.01 0.52 1.22 1.12 0.80 1.03 1.03
U 0.50 1.48 0.76 3.13 0.20 0.27 0.58 1.33 0.54

In the French data, at 6 months, ratios along the diagonal are greater than 1.0 for all the favored CV combinations. At 9 months only one on-diagonal ratio exceeds 1.0, whereas at 12 months they increase, with labial-central and velar-back combinations reaching values greater than those recorded at 6 months.

For English, the three combinations predicted as favored show ratios >1.0 at all ages. Values record an increase from 6 to 9 months and then a decrease from 9 to 12 months.

For the Mandarin data, only the labial-central and the coronal-front co-occurrences are greater than 1.0. In general, the data show particularly high ratios for the labial-central combination and particularly low ratios for the velar-back.

In general, as the results show, the babbling of the three languages presents greater than 1.0 ratios of observed to expected frequencies for the three CV combinations predicted as favored by the F/C account. However, there are CV combinations, other than the favorite ones (off-diagonals in the table), that also present relatively high frequencies. According to AP this is to be interpreted as due to some articulatorily synergetic factor, that is at work both for the preferred and non-preferred syllables since the very early stages of language development. If, as in the F/C account, the preference for the diagonal combinations were to be attributed only to jaw oscillation (lack of control on other articulators), we would expect to see much higher ratios in the diagonal cells and much lower ratios in the off-diagonals. Instead, as the table shows, coronal consonants combine frequently with back vowels in the French babbling at 9 months and in Mandarin babbling at 6 and 12 months. A similar result is found for French in a study conducted by Boysson-Bardies (1993) on CV associations in disyllabic productions of 10- to 12-month-old learners of French, American, Swedish and Yoruba. In the same study, high frequency of labial-back combinations is found for Swedish (even if only in second syllable position). In our data, labial consonants combine frequently with back vowels in the babbles of 6 and 12 months French infants, in those of 6, 9 and 12 months English and in the productions of 9 months Mandarin. The labial-back combination has been found to be preferred in Mandarin also in a study by Chen & Kent (2005), where the authors analyze the productions of Mandarin-learning infants, recorded from 7 to 18 months.

Velar consonants combine with front vowels more often than expected in our French and Mandarin data at all ages and in the English data at 12 months. High frequency of velar-front associations is found in the Yoruba data of Boysson-Bardies (1993). In the same study, the American and Swedish data present high frequency of velar-central combinations. A similar result is found in our data for English (at 6 an 9 months) and Mandarin (at 6 and 12 months) even if the values are generally smaller.

2.4 Testing for developmental trends

Our first statistical test replicates that of Davis and MacNeilage (1995), a chi square test, in order to be as comparable as possible. Combining the three languages in tests at each age, the test showed that the differences between the observed and the expected frequencies for the predicted CV combinations were significant at all ages (6, 9 and 12 months), given that all consonants can combine at random with all vowels. Seven out of 9 predictions (3 predictions for each of the 3 age stages) were confirmed. The two cases in which the expected preference did not occur involved the velar-back combination at 6 and 12 months.

A more stringent test was run in which babblers were taken as individual cases in a repeated measures Analysis of Variance (ANOVA), with the within factors of Syllable Type (BA, DI, GU) and Age (6, 9, and 12) and the between factor of Language (English, French, Mandarin) for the dependent measure of the ratio. The two missing values at the 12-month level were replaced with the overall mean for the remaining 12-month data. The one missing value at 9 months was similarly replaced with the mean for all 9-month data. Ratios were decreased by 1, so that the test of the overall mean would indicate whether the ratios were significantly greater than 1 or not.

Results for the three main factors and all interactions were nonsignificant, with F values ranging from 0.39 to 2.08 (all p’s > .05). The overall mean, however, was significantly greater than 0 (F(1,14) = 5.59, p < .05), with an average value of 1.24. The overall tendency to have diagonals greater than 1 was supported, which is in line with both F/C and AP reasoning. The lack of a difference across language environments is also consistent with both accounts. However, the lack of any indication of change by age is inconsistent with the F/C account but consistent with the AP account.

In order to investigate whether it was possible to observe a developmental trend in the frequencies of the CV combinations that are claimed to be favored as a consequence of ‘frame dominance,’ we plotted in Figure 1 the ratios of observed to expected frequencies obtained for each language at each age.

Figure 1.

Figure 1

Ratios of observed to expected frequencies of the three favored CV co-occurrences, for the French-, English- and Mandarin-learning infants.

As already mentioned, the F/C account claims that, as language develops, the role of jaw oscillation (‘frame’) in shaping speech production becomes less dominant and children acquire greater control over other articulators. Therefore, we expected to observe, between 6 and 12 months, a developmental decrease in the frequencies of the predicted combinations. Instead, as the figure shows, no consistent developmental trend was observable in the ratios of any of the three languages. This would imply that jaw oscillation is still playing a major role in the children’s productions and that no gains in articulatory control took place in the time span of 6 months. However a count of the different phones produced by each babbler at each age seems to contradict this conclusion. Table 3 reports the number of different phones produced by each infant at 6, 9 and 12 months. The average number of different phones produced by infant learners of each language at each age is also given at the bottom of the table:

Table 3.

Number of different phones produced by each infant in the Haskins database at 6, 9 and 12 months of age.

6 months 9 months 12 months
Babbler. Lang. N° of sounds N° of sounds N° of sounds
AB English 27 25 28
CR English 15 27 23
MA English 18 22 35
MM English 20 23 27
NG English 25 28 31
VB English 36 34 31
EC French 25 21 34
MB French 20 21
MS French 37 35 34
NM French 11 26 23
YC French 25 29 36
BX Mandarin 29 14 22
EW Mandarin 15 16 29
FL Mandarin 18 22
TZ Mandarin 25 27 35
YL Mandarin 13 24 41
Average English 23.5 26.5 29.1
French 23.6 26.4 31.75
Mandarin 20 20.6 31.75

Only five of the six French participants were included in the count. The one we left out (JZ) had been recorded at 6 months only. The two infants who were recorded at 6 and 9 months, but not at 12 (MB and FL), were retained.

The table shows that the most common tendency is for the number of sounds produced by each babbler to increase progressively from 6 to 12 months. This happens for example in the babbles of infants MA, MM and NG for English, MB and YC for French, and EW, FL, TZ and YL for Mandarin. In a few cases, the number of phones decreases at 9 months and increases again at 12 months. Looking at the transcriptions, we noticed that this pattern is observed when the number of utterances at 9 months is very limited. Therefore, the reduction of the number of phones can be attributed to the limited amount of data. This is the case for babblers AB, EC and BX, respectively for English French and Mandarin. In some other cases, the number of different phones tends to diminish from 6 to 9 and from 9 to 12 months, for example for the English babbler VB and the French babbler MS. Again, looking at the transcriptions we noticed that this happens only when the inventory of sounds is already very large at 6 months, so that the 9- and 12-month counts may under-represent the true inventory. Finally, in English CR and French NM, the number of distinct sounds increases from 6 to 9 months and then decreases from 9 to 12 months. Overall, there is evidence of increased control over the speech articulators during this six-month period.

2.5 Comparison with adult language

In order to further probe the issue of the existence of a developmental trend, the babbling ratios of observed to expected frequencies generated from the Haskins babbling database were compared to those obtained from French, English and Mandarin dictionary data. The aim was to see whether it was possible to observe any developmental trend, from babbling to adult languages, in the relative frequencies of the CV combinations predicted as favored by the F/C account. The French ratios for the B-A, D-I, G-U syllable types were 1.2, 1.6 and 1.6, respectively. For English the ratios are 1.2, 1.0 and 1.5, and for Mandarin, 1.2, 1.2 and 1.9.

There is no consistent developmental decrease in the ratios between babbling and the dictionary values, i.e. toward target language. Indeed, some of the dictionary values are larger than those of the babbling.

In Table 4, we report the Haskins babbling ratios generated for English, and the average ratios obtained for babbling, first words and dictionary respectively in Davis and MacNeilage (1995), Davis, MacNeilage, Matyear, (2002), and MacNeilage, Davis, Kinney, Matyear (2000).

Table 4.

Ratios of observed to expected frequencies of the favored CV combinations generated for English babbling, first words and dictionary data.

ENGLISH
Age (months) B-A D-I G-U
Babbling (Haskins) 6 1.17 1.02 1.15
9 1.24 1.07 1.42
12 1.17 1.04 1.40
Babbling M&D 6.5–12 1.39 1.41 1.34
First words M&D 12–18 1.30 1.43 1.38
Dictionary 1.18 1.0 1.5

No systematic evidence of differences among babbling, first words and adult languages is observable in the table. First word ratios are higher than the babbling ratios for two out of three CV combinations, namely coronal-front and dorsal-back. The F/C account attributes this lack of change to the increased “functional load” that word learning involves (Davis et al., 2002. It is possible that the perceptual reorganization that accompanies the acquisition of first words (Werker, 1989) plays a role, but no direct mechanism has been proposed. Nonetheless, within the age range of babbling, no increase in functional load is evident, and so the F/C account should lead us to expect a progression toward adult values as articulatory control increases. According to AP, instead, the favored CV combinations emerge in babbling as a consequence of the greater ease of articulation involved in their production, and are eventually retained also in later stages of language development.

2.6 Discussion

The analysis described so far has focused on testing the co-occurrence of consonants and vowels in babbling for preferences according to place of articulation. Overall, as predicted by both the F/C and the AP account, the observed-to-expected ratios for these syllables were significantly greater than 1 (1.24), indicating that the preference was present in our data. Some details were not consistent with this overall effect: only the English data show ratios greater than 1.0 for all three favored combinations at all ages; the French data give ratios less than 1.0 for the coronal-front and velar-back combinations at 9 months, and in the Mandarin data the velar-back combination is always less than 1.0 at all ages.

As already mentioned, according to the F/C account, the regularity of mandibular oscillation provides the basis for what the listener perceives as a syllable-like input. In canonical babbling, no other neuromuscular activity is said to be associated to the movement of the mandible, and therefore no subsyllabic organization (Content) should be present during the jaw cycle (Frame). As a consequence of this kind of organization, the account predicts the three favored CV co-occurrence patterns mentioned earlier. Such co-occurrences are claimed to be due only to the anatomical and acoustic consequence of jaw movement, with lack of lingual independence. There is no combination (of C and V) taking place. However, if babbling were really dominated by the movement of the mandible, without any independent action of other articulators, we would expect much higher ratios than those obtained both in our data and in those of MacNeilage and Davis. It is true that MacNeilage and Davis’ English babbling results give ratios of observed to expected frequencies always greater than 1.0, and the same can be observed in our data (even if values are generally smaller). However, the average ratios reported in Davis and MacNeilage (1995) are less than 1.5; the underlying counts (provided by the authors) indicate that 51.6% of the productions fall into these preferred categories. (The figure for the results in Davis, MacNeilage & Matyear (2002) is 49.9%.)

MacNeilage and Davis’s data show ratios greater than 1.0 for many CV combinations other than the favored ones. This is true also for our French and Mandarin babbling data as well as for some Italian babbling data found in the literature (Zmarich & Miotti, 2003). Since, according to the account, jaw oscillation is responsible only for the three favored CV combinations, the high frequencies of the non-favored ones must be attributed to something different than ‘frame dominance’. As we will see in the following paragraph, according to AP, the (slight) preference for the favored CV combinations is due to the more synergetic articulatory actions involved in the production of consonants and vowels.

Furthermore, the fact that the inventory of sounds produced by the majority of infants in the Haskins database tends to increase from 6 to 12 months seems to suggest that children are acquiring greater control over articulators and freeing themselves from the ‘frame’ bias. Indeed, some preliminary findings by Serkhane et al. (2007) seem to suggest an increase in articulatory exploration already between 4 and 7 months of age2. The authors compare acoustic data of 4- to 7-month-old infants vocalizations to the productions of an articulatory-acoustic model of the growing vocal tract. The discrepancies between the data obtained from the simulations and the infants’ data suggest that the latter could have been the result of greater complexity of movements rather than of jaw oscillation alone.

Indeed, if jaw oscillation were the only factor responsible for the preferred CV combinations, as in the F/C account, the achievement of greater articulatory control should lead to a developmental decrease in the frequencies of the three CV combinations predicted to be favored as a consequence of ‘frame dominance’. This means that smaller ratios of observed to expected frequencies should be observable for those combinations. The ratios generated from our babbling data do not show any such developmental trend. Nor is it possible to find a consistent decrease in the ratios reported by MacNeilage and Davis for babbling, first words, and adult data. The lack of frequency decrease is in line with the explanations offered by AP for the preferred CV combinations.

In general, for all of our adult data (both spoken language and dictionary data), as well as for those of MacNeilage and Davis, we obtained ratios of observed to expected frequencies greater or very close to 1.0. This means that there are relatively high frequencies of the CV combinations that are claimed to be a consequence of the sole oscillatory movement of the mandible also in adults. As mentioned in the introduction, MacNeilage and Davis explain this residue as a slight preference for greater ease of articulation.

As already mentioned, Articulatory Phonology offers a different explanation for these results, as will be described in the following paragraph.

3. Modeling of Frame then Content and Articulatory Phonology dynamics

Articulatory Phonology proposes an alternative explanation for the relatively high frequencies of these preferred CV combinations in children and adult speech (Goldstein, et al., 2006). In CV syllables, the gesture of the onset consonant and the gesture of the following vowel are hypothesized to be initiated synchronously (in-phase). However, not all CV combinations can allow articulatory synchrony, due to anatomical limits of the articulators involved and to the demands on shared articulators. For example, one of the CV combinations predicted as disfavored by the F/C account is a coronal consonant with a back vowel, e.g., [du]. Producing the tongue tip gesture for [d] at the same time as the back vowel gesture for [u] may not be as easy to achieve as producing the same consonant gesture at the same time as the gesture for a front vowel such as [i], because in the coronal-back combination the tongue body needs to be fronted for the achievement of the tongue tip constriction, but the production of the back vowel requires the tongue body to be retracted. Therefore, the consonant and vowel gestures may not be produced synchronously. In other words the vowel gesture may need to be delayed.

The three CV combinations predicted as favored by the F/C account, do not present such difficulties. In the coronal consonant-front vowel combination ([di]), for example, the tongue body needs to be fronted for producing both the tongue tip constriction and the front vowel. Therefore, the two gestures can be synchronized. The same can be said for the labial consonant-central vowel combination: during jaw raising, the tongue needs only to remain in its resting position, therefore no undesired constriction is produced. With the velar consonant-back vowel combination ([gu]), the tongue body needs to be retracted for the production of both the velar constriction and the back vowel, so that synchronization of the two gestures can easily be achieved.

Some recent evidence showing coordination differences between articulatorily compatible C-V sequences and incompatibles ones has been found by Gao (2008). She measured tongue, lip and jaw motion via electromagnetometry (Perkell, et al., 1992) for a set of 36 syllables produced by 7 speakers of Mandarin Chinese. Their onsets for C and V gestures in labial-central vowel sequences were more nearly synchronous than those in the coronal-central vowel sequences.

According to AP, therefore, certain CV combinations are preferred over others because they can be produced in-phase. This means that their production is characterized by the fact that articulatory synchrony matches synchrony in gestural triggering. For this reason, those combinations can emerge and be produced more easily, and this could explain their relatively high frequencies both in babbling and in adult languages. Goldstein et al. (2006) point out that adults may have maintained this preference for those CV combinations that can be synchronized. The oscillatory movement of the mandible could conceivably still play a role in favoring those combinations, but “the existence of these CV preferences cannot constitute evidence for a jaw-only motor control strategy, since the preferences exist in adult languages but the jaw-only strategy does not” (Goldstein et al., 2006:238–239).

The preferred combinations that are apparent in the data outlined by MacNeilage and Davis and in our own work seem amenable to a biomechanical explanation, but there is presently no set of articulatory data that can test the question. Not only is it difficult to study the articulation of infants, it is not completely clear even for adults what would constitute the correct line of evidence showing biomechanical preferences. Here, we will present a summary of efforts to model both accounts.

The modeling makes use of a stylized but plausible articulatory synthesizer, the Configurable Articulatory SYnthesizer (CASY) (Iskarous, Goldstein, Whalen, Tiede, & Rubin, 2003) that can be set to perform kinematically determined movements or be driven by a task-dynamical model of speech (“TADA”) (Nam, Goldstein, Saltzman, & Byrd, 2004). Here, we will briefly describe how we used each mode to model the F/C account (section 3.1) and the AP account (section 3.2). Other technical issues are described more fully in a companion paper (Nam, Giulivi, Goldstein, Levitt, & Whalen, submitted).

3.1 Frame then Content predictions for jaw-only movement

As summarized earlier, the F/C model assumes that the supralaryngeal articulators begin at a random configuration that is then closed and opened by jaw motion alone (Davis & MacNeilage, 2002: 139). Because CASY allows changes in overall vocal tract length, we also modeled the default, adult setting and a scaled version matched to the MRI images of a 7-month-old infant (Vorperian, Kent, & Lindstrom, 2005). The jaw is a separate articulator in CASY (uncorrelated with other articulators), so we were able to reduce movement to this one parameter. Tongue location was varied systematically such that the length and angle of the tongue body relative to the jaw, and of the tongue tip relative to the tongue body, were randomly selected within physiologically reasonable bounds. The jaw was then raised until some portion of the vocal tract was obstructed; this point of contact was classified as labial (lips made contact), alveolar (tongue tip made contact) or velar (tongue body made contact). The jaw was then lowered a randomly varying amount, simulating a range of vowel heights. Results were classified as front, central and back for the vowels by comparing the shapes to MRI images for static vowels and finding the nearest match. Ratios were then computed as for the transcriptional data. (See Table 5).

Table 5.

Ratios generated by the simulation of the frame/content account. A) Adult vocal tract, varying location of consonant closure location. B) Vocal tract scaled to a 7-month-old’s MRI images.

A.
B D G
A 1.01 1.34 0.63
I 1.06 0.96 0.82
U 0.92 0 2.24
B.
B D G
A 0.86 1.24 0.73
I 1.61 1.08 0.04
U 0 0.31 3.67

Diagonal ratios in the adult model were 1.01, 0.96 and 2.44 for labial/central, coronal/front and velar/back respectively. Diagonal ratios in the infant model were 0.86, 1.08 and 3.67 for labial/central, coronal/front and velar/back respectively. Two out of three of these ratios are above one, as predicted by the F/C account. However, the 1.01 and 1.08 ratios are barely above 1, so that these models provide little support for a jaw-only source of the CV preferences. The off-diagonals are also variable, ranging from 0 to 1.61. Thus there is no real support for the assumptions of the F/C account from either of the models run here.

3.2 Using Articulatory Phonology synergies to predict preferred co-occurrences

To test the AP account, we reasoned that if a CV combination is highly synergistic (and therefore affords synchronization of C and V gestures), the tongue movements exhibited by the synchronized activation of those two gestures (C and V gestures) will be similar to each other. The task-dynamic implementation of AP (TADA) was used to simulate this. We produced bare consonant and vowel trajectories and compared their final configurations of the tongue body between the gestures. The smaller the difference in the end positions between C and V gestures is, the more synergistic that CV combination is. The parameters that are needed for this model have never been computed for infant vocal tracts, so it was impossible to model them. The similarity of the infant and adult models for the F/C account indicates that it is reasonable to expect similar results if the TADA model were to incorporate such radically different vocal tracts.

To generate ratios from these synergies, we first found the magnitude of the difference of the path for the C and for the V; smaller differences are equivalent to greater synergy. Eleven values were used for the C: a single value for labial, and five equally spaced locations along the palate for each of alveolar and velar place. The vowel space was sampled by distributing 300 target values across all physiologically possible locations of the center of the tongue. The values for all 3300 CV combinations (300 × 11) were then normalized to range from 0 to 1; this value was then subtracted from 1 to give a likelihood estimate (given that small values are more likely). The sum of these likelihoods for each of the nine CV cells was then computed, with the ratios of obtained to expected calculated over the total sum.

The results are shown in Table 6. All the diagonals have the highest values, as found in the transcription data. The values for the off-diagonals were also well predicted.

Table 6.

Ratios from relative likelihood of co-occurrence of CV places of articulation from the articulatory synthesis implementation of Articulatory Phonology.

B D G
A 1.29 0.79 1.14
I 0.73 1.32 0.74
U 1.20 0.45 1.50

3.3 Comparison of the two models

The F/C models, adult and infant, generated only two out of three diagonals greater than one, while the AP model predicted all three. These models were compared with the off-diagonals as well, even though the original explanation for the F/C account did not encompass the off-diagonals. The magnitude of the differences between the transcriptional data and the model is much smaller for the AP model. For the three diagonals, the average difference between the model and the data is 0.15 for the AP account and 1.09 for the F/C. For the six off-diagonals, the values are 0.23 and 0.71. Thus even though the original explanation did not make any predictions for the off-diagonals, those are even more accurate than the diagonals. The error for the AP account is less than a seventh that of the F/C account. Therefore, while both accounts provide a physiological rationale for the preferred combinations, the AP account provides a better fit with the data.

Two other models have also failed to show substantial support for the F/C account. Serkhane et al. (2007) used a different articulatory synthesizer (Maeda, 1990) and found that some of the predicted patterns were confirmed, but some categories were greatly different from predictions. Even this limited success must be viewed with some caution, given that the “jaw” component in the Maeda model incorporates some elements of tongue motion since it is derived from Principal Components Analysis of x-rays of adult speech (see Nam et al, submitted, for further details). A second approach is found in Lindblom (2008), who calculated degree of compatibility as amount of difference between the tongue shapes for consonants and for vowels. He found that alveolars were most compatible with front vowels, but velars were most compatible with front/central vowels and became increasingly incompatible with back vowels. Thus neither other modeling approach supports the F/C account.

Several features of the AP model differ from the F/C. First, the consonant constriction is taken as a primary goal. Whether or not infants have a particular goal when they babble is currently not addressable. Nonetheless, the (variably placed) consonant constriction is assumed to be controlled in this account. Similarly, the vowel is not assumed to be fully determined by jaw opening but by some combination of jaw opening and a further specification of a tongue target. As previously noted (section 2.5), the F/C account would generally predict that all utterances would be on the diagonal in the place of articulation tables. That is, the reason that the F/C account predicts the preference for the diagonal cells is that random starting positions will primarily result in the predicted CV types. However, none of the models of the jaw-only control of articulation result in the patterns found in babbling (Vilain et al. 1999, Serkhane et al. 2007, Nam et al., submitted). In the Nam et al. study, the F/C simulation did result in about half of the syllables being on the diagonal (as in the babbling data), but there was no correlation with the ratios found in the babbling data. It is therefore unclear what the off-diagonal combinations entail for the F/C account, and whether the original rationale of the model can be said to be viable. For the AP account, there is relatively good correspondence between the ratios both on and off the diagonals (Nam et al., submitted). Articulatory synergy, therefore, seems to be greatest on the diagonals, as originally observed by MacNeilage and Davis, but is also a factor in the off-diagonals.

Furthermore, it also accounts for another frequently observed characteristic of babbling: a (near-)universal preference for CV syllables over VC. To our knowledge, the F/C account does not provide a ready explanation for this effect. If the articulators are carried by the jaw, it would seem that just as many accidental closures would appear to be VC as CV. If, on the other hand, the infants are combining C and V actions that they (at least partially) control, they would be expected to employ only the simplest coupling mode (in-phase), and only CV syllables would result. The consonant of VC syllables is in an anti-phase relationship to the vowel, so VC syllables are intrinsically more difficult to produce, and would be expected to develop later. Nam, Goldstein & Saltzman (2009) show in simulations of infant development that the earlier production of CV syllable can result from the greater intrinsic accessibility and strength of in-phase coupling.

4. General discussion

There is a pattern of favored co-occurrences of consonants and vowels in babbling and, to a certain extent, in adult language (Davis & MacNeilage, 1995). We explored an implication of the F/C account, that the proportion of such favored pairing should decrease as articulatory control increases. Data from the babbling of children from French, English and Mandarin language environments were used. As an overall replication of the main result, the favored patterns occurred more frequently than chance would predict, for the most part, as seen in ratios greater than one. There were some instances for the French and Mandarin data for which these ratios were less than 1, contradicting the favored pattern. For all of the cases, the magnitude of the ratios was usually small enough that it represented only a plurality of such CV combinations in the data. By direct measurement, these combinations accounted for approximately 50% of the total in both our data and that of MacNeilage and Davis.

There was no developmental trend in the ratios of observed to expected frequencies of the CV combinations predicted to be favored. The F/C account leads to an expectation of a progressive decrease over development in the frequencies of the syllable types predicted as favored as a consequence of ‘frame dominance’. This should occur because the account claims that the effect of jaw oscillation becomes smaller as articulatory control increases. The Haskins babbling database covers a time span of 6 months with the utterances we analyzed having been produced at 6, 9 and 12 months of age. The data in the database did not exhibit any consistent developmental decrease in the frequencies of the favored CV combinations, nor did a comparison between the babbling data and target language data, coming from French, English and Mandarin dictionaries.

Articulatory Phonology offers an alternative explanation: the relatively high frequencies of the three CV combinations are due to the relative ease with which independent (BA) or nearly identical (DI and GU) gestures can easily be produced synchronously. This happens for example in CV syllables made up of a coronal consonant and a front vowel, where both the consonant and the vowel gesture involve an advancement of tongue body. In other combinations, instead, the consonant gesture and the vowel gesture may involve motions of the same articulator in opposite directions and may therefore be competing to reach their respective targets. For example, syllables made up of a coronal consonant and a back vowel will need an advancement and a retraction of tongue body, respectively for the production of the consonant and of the vowel.

Matyear (2007) suggested that there are shallower transitions for babbled utterances than for adult speech. Indeed, for the babbled utterances, there was virtually no evidence of formant transitions at all. This would, in fact, show a lack of control over the articulators if the measurements are accurate, but there are two reasons for thinking that they are not. First, the high fundamental frequency of the voice source in infant speech makes it very difficult to track anything other than the most intense harmonic of a formant. Thus even if the formant is moving, it will look as if it is not, since only one harmonic is likely to be measured. The other consideration is that transcribers do, in fact, hear consonants at different places of articulation (fully 50% of the “diagonals” in both our data and that of MacNeilage and Davis.) If there were no formant transitions to base this judgment on, then we would be hard pressed to explain this consistent transcription. Therefore, it seems unlikely that these tantalizing results will hold up. The acoustic evidence, then, is more in line with a lack of change during maturation.

The slightly greater than expected co-occurrence of the “favored” combinations does, as MacNeilage and Davis originally pointed out, call for a physiological explanation. Given the lack of evidence for a developmental trend, the persistence of the pattern in adult type counts, and the inconsistency of the pattern across languages in adult speech, it appears that the pattern is not due to cyclicity of the jaw. Instead, the greater ease with which some consonant and vowel gestures combine with each other, as predicted by Articulatory Phonology, explains both the existence, the recurrence in different circumstances, and the variable influence in adult speech of CV combination preferences.

Acknowledgments

This work was supported by NIH grants DC-000403 and DC-002717 to Haskins Laboratories. Portions of this work appeared in the unpublished Ph.D. dissertation (Giulivi, 2007). We thank Carol A. Fowler, Michael Studdert-Kennedy, Julia Irwin, Aude Noiray, D. Kimbrough Oller and two anonymous reviewers for helpful comments. Data for calculating percentages of preferred response from previous studies were kindly provided by Barbara L. Davis and Peter MacNeilage.

Footnotes

1

Further validation of transcriptions was provided through a two alternative forced choice identification test, conducted to determine whether listeners from two language backgrounds would agree not only with the transcriptions used in the present study, but also with acoustic indications of consonant and vowel identity. The aim was to understand whether there is an intrinsic bias towards hearing the favored CV combinations or not. Participants were two groups of native American-English speakers and two groups of native Italian speakers, who where asked to listen to four different kinds of CV syllables (labial-central, coronal-front, labial-front and labial-central), and judge the place of articulation of consonants and vowels.

A repeated measures ANOVA was performed on the results of both the consonant and the vowel test. The between factor was Language (English or Italian) and the within factors were Consonant and Vowel. The dependent variable was the percent of responses that agreed with the transcriber’s judgment.

Overall the test showed good agreement both with the transcriptions considered for the present study and with the acoustic indications of sound identity (even if vowels were not as accurately identified as the consonants). No noticeable tendency towards hearing the preferred CV combinations was observed.

2

The acoustic results of Serkhane et al. (2007) show some correspondences with those of Sussman et al. (1999). In this study, the authors recorded a child from 7 to 40 months of age. They found that CV coarticulation, as indicated by locus equations, grew increasingly adult-like between 7 and 13 months.

References

  1. Boysson-Bardies Bd. Ontogeny of language-specific syllabic productions. In: Boysson-Bardies Bd, Schonen Sd, Jusczyk P, MacNeilage PF, Morton J., editors. Developmental neurocognition: Speech and face processing in the first year of life. Dordrecht: Kluwer Academic Publishers; 1993. pp. 353–363. [Google Scholar]
  2. Boysson-Bardies Bd, Hallé PA, Sagart L, Durand C. A crosslinguistic investigation of vowel formants in babbling. Journal of Child Language. 1989;16:1–17. doi: 10.1017/s0305000900013404. [DOI] [PubMed] [Google Scholar]
  3. Boysson-Bardies Bd, Sagart L, Durand C. Discernible differences in the babbling of infants according to target-language. Journal of Child Language. 1984;11:1–15. doi: 10.1017/s0305000900005559. [DOI] [PubMed] [Google Scholar]
  4. Boysson-Bardies Bd, Vihman MM. Adaptation to language: Evidence from babbling and early words in four languages. Language. 1991;61:297–319. [Google Scholar]
  5. Breen G, Pensalfini R. Arrernte: A language with no syllable onsets. Linguistic Inquiry. 1999;30(1) [Google Scholar]
  6. Browman CP, Goldstein LM. Articulatory gestures as phonological units. Phonology. 1989;6:151–206. [Google Scholar]
  7. Chen LM, Kent RD. Consonant-vowel co-occurrence patterns in Mandarin-learning infants. Journal of Child Language. 2005;32:507–534. doi: 10.1017/s0305000905006896. [DOI] [PubMed] [Google Scholar]
  8. Davis BL, MacNeilage PF. Organization of babbling: A case study. Language and Speech. 1994;37:341–355. doi: 10.1177/002383099403700401. [DOI] [PubMed] [Google Scholar]
  9. Davis BL, MacNeilage PF. The articulatory basis of babbling. Journal of Speech and Hearing Research. 1995;38:1199–1211. doi: 10.1044/jshr.3806.1199. [DOI] [PubMed] [Google Scholar]
  10. Davis BL, MacNeilage PF. The internal structure of the syllable An ontogenetic perspective on origins. In: Givón T, Malle BF, editors. The evolution of language out of pre-language. Amsterdam: Benjamins; 2002. pp. 135–153. [Google Scholar]
  11. Davis BL, MacNeilage PF, Matyear CL. Acquisition of serial complexity in speech production. A comparison of phonetic and phonological approaches to first word production. Phonetica. 2002;59:75–107. doi: 10.1159/000066065. [DOI] [PubMed] [Google Scholar]
  12. Gao M. Unpublished PhD dissertation. Yale Univeristy; New Haven: 2008. Mandarin tones: An Articulatory Phonology account. [Google Scholar]
  13. Giulivi S. Unpublished PhD dissertation. University of Florence; 2007. Vowels and consonants favored co-occurrences in language development. [Google Scholar]
  14. Goldstein LM, Byrd D, Saltzman E. The role of vocal tract gestural action units in understanding the evolution of phonology. In: Arbib M, editor. Action to language via the mirror neuron system. Cambridge: Cambridge University Press; 2006. pp. 215–249. [Google Scholar]
  15. Goldstein LM, Fowler CA. Articulatory phonology: A phonology for public language use. In: Schiller N, Meyer A, editors. Phonetics and phonology in language comprehension and production: Differences and similarities. Berlin: Mouton de Gruyter; 2003. pp. 159–207. [Google Scholar]
  16. Haken H, Kelso JAS, Bunz H. A theoretical model of phase transitions in human hand movements. Biological Cybernetics. 1985;51:347–356. doi: 10.1007/BF00336922. [DOI] [PubMed] [Google Scholar]
  17. Hayes B. Phonological acquisition in Optimality Theory: the early stages. In: Kager R, Pater J, Zonneveld W, editors. Constraints in phonological acquisition. 2004. pp. 158–203. [Google Scholar]
  18. Iskarous K, Goldstein LM, Whalen DH, Tiede MK, Rubin PE. CASY: The Haskins Configurable Articulatory speech synthesizer. In: Recasens D, Solé M-J, Romero J, editors. Proceedings of the 15th International Congress of Phonetic Sciences. Barcelona: Universitat Autonoma de Barcelona; 2003. pp. 185–188. [Google Scholar]
  19. Jakobson R. Child language, aphasia and phonological universals. The Hague: Mouton; 1968. [Google Scholar]
  20. Kent RD, Mitchell PR, Sancier M. Evidence and role of rhythmic organization in early vocal development in human infants. In: Fagard J, Wolff PH, editors. The development of timing control and temporal organization in coordinated action. New York: Elsevier; 1991. pp. 135–139. [Google Scholar]
  21. Koopmans-van Beinum FJ, Clement CJ, Dikkenberg-Pot Ivd. AMSTIVOC (AMsterdam System for Transcription of Infant VOCalizations) applied to utterances of deaf and normally hearing infants. Paper presented at the Eurospeech; 2001; Aalborg, Denmark. 2001. [Google Scholar]
  22. Levitt AG, Wang Q. Evidence for language-specific rhythmic influences in the reduplicative babbling of French- and English-learning infants. Language and Speech. 1991;34:235–249. doi: 10.1177/002383099103400302. [DOI] [PubMed] [Google Scholar]
  23. Lindblom BE. The target hypothesis, dynamic specification and segmental independence. In: Davis BL, Zajdó K, editors. The syllable in speech production. New York: Lawrence Erlbaum Associates; 2008. pp. 327–353. [Google Scholar]
  24. Locke JL. The child’s path to spoken language. Cambridge, MA: Harvard University Press; 1993. [Google Scholar]
  25. MacNeilage PF. The frame/content theory of evolution of speech production. Behavioral and Brain Sciences. 1998;21:499–546. doi: 10.1017/s0140525x98001265. [DOI] [PubMed] [Google Scholar]
  26. MacNeilage PF. The origin of speech. Oxford: Oxford University Press; 2008. [Google Scholar]
  27. MacNeilage PF, Davis BL. Motor explanation of babbling and early speech patterns. In: Boysson-Bardies Bd, Schonen Sd, Jusczyk P, MacNeilage PF, Morton J., editors. Developmental neurocognition: Speech and face processing in the first year of life. Dordrecht: Kluwer Academic Publishers; 1993. pp. 341–352. [Google Scholar]
  28. MacNeilage PF, Davis BL. Deriving speech from nonspeech: A view from ontogeny. Phonetica. 2000a;57:284–296. doi: 10.1159/000028481. [DOI] [PubMed] [Google Scholar]
  29. MacNeilage PF, Davis BL. On the origin of internal structure of word forms. Science. 2000b;288:527–531. doi: 10.1126/science.288.5465.527. [DOI] [PubMed] [Google Scholar]
  30. MacNeilage PF, Davis BL, Kinney A, Matyear CL. Origin of serial output complexity in speech. Psychological Science. 1999;10:459–460. [Google Scholar]
  31. MacNeilage PF, Davis BL, Kinney A, Matyear CL. The motor core of speech: A comparison of serial organization patterns in infants and languages. Child Development. 2000;71:153–163. doi: 10.1111/1467-8624.00129. [DOI] [PubMed] [Google Scholar]
  32. Maddieson I, Precoda K. Syllable structure and phonetic models. Phonology. 1992;9:45–60. [Google Scholar]
  33. Maeda S. Compensatory articulation during speech: Evidence from the analysis and synthesis of vocal-tract shapes using an articulatory model. In: Hardcastle WJ, Marchal A, editors. Speech production and modelling. Dordrecht: Kluwer Academic Publishers; 1990. pp. 131–149. [Google Scholar]
  34. Matyear CL. Lingual co-occurrence constraints in babbling: An acoustical study. Paper presented at the International Conference of Phonetic Sciences..2007. [Google Scholar]
  35. Moore CA, Ruark JL. Does speech emerge from earlier appearing oral motor behaviors? Journal of Speech and Hearing Research. 1996;39:1034–1047. doi: 10.1044/jshr.3905.1034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Moore CA, Smith A, Ringel RL. Task-specific organization of activity in human jaw muscles. Journal of Speech and Hearing Research. 1988;31:670–680. doi: 10.1044/jshr.3104.670. [DOI] [PubMed] [Google Scholar]
  37. Nam H, Giulivi S, Goldstein LM, Levitt AG, Whalen DH. Computational simulation of CV combination preferences in babbling. doi: 10.1016/j.wocn.2012.11.002. submitted. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Nam H, Goldstein LM, Saltzman E. Self-organization of syllable structure: A coupled oscillator model. In: Pellegrino F, Marisco E, Chitoran I, editors. Approaches to phonological complexity. Berlin: Mouton de Gruyter; 2009. pp. 299–328. [Google Scholar]
  39. Nam H, Goldstein LM, Saltzman E, Byrd D. TADA: An enhanced, portable Task Dynamics model in MATLAB. Journal of the Acoustical Society of America. 2004;115:2430–2430. [Google Scholar]
  40. Newmeyer F. Possible and probable languages: A generative perspective on linguistic typology. Oxford: Oxford University Press; 2005. [Google Scholar]
  41. Oller DK. The emergence of the speech capacity. Mahwah, N.J: Lawrence Erlbaum Associates; 2000. [Google Scholar]
  42. Oller DK, Ramsdell HL. A weighted reliability measure for phonetic transcription. Journal of Speech, Language, and Hearing Research. 2006;29:1391–1411. doi: 10.1044/1092-4388(2006/100). [DOI] [PubMed] [Google Scholar]
  43. Oller DK, Wieman LA, Doyle WJ, Ross C. Infant babbling and speech. Journal of Child Language. 1976;3:1–11. [Google Scholar]
  44. Ostry DJ, Flanagan JR. Human jaw movement in mastication and speech. Archives of Oral Biology. 1989;34:685–693. doi: 10.1016/0003-9969(89)90074-5. [DOI] [PubMed] [Google Scholar]
  45. Ostry DJ, Vatikiotis-Bateson E, Gribble PL. An examination of the degrees of freedom of human jaw motion in speech and mastication. Journal of Speech, Language, and Hearing Research. 1997;40:1341–1351. doi: 10.1044/jslhr.4006.1341. [DOI] [PubMed] [Google Scholar]
  46. Perkell JS, Cohen MH, Svirsky MA, Matthies ML, Garabieta I, Jackson MTT. Electromagnetic midsagittal articulometer systems for transducing speech articulatory movements. Journal of the Acoustical Society of America. 1992;92:3078–3096. doi: 10.1121/1.404204. [DOI] [PubMed] [Google Scholar]
  47. Ramsdell HL, Oller DK, Ethington CA. Predicting phonetic transcription agreement: Insights from research in infant vocalizations. Clinical Linguistics and Phonetics. 2007;21:793–831. doi: 10.1080/02699200701547869. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Serkhane JE, Schwartz JL, Boë LJ, Davis BL, Matyear CL. Infants’ vocalizations analyzed with an articulatory model: A preliminary report. Journal of Phonetics. 2007;35:321–340. [Google Scholar]
  49. Steeve RW, Moore CA, Green JR, Reilly KJ, McMurtrey JR. Babbling, chewing, and sucking: Oromandibular coordination at 9 months. Journal of Speech, Language, and Hearing Research. 2008;51:1390–1404. doi: 10.1044/1092-4388(2008/07-0046). [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Stockman IJ, Woods DR, Tishman A. Listener agreement on phonetic segments in early infant vocalizations. Journal of Psycholinguistic Research. 1981;10:593–617. doi: 10.1007/BF01067296. [DOI] [PubMed] [Google Scholar]
  51. Sussman HM, Duder C, Dalston E, Cacciatore A. An acoustic analysis of the development of CV coarticulation: A case study. Journal of Speech, Language, and Hearing Research. 1999;42:1080–1096. doi: 10.1044/jslhr.4205.1080. [DOI] [PubMed] [Google Scholar]
  52. Turvey MT. Coordination. American Psychologist. 1990;45:938–953. doi: 10.1037//0003-066x.45.8.938. [DOI] [PubMed] [Google Scholar]
  53. Vorperian HK, Kent RD, Lindstrom MJ. Development of vocal tract length during early childhood: A magnetic resonance imaging study. Journal of the Acoustical Society of America. 2005;117:338–350. doi: 10.1121/1.1835958. [DOI] [PubMed] [Google Scholar]
  54. Werker JF. Becoming a native listener. American Scientist. 1989;77:54–59. [Google Scholar]
  55. Whalen DH, Giulivi S, Nam H, Levitt AG, Hallé PA, Goldstein LM. Biomechanically preferred consonant-vowel combinations occur in adult lexicons but not in spoken language submitted. [Google Scholar]
  56. Whalen DH, Levitt AG, Hsiao PL, Smorodinsky I. Intrinsic F0 of vowels in the babbling of 6-, 9- and 12-month-old French- and English-learning infants. Journal of the Acoustical Society of America. 1995;97:2533–2539. doi: 10.1121/1.411973. [DOI] [PubMed] [Google Scholar]
  57. Whalen DH, Levitt AG, Wang Q. Intonational differences between the reduplicative babbling of French- and English-learning infants. Journal of Child Language. 1991;18:501–516. doi: 10.1017/s0305000900011223. [DOI] [PubMed] [Google Scholar]
  58. Zmarich C, Miotti R. The frequency of consonants and vowels and their co-occurrences in the babbling and early speech Italian children. Proceedings of the 15th International Congress of Phonetic Sciences; 1947–1950; Barcelona. 2003. [Google Scholar]

RESOURCES