The development of acoustic cues to coda contrasts in young children learning American English

Jae Yung Song; Katherine Demuth; Stefanie Shattuck-Hufnagel

doi:10.1121/1.3687467

. 2012 Apr;131(4):3036–3050. doi: 10.1121/1.3687467

The development of acoustic cues to coda contrasts in young children learning American English^a

Jae Yung Song ^1,^a), Katherine Demuth ², Stefanie Shattuck-Hufnagel ³

PMCID: PMC3339504 PMID: 22501078

Abstract

Research on children’s speech perception and production suggests that consonant voicing and place contrasts may be acquired early in life, at least in word-onset position. However, little is known about the development of the acoustic correlates of later-acquired, word-final coda contrasts. This is of particular interest in languages like English where many grammatical morphemes are realized as codas. This study therefore examined how various non-spectral acoustic cues vary as a function of stop coda voicing (voiced vs. voiceless) and place (alveolar vs. velar) in the spontaneous speech of 6 American-English-speaking mother-child dyads. The results indicate that children as young as 1;6 exhibited many adult-like acoustic cues to voicing and place contrasts, including longer vowels and more frequent use of voice bar with voiced codas, and a greater number of bursts and longer post-release noise for velar codas. However, 1;6-year-olds overall exhibited longer durations and more frequent occurrence of these cues compared to mothers, with decreasing values by 2;6. Thus, English-speaking 1;6-year-olds already exhibit adult-like use of some of the cues to coda voicing and place, though implementation is not yet fully adult-like. Physiological and contextual correlates of these findings are discussed.

INTRODUCTION

Researchers have long noted that children’s early word productions are variable in form, and different from those of adults (e.g., Smith, 1973). This has given rise to speculation regarding the nature of children’s early phonological representations. Although much of what is known about phonological development comes from phonemic transcriptions of child speech, interpretation of transcription data can be challenging, because children sometimes make systematic acoustic contrasts that are not perceived as contrastive by adults. For example, Macken and Barton (1980) showed that some children below the age of 2 went through a period during which they made voicing distinctions for stops in word onsets, but this distinction relied on a non-adult-like voice onset time (VOT) difference that was entirely within the range for adult voiced stop onsets. Thus, children showed a contrast between voiced and voiceless onset stops that was not always perceptible for adults. Children also produce contextually governed variation at an early age, even when they do not always overtly produce the conditioning context. For example, Weismer et al. (1981) found that language-delayed children produced longer vowels before voiced compared to voiceless target codas, even when they did not produce the target consonant closure and release. Song and Demuth (2008) explored these issues further with typically developing 1-2-year-olds. They found that children systematically lengthened the preceding vowel in tokens where they did not produce an audible coda consonant (e.g., dog [d cg] < [d cː]), providing evidence for a coda consonant representation. The presence of such acoustic “covert contrasts” (e.g., Scobbie et al., 2000) raises questions about children’s emerging phonological representations, and when and how their speech productions begin to assume the same acoustic-phonetic realizations as those produced by adults.

In the present study, we were interested in how children’s phonological representations develop over time, and the extent to which detailed acoustic analyses can help to reveal what they know, as well as how adult-like their phonetic implementations are. In particular, we focused on the development of voicing (voiced vs. voiceless) and place of articulation (PoA) (alveolar vs. velar) contrasts in coda stops. It is reported that coda consonants are typically acquired later than onset consonants, and less is known about the acoustics of codas. Coda consonants are of particular interest because many inflectional morphemes in English, such as the past tense morphemes -t/d, appear in coda position, sometimes creating a complex coda cluster at the end of a word (e.g., kicked /kIkt/). Thus, a study of the acoustics of monomorphemic codas will provide a baseline for future exploration of how and when morphemic coda consonants are acquired.

Studies of adult speech production have suggested that there are many potential cues to stop coda voicing, including the duration of the preceding vowel, presence and duration of a voice bar (i.e., low-frequency periodicity indicating continued vocal fold vibration after oral closure), and the presence and amplitude of aspiration noise produced at the vocal folds after the oral release (Cole et al., 2007; Lisker and Abramson, 1964; Repp, 1979; Wright, 2004). It has also been shown that there are various acoustic correlates to PoA in adult speech, including spectral cues during vowel formant transitions and the spectrum and number of release bursts (Blumstein and Stevens, 1979; Olive et al., 1993). However, only limited information is available about the acoustic correlates of voicing and PoA contrasts for stops in children’s speech, and what research there is has primarily focused on stops in word-initial, onset position. Results of these studies are reviewed below, before turning to the question of children’s stops in word-final, coda position.

VOT is one of the primary cues to voicing contrasts in onset stops. A number of studies have described children younger than 3–4 years undergoing at least three developmental stages before they produce adult-like VOT (Bond and Wilson, 1980; Kewley-Port and Preston, 1974; Macken and Barton, 1980; Zlatin and Koenigsknecht, 1976): the first stage, where children’s undifferentiated VOT values fall within the short lag region that adults use for voiced stops in English; the second stage, where children make VOT distinctions between voiced and voiceless onsets, but both sets of VOT values generally fall within the adult perceptual boundaries for English voiced stop; and the final stage, where children’s VOT values become more adult-like, separating into a short lag region for voiced stops and a long lag region for voiceless ones. More recently, Imbrie (2005) showed that children between the ages of 2;6–3;3 had a significantly longer VOT-lag for voiced stops than adults, and that this decreased to more adult-like values over the next 6 months. These findings provide evidence that 2–3-year-olds are still developing appropriate timing and glottal adjustments for onset voicing distinctions.

On the other hand, children appear to produce another cue to the voicing contrast, the vowel duration difference associated with coda voicing, earlier in life. In adult English, the effect of final consonant voicing on the duration of the preceding vowel is strong, with vowels preceding voiced consonants being almost twice as long as vowels preceding voiceless consonants (House, 1961), at least in utterance-final position (Crystal and House, 1988). English-learning children are known to produce the vowel duration cue to coda voicing before the age of three (Buder and Stoel-Gammon, 2002; Krause, 1982). Recent research looking at spontaneous child speech has found this distinction to be in place even before the age of 2 (Ko, 2007).

In a study exploring additional acoustic cues to coda voicing, Shattuck-Hufnagel et al. (2011) examined CVC (consonant-vowel-consonant) productions from two children from the Imbrie Corpus (Imbrie, 2005) who were aged 2;5 and 3;2 at the beginning of the study. The target words analyzed contained either velar codas (bug vs. duck) or bilabial codas (tub vs. cup), and the data were coded for the presence vs. absence of five acoustic cues that have been associated with the voicing contrast in coda stop consonants in adults. The results showed that both children exhibited systematic cues to coda voicing contrasts. That is, a voice bar during stop closure and an epenthetic vowel after the release appeared more frequently for voiced codas, whereas noise at the end of the vowel and noise after the coda release were produced more frequently for voiceless codas. The fifth cue, glottalization at the end of the vowel, which is more common before voiceless codas in adult speech, did not occur often enough to reveal a difference with respect to the voicing feature of the coda in this study, perhaps because their target words did not include coda /t/, which is the most common context for such coda-related glottalization in adult speech. This fine-grained acoustic analysis based on the feature-cue approach to the signaling of phonological contrasts (Stevens, 2002; Keyser and Stevens, 2006) provided detailed information about the acoustic correlates of the voicing feature in child speech. However, since the study only looked at children, it was not possible to determine precisely how these cue values differed from those of adults.

In contrast to the acoustic studies of voicing discussed above, research on children’s development of PoA contrasts have been largely perception- and transcription-based. There is therefore limited understanding of how PoA features are realized acoustically in child speech. For example, Irwin (1947) examined the early emergence of consonants with different PoA using phonological transcriptions; during the first months of life, it was glottals (i.e., /h/) that appeared most frequently, followed by velars. By the age of 2;5, the frequency of glottal consonants gradually dropped, and the frequency of consonants produced at other PoAs, such as alveolars and labials, increased. In one of the few acoustic studies, Imbrie (2005) examined various cues to PoA in children’s onset stops, including VOT, burst duration, and number of bursts.

Transcriptional studies have noted that some phonological processes commonly observed in early speech, such as velar fronting (e.g., go [go] → [do]) (Inkelas and Rose, 2007; McAllister, 2009) and consonant harmony (e.g., cat [kæt] → [kæg]) (Pater and Werle, 2003), involve changes in consonant PoA. Since children often make such PoA “errors,” one might assume that their representation of PoA features is incomplete or non-adult-like. However, an extensive body of psycholinguistic literature on categorical perception shows that infants below the age of 5 months are sensitive to acoustic variations that define various phonological features, such as voicing, manner, and PoA (Eimas, 1974; Eimas and Miller, 1980; Eimas et al., 1971). By the end of the first year of life, infant sensitivity to such acoustic cues has been refined to reflect the phonological structure of the ambient language (Kuhl et al., 1992; Werker and Tees, 1984). Although infants below the age of 2 have been shown to be less successful in demonstrating this sensitivity when the tasks of phonetic discrimination involve word processing or word learning that require attention to meaning (Stager and Werker, 1997), more recent studies demonstrate that infants as young as 19 months show linearly graded sensitivity to the number of feature mismatches when appropriate tasks were used (White and Morgan, 2008). That is, the time that infants spent looking at a picture of the target (e.g., shoe) decreased as the degree of mismatch in features increased [e.g., shoe (correct pronunciation) > foo (change in PoA) > voo (change in PoA and voicing) > goo (change in PoA, voicing, and manner)].

In sum, the previous literature suggests that children below the age of 2 may have sophisticated phonological knowledge about voicing and PoA contrasts in words, even before they can reliably produce them in an adult-like way. One source of information about what a child knows is the pattern of acoustic cues that the child produces; that is, the child may indicate knowledge of a phonemic contrast by producing some of the cues that an adult uses, before developing the ability to produce the full pattern of adult-like cues. However, much is still to be learned about the development of acoustic cues to voicing and PoA contrasts in early child speech, especially with respect to codas.

The primary goal of the present study was, therefore, to provide a more comprehensive understanding of how and when certain acoustic correlates of voicing and PoA contrasts are acquired for stops in word-final coda position. Specifically, we wanted to know (a) which of the possible cues children produce early in acquisition, (b) whether children’s production of the cues initially differs from that of adults, and if so, how and when do they become more adult-like, and (c) what the implications of these findings are for the development of phonological representations more generally.

To address these issues, we conducted an acoustic analysis of spontaneous speech productions collected from 6 mother-children dyads speaking American English in a longitudinal study. In particular, we investigated a selected set of individual non-spectral acoustic cues to the voicing and PoA features in stop codas, rather than simply noting whether an adult listener heard the target segment or not. This approach grew out of Stevens’ (2002) model of speech production and perception based on individual acoustic cues to distinctive features. This model proposes that a given feature contrast may be signaled by a number of different acoustic cues, and that the precise set of cues that a speaker employs may vary depending on the other features in the feature bundle, as well as on the segmental and structural context in which the feature occurs. Performing analyses at the level of acoustic cues therefore provides a much richer and more systematic constellation of observations than performing analyses at the level of the segment (or even the feature) alone. Specifically, it has the potential to capture critical details about phonetic variation patterns in children’s productions, how these develop over time, and how these may differ from those of adults.

Study 1 examined cues to the voicing contrast, and Study 2 examined cues to PoA contrasts. The values for these cues were averaged across participating adults (mothers) and children. For some cues, we examined the average duration or number of cues (e.g., number of release bursts) as a function of voicing and PoA (e.g., the average duration of the vowel before voiced vs. voiceless codas in adult and child speech), and for other cues, we examined the frequency of the presence of a cue as a function of coda voicing and PoA (e.g., the frequency of occurrence of the voice bar during the closure of voiced vs. voiceless codas in adult and child speech).

Although we hypothesized that these cues would vary systematically with voicing and PoA in both mothers’ and children’s speech, we also expected some differences between the two populations. For example, Imbrie (2005) examined the development of the acoustic patterns of onset stop consonants by taking various durational, amplitude, spectral, formant, and harmonic measurements on 1049 utterances produced by ten 2;6–3;3-year-old children. The acoustic analysis revealed high variability in many of the measures including VOT, F0, intensity, and F2 at vowel center, suggesting that children were overall less consistent than adults in controlling and coordinating their articulatory gestures, vocal fold stiffness, and respiration. In addition, she argued that their smaller articulator size and high subglottal pressure were probably responsible for the high incidence of multiple release bursts for stops. In that study, some aspects of the children’s speech became more adult-like over the 6 month observation period, but their gestures were still far from producing adult-like acoustic patterns at the end of the study, when the children were 3;0–3;9. In the present study, we examined the speech of younger 1;6–2;6-year-olds. Since coda consonants are typically acquired later than onsets, we predicted that the codas of these younger children would exhibit even more inconsistent use of some of the cues when compared to the productions of adults. In addition, we expected, based on Imbrie’s (2005) observations (e.g., multiple release bursts), that some cues would have exaggerated values, but that the children would move toward more adult-like use of these cues by the end of the study period.

STUDY 1: THE DEVELOPMENT OF NON-SPECTRAL ACOUSTIC CUES TO VOICING CONTRASTS IN STOP CODAS

Method

Subjects and database

The data examined in this study came from the Providence Corpus (Demuth et al., 2006), a collection of spontaneous speech interactions between 6 mother-child pairs from the New England Area.1 All 6 children (3 boys, 3 girls) were typically developing, monolingual speakers of American English. The parents of 2 children spoke the dialect typical of Southern New England, which is often characterized by the omission of postvocalic /r/. The parental input for the other 4 children more closely resembled Standard American English.

Digital audio/video recordings were collected in the children’s homes, approximately 1 h every 2 weeks for 2 years. Recording started between the ages of 0;11–1;4, depending on when each child started producing words. During the recording, both mother and child wore a wireless Azden WLT/PRO VHF lavalier microphone pinned to their collar as they engaged in everyday activities. The recordings were made using a Panasonic PV-DV601D-K mini digital video recorder. The audio from the video was later extracted and digitized at a sampling rate of 44.1 KHz. Both the mothers’ and children’s speech were orthographically transcribed using Codes for the Human Analysis of Transcripts (CHAT) conventions (MacWhinney, 2000). The children’s utterances were also transcribed by trained coders using International Phonetic Alphabet (IPA) transcription, showing the phonemic representations of words and the position of stressed syllables. Ten percent of the child data from each recording session were re-transcribed by a second transcriber. Transcription reliability of overall coded segments ranged from 80% to 97% across files in terms of presence/absence of segments and place/manner of articulation. Voicing is difficult to reliably code in young children’s speech, and was therefore not assessed in these reliability measures.

Data

We first extracted the highest frequency monosyllabic CVC content words in the Providence Corpus, selecting those ending in alveolar (/t/, /d/) and velar (/k/, /g/) stops. Words starting with glides or liquids (e.g., red) were excluded, due to the difficulty of identifying the beginning of vowels in such utterances for duration measures. The final set of target words contained either voiceless codas (bat,cat,hot,boat,feet,hat,back,book,duck,) or voiced codas (bed,food,good,head,side,big,dog,pig). We examined both child and adult productions of these words when the child was 1;5–1;7, 1;11–2;1, and 2;11–3;1, i.e., during time periods centered on 1;6, 2;0, and 2;6 years of age, respectively. This provided a reasonable number of tokens for each speaker and allowed us to explore developmental effects, both in the child speech and in that of their mothers during the same time periods.

We then extracted all the audio files of the sentences containing these target words. For individual mothers and children at each age, we coded the first 10 acoustically clean tokens of a target word in utterance-final position (e.g., It’s big) and the first 5 acoustically clean tokens in each of four utterance-medial contexts: before consonant-initial words (e.g., big spoon), before glide-initial words (e.g., big wagon), before words beginning with a stressed vowel (e.g., big apple), and before words beginning with an unstressed vowel (e.g., big as). This provided some controlled variability for the medial context, and a maximum of 30 tokens per target word per speaker per age. Unusable tokens included those with poor acoustic quality, often because of overlap with other speaker’s vocalizations or background noise. The final set of data included 2928 tokens.

As shown in Table TABLE I., the tokens analyzed were relatively evenly distributed across voiceless vs. voiced codas, ages, and positions within the utterance. The children had more utterance-medial tokens at later ages, probably due to the fact that utterance length increases with age. For both mothers and children, more than half of the utterance-medial tokens were followed by consonant-initial words (Mothers: 56%, Children: 61%). About a quarter of the utterance-medial words were followed by words beginning with an unstressed vowel (Mothers: 24%, Children: 18%). Target words followed by glide-initial words (Mothers: 11%, Children: 11%) and by words beginning with a stressed vowel (Mothers: 9%, Children: 10%) were the least common utterance-medial contexts. Importantly, the mothers and children showed similar distributions of utterance-medial words.

TABLE I.

Number of tokens analyzed in Study 1.

	Utterance-final position						Utterance-medial position
	Mothers			Children			Mothers			Children
	1;6	2;0	2;6	1;6	2;0	2;6	1;6	2;0	2;6	1;6	2;0	2;6	Total
Voiceless	191	120	134	107	62	91	171	188	187	43	40	105	1439
Voiced	141	108	110	60	48	84	215	242	233	51	72	125	1489
Total	332	228	244	167	110	175	386	430	420	94	112	230	2928

Open in a new tab

Table TABLE II. further shows a breakdown of tokens contributed by each subject at different points in time. Overall, the numbers were similar among mothers and children respectively, suggesting that the amount of each participant’s contribution to analyses was comparable. At the same time, one of the mother-child dyads (Mother 4 and Child 4) had a greater number of tokens compared to other pairs; this is due to the fact that Mother 4 and Child 4 had denser corpora, with weekly recordings during the time of investigation. Child 1 was slower in language development and did not produce many words in the 1;6 and 2;0 samples.

TABLE II.

Number of tokens contributed by each subject at each age.

	Utterance-final position			Utterance-medial position
	1;6	2;0	2;6	1;6	2;0	2;6	Total
Mother 1	25	32	56	35	41	64	253
Mother 2	63	36	52	66	39	91	347
Mother 3	67	43	24	83	101	55	373
Mother 4	97	87	53	112	171	96	616
Mother 5	45	10	25	45	27	54	206
Mother 6	35	20	34	45	51	60	245
Total	332	228	244	386	430	420	2040

	1;6	2;0	2;6	1;6	2;0	2;6	Total
Child 1	0	6	65	0	0	33	104
Child 2	46	8	22	19	10	38	143
Child 3	20	30	21	1	33	24	129
Child 4	64	51	48	67	63	71	364
Child 5	12	7	9	2	4	36	70
Child 6	25	8	10	5	2	28	78
Total	167	110	175	94	112	230	888

Open in a new tab

Acoustic coding

We then examined the acoustics of each token, coding for individual cues to voicing. To this end, we developed a set of coding conventions using both visual information from the spectrogram and waveform and auditory information. Acoustic coding was carried out by several trained coders using Praat (Boersma and Weenink, 2005) to display and label the speech tokens. To evaluate inter-coder reliability, 10% of the tokens (293) were randomly chosen at age 2;0 and re-labeled by a second coder. The average difference in vowel duration and duration of post-release noise was 8 ms (SD = 12) and 16 ms (SD = 23), respectively. Agreement for the presence or absence of each of the other 3 cues was good (voice bar: 91%; glottalization at the end of a vowel: 93%; post-release noise: 91%). Pearson r correlations between the measurements of the original and recoded data were over 0.74 for all measures. All correlations were significant (p < 0.001), suggesting high inter-coder reliability.

Measures

We investigated five non-spectral acoustic cues to coda stop voicing (Fig. 1), all of which have been observed in adult speech (House, 1961; Fant and Lindblom, 1961; Redi and Shattuck-Hufnagel, 2001; Repp, 1979; Wright, 2004). Our coding criteria were defined as follows: (1) Vowel duration: the interval between the onset and offset of clear F2 energy in the spectrogram. (2) Voice bar: continued periodicity after the abrupt drop in amplitude (particularly in F2) that signaled closure for the stop coda. This was often characterized by a low frequency, low amplitude signal with a simpler waveform, reflecting continued vocal fold vibration during the stop closure without radiation of higher-frequency components of the source. (3) Glottalization at the end of a vowel: irregular, creaky-sounding pitch periods at the end of the vowel. This can result from various mechanisms, including strong adduction of the vocal folds. (4) Post-release noise: substantial noise following the coda stop release. The post-release noise sometimes showed the characteristics of frication followed by those of aspiration, but often this distinction was not clear, and sometimes only one of these possibilities was observed. Since our goal was to determine the presence of post-release noise, not its source, we simply determined whether there was substantial post-release noise or not. (5) Duration of post-release noise: the interval between the onset and offset of noticeable post-release noise. Note that some of these measures involve duration (1, 5), whereas others involve presence or absence of a cue (2, 3, 4).

Representative waveform and spectrogram for the word *dog* produced by a mother, illustrating the five acoustic cues examined: (1) vowel duration, (2) voice bar, (3) glottalization at the end of a vowel, (4) post-release noise, and (5) duration of post-release noise.

Predictions

Previous studies have shown that preceding vowels are longer and voice bars occur more frequently in the context of tautosyllabic voiced stops than for voiceless stops (e.g., House, 1961; Fant and Lindblom, 1961). Furthermore, in American English vowel glottalization is known to occur more often before voiceless stops (especially /t/) (Redi and Shattuck-Hufnagel, 2001) due to adjustment of the vocal folds to interrupt regular vibration. In contrast to the vocal fold separation that accompanies voiceless stops in onset position, the adjustment for codas is hypothesized to involve strong approximation of the vocal folds, sometimes inducing irregular vibration (glottalization) at the end of the vowel. Past studies have also demonstrated that intraoral air pressure is greater during the production of voiceless stops than voiced stops (Bernthal and Beukelman, 1978; Lisker, 1970), which may contribute to more frequent and longer duration post-release noise.

We therefore predicted that mothers would exhibit (1) longer vowel duration before voiced codas, (2) more frequent occurrence of voice bar during the closure of voiced codas, (3) more frequent glottalization at the end of vowels before voiceless codas, and (4) more frequent and (5) longer post-release noise for voiceless codas. Children were predicted to produce many of the same cues as the mothers, but to exhibit more frequent and longer post-release noise, since they have been reported to generate greater intraoral air pressure than adults during the production of voiceless stops (Bernthal and Beukelman, 1978). We also expected to find some developmental changes, with children approximating more adult-like patterns with age.

Results

In order to examine how the five acoustic cues varied with the voicing of the coda, we employed a mixed-effects regression analysis, which incorporates both random and fixed effects. This analysis was particularly appropriate for our spontaneous speech corpus data, because of the flexibility with which it is capable of handling missing values as well as unbalanced numbers of word tokens in individual speakers. Furthermore, mixed-effects models offer the advantage of providing insights into the full structure of the data by examining fixed- and random-effects factors simultaneously (for further information, see Baayen, 2008; Baayen et al., 2008; Quené and van den Bergh, 2004). In the present study, we wanted to capture idiosyncratic differences between speakers by treating them as a random-effect factor. We examined the mothers’ and children’s data separately, and the mothers and children in each analysis were treated as random effects.

In each of the mixed-effects regression models, the dependent variable was each of the five acoustic cues. Analyses are carried out using the R statistical computing software (R Development Core Team, 2011). We used mixed-effects logistic regression models for the binary cues (i.e., presence or absence), and generalized linear mixed-effects models for the durational cues. Individual tokens from the speakers were used as individual data points to compute the frequency of occurrence or the average duration of each acoustic cue. This is possible because the mixed-effects regression models take into account the within- and between-speaker variances in the data. For the independent variables, the models included one random-effect factor, speakers (mothers or children), and three fixed-effect factors: voicing (voiced vs. voiceless), position of the target word within the utterance (utterance-medial vs. utterance-final), and the interaction between the two factors (voicing × position). Earlier reports that individual cues can vary with the position of the target word in its utterance (e.g., Oller, 1973) motivated the consideration of utterance position.

In Secs. 1, 2, 3, 4, 5 below, we discuss detailed results of the models for each of the five cues to voicing. Table TABLE III. shows the results of mixed-effects regression analyses for mothers and children at each age, where the effect of the random factor (speakers) was controlled. As regards the vowel-final glottalization cue in child speech, only the main effects of voicing and position were examined, because at each age there was one voicing × position combination with no observation of glottalization, causing the odds ratio for the interaction to be undefined.

TABLE III.

Number of tokens from 6 mothers and 6 children, t-values, and the significance of the effect of each fixed factor (Note: ° = marginally insignificant effect (p = .05 or.06), * = p < .05, ** = p < .01, *** = p < .001). For voicing, the reference group was voiced codas; for position, the reference group was final codas.

	Mothers			Children
	1;6	2;0	2;6	1;6	2;0	2;6
Number of tokens	718	658	664	261	222	405
(1) Duration of preceding vowels
Voicing	−3.92***	−7.45***	−6.25***	−7.77***	−5.36***	−9.11***
Position	−8.35***	−15.95***	−12.92***	−5.72***	−8.42***	−11.17***
Voicing × Position	1.62	6.04***	4.97***	4.57***	3.36***	5.90***
(2) Presence of voice bar
Voicing	−10.50***	−8.83***	−10.05***	−6.23***	−5.46***	−6.05***
Position	1.01	0.71	−1.45	−1.45	0.80	0.14
Voicing × Position	0.65	0.53	−0.67	0.36	2.08*	2.03*
(3) Presence of glottalization at end of vowel
Voicing	4.29***	4.22***	4.20***	4.29***	1.13	1.06
Position	−3.67***	−2.95**	−2.21*	−3.79***	−3.74***	−5.15***
Voicing × Position	0.43	−0.41	0.20	n/a	n/a	n/a
(4) Presence of post-release noise
Voicing	−0.06	2.14*	2.56*	0.38	1.70	2.56*
Position	−10.45***	−8.94***	−7.70***	−5.16***	−5.67***	−6.02***
Voicing × Position	2.32*	2.68**	0.79	1.91°	1.34	−0.28
(5) Duration of post-release noise
Voicing	5.06***	4.01***	4.84***	−0.28	2.06*	3.23**
Position	−10.33***	−10.57***	−9.23***	−6.44***	−5.39***	−5.57***
Voicing × Position	−3.66***	−2.53*	−3.50***	2.26*	−0.89	−2.21*

Open in a new tab

Preceding vowel duration

Figure 2 shows the average vowel duration (in ms) before voiceless and voiced stop codas in utterance-medial (dotted lines) and utterance-final (solid lines) positions. The thick lines for the children are above thin lines for the mothers in most conditions, showing that the children overall exhibited vowel duration several tens of milliseconds longer than the mothers (utterance-medially: children: 175 ms, mothers: 120 ms; utterance-finally: children: 255 ms, mothers: 220 ms). As shown in Table TABLE III., vowel duration, on average, varied as expected with voicing: it was greater before voiced than before voiceless codas for both mothers and children, but this main effect of voicing was caused by a significant interaction between voicing and utterance-position. That is, although vowel duration was reliably longer before voiced codas utterance-finally (solid lines), there was no such difference utterance-medially (dotted lines). This is consistent with previous findings in adult speech corpora showing a reliable effect of voicing on vowel duration only in utterance-final position (Crystal and House, 1988). Finally, the main effect of position was significant in both mothers and even the youngest children, with longer vowel durations utterance-finally compared to utterance-medially.

(Color online) Average vowel duration (ms) before voiceless and voiced stop codas in utterance-medial (M) and utterance-final (F) position at each age. Error bars represent standard error.

Presence of voice bar

As predicted, for both mothers and children there was a significant main effect of voicing on the presence of voice bar, with more frequent occurrence of voice bars during the closure of voiced codas (Fig. 3). The probability of a voice bar was not significantly affected by the utterance position for either population. However, there were some significant interactions for the children at 2;0 and 2;6, with a greater difference in the percent voice bar between voiced and voiceless codas utterance-finally compared to utterance-medially. In sum, mothers and children overall showed similar patterns in using the voice bar to cue voicing contrasts, but differed in the effect of voicing × position interaction.

(Color online) The percent of voice bar before voiceless and voiced stop codas in utterance-medial (M) and utterance-final (F) position.

Presence of glottalization at the end of the vowel

For mothers, the presence of vowel-final glottalization varied systematically as a function of voicing and position (Fig. 4); vowel-final glottalization was, on average, more frequent before voiceless than before voiced stops, and in utterance-final position compared to medially. The voicing × position interaction, however, was not significant. Although the effect of voicing on this cue was significant for children at 1;6, surprisingly it was not significant at 2;0 and 2;6. Children produced vowel-final glottalization significantly more often utterance-finally than utterance-medially from 1;6, possibly because of a tendency toward final creak. As noted earlier, an examination of the interaction effect between voicing and position in medial position was not possible due to the sparse data problem in this position. In sum, glottalization varied systematically with position, but less systematically with voicing in early speech.

(Color online) Percent glottalization at the end of the vowel before voiceless and voiced stop codas in utterance-medial (M) and utterance-final (F) position.

Presence of post-release noise

For mothers, there was a significant main effect of voicing on post-release noise when the children were 2;0 and 2;6, with more frequent occurrences for voiceless stops (Fig. 5). The lack of voicing effect at 1;6 appears to be due to the high incidence of post-release noise for both voiceless and voiced stops, especially in utterance-final position. This is supported by a significant voicing × position interaction at 1;6 and 2;0, showing that the difference in percent post-release noise between voiceless and voiced tokens was larger in utterance-medial position than utterance-final position. Finally, the main effect of position was significant at all three ages, with more frequent occurrences of post-release noise utterance-finally compared to medially.

(Color online) Percent post-release noise for voiceless and voiced stop codas in utterance-medial (M) and utterance-final (F) position.

For children, the main effect of voicing was not significant until 2;6 (Fig. 5). The null effect at 1;6 and 2;0 appears to be due to the small difference between voiceless and voiced codas in utterance-final position. However, in medial position, post-release noise was more frequent for voiceless stops than for voiced stops from 1;6, as evidenced by a significant interaction between voicing and position. Children also showed a significant effect of position on this cue at all ages, with more frequent post-release noise in utterance-final position than medially. By 2;6 children decreased their use of post-release noise to near-adult levels in both utterance positions. In sum, mothers and children exhibited more frequent use of post-release noise for voiceless codas, but only in utterance-medial positions when the children were at earlier ages. The lack of an overall voicing effect appears to be due to the frequent occurrence of post-release noise for both voiceless and voiced codas in utterance-final position.

Duration of post-release noise

For mothers, all three factors (voicing, position, voicing× position) significantly affected the duration of post-release noise (Fig. 6). The significant main effect of voicing indicated that post-release noise was, on average, longer for voiceless codas than for voiced codas, but this effect was caused by a significant voicing × position interaction. That is, the overall effect of voicing was due primarily to differences found in utterance-final position, and less to variation in utterance-medial position. This pattern is consistent with results described above for the duration of the preceding vowel, where voicing-related effects were found mainly in utterance-final position. Lastly, the main effect of position was also significant, with longer post-release noise in utterance-final position than medially.

Children generally showed similar patterns to those of adults, although the main effect of voicing was not yet significant at 1;6; voiced (78 ms) and voiceless (75 ms) codas had equally long durations of post-release noise in utterance-final position (Fig. 6). As was also the case for presence of post-release noise, the duration of post-release noise for voiced codas decreased over time in utterance-final position, revealing a distinction between voiceless and voiced codas from 2;0. In utterance-medial position, voiceless codas had a particularly long post-release noise duration at 1;6 (42 ms), but this decreased by 2;6 (10 ms) to narrow the gap between voiceless and voiced codas to insignificance.

In sum, unlike mothers, who showed longer post-release noise for voiceless than for voiced codas in utterance-final position and less variation utterance-medially, 1;6-year-olds exhibited equally long post-release-noise durations for voiceless and voiced codas in utterance-final position, and longer durations for voiceless than for voiced codas medially. By 2;6, post-release noise duration for voiced codas in utterance-final position and that for voiceless codas in utterance-medial position decreased to resemble adult values. Thus, although children showed a significant effect of voicing × position interaction at 1;6 and 2;6, the interactions had opposite signs at the two ages. This suggests that the post-release noise duration cue to voicing is still developing in early child speech, but that children achieve more adult-like abilities by 2;6.

Summary of Study 1

Mothers’ speech consistently showed the predicted difference for all 5 cues to the coda voicing contrast, and children showed similar patterns for vowel duration and voice bar cues from 1;6 years. On the other hand, the presence and duration of post-release noise varied less systematically as a function of coda voicing in the children’s speech at earlier ages, primarily due to no systematic variation with voicing in utterance-final position. Finally, the effect of voicing on vowel-final glottalization was not reliable in child speech until 2;6.

During our examination of the data in Study 1, we noticed that some of the cues to voicing also seemed to vary systematically with the PoA of the stop coda. Thus, although non-spectral cues are not traditionally thought of as cues to PoA (more often considered to involve formant transitions and release-burst spectra), their pattern of occurrence may contain useful information about PoA as well. Therefore, in Study 2, we examined the effect of PoA on three of the cues in Study 1 (presence of vowel-final glottalization, presence and duration of post-release noise) and two additional coda-release-related cues that have been suggested in the adult literature to vary systematically with PoA: the presence and average number of release bursts.