Abstract
Several global and specific rhythm metrics and speech rate were used to characterize differences in the rhythms of 5- and 8-year-olds’ spoken English. The results were that only speech rate and the rate-normalized Pairwise Variability Index (nPVI) differentiated between 5- and 8-year-olds’ speech. A further result was that the variance in nPVI values was better explained by a specific measure devised to capture patterns of supralexical accenting than by the factor of age expressed in months. These results are taken to suggest that the protracted acquisition of English rhythm may be due in part to the slow rate at which children acquire prosodically conditioned vowel reduction.
Keywords: rate, rhythm, stress timing, speech acquisition
1. INTRODUCTION
A number of recent studies have measured the acquisition of speech rhythm and found that children take significantly longer to acquire the stress-timed pattern of English compared to the syllable-timed pattern of French or Spanish [7], [2]. For example, Post and Payne [9] have reported that English speaking 6-year-olds show less overall variability of vocalic interval durations than English speaking adults and more overall variability of consonantal interval durations. The current study was motivated by two questions that arise from these findings. First, are the global rhythm metrics based on interval durations sensitive enough to also distinguish between younger and older school-aged children’s speech? Second, what accounts for the protracted acquisition of stress timing described in previous studies?
Possible explanations for why stress timing takes longer to acquire than syllable timing are derived from explanations for the perception of distinct language rhythms. For example, one explanation follows from the idea that stress and syllable timing emerge from phonotactics and the presence/absence of vowel reduction [3],[10]. Specifically, the idea is that stress timing takes longer to acquire than syllable timing because the complex syllabic structures and vowel reduction of stress-timed languages take longer to acquire than the simpler syllable structures and full vowels of syllable-timed languages (see e.g., [7], [9]).
While phonotactics and vowel reduction likely provide part of the explanation for the protracted acquisition of stress timing [9], White and Mattys [15] point out that:
Language-specific prosodic timing processes, such as accentual lengthening, word-initial lengthening and phrase-final lengthening, should clearly also be considered in a full model of influences on vocalic and consonantal interval durations (p. 518).
Whereas English lexical stress patterns may be acquired quite early (see e.g., [1]), at least some of the phrase-level prosodic patterns that White and Mattys refer to are not [1], [14]. Since phrase-level prosody disproportionately affects vowel/rhyme durations [13], the slow acquisition of these higher-level patterns might explain why English speaking 6-year-olds show less overall variability of vocalic interval durations than adults: children may be less able to create maximal contrasts between accented and unaccented syllables.
To this list of explanations for the protracted development of stress timing, we might add a third—speech rate.
Children’s average rate of speech differs substantially from that of adults’, and speech rate changes slowly across childhood. For example, Sabin and colleagues [11] reported only two significant rate increases between kindergarten (5- or 6-year-olds) and college with the first significant increase occurring between kindergarten and 2nd grade (7- or 8-year-olds). This finding is important to the present discussion because speech rate also distinguishes between stress- and syllable-timed languages [5]. And, like other global metrics, it interacts with word- and phrase-level prosody. Vowel durations in unstressed syllables and in function words are compressed more at faster rates of speech than those in stressed syllables and in content words [6]. Thus adult speech may be characterized by greater variability in the sequential duration of vowels than child speech because adults have acquired the ability to disproportionately, but appropriately, reduce unstressed and unaccented vowels in order to gain articulatory speed.
2. METHODS
2.1. Participants
Ten typically developing 5-year-olds (M = 5;5, SD = 4 months) and 10 typically developing 8-year-olds (M = 8;4, SD = 6 months) participated in the study. All children were native American-English speakers from English dominant households. The children had no speech/language or hearing problems as determined by parental report and a pure-tone hearing screening.
2.2. Procedure and Materials
Children were audio recorded while completing the Recalling Sentences subtest from the Clinical Evaluation of Language Fundamentals (CELF-4) [12]. The subtest uses delayed imitation of sentences to evaluate a child’s linguistic development. The initial sentences in the test are shorter and syntactically simpler than later sentences, and the test always proceeds from the simpler sentences to the more complex ones. We administered the test in the normal way, except that we always began with the first sentence in the test regardless of a child’s age and we used prerecorded stimuli to control for sentence prosody and other phonological and phonetic characteristics of the model sentences.
The first 6 sentences (out of 32) from the Recalling Sentences subtest were almost always reproduced without error and were always produced readily and fluently by the children in the study. These sentences, which had a mean length of 9 syllables (SD = 1.9), provided the basis for rhythm measurement.
2.3. Acoustic Segmentation
Each of the 120 sentences was displayed as an oscillogram and spectrogram and hand segmented into consonantal and vocalic intervals. Only intervals that showed clear evidence of oral closure were considered consonantal. Those portions of sonorant consonants that could not be differentiated from the vowel by a drop in overall amplitude and/or the diminishment of higher formants were considered vocalic. Thus, glides and liquids very frequently contributed to the duration of vocalic intervals.
2.4. Rhythm Metrics
The consonantal and vocalic interval durations were extracted and used to calculate several global rhythm metrics, speech rate, and several specific rhythm metrics. The global rhythm metrics we chose were the rate-normalized measures of overall variability in vocalic and consonantal durations, nPVI [8] and varcoC [4] and the overall percentage of vocalic duration in an utterance, %V [10]. These were calculated for each sentence, and then averaged to provide a single nPVI, varcoC, or %V value for each child.
The specific rhythm metrics we devised were as follows: the average duration ratio of strong to weak vowels (S2W) from two phrase-initial and -medial trochaically-stressed words (tractor and rabbit); the average duration ratio of a content word vowel to a function word vowel (C2F) in 4 phrase-initial and 2 phrase-medial determiner noun phrases with monosyllabic nouns; and the average duration ratio of the ultimate and penultimate vowels (U2P) in 2 sentences with phrase-final determiner noun phrases (DP) that had monosyllabic nouns. The specific metric S2W was meant to captured the aspect of rhythm due to trochaic stress within a word, C2F was meant to capture the aspect due to accenting within supra-lexical prosodic units, and U2P that due to phrase-final lengthening.
Speech rate for each child was calculated as the average number of vocalic intervals per second of speech across the 6 sentences.
3. RESULTS
Global rhythm measures and speech rate were uncorrelated with one another, but the rate-normalized global measures correlated with a subset of the specific rhythm measures. Of the various metrics, only nPVI and speech rate distinguished between 5-year-old and 8-year-old speech. A regression analysis indicated that a significant proportion of nPVI variance was explained by DP accenting. These results are given in greater detail below.
3.1. Correlations Between Metrics
Table 1 shows the intercorrelations between the global rhythm measures (nPVI, varcoC, %V), the specific rhythm measures (S2W, C2F, U2P), and speech rate.
Table 1.
varco-C | %V | S2W | C2F | U2P | Rate | |
---|---|---|---|---|---|---|
nPVI | .076 | −.046 | .343 | .677 | .517 | .387 |
varco-C | −.357 | .727 | .353 | .527 | .077 | |
%V | −.218 | −.285 | −.146 | −.281 | ||
S2W | .430 | .377 | .219 | |||
C2F | .527 | .340 | ||||
U2P | .448 |
None of the global metrics correlated significantly with each other or with speech rate, but the metrics based on vocalic and consonantal interval durations correlated significantly with several of the specific rhythm metrics. nPVI was significantly correlated with DP accenting (C2F) and with final lengthening (U2P). VarcoC was significantly correlated with trochaic stress (S2W) and with final lengthening. Final lengthening was also significantly correlated with DP accenting.
3.2. Rhythmic Differences by Age Group
Given the multiple correlations between the measures, we chose to assess the effect of age group on rhythm in a MANOVA with Speaker added as a covariate.
The omnibus analysis showed that 5-year-olds’ rhythm patterns differed significantly from 8-year-olds’ rhythm patterns [F(7,11) = 5.52, p = .006]. Speaker had no effect on the dependent measures. The direction of difference between the groups is evident in Table 2 and 3 below, which display the group means and standard deviations for each of the measures by age group.
Table 2.
Metrics | |||||
---|---|---|---|---|---|
Age | nPVI | varcoC | %V | Rate | |
5 | M | 48.33 | 61.24 | 50.69 | 3.46 |
SD | 5.4 | 5.67 | 2.91 | 0.21 | |
8 | M | 55.00 | 62.41 | 49.01 | 3.99 |
SD | 5.81 | 7.32 | 2.72 | 0.19 |
Table 3.
Metrics | ||||
---|---|---|---|---|
Age | S2W | C2F | U2P | |
5 yo | M | 1.62 | 3.13 | 3.69 |
SD | 0.38 | 0.86 | 1.14 | |
8 yo | M | 1.84 | 3.55 | 4.72 |
SD | 0.44 | 0.60 | 1.39 |
It should also be evident from Tables 2 and 3 that 5-year-olds and 8-year-olds were best distinguished by measures of overall variance in vocalic interval durations (nPVI) and by speech rate: nPVIs were smaller and speech rates were slower in 5-year-olds’ speech compared to 8-year-olds’. These differences were found to be significant in Type III tests of the group effect [nPVI, F(1,17) = 8.47, p = .010; rate, F(1,17) = 30.82, p < .001].
3.3. Sequential Vocalic Interval Durations
The nPVI primarily reflects the summed differences of sequential vowel durations across an utterance. As noted in the introduction, multiple factors must contribute to this index of rhythm, which differentiates younger and older children’s speech. The significant intercorrelations between nPVI, C2F, and U2P suggest 2 possible sources for the vowel-to-vowel duration variability measured by the nPVI; namely, DP accenting and final lengthening. We used a regression analysis to determine whether one or both of these factors independently predicted nPVI when age in months was also entered as a predictor variable.
The model with 3 predictors explained 58% of the variance in nPVI values. Final lengthening (U2P) contributed very little towards this result, and without it the model explained 52% of the variance in nPVI values. DP accenting (C2F) was the best predictor of nPVI values [b = 4.48, t(16) = 2.74, p = .014], surpassing age-in-months, which was only a significant predictor of nPVI values when U2P was removed [b =.13, t(17) = 2.16, p = .046]. By itself, DP accenting explained 46% of the variance in nPVI values. This strong relationship between vowel-to-vowel duration differences and patterns of supralexical accenting is evident in Figure 1, which shows the relationship between nPVI and C2F values.
4. GENERAL DISCUSSION
The current study sought to understand the relationship between global and specific measurements of speech rhythm, how these measurements might combine to distinguish younger and older children’s speech, and whether such combinations might help to answer the question of why rhythm acquisition is so protracted in English. The principal findings were that only speech rate and the nPVI differentiated between 5- and 8-year-olds’ speech and that variance in nPVI values was better explained by a specific measure devised to capture patterns of supralexical accenting than by age.
Overall, we find the results to be consistent with the argument that rhythm emerges not only from the lexical stress patterns and phonotactics of a language, but also from patterns of accenting across a phrase. And from this explanation for language rhythm, we also find an explanation for the protracted acquisition of stress timing: it requires the acquisition of most aspects of English prosody.
Stress timing also requires the mastery of vowel reduction and vowel lengthening. The finding that younger and older English speaking children do not differ significantly on measures of phrase-final lengthening suggests that lengthening is mastered fairly early. Young children’s slower speech rates may indicate that they have yet to master vowel reduction.
ACKNOWLEDGEMENTS
We are grateful to the 2009–2011 testing team for their help with data collection. This work was supported by grant number 5R01HD061458.
Contributor Information
Hema Sirsa, Email: hsirsa@uoregon.edu.
Melissa A. Redford, Email: redford@uoregon.edu.
REFERENCES
- 1.Allen G, Hawkins S. The development of phonological rhythm. In: Bell A, Hooper J, editors. Syllables and segments. North-Holland: Amsterdam; 1978. pp. 173–185. [Google Scholar]
- 2.Bunta F, Ingram D. The acquisition of speech rhythm by bilingual Spanish- and English-speaking four-and five-year-old children. Journal of Speech, Language and Hearing Research. 2007;50:999–1014. doi: 10.1044/1092-4388(2007/070). [DOI] [PubMed] [Google Scholar]
- 3.Dauer R. Stress-timing and syllable-timing reanalyzed. Journal of Phonetics. 1983;11:51–62. [Google Scholar]
- 4.Dellwo V. Rhythm and speech rate: A variation coefficient for deltaC. In: Karnowski P, Szigeti I, editors. Language and language processing: Proceedings of the 38th linguistic colloquium. Peter Lang: Frankfurt; 2006. pp. 231–241. Piliscsaba 2003. [Google Scholar]
- 5.Dellwo V. Speech Prosody. Campinos, Brazil: 2008. The role of speech rate in perceiving speech rhythm. isca-speech.org. [Google Scholar]
- 6.Gay T. Effect of speaking rate on vowel formant movements. Journal of the Acoustical Society of America. 1978;63:223–230. doi: 10.1121/1.381717. [DOI] [PubMed] [Google Scholar]
- 7.Grabe E, Gut U, Post B, Watson I. The acquisition of rhythm in English, French, and German. In: Barriere I, Morgan G, Chiat S, Woll B, editors. Current research in language and communication: Proceedings of the child language seminar. London: City University; 1999. pp. 157–163. [Google Scholar]
- 8.Grabe E, Low E. Papers in Laboratory Phonology 7. Cambridge: Cambridge University Press; 2002. Durational variability in speech and the rhythm class hypothesis. [Google Scholar]
- 9.Post B, Payne E. Phonological factors in rhythmic development: A cross-linguistic study. Workshop on Prosodic Development; Barcelona, Spain. 2010. [Google Scholar]
- 10.Ramus F, Newpor M, Mehler J. Correlates of linguistic rhythm in the speech signal. Cognition. 1999;73:265–292. doi: 10.1016/s0010-0277(99)00058-x. [DOI] [PubMed] [Google Scholar]
- 11.Sabin E, Clemmer E, O’Connell D, Kowal S. A pausological approach to speech development. In: Siegman A, Feldstein S, editors. Of speech and time: temporal speech patterns in interpersonal contexts. Hillsdale, NJ: Lawrence Erlbaum; 1979. pp. 35–55. [Google Scholar]
- 12.Semel E, Wiig EH, Secord WA. Clinical evaluation of language fundamentals, fourth edition (CELF-4) San Antonio, TX: The Psychological Corporation/A Harcourt Assessment Company; 2003. [Google Scholar]
- 13.Turk AE, White L. Structural influences on accentual lengthening in English. Journal of Phonetics. 1999;27:171–206. [Google Scholar]
- 14.Wells B, Peppe S, Goulandris N. Intonation development from five to thirteen. Journal of Child Language. 2004;31:749–778. doi: 10.1017/s030500090400652x. [DOI] [PubMed] [Google Scholar]
- 15.White L, Mattys SL. Calibrating Rhythm: First Language and Second Language Studies. Journal of Phonetics. 2007;35:501–522. [Google Scholar]