Skip to main content
Journal of Speech, Language, and Hearing Research : JSLHR logoLink to Journal of Speech, Language, and Hearing Research : JSLHR
. 2022 Oct 19;65(11):4025–4046. doi: 10.1044/2022_JSLHR-22-00124

Prosodic Development During the Early School-Age Years

Jeffrey E Kallay a, Laura Dilley b, Melissa A Redford a,
PMCID: PMC9940891  PMID: 36260352

Abstract

Purpose:

This study used a cross-sequential design to identify developmental changes in narrative speech rhythm and intonation. The aim was to provide a robust, clinically relevant characterization of normative changes in speech prosody across the early school-age years.

Method:

Structured spontaneous narratives were elicited annually from 60 children over a 3-year period. Children were aged 5–7 years at study outset and then were aged 7–9 years at study offset. Articulation rate, prominence spacing, and intonational phrase length and duration were calculated for each narrative to index speech rhythm; measures of pitch variability and pitch range indexed intonation. Linear mixed-effects (LME) models tested for cohort-based and within-subject longitudinal change on the prosodic measures; linear regression was used to test for the simple effect of age-in-months within year on the measures.

Results:

The LME analyses indicated systematic longitudinal changes in speech rhythm across all measures except phrase duration; there were no longitudinal changes in pitch variability or pitch range across the school-age years. Linear regression results showed an increase in articulation rate with age; there were no systematic differences between age cohorts across years in the study.

Conclusions:

The results indicate that speech rhythm continues to develop during the school-age years. The results also underscore the very strong relationship between the rate and rhythm characteristics of speech and so suggest an important influence of speech motor skills on rhythm production. Finally, the results on pitch variability and pitch range are interpreted to suggest that these are inadequate measures of typical intonation development during the school-age years.


Prosody is the integrated rhythm and intonation patterns of spoken language. It is impacted in speech produced by children with a wide range of developmental disorders (Paul et al., 2020): from dysarthria and childhood apraxia of speech (Patel et al., 2012; Peter & Stoel-Gammon, 2005) to autism spectrum disorder and William's syndrome (Catterall et al., 2006; Ito & Marten, 2017; Martinez-Castilla et al., 2011; Peppé, 2009, 2018). The breadth of impact reflects the extended domain of the prosodic system. Prosody emerges from stress patterns at the syllable and word level and from lengthening, accenting, and pausing patterns at the phrase and discourse level. The extended domain of prosody makes it one of the most salient features of any individual's speech and so also a salient marker of difference from one's peers. We know that atypical rhythm and intonation patterns increase the perception of disorder (Olejarczuk & Redford, 2013; Redford et al., 2018; Shriberg et al., 2001) and that perception of disorder adversely affects a child's access to positive peer interactions (e.g., McCabe & Meller, 2004; Redford et al., 2018; Rice et al., 1991). Isolation from one's peers undermines a child's social–emotional development and cognitive functioning (Rubin et al., 2015). Speech intelligibility also suffers with atypical prosody (Klopfenstein, 2009), an observation that has been made many times and across many different populations with disordered speech prosody, including in children with Down syndrome (Kent & Vorperian, 2013; Stojanovik, 2011; Wilson et al., 2019), autism spectrum disorder (ASD; Redford et al., 2018), childhood apraxia of speech (Connaghan & Patel, 2012), dysarthria (Patel et al., 2012), and hearing impairment (Chin et al., 2012; Parkhurst & Levitt, 1978). Given the breadth of impact of prosody on communication, there is a recognized need to equip speech-language pathologists with tools to assess and treat prosodic disorder in order to improve children's access to positive social interaction and speech intelligibility (Hawthorne & Fischer, 2020; Kalathottukaren et al., 2015; Peppé, 2009).

Tools and strategies for prosodic assessment must reference normative development to be useful and effective (see Diehl & Paul, 2009). However, the empirical description of prosody in typically developing children's speech is incomplete. Extant research has mostly focused on very early acquisition (e.g., Arciuli & Colombo, 2016; Jusczyk et al., 1993; Kehoe, 2000; Kehoe et al., 1995; Payne et al., 2011; Schwartz et al., 1996; Snow, 1994, 1997; Vihman et al., 2006). There is a relative dearth of information on the later stages of development. This study seeks to address this deficit. The aim is to provide a clinically useful description of prosodic development during the school-age years. To do this, we focus on narrative speech rather than on isolated words and sentences, as well as on global measures of rhythm and intonation rather than on the linguistic and phonetic detail of these prosodic attributes.

Much of the prior research on prosody in children's speech addresses its acquisition as part of the linguistic grammar. Our specific interest is in the emergence of speech prosody at the intersection of language and speech motor skills. Both sets of skills show protracted developmental trajectories: Higher level syntax and discourse skills continue to change throughout the school-age years (Nippold, 2016); speech motor skills show significant change during the early school-age years and continue to be refined until at least 14 years of age (Smith & Zelaznik, 2004). The influence of developing speech motor skills on prosody production is likely to be especially evident in narrative speech where the task demands are high relative to single-word or single-sentence elicitations. This suggestion is based on the effect of language complexity and task difficulty on speech motor performance in children and adults: articulatory timing becomes more variable with increasing linguistic complexity and speech task difficulty (Maner et al., 2000; Saletta et al., 2018). Under the assumption that prosody emerges at the intersection of language and speech motor skills, the expectation is that global speech prosody will change in measurable ways across the school-age years.

Background

Speech Rhythm Development

Most of what we currently know of the typical development of speech rhythm is based on words and phrases elicited from very young children (e.g., Arciuli & Colombo, 2016; Jusczyk et al., 1993; Kehoe, 2000; Kehoe et al., 1995; Schwartz et al., 1996; Snow, 1994, 1997; Vihman et al., 2006). For example, Kehoe (2000) elicited multisyllabic words from 22- to 34-month-old children to investigate the developmental trajectory of weak syllable deletion. Earlier studies had established the development of acoustic correlates of lexical stress in single-word productions elicited from children aged 18–30 months (Kehoe et al., 1995; Schwartz et al., 1996). Vihman (1996) used perceptual measures to investigate the development of adultlike rhythm patterns in disyllabic words produced by even younger infants (11–17 months old). Other studies have investigated rhythm development in multiword phrases elicited from older children but still mostly in the preschool age range (e.g., Grabe et al., 1999; Lleo et al., 2007; Payne et al., 2011). These studies generally suggest that children acquire a stress-timed language rhythm more slowly than they acquire a syllable-timed language rhythm. English is considered a stress-timed language.

The focus of prior studies on very young children's speech might be taken to suggest that rhythm acquisition is complete before the school-age years. However, a handful of studies with older children strongly suggest that this is not the case for children who are acquiring English. More specifically, study findings indicate the protracted development of phonetic detail associated with lexical stress production (Arciuli & Ballard, 2017; Ballard et al., 2012), function word reduction (Redford, 2018), and rhythm-related temporal patterning (Polyanskaya & Ordin, 2015; Sirsa & Redford, 2011). Like most studies with young children, the studies with older children are based on highly controlled speech materials and use cross-sectional study designs to infer patterns of development. For example, Arciuli and Ballard (2017) used a picture naming task to elicit four multisyllabic words with contrastive stress patterns from 77 Australian English–speaking children who were between 8 and 11 years old at the time of study. They found that, as a group, older children did not modulate amplitude in weak–strong sequences in the same way as adults. Redford (2018) elicited simple subject–verb–object sentences from 24 school-age children and 12 adults to study function word reduction. Using measures of relative amplitude and duration, she found that even 8-year-old children did not reduce unstressed function words to the same extent as adults.

The aforementioned differences between school-age children's and adults' stressed syllable production contribute to measured differences in the temporal patterning of child and adult speech (e.g., Sirsa & Redford, 2011). Setting aside questions about the link between temporal patterning and speech rhythm perception (Arvaniti, 2012), interval-based rhythm metrics clearly indicate change across the school-age years. For example, Polyanskaya and Ordin (2015) elicited a set of sentences from 10 British English–speaking adults and 42 British English–speaking children, who ranged in age from 4 to 11 years old. The sentences were segmented into consonantal and vocalic intervals; mean segment duration, mean syllable duration, and rhythm metrics were calculated based on these intervals (e.g., VarcoC, %V; see the study of White & Mattys, 2007). Metrics that captured interval variability (e.g., VarcoV and VarcoC) varied systematically with age and distinguished younger children's speech from older children's speech; mean segment and syllable durations distinguished between every age group, including between older children's speech and adults' speech.

Polyanskaya and Ordin's (2015) findings suggest a relationship between articulation rate and rhythm. The same relationship has been identified in adult language. For example, Arvaniti (2012) found significant and consistent differences within a language when interval-based rhythm metrics were calculated for spontaneous speech versus connected read speech versus sentence reading, presumably because of differences in articulation rate. More relevant to this study, Dellwo (2010) showed that articulation rate is as effective as different interval-based rhythm metrics for distinguishing between language rhythm classes, such as stress-timed versus syllable-timed languages (German vs. French). Moreover, he found that listeners rely on rate more than on variability in segment durations to discriminate timing patterns. We take this latter finding to suggest that articulation rate can be used to index the temporal patterning in speech associated with language rhythm.

The finding that articulation rate differentiates rhythm classes is consistent with the hypothesis that prosody emerges at the intersection of language and speech motor skills. After all, increasing articulation rate indexes better motor skills in children's speech (Lee et al., 1999; Mahr et al., 2021; Redford, 2014), and articulation rate is positively correlated with more mature rhythm-related temporal patterning in children's speech (Polyanskaya & Ordin, 2015). The hypothesis also helps make sense of why rhythm (and rate) are impacted in developmental disorders that are classified as neuromotor speech disorders rather than language disorders (e.g., childhood apraxia of speech and dysarthria). The hypothesis also predicts that other global rhythm measures besides articulation rate will change across the school-age years with the development of language and speech motor skills.

Development of Intonation

Similar to the literature on speech rhythm development, the literature on intonational development is mainly focused on words and phrases elicited from very young children's speech (for relevant reviews, see the studies of Chen et al., 2020; Frota & Butler, 2018). However, unlike the developmental literature on speech rhythm, there is less evidence for the protracted development of intonation patterns. In fact, Frota and Butler argue that intonation develops “independent(ly) from the onset of combinatorial speech (p. 156).” More specifically, they argue that the acquisition of language-specific accentual patterns and boundary tones is largely complete by 2 years of age. Yet, other work suggests that speech motor control over fundamental frequency (f o), the acoustic correlate of intonation, has a protracted developmental trajectory.

Lee et al. (1999) demonstrated that pitch variability, measured as the standard deviation in f o, declined from 5 years of age to around 14 years of age before leveling out; the decline between 5 and 8 years of age was found to be especially dramatic. The results were based on single words elicited from a very large sample of typically developing children. Consistent with a motor interpretation of the results, Lee et al. reported similar age-related declines in the temporal and spectral variability with which the same children produced consonants and vowels in the elicited words. The motor interpretation of decreasing variability in measures of pitch also suggests a particular interpretation of the finding that pitch variability and pitch range are greater in speech produced by children with ASD compared to speech produced by their typically developing peers (Bone et al., 2014; Bonneh et al., 2011; Fosnot & Jun, 1999). In particular, it suggests a pattern of prosodic delay rather than disorder in children with ASD (Diehl et al., 2009). However, the main point is that, even in typical development, the intonational aspect of speech prosody may continue to change into the school-age years due to the protracted development of f o control.

Another reason to suspect that intonation may continue to develop into the school-age years is because some of its linguistic aspects only develop with the acquisition of higher language abilities (Chen et al., 2020; Wells et al., 2004). For example, Wells et al. (2004) used a battery of tasks designed to assess intonation comprehension and production in 120 British English–speaking children between 5 and 13 years of age. They found that children were at ceiling on most production tasks (e.g., contrastive focus marking), but younger school-age children had trouble marking familiarity versus surprise when repeating back a word that was either familiar (e.g., carrot) or unfamiliar (e.g., pargle); that is, the children had trouble with a specific semantic–pragmatic function of intonation. Relatedly, Chen and colleagues have shown the delayed acquisition of certain semantic–pragmatic functions of intonation (e.g., Chen, 2009, 2011; Romøren & Chen, 2014). For example, Chen (2011) found that, whereas the overall intonation pattern of different types of phrases had been acquired by 5 years of age, Dutch-speaking children did not systematically mark topic versus focus correctly until 8 years of age. The protracted development of this aspect of intonation likely depends on cognitive development and the acquisition of higher level language skills (Chen et al., 2020). Given that Dutch is like English in that it is an intonational language, these findings reinforce the expectation that the intonational aspects of speech prosody will continue to change during the school-age years.

This Study

The typical development of prosody during the school-age years has not been studied systematically. There is a particular gap in the description of prosody in children's spontaneous speech. Gaps in the description of typical prosodic development have negative consequences for the assessment and treatment of prosodic disorder in children (Diehl & Paul, 2009; Kalathottukaren et al., 2015). This study seeks to address these gaps. To do this, we calculated global measures of rhythm and intonation from narrative speech samples that were collected once a year for 3 years from English-speaking school-age children who were between 5 and 7 years old at the start of the 3-year period. The description of typical prosody in school-age children's narrative speech is likely to be especially useful for clinical purposes. After all, this type of speech is frequently elicited during assessment to characterize the global patterns of a child's spontaneous speech (Heilmann et al., 2010). In addition, prosody is no different from other aspects of language: Longer speech samples provide a better foundation for its description than shorter speech samples. Importantly, some aspects of prosody, such as rhythmic phrasing, only emerge in longer speech samples. Narrative speech also facilitates the study of prosody production at the intersection of developing speech motor and language skills: Narratives are associated with more complex language than conversational speech (Nippold et al., 2014); a narrative task is also more challenging than other commonly used speech elicitation tasks and so is more likely than these other tasks to tax the speech motor system (see Saletta et al., 2018), rendering speech motor effects on prosody production more evident than they might otherwise be.

Study Predictions

The global rhythm measures calculated from the children's narrative speech samples were articulation rate and perceptually based measures of stress timing and rhythmic phrasing. The global intonation measures calculated were pitch variability and pitch range. The predictions were that the global measures of rhythm would increase over developmental time with age-related development in motor speech and language skills, whereas the global measures of intonation would decrease with the development of these skills. In what follows, we give the motivation for each measure and for its predicted increase or decrease.

Since language rhythm cannot be reduced to temporal patterning, it cannot be reduced to articulation rate. In linguistics, rhythm is understood to emerge from the spacing of prominences and from the timing of prosodic boundaries (Arvaniti, 2012; Ladd, 2008). Prominence is a perceptual construct (Ladd, 2008). The perception of prominence is cued by a combination of syllable lengthening (= syllable-to-syllable differences in acoustic duration), syllable strengthening (= syllable-to-syllable differences in acoustic intensity), and accenting (= phrase-internal f o minima or maxima). Prosodic boundaries are also multiply cued, including by syllable lengthening, boundary tones (e.g., a final fall or rise in f o), and pausing. Accordingly, in this study, prominence spacing and the timing of prosodic boundaries were calculated based on perceptual judgments, according to the Rhythm and Pitch (RaP) labeling system (Dilley & Brown, 2005; Breen et al., 2012).

  • Prominence spacing. This measure captures the alternation in prominences that is typical of a stress-timed language like English. It has long been reported that younger English-speaking children produce speech rhythms that are more equally stressed (or, syllable-timed) than the speech rhythms produced by older children and adults (Allen & Hawkins, 1980; Grabe et al., 1999). This perception is likely tied to the slow development of function word reduction (Allen & Hawkins, 1980; Sirsa & Redford, 2011), which itself is likely tied to speech motor skill development (Redford, 2018). The prediction was therefore that prominence spacing in narrative speech would increase with children's age from a situation where nearly every syllable is perceived as prominent to one where prominent syllables occur (at most) every other syllable.

  • Phrasing. The specific measures of phrasing in this study were phrase length, phrase duration, and variability in phrase length. Like the other global measures of rhythm, the prediction was that these would increase with developmental time. However, unlike with the other measures, this prediction was based on the assumption that developing cognitive and language skills are the most important factor driving changes in the rhythmic phrasing of children's speech. Prosodic phrase boundaries are, after all, usually aligned with strong syntactic boundaries (Shattuck-Hufnagel & Turk, 1996) and so their timing will depend on the length of a clause and/or communication unit. Along these lines, we have previously found a positive relationship between syntactic complexity and the mean length of pause delimited utterances in children's narrative speech (see, e.g., the study of Kallay & Redford, 2021). This relationship predicts a developmental increase in phrase length and phrase duration with developing language skills. In addition, as syntactic complexity increases, more diverse expressions are possible. A greater diversity of expression should result in a greater range of phrase lengths. The prediction is, therefore, that phrase lengths will also become more variable with development because language complexity increases with age.

  • Pitch variability and pitch range. The global intonation measures of pitch variability and pitch range have been used extensively in studies designed to characterize speech differences between children with typical and atypical prosodic development (e.g., Bone et al., 2014; Bonneh et al., 2011; Diehl et al., 2009; Fosnot & Jun, 1999). Testing for developmental changes in these measures is thus particularly well aligned with the goal of providing a clinically useful description of typical prosodic development across the early school years. Pitch variability and pitch range were predicted to decline with development under the assumption that these measures are tied to speech motor skills in extended speech samples. If the measures are instead more closely tied to the semantic–pragmatic function of intonation, they will likely increase with development time as children become increasingly sophisticated in their marking of semantics and discourse structure in their narratives.

The core study predictions of a developmental increase in global rhythm measures and a developmental decrease in global intonation measures were tested in narratives elicited from a cross-sectional sample of children across a 3-year period. In other words, the study used a cross-sequential design. This design provides the independent advantages of cross-sectional and longitudinal designs (see Galbraith et al., 2017) while minimizing the drawbacks of both. Stritch (2017, p. 222) notes that “researchers cannot use static data to directly test dynamic theories” because cohort effects may mask developmental ones or because they may be inappropriately interpreted as developmental effects. A longitudinal design is the most appropriate design for describing developmental trajectories. However, longitudinal studies are time consuming. The cross-sequential design allows for the assessment of development across a larger range of ages in a shorter amount of time, with the advantages of increased sample size and a reduction in the dropout rate typically associated with more traditional longitudinal designs (see also Verbeke & Lesaffre, 1999). Moreover, the results are comparable to those obtained from a longitudinal study design (see, e.g., the study of Duncan et al., 1996). Here, the cross-sequential design allows us to describe the development of prosody across most of the primary school years; specifically, we provide data that capture prosody production in narrative speech from 5 to 11 years of age.

Method

Speech Material

The speech samples analyzed in this study were drawn from the Eugene Children's Story Corpus (ECSC; Kallay & Redford, 2021), which is publicly available through the CHILDES database (MacWhinney, 2000). The corpus includes 367 audio recordings and transcriptions of structured spontaneous narratives elicited from a total of 188 typically developing, English-speaking school-age children and 26 adults (i.e., accompanying caregivers). Narratives were elicited using the well-known frog story picture books (Mayer, 1967, 1969, 1973; Mayer & Mayer, 1975). The narrative storytelling sessions began with the experimenter presenting the four books to the child and a caregiver. Each then selected a book to narrate. A small number of the children (N = 4) chose the same book across all three study years. A far larger number (N = 20) chose a different book to narrate in each of the study years. The majority (N = 36) happened to choose the same book in two of the 3 years. The experimenter assisted the child in conceptualizing a coherent narrative around the pictures in their chosen book by asking questions about predetermined events on the pages. Each participant then told their story twice to each other in the presence of the experimenter, alternating between child and caregiver as storyteller or vice versa. The tellings themselves were free from prompts. The second telling was different than the first, but the telling itself benefited from practice (e.g., pause durations were shorter; Redford, 2013). It is the second telling of each story that has been made publicly available, and so it is also the second telling upon which the current description of prosody is based.

Participants

The ECSC speakers were all recruited from the Eugene–Springfield area in Oregon, Unites States, via word of mouth. A subset of 71 children (32 boys) was followed longitudinally over a 3-year period. Those children returned to the laboratory at 12-month intervals within 2 weeks of the date of their previous visit. The study was approved by the institutional review board at the University of Oregon, and informed consent was obtained from all speakers prior to participation. This study is based on the 60 youngest children (25 boys, 35 girls) from the subset of children in the longitudinal study who were in kindergarten through second grade in the first study year. The sample of 60 children allowed for an even split of the data into three nonoverlapping starting age cohorts: youngest (Y), in-between (B), and oldest (O). The children in the Y cohort (12 boys, eight girls) ranged in age from 5;2 to 6;3 (years;months) in the first study year (M = 5;8, SD = 4 months); those in the B cohort (six boys, 14 girls) ranged in age from 6;4 to 7;3 in the first study year (M = 6;9, SD = 3 months); and those in the O cohort (seven boys, 13 girls) ranged in age from 7;3 to 8;1 in the first study year (M = 7;7, SD = 3 months). There were two children in the sample who were aged 7;3 in the first study year; one was randomly assigned to the B cohort and one to the O cohort for the analyses.

The majority of the 60 children were identified by caregivers as White only (N = 45). The remaining children were identified as Asian (N = 2), Pacific Islander (N = 2), Hispanic (N = 2), or mixed (N = 9). Typical language development was reported by all caregivers and was confirmed using the Peabody Picture Vocabulary Test–Fourth Edition (PPVT-4; Dunn & Dunn, 2007) and two subtests from the Clinical Evaluation of Language Fundamentals (Semel et al., 2003): Recalling Sentences (RS; language production) and Sentence Structure (SS: language comprehension). All of the children's scores on these tests were no more than 1 SD from the mean standard score (PPVT: M = 118.83, SD = 11.45; RS: M = 11.83, SD = 2.72; SS: M = 11.91, SD = 2.19).

Measurement

Rhythm Measures

The 180 audio recordings from the 60 children in the longitudinal study were suitable for calculating articulation rate and for prosodic transcription. The stories were first segmented into pause-delimited utterances following the criteria laid out in the study of Redford (2013) and orthographically transcribed using Praat software (Boersma & Weenink, 2018). Articulation rate was defined as the number of syllables per second in speech without pauses. This measure was extracted with a Praat script for each narrative based on the pause-delimited transcriptions. Specifically, it was calculated as the total number of syllables produced divided by the total speaking time in seconds, minus pauses.

The other rhythm measures were based on perceptual judgments of prominences and boundaries. Judgments were made according to the RaP labeling system (Dilley & Brown, 2005; Breen et al., 2012), which has been validated for judgments of child speech (Dilley et al., 2013). These judgments were made in the context of prosodic coding, which was based on 30-s samples from each narrative. The samples were taken from roughly the middle of each narrative specifically to exclude the stereotyped prosody associated with story beginning and endings. The samples were also chosen so that they would be representative of the child's fluent speech and free from nonspeech interruptions (e.g., coughs) or from interruptions by an interlocuter (i.e., parent or experimenter). The onsets of the samples were chosen to correspond with a strong phrase boundary, and the offsets were selected to align with the nearest sentence boundary to the 30-s point. The choice to use shorter samples of the full narrative recordings was motivated by the time-consuming nature of hand-coding prosodic features.

A team of trained analysts marked off all syllables in the samples and then used the RaP system (Dilley & Brown, 2005) to label the locations and types of perceptual prominences and prosodic phrase boundaries in the speech samples. The team of analysts was composed of undergraduate and graduate students from the Department of Communicative Sciences and Disorders, Michigan State University (MSU), who received in-lab training in RaP after participating in a seminar on prosody and prosodic transcription. Prominences were marked as strong or weak or absent for each syllable. Phrase boundaries were also marked as strong or weak or absent. In linguistic theory, strong and weak prominences and strong and weak phrase boundaries are often thought to reflect different types of prosodic units (see, e.g., Ladd, 2008). For example, a strong prominence marks cumulative prominence due to phrase- and lexical-level stresses; a strong boundary marks an intonational phrase (IP), and a weak boundary marks an intermediate phrase (ip). In the present analyses, any syllable marked as prominent was treated as a strong syllable; similarly, we made no distinction between weak and strong boundary judgments. In the case of prominences, collapsing judgments into a single “strong” category aligned with our interest in speech rhythm, which is assumed to arise from the alternation of stressed and unstressed syllables at the level of the metrical foot not at the level of the IP. In the case of boundaries, the strong and weak categories were collapsed because there is little agreement on what type of unit the ip might be or on how to define it other than with reference to boundary strength. Additionally, some have suggested that the distinction between strong and weak phrase boundaries is not relevant from a psycholinguistic perspective (Choe & Redford, 2012; Watson & Gibson, 2004).

The kappa (κ) metric was used to assess reliability across all prosodic transcriptions. This metric controls for chance agreement between analysts (McHugh, 2012). There was moderate to substantial interrater agreement across the different prominence types (K = .60) and boundary types (κ = .62). As an additional check, acoustic measures of duration were taken on a subsample of the data. These measures confirmed a clear acoustic contrast between syllables that were judged prominent versus those that were not prominent, F(1, 625) = 529.5, p < .001; M = 345 ms for prominent syllables versus M = 199 ms for not prominent syllables. This contrast that is consistent with the reported correlation between duration and perceived prominence in stress-timed languages like English (Kochanski et al., 2005; Sluijter et al., 1997). Third-party inspection of the acoustics associated with labeled boundary data was also conducted. This inspection indicated that perceived strong boundaries (nearly) always preceded a labeled silent pause. Strong boundaries represented 80% of the total boundaries coded. The additional acoustically based checks on coding were undertaken at the University of Oregon.

Once the samples had been prosodically labeled, the dependent measures of prominence spacing, mean phrase length, mean phrase duration, and phrase length variability were calculated as follows: Prominence spacing was the total number of syllables across the entire speech sample divided by the total number of perceived prominences; mean phrase length was the average number of syllables between IP boundaries; mean phrase duration was the average duration in seconds between boundaries, exclusive of pauses; and variability in phrase length was the standard deviation of the phrase lengths in syllables within the sample.

Intonation Measures

Due to background noise in some recordings or tangential discussions between experimenter and child or between caregiver and child, a reduced sample of materials was used for calculating mean f o variation and range from across entire narratives (see below). In order to maintain the benefits of a cross-sequential design, we included only those children with recordings that were of consistently high quality across all three study years. This strict inclusion criteria resulted in a sample of 108 narratives from 36 children (16 boys, 20 girls) who ranged in age from 5;2 to 7;9 in the first study year. As in the larger sample, the majority of these children were also identified by caregivers as being White only (N = 25). The children were evenly split into cohorts of 12, with children in the Y cohort (six boys, six girls) ranging in age from 5;2 to 6;2 in the first study year (M = 5;8, SD = 4 months), children in the B cohort (four boys, eight girls) ranging in age from 6;2 to 6;11 in the first study year (M = 6;7, SD = 3 months), and children in the O cohort (six boys, six girls) ranging in age from 7;1 to 7;9 in the first study year (M = 7;5, SD = 2 months). Two of the children were aged 6;2 in the first study year; one was randomly assigned to the Y cohort and the other to the B cohort.

f o values were extracted automatically over 10-ms intervals across the sonorant-only portions of the utterances in each narrative to avoid spurious f o values associated with changes in voicing and with frication. Sonorant intervals were identified within the utterances using the vuv (voiced/unvoiced) textgrid function in Praat (Boersma & Weenink, 2018). To further avoid spurious values in the automatic extraction of f o, outliers were removed according to the 1.5 IQR rule prior to calculating each of the intonation measures. Once this was done, intonational variability was calculated in two ways: as the standard deviation of f o within narratives and as the coefficient of variation of f o within narratives. The overall standard deviation of f o has been widely used in previous studies to investigate differences in intonation variability between typical and atypical populations (e.g., Diehl et al., 2009; Diehl & Paul, 2012). The coefficient of variability (= standard deviation of f o divided by mean f o) was calculated to control for a developmental- or cohort-based effect of mean f o on the standard deviation in f o. The f o range was also calculated for each narrative. This measure was simply the minimum f o value in the narrative subtracted from the maximum f o value in the same narrative.

Statistical Analyses

Cross-Sequential Analyses

Linear mixed-effects (LME) models were constructed for each measure of rhythm and intonation discussed previously to test for cross-sectional and longitudinal effects that would indicate developmental change. The cross-sectional effect was age cohort, a between-subjects factor with three levels (Y, B, and O). The longitudinal effect was study year, a within-subject factor with three levels (YR1, YR2, and YR3). Models were built using the lme4 package in R software (Bates et al., 2015; R Core Team, 2021). Full models included both age cohort and study year as fixed main effects and speaker as a random effect. For each fixed effect, null models were constructed by removing the effect of interest from the full model. The full model was then compared to the null model with analysis of variance to assess significance of the removed fixed effects (see Brown, 2021; Winter, 2013). The χ2-statistic and p value are reported for all significant differences between the models. Pairwise comparisons were also performed within age cohort and study year using the same LME models with Bonferroni corrections.

Additional Analyses

In addition to the cross-sectional analyses, simple linear regression models were built to assess the continuous effect of age-in-months on the rhythm and intonation measures. Separate models were constructed for each measure within each study year; age-in-months was the lone predictor variable. The t-statistic and p value are reported for all significant effects. LME models were not used because of the strong relationship between age-in-months and speaker in this study, H(59) = 179, p < .001; that is, the fixed effect of age-in-months and the random effect of speaker were virtually indistinguishable.

Results

The mean and standard deviation values for each prosodic measure are presented by age cohort and study year in Table 1. The patterns evident in the summary data are consistent with the prediction that global rhythm measures would increase with development. The patterns do not clearly support the predicted developmental decrease in global intonation measures. The detailed cross-sequential analyses confirm this general conclusion: articulation rate, prominence spacing, mean phrase length, and variability in phrase length all increase with age; pitch variability and pitch range do not. The detailed results are presented next.

Table 1.

Mean (SD) values for each of the global rhythm and intonation measures by age cohort (the cross-sectional factor) and study year (the longitudinal factor).

Rhythm Y
B
O
Year 1 Year 2 Year 3 Year 1 Year 2 Year 3 Year 1 Year 2 Year 3
Articulation rate (syll/s) 3.04 (0.45) 3.37 (0.43) 3.41 (0.41) 3.30 (0.40) 3.82 (0.45) 3.96 (0.57) 3.37 (0.47) 3.37 (0.52) 3.97 (0.58)
Prominence spacing (syll) 1.72 (0.23) 1.83 (0.19) 1.83 (0.24) 1.70 (0.16) 1.87 (0.22) 1.89 (0.17) 1.81 (0.23) 1.80 (0.22) 1.85 (0.21)
Mean phrase length (syll) 3.29 (0.92) 3.79 (1.08) 3.66 (0.89) 3.37 (0.78) 4.00 (0.97) 4.35 (1.30) 3.89 (1.14) 4.09 (1.74) 4.31 (1.51)
Mean phrase duration (s) 1.08 (0.23) 1.13 (0.32) 1.07 (0.24) 1.02 (0.22) 1.06 (0.26) 1.09 (0.29) 1.15 (0.26) 1.21 (0.45) 1.08 (0.28)
Phrase length variability (syll)
2.23 (0.90)
2.30 (0.77)
2.29 (0.65)
2.40 (0.73)
2.62 (0.63)
2.76 (0.96)
2.56 (0.69)
2.76 (0.93)
3.01 (1.25)
Y
B
O
Intonation
Year 1
Year 2
Year 3
Year 1
Year 2
Year 3
Year 1
Year 2
Year 3
SD of f o (Hz) 53.03 (28.26) 58.22 (30.46) 52.13 (24.37) 54.20 (21.43) 59.93 (16.19) 59.47 (18.07) 48.16 (22.91) 60.47 (20.98) 44.16 (14.65)
CV of f o 0.18 (0.08) 0.20 (0.08) 0.18 (0.07) 0.21 (0.08) 0.23 (0.54) 0.22 (0.06) 0.18 (0.06) 0.23 (0.05) 0.18 (0.05)
Range of f o (Hz) 500 (88.33) 571.26 (21.10) 538.36 (79.97) 503.96 (107.12) 568.88 (27.74) 540.38 (78.90) 482.32 (135.58) 561.47 (53.86) 463.58 (153.13)

Note. Measurement units indicated in parentheses: syll = syllables; s = seconds; SD = standard deviation; CV = coefficient of variation; Hz = Hertz, f o = fundamental frequency. Y = youngest; B = in-between; O = oldest.

Rhythm

Articulation Rate

Results of the full model indicated significant main effects of age cohort, χ2 = 14.51, p < .001, and study year, χ2 = 45.32, p < .001, on mean articulation rate. When the data were split by study year, pairwise comparisons indicated significant differences between each year across age-based cohorts of children: Year 1 versus Year 2, χ2 = 15.71, p < .001; Year 1 versus Year 3, χ2 = 41.41, p < .001; Year 2 versus Year 3, χ2 = 9.18, p < .01, reflecting systematic increases in articulation rates across developmental time. Pairwise comparisons indicated that the longitudinal effect was significant within each of the three age cohorts (Y cohort: χ2 = 9.78, p < .01; B cohort: χ2 = 21.36, p < .001; and O cohort: χ2 = 15.33, p < .001). Figure 1 shows these results.

Figure 1.

Figure 1.

Longitudinal effect of study year on articulation rate is shown by the cross-sectional effect of age cohort (Y = youngest, B = in-between, and O = oldest).

Cohort effects indicate that the different starting ages of children had an effect on articulation rate—an effect that was independent of (though presumably compounded by) longitudinal development. Recall that children in the Y cohort were 5 years old in the first year of study, those in the B cohort were 6 years old, and those in the O cohort were 7 years old. The effect of age cohort indicates that this cross-sectional age group difference mattered. Over the three study years, the Y cohort produced narratives with a slower articulation rate than the B cohort, χ2 = 14.74, p < .001; their articulation rate in narrative speech was also slowed down overall all than that produced by the O cohort, χ2 = 7.15, p < .01. Related to this cross-sectional effect on articulation rate, there was a significant linear increase in articulation rate by children's age within a year. The continuous between-subjects effect of age-in-months was significant within Year 1, t(58) = 2.40, p = .02, and Year 3, t(58) = 3.51, p < .001, but not within Year 2 of the study. These results are shown in Figure 2, where the most noticeable increase in articulation rate occurred in children who were between the ages of 86 and 121 months in Year 3 of the study.

Figure 2.

Figure 2.

The continuous between-subjects effect of age-in-months on articulation rate is shown by study year.

Prominence Spacing

Results of the mixed-effects model indicated a significant main effect of study year on prominence spacing, χ2 = 13.81, p < .01, but there was no overall effect of age cohort. The longitudinal effect was due to a significant difference between Year 1 and Year 2 values, χ2 = 7.89, p = .01, and between Year 1 and Year 3 values, χ2 = 11.59, p < .001. The difference between Year 2 and Year 3 values was not significant. When the data were split by age cohort, analyses indicated that the effect of study year only reached significance in the B cohort, χ2 = 16.00, p < .001. Still, the same longitudinal trend is evident in the narrative produced by the Y cohort. Figure 3 shows the longitudinal increase in prominence spacing by the cross-sectional age groups.

Figure 3.

Figure 3.

The longitudinal effect of study year on prominence spacing is shown by the cross-sectional effect of age cohort (Y = youngest, B = in-between, and O = oldest).

Consistent with the absence of a cross-sectional effect of age group on the development of prominence spacing, the continuous between-subjects effect of age-in-months was not significant either. Figure 4 nonetheless shows the hint of a cross-sectional pattern in Year 1 when children ranged in age from 5 years old (62 months) to 8 years old (97 months).

Figure 4.

Figure 4.

The continuous between-subjects effect of age-in-months on prominence spacing is shown by study year.

Phrasing

Results of the mixed-effects model indicated a significant overall main effect of study year on phrase length, χ2 = 13.10, p < .001. Again, there was no effect of age cohort. When the analyses were split by the cross-sectional factor, the longitudinal effect only reached significance for B cohort, χ2 = 12.58, p < .001. Though not significant, mean phrase length also increased by study year within the Y and O cohorts (see Figure 5). As with prominence spacing, individual variability was such that there were no significant linear effects of age-in-months on mean phrase length within study year (see Figure 6).

Figure 5.

Figure 5.

The longitudinal effect of study year on phrase length is shown by the cross-sectional effect of age cohort (Y = youngest, B = in-between, and O = oldest).

Figure 6.

Figure 6.

The continuous between-subjects effect of age-in-months on mean phrase length is shown by study year.

Unlike for phrase length, there were no main effects of children's age on phrase duration. Phrases were roughly 1 s in duration (M = 1.1 s, SD = 290 ms) across longitudinal time and cross-sectional age groups. The minimum average phrase duration was 617 ms; the maximum duration was 2.69 s.

In contrast to the results on phrase length and phrase duration, the mixed-effects model on variability in phrase length indicated a significant main effect of age cohort, χ2 = 8.15, p = .02, but no overall effect of study year. Pairwise comparisons between the different cross-sectional age groups indicated that phrase length variability was higher in the B cohort than in the Y cohort, F(1, 116) = 5.02, p = .03; it was also higher in the O cohort than in the Y cohort, F(1, 116) = 9.39, p < .01. The difference between the B and O cohorts was not statistically significant. The results are shown in Figure 7.

Figure 7.

Figure 7.

The longitudinal effect of study year on variability in phrase length is shown by age cohort (Y = youngest, B = in-between, and O = oldest).

The cross-sectional effect of age was confirmed in the analysis by age-in-months. There was a significant linear increases in variability with age in Year 1, t(58) = 2.11, p = .039, and Year 3, t(58) = 2.02, p = .048. The effect did not reach significance in Year 2. Figure 8 shows phrase length variability by the continuous between-subjects age variable for each study year.

Figure 8.

Figure 8.

The continuous between-subjects effect of age-in-months on variability in phrase length is shown by study year.

Intonation

Pitch Variability

Results of the mixed-effects model indicated no main effects of age cohort or study year on pitch variability, as measured by the standard deviation of f o values for each narrative. Figures 9 and 10 show the distribution of the data by the fixed factors. Figure 10 shows a large amount of individual variability in pitch variability across the 60 narratives within each study year. As is evident from that figure, there was no significant effect of age-in-months on pitch variability either.

Figure 9.

Figure 9.

The longitudinal effect of study year on pitch variability, measured as the standard deviation in f o values, is shown by the cross-section effect of age cohort (Y = youngest, B = in-between, and O = oldest).

Figure 10.

Figure 10.

The continuous between-subjects effect of age-in-months on pitch variability, measured as the standard deviation of f o, is shown by study year.

Similar results were obtained when the measure of pitch variability, controlled for mean f o (= CV of f o), despite a significant cross-sectional effect of age cohort on this measure, χ2 = 6.37, p = .04; the effect of study year was not significant. The significant cross-sectional effect was due to higher pitch variability in narrative elicited from children in the B cohort compared to those elicited from children in either the Y or O cohorts. Given that children in the B cohort were older than those in the Y cohort and younger than those in the O cohort, the difference is not what one would expect if the results were age related. The analysis by age-in-months confirms the absence of a developmental effect. The distribution of the CV of f o measure is shown by the fixed factors in Figure 11 and by age-in-months in Figure 12.

Figure 11.

Figure 11.

The longitudinal effect of study year on pitch variability, measures as the coefficient of variation in f o values, is shown by the cross-sectional effect of age cohort (Y = youngest, B = in-between, and O = oldest). CV = coefficient of variation.

Figure 12.

Figure 12.

The continuous between-subjects effect of age-in-months on pitch variability, measured as the coefficient of variation of f o, is shown by study year. CV = coefficient of variation.

Pitch Range

There were also no significant main effects of age cohort or study year on pitch range. The (null) results are shown in Figure 13.

Figure 13.

Figure 13.

The longitudinal effect of study year on pitch range, measured as the average difference between max and min f o, is shown by the cross-sectional effect of age cohort (Y = youngest, B = in-between, and O = oldest).

As expected from the cross-sequential analyses, the effect of age-in-months on pitch range was not significant. However, unlike with the measures of pitch variability, the null effect seems to stem from a general lack of individual variability rather than to excessive variability (see Figure 14).

Figure 14.

Figure 14.

The continuous between-subjects effect of age-in-months on pitch range, measured as the difference between max and min f o, is shown by study year.

Discussion

The aim of this study was to provide a clinically useful characterization of the typical age-related changes in speech prosody that occur across the early school years. Our approach was to calculate global measures of rhythm and intonation based on narrative speech samples elicited from a cross-sectional sample of children, who were followed longitudinally for 3 years. The focus on narrative speech allowed us to capture aspects of prosody that have been previously studied, such as the development of temporal patterning, as well as those aspects that are poorly studied, such as the development of rhythmic phrasing. In addition, the narrative speech task was expected to elicit language that was sufficiently complex to make manifest developmental patterns that emerge with the maturation of speech motor skills and the development of higher level language skills. The study predictions were that (a) measures of temporal patterning and prominence spacing (i.e., “stress-timing”) would increase with age due to the development of speech motor skills, (b) measures of pitch variability and pitch range (i.e., global intonation) would decrease with age for the same reason, and (c) measures of rhythmic phrasing would increase with age due to the development of higher level language skills. The cross-sequential analyses confirmed the predictions that temporal patterning and prominence spacing increase with age and that prosodic phrases become longer and more variable. In contrast to the developmental changes in global rhythm, pitch variability and pitch range showed no systematic effects of development. In the following sections, we discuss these results in the context of developing speech motor and language skills—beginning with the absence of developmental effects on the global measures of intonation.

The Meaning of Pitch Variability and Pitch Range

The global measures of intonation used in this study were pitch variability and pitch range. These measures were chosen for their clinical value: They are easy to calculate, especially when compared to a phonological analysis of intonation, and they have been used extensively in previous research on prosody production in clinical populations, especially not only in studies of children with ASD (e.g., Bone et al., 2014; Bonneh et al., 2011; Fosnot & Jun, 1999) but also in studies of children with stuttering (Fosnot & Jun, 1999), in children who have received cochlear implants (Nakata et al., 2012), and in adults with schizotypal personality disorder (Dickey et al., 2012). The prediction was that pitch variability and pitch range would decrease with developmental time. This prediction was based on a finding from a large-scale acoustic study on typical development of articulatory timing skills, which showed that pitch variability decreases dramatically from 5 to 8 years of age (Lee et al., 1999). It was also based on a particular interpretation of the well-established finding that higher-than-average pitch variability and range indexes atypical prosody in children with ASD; namely, that this finding is due to prosodic delay rather than disorder (see, e.g., the study of Diehl et al., 2009, p. 398).

Contrary to the prediction, we found no systematic changes in pitch variability or pitch range across study year, age cohort, or by age-in-months within study year. Instead, we found extensive individual differences in pitch variability (see, e.g., Figure 10), with measures ranging from a low of 11-Hz differences across the sonorant portions of the narrative to a high of 66 Hz. We interpret these finding to indicate that the f o-based measures of pitch variability and pitch range are insufficiently sensitive to age-related changes in intonation during the school years. This interpretation is motivated by our firm expectation that all aspects of speech prosody continue to develop through the school-age years with the maturation of speech motor skills and the development of language. This expectation is motivated by the findings of Lee et al. (1999) that indicate the slow maturation of motor skills relevant for f o manipulation and by other studies that document immature intonational patterns for some aspects of discourse prosody in young school-age children's speech (e.g., Chen, 2009, 2011; Romøren & Chen, 2014). The latter study findings are also consistent with the observation that discourse skills themselves continue to develop into adolescence (Nippold, 2016). Relatedly, Zampini et al. (2016) found that while children with Down syndrome produce largely typical phrase-level intonation patterns, they struggle specifically with production of interrogative contours relative to their typically developing peers. Children with Down syndrome also produce shorter and less syntactically complex phrases than their typically developing peers (ibid), which again suggests that influence of language development on intonation production.

Since the specific data presented herein are meant to provide developmental norms against which other data can be compared, it is worth pointing out that pitch variability values in this study (i.e., SD of f o) were substantially lower on average than pitch variability values reported in the study of Diehl et al. (2009) on narrative speech and substantially higher on average than the values reported for typically developing 5-year-olds in the study of Lee et al. (1999). Our mean, SD f o value was 37 Hz; that in the study of Diehl et al. was approximately 48 Hz in narratives elicited from children aged 6–14 years (42 Hz for TD boys and 53 Hz for TD girls; see Tables 2–3, p. 392–93); Lee et al. report a mean of 23 Hz for 5-year-old speech (see Figure 3, p. 1460). Although it is possible that the participants in the study of Diehl et al. were more animated in their storytelling than our participants, we suspect that the difference in values reflects a difference in the way in which the f o measurements were made. Diehl et al. extracted average f o across 250-ms intervals of their narratives, yielding four measures/s. Pitch variability was the standard deviation of these measures. Since the methods say nothing about pitch tracking correction or a focus on sonorant-only intervals, it is possible that their larger values reflect the inclusion of spurious f o values due to voicing changes at obstruent–sonorant edges or simple mistracking. As spurious values are likely to be randomly distributed across groups, their inclusion does not impact their study results. Still, we are suggesting that the values we report here may more accurately reflect the perceptual experience of pitch variability. Either way, this study's results can be used as yet another baseline against which to compare the higher pitch variability and pitch range values associated with ASD and prosodic disorder in the literature.

Turning now to the difference between the values reported here and those reported in the study of Lee et al. (1999), we expect that our larger SD f o values relative to that study are meaningfully related to task differences. Lee et al. extracted their measurements from words uttered in isolation; we extracted our measurements from across an entire narrative. It is intuitively plausible that pitch variability will be higher in narratives compared to isolated words because speakers deploy the full prosodic system during narrative language. By contrast, isolated words that are monosyllabic only provide information on the extent of f o change during the pitch accenting that might accompany an in-focus production of said monosyllabic word. It is also likely that task differences and sample sizes account for why we find no developmental change in pitch variability, whereas Lee et al. report a significant change. Note that the change from 5-year-old SD f o values to 8-year-old SD f o values in the study of Lee et al. is of about 8 Hz. By contrast, our range in SD f o values across ages was on the order of 55 Hz. It is easy to see how this level of individual difference in pitch variability associated with the full prosodic spectrum of storytelling might swamp any evidence of a developmental decrease in pitch variability due to the development of motor control over pitch- and age-related changes in vocal fold physiology. Future normative work on pitch variability in the context of language production (as opposed to single word production) will need to be based on much larger sample sizes if the goal is to detect differences due to changes in motor control and vocal tract physiology.

Age-Related Change in Global Rhythm

Turning now to the results on rhythm, our findings indicate gradual age-related changes in articulation rate, prominence spacing, and phrase length across study years and a gradual age-related change in phrase length variability. These results stand in contrast to the more dramatic nonlinear changes that are frequently reported in the developmental speech-language literature (e.g., articulatory timing: Lee et al., 1999; phonological development: Burchinal & Appelbaum, 1991; vocabulary growth: Ganger & Brent, 2004). We suspect that the more gradual changes in speech rhythm across the early school-age years compared to other types of speech-language change reflect the many different influences on speech rhythm under the assumption that these influences hold to different developmental schedules that vary across children.

Rhythm emerges at the intersection of speech motor and linguistic systems, which is to say that the effects of both are evident in its structure (see “Background” section to this study). Despite the known interaction between the speech motor and linguistic systems, it is reasonable to treat them as quasi-independent systems, with the former intimately tied to perceptual motor processes and the latter to domain-specific and domain-general cognitive processes. Similarly, subsystems within the linguistic system may also be quasi-independent: Some are well described as domain-specific; others clearly depend on domain-general processes. Consider the case of syntax versus pragmatics: Both are bound to semantics, but syntax is much more domain specific than pragmatics. In particular, syntax is the learned, language-specific sequencing of meaningful elements—the sequencing patterns may have their origins in lexical semantics—but once fixed by historical forces, the particular language-specific sequencing patterns are arbitrary from a synchronic perspective. In contrast, despite language-specific (i.e., formal) elements to pragmatics, it is clear from both typical and atypical development that pragmatic learning relies very heavily on the ability to track conceptual relations at a high level and on social cognition. Traces of all of these quasi-independent systems are seen in the production of speech rhythm. We speculate that differences in the rate of development across these systems accounts for the extensive individual differences, gradual developmental change, and conflicting results obtained in the analyses of global rhythm measures.

Consider, for example, the results on phrasing. At first glance, the systematic longitudinal increases that were observed for phrase length (measured in syllables) would seem to be at odds with the finding that phrase duration (measured in milliseconds) remains stable across developmental time, especially when one considers that phrase length and phrase duration were strongly correlated in the present data, r(179) = .853, p < .001. However, in fact, the conflicting results are consistent with the combined influence of language and speech factors on the emergence of rhythm.

In particular, developmental changes in prosodic phrase length must be at least partially due to the development of more complex language. For instance, a richer vocabulary and the ability to deploy this vocabulary to render more complex thoughts will likely result in longer syntactic phrases on average (e.g., the more frequent use of adjectives and adverbs to modify a noun or verb). Given the relationship between strong prosodic boundaries and strong syntactic boundaries (Shattuck-Hufnagel & Turk, 1996), an increase om clause-internal syntactic phrase lengths will result in longer prosodic phrases. Support for this hypothesis comes from the correlation between mean clause length and pause-delimited utterance length in narrative speech (e.g., Kallay & Redford, 2021).

Of course, the length of a prosodic phrase in syllables should correlate perfectly with its length in duration. However, this is only true if articulation rate is held constant. If instead articulation rate increases with age, then the relationship between phrase length and phrase duration will be weaker. In fact, it is an increase in articulation rate that accounts for the divergent results on phrase length and phrase duration in this study: Phrase length and articulation rate explain 96% of the variance in phrase duration, F(2, 177) = 2321, p < .001, R 2 = .963. The combined effect of phrase length and articulation rate on phrase duration is also consistent with the positive relationship between length and rate that has been reported elsewhere (Mahr et al., 2021). In so far as phrase length indexes language development and mean articulation rate indexes speech motor skill in children's speech, the absence of an effect of development on phrase duration despite the presence of an effect of development on phrase length exemplifies the combined influence of motor and language factors on the emergence of speech rhythm. Again, we also assume that these factors may develop at different rates and that there is extensive individual difference in the relative rate of development of these factors.

The interaction between language and speech motor factors on rhythm is noted elsewhere in the literature with reference to the acquisition of different languages. For example, Payne et al. (2011) argued that the delayed acquisition of a stress-timed language, like English, relative to syllable-timed languages, like Spanish and Catalan, has to do with the different phonotactics of the languages. Following Dauer (1983), they adopted the view that a stress-timed rhythm emerges in languages that allow both complex syllable structures and unstressed syllable reduction and that a syllable-timed rhythm emerges in languages that have simpler syllable structures and full vowels. In so far as children with immature speech motor skills are more likely to accurately produce simple syllables than complex syllables, they are also more likely to produce adultlike rhythm earlier in syllable-timed languages, like Spanish, than in stress-timed languages, like English. Relatedly, typically developing children learning Italian, a syllable-timed language, are mostly adultlike in their productions of contrastive stress as early as 3 years of age (Arciuli & Colombo, 2016); similar measures applied to children learning English suggest a much more protracted developmental course (Arciuli & Ballard, 2017). The point is that rhythm emerges in interaction with the phonological patterning of a language. This observation suggests that yet another factor to consider when evaluating the longitudinal rhythm structure a child's language, namely, the interaction with phonological development.

Longitudinal Versus Cross-Sectional Age Effects

Although longitudinal studies have a privileged status in the developmental literature, it is not entirely clear that all longitudinal data should be treated as developmental data. Children may simply get better at certain assessment or experimental tasks with repeated exposure to the task itself. This type of learning is interesting and important, but it does not necessarily provide the developmental insights we seek when studying age-related speech and language changes. Relatedly, it is common knowledge that cross-sectional effects can emerge from uncontrolled (i.e., nondevelopmental) differences in population samples. These differences are known as cohort effects. Of course, the knowledge that cohort effects are always possible in a cross-sectional study rarely impacts the interpretation of cross-sectional age effects. Instead, cross-sectional effects are nearly always interpreted as developmental effects when an interest in development motivates the study. The validity of a developmental interpretation of cross-sectional age effects on speech-language variables is strengthened with replication and with longitudinal data. An important benefit of the cross-sequential design used in this study is that it allows for both of these. Age cohort effects were subject to quasireplication by examining the within-year effect of age-in-months on the variables of interest. A developmental interpretation of one or both of these cross-sectional age effects was further strengthened by the longitudinal aspect of the study. The strongest evidence for development is provided when the cross-sectional (between-subjects) effects and the longitudinal (within-subject) effects align within the data set. With this in mind, we make some additional observations about these study findings.

This study's results provide very strong evidence for developmental changes in the articulation rate of narrative speech across the early school-age years, with the largest increases occurring between the ages of 7 and 9 years (cf. Mahr et al. [2021] who report faster rates of change between the ages of 2 and 5 years and more gradual change thereafter until 9 years of age). The results provide weaker evidence for significant developmental change in prominence spacing. In particular, the longitudinal effect was significant overall and within the in-between age cohort (children who were 6 years old on average at the start of the study), but the between-subjects effects were not significant despite some evidence of age-related differences in the Year 1 data (i.e., between 62 and 97 months in age). A similar pattern of results was observed for phrase length. The longitudinal effect was significant, but the cross-sectional age effect was not and individual variability within each study year was substantial enough to obscure an effect of age-in-months in the Year 1 study data, despite some suggestion in the data of an age-related increase in phrase length. The combined results were stronger for the measure of variability in phrase lengths. Although the overall analyses indicated only an effect of age cohort, the within-age cohort analyses indicated longitudinal increases in variability in the older cohorts. The effect was strongest in children who were 7 years old on average at the start of the 3-year study. In addition, there were linear increases in the variability of phrase lengths within a narrative across age-in-months within the Year 1 and Year 3 data; that is, variability in the timing of prosodic boundaries increased between the ages of 5;2 and 8;1 and then again between the ages of 7;2 and 10;1.

In the context of the preceding discussion regarding the multifactorial nature of speech rhythm, we take the stronger developmental effects on articulation rate and the internally replicated effect of age-related change on variability in phrase length to suggest that these measures may index a narrower range of specific influences on rhythm production than the other measures. For example, the across-the-board effects on articulation rate are consistent with previous findings, many of which argue that this measure provides an index of speech motor skill (Lee et al., 1999; Redford, 2014). Accordingly, we interpret this finding to indicate that articulation rate represents a reasonably pure index of the influence of speech motor development on rhythm production, which is to say that it does not index the influence of language development on rhythm production to the same extent. By contrast, we interpret the significant, but weaker evidence for developmental change in prominence spacing to suggest that this measure indexes both speech motor and language influences on rhythm production in a more equal fashion: A child's ability to reduce weak syllables to the same extent as the adult relies on the development of speech motor skills (Redford, 2014, 2018), but the number of weak syllables that are reduced in a sample also depends on language complexity. In particular, the production of more disyllabic words and more prepositional phrases in a sample will result in more opportunities for weak syllable reduction in that sample. A greater number of weak syllables in the sample will result in a developmental increase in prominence spacing, assuming that the child is able to appropriately reduce said syllables.

By a similar logic, the stronger evidence for a developmental change in phrase length variability compared to phrase length itself suggests that variability in phrase length may be the purer index of language development—again, under the assumption that strong prosodic boundaries are usually aligned with strong syntactic boundaries (i.e., clause boundaries) in fluent spontaneous speech. It was suggested that the variable timing of phrase boundaries will increase with the greater diversity of expression that follows from the development of language skills. This suggestion is really a hypothesis. The assumed relationship between phrasing and diversity of expression is understudied. For that matter, given the clinical value of global prosodic measures, the underlying reasons for developmental changes in prosodically related variability of any kind merits further research.

Future Research

Regarding the need to better understand the underlying factors that contribute to developmental changes in measures of prosodic variability, we make the more general observation that global measures of speech prosody provide incomplete information about prosodic development. This is because the measures separate rhythm and intonation from the linguistic and semantic–pragmatic functions they serve. We interpreted the null results on pitch variability and pitch range as an instance of this limitation, but it is also relevant to the discussion of rhythm. For example, we implied that prominence spacing may be more dense in younger children's speech than in older children's speech not simply because younger children are unable to reduce weak syllables to the same extent as older children but also because younger children favor monosyllabic words and simpler syntactic constructions, including, for example, fewer prepositional phrases. This possibility points to the importance of future research that seeks to better understand exactly what global measures of prosody are actually measuring. The clinical implication is also clear: A thorough prosodic assessment must also include an analysis of the language that is being produced. Absent further research into the underlying reasons for developmental increases or decrease to global rhythm and intonation measures, we will lack insight into the origin of deviant rhythm or intonation in development. Relatedly, prosodic profiles are most accurately drawn on the basis of extended speech samples. Although single-word elicitations can provide insight into a child's access to language-specific metrical structures (i.e., the prosodic grammar), spoken language rhythms are influenced by a variety of factors that combine to give rise to the perception of typical or disordered prosody. Extended speech samples are needed to detect the combined influence of these factors on speech. Thus, future research should focus on describing the linguistic and communicative context of global rate and pitch changes in extended speech samples, as well as on the distribution of prosodic boundaries in these types of samples.

Data Availability Statement

The speech samples analyzed in this study were drawn from the Eugene Children's Story Corpus (ECSC; Kallay & Redford, 2021), which is publicly available through the CHILDES database (MacWhinney, 2000). The ECSC includes both the audio files and pause-delimited orthographic transcriptions of the stories. The Rhythm and Pitch (RaP) transcriptions of the stories and raw measurement data extracted from the audio files and from the transcriptions are available upon request made to the corresponding author.

Acknowledgments

This research was supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) under Grants R01HD061458 (PI: Redford) and R01HD087452 (Principal Investigator: Melissa A. Redford). The content is solely the authors' responsibility and does not necessarily reflect the views of NICHD. The authors are grateful to the families that participated in the longitudinal study that underpins this study and to a number of research assistants (RAs) who passed through the Speech & Language Lab at the University of Oregon (Principal Investigator: Melissa A. Redford) and the Speech Perception and Production Lab at Michigan State University (Principal Investigator: Laura Dilley). Those RAs who contributed the most to the data collection and transcriptions on which this study was based were Aubrianne Carson, Rachel Cicerrella, Faire Holliday, and Jayme Monroe in the Speech & Language Lab and Jessica Gamache in the Speech Perception & Production Lab. Jessica Fanning helped design the original storytelling task and oversaw the training and administration of speech-language assessments in the Speech & Language Lab.

Funding Statement

This research was supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) under Grants R01HD061458 (PI: Redford) and R01HD087452 (Principal Investigator: Melissa A. Redford). The content is solely the authors' responsibility and does not necessarily reflect the views of NICHD.

References

  1. Allen, G. D. , & Hawkins, S. (1980). Phonological rhythm: Definition and development. In Yeni-Komshian G. H., Kavanagh J. F., & Ferguson C. A. (Eds.), Child phonology: Volume 1: Production (pp. 227–256). Academic Press. https://doi.org/10.1016/B978-0-12-770601-6.50017-6 [Google Scholar]
  2. Amir, O. , & Grinfeld, D. (2011). Articulation rate in childhood and adolescence: Hebrew speakers. Language and Speech, 54(2), 225–240. https://doi.org/10.1177/0023830910397496 [DOI] [PubMed] [Google Scholar]
  3. Arciuli, J. , & Ballard, K. J. (2017). Still not adult-like: Lexical stress contrastivity in word productions of eight- to eleven-year-olds. Journal of Child Language, 44(5), 1274–1288. https://doi.org/10.1017/S0305000916000489 [DOI] [PubMed] [Google Scholar]
  4. Arciuli, J. , & Colombo, L. (2016). An acoustic investigation of the developmental trajectory of lexical stress contrastivity in Italian. Speech Communication, 80, 22–33. https://doi.org/10.1016/j.specom.2016.03.002 [Google Scholar]
  5. Arvaniti, A. (2012). The usefulness of metrics in the quantification of speech rhythm. Journal of Phonetics, 40(3), 351–373. https://doi.org/10.1016/j.wocn.2012.02.003 [Google Scholar]
  6. Ballard, K. J. , Djaja, D. , Arciuli, J. , James, D. G. H. , & van Doorn, J. (2012). Developmental trajectory for production of prosody: Lexical stress contrastivity in children ages 3 to 7 years and in adults. Journal of Speech, Language, and Hearing Research, 55(6), 1822–1835. https://doi.org/10.1044/1092-4388(2012/11-0257) [DOI] [PubMed] [Google Scholar]
  7. Bates, D. , Mächler, M. , Bolker, B. , & Walker, S. (2015). Fitting linear mixed-effects models Usinglme4. Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01 [Google Scholar]
  8. Boersma, P. , & Weenink, D. (2018). Praat: Doing phonetics by computer [Computer program] . https://www.fon.hum.uva.nl/praat/
  9. Bone, D. , Lee, C.-C. , Black, M. P. , Williams, M. E. , Lee, S. , Levitt, P. , & Narayanan, S. (2014). The psychologist as an interlocutor in autism spectrum disorder assessment: Insights from a study of spontaneous prosody. Journal of Speech, Language, and Hearing Research, 57(4), 1162–1177. https://doi.org/10.1044/2014_JSLHR-S-13-0062 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Bonneh, Y. S. , Levanon, Y. , Dean-Pardo, O. , Lossos, L. , & Adini, Y. (2011). Abnormal speech spectrum and increased pitch variability in young autistic children. Frontiers in Human Neuroscience, 4, 1–7. https://doi.org/10.3389/fnhum.2010.00237 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Breen, M. , Dilley, L. C. , Kraemer, J. , & Gibson, E. (2012). Inter-transcriber reliability for two systems of prosodic annotation: ToBI (Tones and Break Indices) and RaP (Rhythm and Pitch). Corpus Linguistics and Linguistic Theory, 8(2), 277–312. https://doi.org/10.1515/cllt-2012-0011 [Google Scholar]
  12. Brown, V. A. (2021). An introduction to linear mixed-effects modeling in R. Advances in Methods and Practices in Psychological Science, 4(1), 1–19. https://doi.org/10.1177/2515245920960351 [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Burchinal, M. , & Appelbaum, M. I. (1991). Estimating individual developmental functions: Methods and their assumptions. Child Development, 62(1), 23–43. https://doi.org/10.2307/1130702 [Google Scholar]
  14. Catterall, C. , Howard, S. , Stojanovik, V. , Szczerbinski, M. , & Wells, B. (2006). Investigating prosodic ability in Williams syndrome. Clinical Linguistics & Phonetics, 20(7–8), 531–538. https://doi.org/10.1080/02699200500266380 [DOI] [PubMed] [Google Scholar]
  15. Chen, A. (2009). The phonetics of sentence-initial topic and focus in adult and child Dutch. In Vigario M., Frota S., & Joao Freitas M. (Eds.), Phonetics and phonology: Interactions and interrelations (pp. 91–106). John Benjamins. https://doi.org/10.1075/cilt.306.05che [Google Scholar]
  16. Chen, A. (2011). Tuning information packaging: Intonational realization of topic and focus in child Dutch. Journal of Child Language, 38(5), 1055–1083. https://doi.org/10.1017/S0305000910000541 [DOI] [PubMed] [Google Scholar]
  17. Chen, A. , Esteve-Gibert, N. , Prieto, P. , & Redford, M. A. (2020). Development in phrase-level prosody from infancy to late childhood. In Gussenhoven C. & Chen A. (Eds.), The Oxford handbook of language prosody (pp. 553–562). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780198832232.013.35 [Google Scholar]
  18. Chin, S. B. , Bergeson, T. R. , & Phan, J. (2012). Speech intelligibility and prosody production in children with cochlear implants. Journal of Communication Disorders, 45(5), 355–366. https://doi.org/10.1016/j.jcomdis.2012.05.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Choe, W. K. , & Redford, M. A. (2012). The distribution of speech errors in multi-word prosodic units. Laboratory Phonology, 3(1), 5–26. https://doi.org/10.1515/lp-2012-0002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Connaghan, K. P. , & Patel, R. (2012). Impact of prosodic strategies on vowel intelligibility in childhood motor speech impairment. Journal of Medical Speech-language Pathology, 20(4), 133–139. [Google Scholar]
  21. Dauer, R. M. (1983). Stress-timing and syllable-timing reanalyzed. Journal of Phonetics, 11(1), 51–62. https://doi.org/10.1016/S0095-4470(19)30776-4 [Google Scholar]
  22. Dellwo, V. (2010). Influences of speech rate on the acoustic correlates of speech rhythm: An experimental phonetic study based on acoustic and perceptual evidence [Doctoral dissertation, Bonn University, Germany] . https://bonndoc.ulb.uni-bonn.de/xmlui/handle/20.500.11811/4239 [Google Scholar]
  23. Dickey, C. C. , Vu, M. A. T. , Voglmaier, M. M. , Niznikiewicz, M. A. , McCarley, R. W. , & Panych, L. P. (2012). Prosodic abnormalities in schizotypal personality disorder. Schizophrenia Research, 142(1–3), 20–30. https://doi.org/10.1016/j.schres.2012.09.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Diehl, J. J. , & Paul, R. (2009). The assessment and treatment of prosodic disorders and neurological theories of prosody. International Journal of Speech-Language Pathology, 11(4), 287–292. https://doi.org/10.1080/17549500902971887 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Diehl, J. J. , & Paul, R. (2012). Acoustic differences in the imitation of prosodic patterns in children with autism spectrum disorders. Research in Autism Spectrum Disorders, 6(1), 123–134. https://doi.org/10.1016/j.rasd.2011.03.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Diehl, J. J. , Watson, D. , Bennetto, L. , McDonough, J. , & Gunlogson, C. (2009). An acoustic analysis of prosody in high-functioning autism. Applied Psycholinguistics, 30(3), 385–404. https://doi.org/10.1017/S0142716409090201 [Google Scholar]
  27. Dilley, L. C. , & Brown, M. (2005). The RaP (Rhythm and Pitch) labeling system [Unpublished manuscript] . http://tedlab.mit.edu/tedlab_website/RaP%20System/RaP_Labeling_Guide_v1.0.pdf
  28. Dilley, L. C. , Wieland, E. A. , Gamache, J. L. , McAuley, J. D. , & Redford, M. A. (2013). Age-related changes to spectral voice characteristics affect judgments of prosodic, segmental, and talker attributes for child and adult speech. Journal of Speech, Language, and Hearing Research, 56(1), 159–177. https://doi.org/10.1044/1092-4388(2012/11-0199) [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Duncan, S. C. , Duncan, T. E. , & Hops, H. (1996). Analysis of longitudinal data within accelerated longitudinal designs. Psychological Methods, 1(3), 236–248. https://doi.org/10.1037/1082-989x.1.3.236 [Google Scholar]
  30. Dunn, L. M. , & Dunn, D. M. (2007). Peabody Picture Vocabulary Test–Fourth Edition (PPVT-4). Pearson Assessments. [Google Scholar]
  31. Fosnot, S. M. , & Jun, S.-A. (1999). Prosodic characteristics in children with stuttering or autism during reading and imitation. In Proceedings of the 14th International Congress of Phonetic Sciences, San Francisco, CA (pp. 1925–1928).
  32. Frota, S. , & Butler, J. (2018). Early development of intonation: Perception and production. In Prieto P. & Esteve-Gibert N. (Eds.), The development of prosody in first language (pp. 145–165). John Benjamins. https://doi.org/10.1075/tilar.23.08fro [Google Scholar]
  33. Galbraith, S. , Bowden, J. , & Mander, A. (2017). Accelerated longitudinal designs: An overview of modelling, power, costs and handling missing data. Statistical Methods in Medical Research, 26(1), 374–398. https://doi.org/10.1177/0962280214547150 [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Ganger, J. , & Brent, M. R. (2004). Reexamining the vocabulary spurt. Developmental Psychology, 40(4), 621–632. https://doi.org/10.1037/0012-1649.40.4.621 [DOI] [PubMed] [Google Scholar]
  35. Grabe, E. , Post, B. , & Watson, I. (1999). The acquisition of rhythmic patterns in English and French. In Proceedings of the 14th International Congress of Phonetic Sciences, San Francisco, CA (pp. 1201–1204).
  36. Hawthorne, K. , & Fischer, S. (2020). Speech-language pathologists and prosody: Clinical practices and barriers. Journal of Communication Disorders, 87, 106024. https://doi.org/10.1016/j.jcomdis.2020.106024 [DOI] [PubMed] [Google Scholar]
  37. Heilmann, J. J. , Miller, J. F. , & Nockerts, A. (2010). Using language sample databases. Language, Speech, and Hearing Services in Schools, 41(1), 84–95. https://doi.org/10.1044/0161-1461(2009/08-0075) [DOI] [PubMed] [Google Scholar]
  38. Ito, K. , & Marten, M. (2017). Contrast-marking prosodic emphasis in Williams syndrome: Results of detailed phonetic analysis. International Journal of Language & Communication Disorders, 52(1), 46–58. https://doi.org/10.1111/1460-6984.12250 [DOI] [PubMed] [Google Scholar]
  39. Jusczyk, P. W. , Cutler, A. , & Redanz, N. J. (1993). Infants preference for the predominant stress patterns of English Words. Child Development, 64(3), 675–687. https://doi.org/10.2307/1131210 [PubMed] [Google Scholar]
  40. Kalathottukaren, R. T. , Purdy, R. , McCormick, S. C. , & Ballard, E. (2015). Behavioral measures to evaluate prosodic skills: A review of assessment tools for children and adults. Issues in Communication Science and Disorders, 42, 138–154. https://doi.org/10.1044/cicsd_42_S_138 [Google Scholar]
  41. Kallay, J. E. , & Redford, M. A. (2021). Clause-initial AND usage in a cross-sectional and longitudinal corpus of school-age children's narratives. Journal of Child Language, 48(1), 88–109. https://doi.org/10.1017/S0305000920000197 [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Kehoe, M. M. (2000). Truncation without shape constraints: The latter stages of prosodic acquisition. Language Acquisition , 8 (1), 23–67. https://doi.org/10.1207/S15327817LA81_2 [Google Scholar]
  43. Kehoe, M. M. , Stoel-Gammon, C. , & Buder, E. H. (1995). Acoustic correlates of stress in young children's speech. Journal of Speech and Hearing Research , 38 (2), 338–350. https://doi.org/10.1044/jshr.3802.338 [DOI] [PubMed] [Google Scholar]
  44. Kent, R. D. , & Vorperian, H. K. (2013). Speech impairment in Down syndrome: A review. Journal of Speech, Language, and Hearing Research, 56(1), 178–210. https://doi.org/10.1044/1092-4388(2012/12-0148) [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Klopfenstein, M. (2009). Interaction between prosody and intelligibility. International Journal of Speech-Language Pathology, 11(4), 326–331. https://doi.org/10.1080/17549500903003094 [Google Scholar]
  46. Kochanski, G. , Grabe, E. , Coleman, J. , & Rosner, B. (2005). Loudness predicts prominence: Fundamental frequency lends little. The Journal of the Acoustical Society of America, 118(2), 1038–1054. https://doi.org/10.1121/1.1923349 [DOI] [PubMed] [Google Scholar]
  47. Ladd, D. R. (2008). Intonational phonology. Cambridge University Press. https://doi.org/10.1017/CBO9780511808814 [Google Scholar]
  48. Lee, S. , Potamianos, A. , & Narayanan, S. (1999). Acoustics of children's speech: Developmental changes of temporal and spectral parameters. The Journal of the Acoustical Society of America, 105(3), 1455–1468. https://doi.org/10.1121/1.426686 [DOI] [PubMed] [Google Scholar]
  49. Lleo, C. , Rakow, M. , & Kehoe, M. (2007). Acquiring rhythmically differently languages in a bilingual context. In Trouvain J. & Barry W. J. (Eds.), Proceedings of the 16th International Congress of Phonetic Sciences, Saarbrucken, Germany (pp. 1545–1548).
  50. MacWhinney, B. (2000). The CHILDES Project: Tools for analyzing talk (3rd ed.). Erlbaum. https://childes.talkbank.org/ [Google Scholar]
  51. Mahr, T. J. , Soriano, J. U. , Rathouz, P. J. , & Hustad, K. C. (2021). Speech development between 30 and 119 months in typical children II: Articulation rate growth curves. Journal of Speech, Language, and Hearing Research, 64(11), 4057–4070. https://doi.org/10.1044/2021_JSLHR-21-00206 [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Maner, K. J. , Smith, A. , & Grayson, L. (2000). Influences of utterance length and complexity on speech motor performance in children and adults. Journal of Speech, Language, and Hearing Research, 43(2), 560–573. https://doi.org/10.1044/jslhr.4302.560 [DOI] [PubMed] [Google Scholar]
  53. Martinez-Castilla, P. , Sotillo, M. , & Campos, R. (2011). Prosodic abilities of Spanish-speaking adolescents and adults with Williams syndrome. Language and Cognitive Processes, 26(8), 1055–1082. https://doi.org/10.1080/01690965.2010.504058 [Google Scholar]
  54. Mayer, M. (1967). A boy, a dog, and a frog. Dial Press. [Google Scholar]
  55. Mayer, M. (1969). Frog, where are you? Dial Press. [Google Scholar]
  56. Mayer, M. (1973). Frog on his own. Dial Press. [Google Scholar]
  57. Mayer, M. , & Mayer, M. (1975). One frog too many. Dial Press. [Google Scholar]
  58. McCabe, P. C. , & Meller, P. J. (2004). The relationship between language and social competence: How language impairment affects social growth. Psychology in the Schools, 41(3), 313–321. https://doi.org/10.1002/pits.10161 [Google Scholar]
  59. McHugh, M. L. (2012). Interrater reliability: The kappa statistic. Biochemia Medica, 22(3), 276–282. https://doi.org/10.11613/bm.2012.031 [PMC free article] [PubMed] [Google Scholar]
  60. Nakata, T. , Trehub, S. E. , & Kanda, Y. (2012). Effect of cochlear implants on children's perception and production of speech prosody. The Journal of the Acoustical Society of America, 131(2), 1307–1314. https://doi.org/10.1121/1.3672697 [DOI] [PubMed] [Google Scholar]
  61. Nippold, M. A. (2016). Later language development: School-age children, adolescents, and young adults. Pro-Ed. [Google Scholar]
  62. Nippold, M. A. , Frantz-Kaspar, M. W. , Cramond, P. M. , Kirk, C. , Hayward-Mayhew, C. , & MacKinnon, M. (2014). Conversational and narrative speaking in adolescents: Examining the use of complex syntax. Journal of Speech, Language, and Hearing Research, 57(3), 876–886. https://doi.org/10.1044/1092-4388(2013/13-0097) [DOI] [PubMed] [Google Scholar]
  63. Olejarczuk, P. , & Redford, M. A. (2013). The relative contribution of rhythm, intonation and lexical information to the perception of prosodic disorder. Proceedings of Meetings on Acoustics, 19, 060154. https://doi.org/10.1121/1.4800625 [Google Scholar]
  64. Parkhurst, B. G. , & Levitt, H. (1978). The effect of selected prosodic errors on the intelligibility of deaf speech. Journal of Communication Disorders, 11(2–3), 249–256. https://doi.org/10.1016/0021-9924(78)90017-5 [DOI] [PubMed] [Google Scholar]
  65. Patel, R. , Hustad, K. C. , Connaghan, K. P. , & Furr, W. (2012). Relationship between prosody and intelligibility in children with dysarthria. Journal of Medical Speech-Language Pathology, 20, 1–5. [PMC free article] [PubMed] [Google Scholar]
  66. Paul, R. , Schoen Simmons, E. , & Mahshie, J. (2020). Prosody in children with atypical development. In Gussenhoven C. & Chen A. (Eds.), The Oxford handbook of language prosody (pp. 582–594). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780198832232.013.49 [Google Scholar]
  67. Payne, E. , Post, B. , Astruc, L. , Prieto, P. , & del Mar Vanrell, M. (2011). Measuring child rhythm. Language and Speech, 55(2), 203–229. https://doi.org/10.1177/0023830911417687 [DOI] [PubMed] [Google Scholar]
  68. Peppé, S. J. (2009). Why is prosody in speech-language pathology so difficult? International Journal of Speech-Language Pathology, 11(4), 258–271. https://doi.org/10.1080/17549500902906339 [Google Scholar]
  69. Peppé, S. (2018). Prosodic development in atypical populations. In Prieto P. & Esteve-Gibert N. (Eds.), The development of prosody in first language acquisition (pp. 343–362). John Benjamins. https://doi.org/10.1075/tilar.23.17pep [Google Scholar]
  70. Peter, B. , & Stoel-Gammon, C. (2005). Timing errors in two children with suspected childhood apraxia of speech (sCAS) during speech and music-related tasks. Clinical Linguistics & Phonetics, 19(2), 67–87. https://doi.org/10.1080/02699200410001669843 [DOI] [PubMed] [Google Scholar]
  71. Polyanskaya, L. , & Ordin, M. (2015). Acquisition of speech rhythm in first language. The Journal of the Acoustical Society of America, 138(3), EL199–EL204. https://doi.org/10.1121/1.4929616 [DOI] [PubMed] [Google Scholar]
  72. R Core Team. (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.r-project.org/
  73. Redford, M. A. (2013). A comparative analysis of pausing in child and adult storytelling. Applied Psycholinguistics, 34(3), 569–589. https://doi.org/10.1017/S0142716411000877 [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Redford, M. A. (2014). The perceived clarity of children's speech varies as a function of their default articulation rate. The Journal of the Acoustical Society of America, 135(5), 2952–2963. https://doi.org/10.1121/1.4869820 [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Redford, M. A. (2018). Grammatical word production across metrical contexts in school-aged children's and adults' speech. Journal of Speech, Language, and Hearing Research, 61(6), 1339–1354. https://doi.org/10.1044/2018_JSLHR-S-17-0126 [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Redford, M. A. , Kapatsinski, V. , & Cornell-Fabiano, J. (2018). Lay listener classification and evaluation of typical and atypical children's speech. Language and Speech, 61(2), 277–302. https://doi.org/10.1177/0023830917717758 [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Rice, M. L. , Sell, M. A. , & Hadley, P. A. (1991). Social interactions of speech, and language-impaired children. Journal of Speech and Hearing Research, 34(6), 1299–1307. https://doi.org/10.1044/jshr.3406.1299 [DOI] [PubMed] [Google Scholar]
  78. Romøren, A. S. H. , & Chen, A. (2014). Accentuation, pitch and duration as cues to focus in Dutch 4- to 5-year-olds. In Orman W. & Valleau M. J. (Eds.), BUCLD 38: Proceedings of the 38th Annual Boston University Conference on Language Development. Cascadilla Press. http://www.bu.edu/bucld/files/2014/04/romoren.pdf [Google Scholar]
  79. Rubin, K. H. , Bukowski, W. M. , & Bowker, J. C. (2015). Children in peer groups. In Lerner R. M. (Ed.), Handbook of child psychology and developmental science: Vol. 4. Ecological settings and processes (7th ed., pp. 175–222). Wiley. https://doi.org/10.1002/9781118963418.childpsy405 [Google Scholar]
  80. Saletta, M. , Goffman, L. , Ward, C. , & Oleson, J. (2018). Influence of language load on speech motor skill in children with specific language impairment. Journal of Speech, Language, and Hearing Research, 61(3), 675–689. https://doi.org/10.1044/2017_JSLHR-L-17-0066 [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Scholderle, T. , Haas, E. , Baumeister, S. , & Ziegler, W. (2021). Intelligibility, articulation rate, fluency, and communicative efficiency in typically developing children. Journal of Speech, Language, and Hearing Research, 64(7), 2575–2585. https://doi.org/10.1044/2021_JSLHR-20-00640 [DOI] [PubMed] [Google Scholar]
  82. Schwartz, R. G. , Petinou, K. , Goffman, L. , Lazowski, G. , & Cartusciello, C. (1996). Young children's production of syllable stress: An acoustic analysis. The Journal of the Acoustical Society of America, 99(5), 3192–3200. https://doi.org/10.1121/1.414803 [DOI] [PubMed] [Google Scholar]
  83. Semel, E. , Wiig, E. , & Secord, W. (2003). Clinical Evaluation of Language Fundamentals–Fourth Edition. The Psychological Corporation. [Google Scholar]
  84. Shattuck-Hufnagel, S. , & Turk, A. E. (1996). A prosody tutorial for investigators of auditory sentence processing. Journal of Psycholinguistic Research, 25(2), 193–247. https://doi.org/10.1007/BF01708572 [DOI] [PubMed] [Google Scholar]
  85. Shriberg, L. D. , Paul, R. , McSweeny, J. L. , Klin, A. , Cohen, D. J. , & Volkmar, F. R. (2001). Speech and prosody characteristics of adolescents and adults with high-functioning autism and Asperger syndrome. Journal of Speech, Language, and Hearing Research, 44(5), 1097–1115. https://doi.org/10.1044/1092-4388(2001/087) [DOI] [PubMed] [Google Scholar]
  86. Sirsa, H. , & Redford, M. A. (2011). Towards understanding the protracted acquisition of English rhythm. In Lee W. S. & Zee E. (Eds.), Proceedings of the 17th International Congress of Phonetic Sciences, Hong Kong, China (pp. 1862–1865). [PMC free article] [PubMed]
  87. Sluijter, A. M. C. , van Heuven, V. J. , Pacilly, J. J. A. (1997). Spectral balance as a cue in the perception of linguistic stress. The Journal of the Acoustical Society of America, 101(1), 503–513. https://doi.org/10.1121/1.417994 [DOI] [PubMed] [Google Scholar]
  88. Smith, A. , & Zelaznik, H. N. (2004). Development of functional synergies for speech motor coordination in childhood and adolescence. Developmental Psychobiology, 45(1), 22–33. https://doi.org/10.1002/dev.20009 [DOI] [PubMed] [Google Scholar]
  89. Snow, D. (1994). Phrase-final syllable lengthening and intonation in early child speech. Journal of Speech and Hearing Research, 37(4), 831–840. https://doi.org/10.1044/jshr.3704.831 [DOI] [PubMed] [Google Scholar]
  90. Snow, D. (1997). Children's acquisition of speech timing in English: A comparative study of voice onset time and final syllable vowel lengthening. Journal of Child Language, 24(1), 35–56. https://doi.org/10.1017/S0305000996003029 [DOI] [PubMed] [Google Scholar]
  91. Stojanovik, V. (2011). Prosodic deficits in children with Down syndrome. Journal of Neurolinguistics, 24(2), 145–155. https://doi.org/10.1016/j.jneuroling.2010.01.004 [Google Scholar]
  92. Stritch, J. M. (2017). Minding the time: A critical look at longitudinal design and data analysis in quantitative public management research. Review of Public Personnel Administration, 37(2), 219–244. https://doi.org/10.1177/0734371X17697117 [Google Scholar]
  93. Verbeke, G. , & Lesaffre, E. (1999). The effect of drop-out on the efficiency of longitudinal experiments. Journal of the Royal Statistical Society: Series C (Applied Statistics), 48(3), 363–375. https://doi.org/10.1111/1467-9876.00158 [Google Scholar]
  94. Vihman, M. M. (1996). Phonological development: The origins of language in the child. Blackwell. [Google Scholar]
  95. Vihman, M. M. , Nakai, S. , & DePaolis, R. (2006). Getting the rhythm right: A cross-linguistic study of segmental duration in babbling and first words. In Goldstein L., Whalen D., & Best C. T. (Eds.), Laboratory phonology 8 (pp. 341–366). Mouton de Gruyter. https://doi.org/10.1515/9783110197211.2.341 [Google Scholar]
  96. Walker, J. F. , & Archibald, L. M. D. (2006). Articulation rate in preschool children: A 3-year longitudinal study. International Journal of Language & Communication Disorders, 41(5), 541–565. https://doi.org/10.1080/10428190500343043 [DOI] [PubMed] [Google Scholar]
  97. Watson, D. , & Gibson, E. (2004). The relationship between intonational phrasing and syntactic structure in language production. Language and Cognitive Processes, 19(6), 713–755. https://doi.org/10.1080/01690960444000070 [Google Scholar]
  98. Wells, B. , Peppe, S. , & Goulandris, N. (2004). Intonation development from five to thirteen. Journal of Child Language, 31(4), 749–778. https://doi.org/10.1017/S030500090400652X [DOI] [PubMed] [Google Scholar]
  99. White, L. , & Mattys, S. L. (2007). Calibrating rhythm: First language and second language studies. Journal of Phonetics, 35(4), 501–522. https://doi.org/10.1016/j.wocn.2007.02.003 [Google Scholar]
  100. Wilson, E. M. , Abbeduto, L. , Camarata, S. M. , & Shriberg, L. D. (2019). Speech and motor speech disorders and intelligibility in adolescents with Down syndrome. Clinical Linguistics and Phonetics, 33(8), 790–814. https://doi.org/10.1080/02699206.2019.1595736 [DOI] [PMC free article] [PubMed] [Google Scholar]
  101. Winter, B. (2013). Linear models and linear mixed effects model in R with linguistic applications. arXiv, 1–42. https://doi.org/10.48550/arXiv.1308.5499 [Google Scholar]
  102. Zampini, L. , Fasolo, M. , Spinelli, M. , Zanchi, P. , Suttora, C. , & Salerni, N. (2016). Prosodic skills in children with Down syndrome and in typically developing children. International Journal of Language & Communication Disorders, 51(1), 74–83. https://doi.org/10.1111/1460-6984.12186 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The speech samples analyzed in this study were drawn from the Eugene Children's Story Corpus (ECSC; Kallay & Redford, 2021), which is publicly available through the CHILDES database (MacWhinney, 2000). The ECSC includes both the audio files and pause-delimited orthographic transcriptions of the stories. The Rhythm and Pitch (RaP) transcriptions of the stories and raw measurement data extracted from the audio files and from the transcriptions are available upon request made to the corresponding author.


Articles from Journal of Speech, Language, and Hearing Research : JSLHR are provided here courtesy of American Speech-Language-Hearing Association

RESOURCES