Abstract
Purpose
We investigated perception and production of lexical stress and processing of affective prosody in adolescents with high functioning autism (HFA). We hypothesized preserved processing of lexical and affective prosody, but atypical lexical prosody production.
Method
16 children with HFA and 15 typically developing (TD) peers participated in three experiments: 1. Perception of affective prosody, 2. Lexical stress perception, 3. Lexical stress production. In Experiment 1, participants labeled sad, happy, and neutral spoken sentences that were low-pass filtered, to eliminate verbal content. In Experiment 2 participants disambiguated word meanings based on lexical stress (HOTdog, vs. hotDOG). In Experiment 3 participants produced these words in a sentence completion task. Productions were analyzed using acoustic measures.
Results
Accuracy levels showed no group differences. Participants with HFA could determine affect from filtered sentences and disambiguate words based on lexical stress. They produced appropriately differentiated lexical stress patterns but demonstrated atypically long productions indicating reduced ability in natural prosody production.
Conclusions
Children with HFA were as capable as their TD peers in receptive tasks of lexical stress and affective prosody. Prosody productions were atypically long, despite accurate differentiation of lexical stress patterns. Future research should use larger samples and spontaneous vs. elicited productions.
Keywords: Autism, Prosody, Lexical stress, Affective prosody, Perception, Production
Introduction
Prosody is a suprasegmental device that can best be described as the “melody” or “rhythm” of speech. It is a complex vocal signal composed primarily of the pitch, intensity, and duration of an utterance. Despite its complexity, typically developing (TD) children and adults are able to perceive and comprehend this prosody automatically (Shriberg & Kent, 2003) and do so from a very early stage in development (Mehler et al., 1988; Jusczyk, Cutler, & Redanz, 1993). There is evidence to suggest that children accurately perceive and use prosodic cues to inform their acquisition of expressive vocabulary from the first-word stage through the preschool years. Some of the typical errors in early word production shown by toddlers, such as weak syllable omission, can be traced directly to their interpretation of prosodic cues (Gerken & McGregor, 1998).
For individuals with autism spectrum disorders (ASD), however, this non-verbal aspect of communication poses a much greater challenge. Abnormal prosody production has been one of the hallmark characteristics of autism since its first description by Kanner (1943) and continues to be a consistent aspect of the ASD communication profile (Shriberg, et al., 2001, Rapin & Dunn, 1997, Baltaxe & Simmons, 1985, 1992). Many observations describe expressive prosody in this population as flat, monotonous, or abnormally modulated (Baltaxe, 1984, Fay & Schuler, 1980). Atypical prosody production also contributes to the perception of reduced social and communicative competence in individuals with ASD, making it one of the earliest and most salient features of abnormal communication noted by typical listeners (Paul, Shriberg, et al., 2005).
Prosody serves several functions in speech, including, but not limited to, the transmission of affect and the marking of grammatical and lexical constructs (McCann & Peppé, 2003). Affective prosody refers to a speaker varying the pitch and rate of an utterance to indicate his or her emotional state. Grammatical prosody encompasses the use of pitch to indicate the type of statement being made (e.g. a rising intonation at the end of an utterance defines it as a question) and the application of lexical stress, which is the emphasis of one syllable over another in order to convey or disambiguate meaning (e.g. “HOTdog” is a type of food while “hot DOG” is an overheated canine) (Merewether & Alpern, 1990).
Many studies of prosody in ASD tend to focus on single aspects of prosody, for example, looking either at production or perception, or focusing only on affective prosody or only on the use of emphatic stress, and the findings from these studies are often contradictory. Some studies of affective prosody have shown that children with ASD are less able than their TD peers to use prosody to match simple emotions from social scenes with pictures of emotional facial expressions (Lindner & Rosén, 2006) or determine complex states of mind from sentences (Golan, Baron-Cohen, Hill, & Rutherford, 2007). On the other hand, Boucher, Lewis, and Collis (2000) found that children with autism were equal to their TD peers in the ability to label six basic emotions based on the affective prosody of single word utterances (e.g. days of the week), despite being deficient in matching the prosodically expressed emotion to a photograph of an emotional face. The differences in these results may be based on the varied methodologies and stimulus selections, ranging from single words and word lists to complete sentences and using labeling vs. face matching.
Investigations of lexical stress production and perception have also yielded an uneven picture of prosodic skills in ASD. TD children appear to have mastered the ability to disambiguate compound nouns (hotdog) from noun phrases (hot dog) using lexical stress cues by the age of 10 (Vogel & Raimy, 2002; Cruttenden, 1974) or 12 (Atkinson-King, 1973), and can produce appropriate lexical stress even earlier (Cutler & Swinney, 1987). Smith and Robb (2005) asked TD children and children with speech delay aged 5-7 to repeat novel two-syllable words with stress either on the first or second syllable and used the acoustic measure of whole-word duration to capture their productions. Both groups produced second-syllable stress words that were overall longer than those with first syllable stress, indicating that whole-word duration is a valid measure of lexical stress production in children.
Children with ASD appear to share many of those skills, being able to recall stressed words better than unstressed words (Fine et al., 1991) and showing no statistical difference from TD peers in disambiguating verbs (e.g. reCALL) from nouns (e.g. REcall) (Paul, Augustyn, Klin, & Volkmar, 2005). Their main deficits appear to lie in the production of prosody, with several studies noting the inappropriate production of contrastive stress (Baltaxe 1984, Fine et al., 1991, McCann & Peppé, 2003 for a review), inappropriate phrasal stress (Baltaxe & Guthrie 1987, Baltaxe & Simmons, 1985, McCaleb & Prizant, 1985), or excessive stress in spontaneous speech (Shriberg et al., 2001). There are also data to suggest that people with ASD have greater variation in the use of prosodic pitch both within an utterance and across individuals (Baltaxe 1984, Fosnot & Jun, 1999). However, most assessments of inappropriateness have relied on subjective coding, rather than acoustic analyses.
Recently, a few studies have gone a long way toward capturing a more complete picture of prosody perception and production in ASD. Shriberg et al. (2001) used the Prosody-Voice Screening Profile (PVSP, Shriberg et al., 1990), which measures production for a range of prosodic functions and found few areas of deficits, which were concentrated mostly in fluency, phrasing and hypernasality. Paul, Shriberg et al. (2005) expanded on this work and determined that there was a strong correlation between these prosodic differences and an increased perception of communication and socialization deficits in this population, as measured by the Autism Diagnostic Observation Schedule-Generic (ADOS-G, Lord et al. 2000) and the Vineland Adaptive Behavior Scales-Survey Form (Sparrow et al., 1984). Paul, Augustyn, et al. (2005) used a comprehensive set of 12 tasks to assess the production and perception of grammatical and affective prosody in a group of children with ASD. They determined that the ASD group showed significant deficits in the perception and production of emphatic stress and the production of grammatical stress, but not in the processing of grammatical and affective intonation, or grammatical phrasing. The authors attribute this lack of group differences to possible ceiling effects for some of the tasks included in the battery. They also note that objective measures of prosody production would be helpful in furthering our understanding of prosody competence in ASD.
Using a measure of prosody production and perception across affect and grammatical usage, the Profiling Elements of Prosodic Systems in Children (PEPS-C, Peppé & McCann, 2003), Peppé, McCann, Gibbon, O’Hare, and Rutherford (2006) showed that the prosodic deficits of a boy with high functioning autism (HFA) could be captured and differentiated from the standardization sample of typically developing (TD) children. In this case study, they documented a tendency for misplaced stress and less accurate performance on affective prosody, as well as relatively poorer performance on short items (words) over longer items (phrases or word lists). Applying the PEPS-C to a large sample of school-aged children with HFA, McCann, Peppé, Gibbon, O’Hare, and Rutherford (2007) and Peppé, McCann, Gibbon, O’Hare, and Rutherford (2007) described systematic deficits in this population, including reduced accuracy for the perception and production of affective prosody in single words (food items expressed with like or distaste matched with happy and unhappy faces) and the production of contrastive stress. The prosody skills of both the HFA and TD groups were strongly correlated with expressive and receptive language scores, indicating that mastery of prosody is an integral part of language function.
Although the PEPS-C assesses most aspects of prosody, it does not contain a measure of lexical stress, which is a prosodic function more directly related to grammatical processing than either contrastive or emphatic stress, and therefore a potential area of strength for high-functioning individuals with ASD whose language skills are in the normal range. The PEPS-C contains a measure for “chunking,” which refers to the use of pauses to indicate whether a list has three items (e.g. chocolate, cookies, and milk), or two items (e.g. chocolate-cookies and milk). This is however a different process than the use of selective intensity (i.e. stress) to determine the meaning of an ambiguous word, such as hot dog vs. hotdog, which is one of the prosodic structures investigated in this paper. Also, the affective prosody tasks of the PEPS-C are based on productions of single food words, indicating like or dislike, thereby limiting the affective prosody information to a very brief utterance, rather than the wealth of prosodic information typically contained in natural, emotionally laden, sentence-length speech. The aim of this paper is to add to the current knowledge base of prosodic ability in individuals with ASD by presenting tasks of affective prosody using sentence length stimuli, as well as tasks investigating receptive and expressive aspects of lexical stress in two-syllable ambiguous word pairs. We use objective measures of stress production (intensity, pitch, and whole word duration) to reduce reliance on subjective coding to determine group differences.
Based on this population’s documented deficits in affective prosody matching and labeling for single words (McCann, et al., 2007, Peppé, et al., 2007), but potentially preserved skills for sentences (Paul, Augustyn, et al., 2005), we expect children and adolescents with HFA to be able to use the richness of whole-sentence prosodic information to determine affective content, even when we eliminate all verbal content from the stimuli. Our hypothesis for the affective prosody task in Experiment 1 is that children with HFA will perform at a level equal to their TD peers on a task using sentences that have been filtered to eliminate verbal content but maintain prosodic contours. We expect both groups to show higher accuracy scores for affective sentences with verbal content than for those without. Experiment 2: Based on the HFA group’s high verbal language scores and the previously demonstrated connection between language skills and prosodic ability (McCann et al., 2007) combined with the highly grammatical nature of the lexical stress stimuli we expect this group to be able to use lexical stress information to disambiguate compound nouns (e.g. wetsuit) from noun phrases (e.g. wet suit). We therefore hypothesize that the HFA group will perform at a level equal to their TD peers when asked to use lexical stress to determine the meaning of an ambiguous word pair. Experiment 3: A deficit in prosody production is one of the few consistently reported results in the autism prosody literature, noting both quantifiable and subjective deficits. We therefore expect participants with HFA to show quantifiable deficits in lexical stress production for ambiguous word pairs. We also hypothesize that this expressive prosody difference will be captured in the objective acoustic measure of whole-word duration.
Experiment 1: Perception of Affective Prosody in Filtered and Unfiltered Sentences
To test perception of affective prosody, the lexical content of the stimulus needs to be controlled (Peppé et al., 2006). In this study we used sentences that were filtered to remove all recognizable semantic content, but contained fully preserved prosody. Our aim was to assess whether children and adolescents with HFA are able to make affect judgments based solely on a speaker’s emotional prosody without any assistance of linguistic content. We also wanted to compare accuracy for these filtered sentences to accuracy in a baseline comparison task using unfiltered sentences with semantically biased lexical content.
Method
Participants
Two groups participated in this study: children and adolescents with HFA (n=16) and typically developing controls (n=15). The participants were recruited through advocacy groups for parents of children with autism, local schools, and advertisements placed in local magazines, newspapers, and the internet.
Diagnosis of Autism
The participants in the autism group met DSM-IV criteria for autistic disorder, based on expert clinical impression and confirmed by the Autism Diagnostic Interview-Revised (ADI-R; Lord, Rutter, & LeCouteur, 1994) and the Autism Diagnostic Observation Schedule (ADOS; Lord, Rutter, DiLavore, & Risi, 1999), which were administered by trained examiners. Participants with known genetic disorders were excluded. In order to create a group of participants with a narrowly constrained diagnosis, we excluded all participants who scored in the range of ASD, Pervasive Developmental Disorder-Not Otherwise Specified, or Asperger Syndrome and kept only that subgroup of participants who were diagnosed as falling in the “full autism” range of the spectrum.
Standardized Testing
The Kaufman Brief Intelligence Test, Second Edition (K-BIT 2; Kaufman & Kaufman, 2004) was used to assess IQ in all participants and receptive vocabulary was measured by the Peabody Picture Vocabulary Test (PPVT-III, Dunn & Dunn, 1981). Participants were selected on the basis of standardized scores within the normal range to further create a more homogenous and high-functioning autism group. The descriptive characteristics of both groups can be found in Table 1. Using a multivariate ANOVA with group as the independent variable we verified that the HFA and TD groups did not differ significantly in age, F (1, 30) = .046, p = .83, verbal IQ, F (1,30) = 1.763, p = .195, nonverbal IQ, F (1,30) = .761, p = .39 and receptive vocabulary ability, F (1,30) = .598, p = .445.
Table 1.
HFA (n=16) M(SD) |
TD (n=15) M(SD) |
|
---|---|---|
Age | 12:4(2:3) Range: 7:6 - 17 |
12:7(3:1) Range: 7:6 -18 |
Full Scale IQ | 106.7(10.6) Range: 87 - 123 |
108.9(11.3) Range: 87-123 |
Verbal IQ | 101.2(14.3) Range: 83 - 127 |
108.1(14.6) Range: 81 - 127 |
Nonverbal IQ | 109.6(19.1) Range: 94 - 127 |
106.7(9.8) Range: 85 -116 |
PPVT-III | 107.0(15.4) Range: 79 - 138 |
111.3(15.3) Range: 79 - 139 |
Language Competence in the Autism Group
All participants with HFA had receptive vocabulary and verbal IQ within two standard deviations of the mean as assessed through standardized measures, which placed them in the high functioning range of the spectrum. Their social, pragmatic, communication, and other deficits, however, were significant enough to reach the diagnostic threshold for full autism on the ADOS. It is important to note that most standardized measures of language do not include detailed measures of prosodic or pragmatic competence, meaning that individuals may have normal verbal IQ, but still exhibit great difficulty perceiving and producing nonverbal language, such as prosody or affective facial expressions.
Materials
The stimuli used in the affective prosody task were complete, natural sentences containing commonly used words. We used sentences spoken by a female native English speaker that were previously used by Plesa Skwerer, Schofield, Verbalis, Faja, & Tager-Flusberg (2007). Six different sentences were produced in each of the three emotions (happy, sad, and neutral). In accordance with established features of affective prosody (Bachorowski & Owren, 1995; Banse & Sherer, 1996; Cosmides, 1983; Murray & Arnott, 1993), Plesa-Skwerer et al. (2007) selected the 18 sentences that best portrayed their target emotions, verified by acoustic characteristics as measured using PRAAT (Boersma & Weenink, 2005). Happy prosody samples were verified to contain higher pitch, faster rate, and a complex tone ending (a rise-fall pattern). Sad sentences had lower pitch, slower rate, and a low tone ending, while neutral sentences were portrayed by mid-range pitch, accent on the main topic, and a less complex final tone. In addition to these acoustic measures, the stimuli were rated by 10 typical individuals for accuracy of expressed affect, confirming that the selected sentences clearly and naturally portrayed the target emotion without exaggeration. All sentences contained a final phrase indicating a label of the target emotion, e.g. “When the kids tease Sue, she’s upset,” designed to enable participants to rely heavily on semantic cues for affect determination in the unfiltered task and thereby provide a significant contrast in processing strategy to the filtered sentences, which had no verbal content at all (see complete list of unfiltered stimuli in appendix A). All 18 sentences were of approximately equal length and followed the same syntactic structure. The sentences were then low-pass filtered to delete speech frequencies above 100-150 Hz. This cut-off point was chosen to ensure that all semantic content was completely removed while fully preserving prosodic contours. The 18 filtered sentences were pseudo-randomized into three different counterbalanced sequences for the “filtered task” and administration of those sequences was alternated between participants. The 18 unfiltered sentences were also pseudo-randomized, counterbalanced, and used for the “content task.” Additional sentences were used for training. Each participant was given the “filtered task” before the “content task.” All stimuli were presented using a PC computer with standard speakers. Responses were recorded using a Cedrus button box.
Procedure
All participants completed a short training session containing only filtered sentences, to ensure that they understood the task. Participants were instructed that they would hear “strange-sounding” sentences and were asked to determine how the speaker was feeling, with the available choices being “happy,” “sad,” or “neutral.” They were further explicitly instructed not to try and decipher what the speaker was saying. Responses were made on a button box, with the leftmost button representing “happy,” the second button “sad” and the third button “neutral.” Buttons were marked with line-drawings of faces representing the target emotion. At the beginning of the experimental task, participants were reminded to listen to how the speaker was feeling and not what she was saying. After completing the filtered task each participant was presented with the content task using the same design and procedures. The unfiltered content stimuli were included to provide a control task for participants to demonstrate their ability to follow the task instructions and process meaningful language stimuli. Our aim was to contrast performance on this control task with performance on the filtered task, to determine differences in emotion processing abilities for prosody-only stimuli, as opposed to those carrying full verbal and vocal language information.
Results
Mean accuracy scores in percent correct for the filtered and content tasks are shown in Table 2. Both group’s accuracy scores were significantly above the 33.3% chance level for the filtered task (t (15) = 9.03, p < .001 in the HFA group, and t (14) = 10.2, p < .001 in the TD group) and the content task (t (15) = 19.7, p < .001 in the HFA group, t (14) = 23.7, p < .001 in the TD group) based on Paired Samples t-tests. We conducted a 2(group) × 2(task) × 3(emotion) repeated measures ANOVA (data were normally distributed) with percent correct as the dependent variable and found a significant main effect for task F (1,29) = 16.6, p < .001, partial η2 =.36, with performance better on the content task than the filtered task in both groups (t (15) = 3.4, p < .001, Cohen’s d = 1.04 for the HFA group and t (14) = 2.4, p = .03, Cohen’s d = .76 for the TD group, based on Paired Samples t-tests). There was also a main effect for emotion, F (2,58) = 20.26, p < .001, partial η2 =.41, with lowest performance on the “neutral” sentences in both tasks for both groups. An error analysis for neutral sentences showed that both groups were more than 68% accurate in categorizing neutral sentences, but when they did make mistakes, they were more likely to incorrectly label neutral expressions as sad than happy (see Fig. 1). There was no main effect for group (F (1,29) = .44, p = .51, partial η2 = .02), or significant group by task (F (1,29) = .35, p = .56, partial η2 = .01) or group by emotion (F (1,29) = 2.3, p = .11, partial η2 = .07) interactions.
Table 2.
HFA (n=16) M(SD) |
TD (n=15) M(SD) |
|
---|---|---|
Filtered Happy | 72.9(20.9) | 86.7(9.3) |
Content Happy | 89.6(13.4) | 95.6(7.6) |
Filtered Sad | 86.5(18.5) | 82.2(23.1) |
Content Sad | 89.6(16.0) | 95.6(9.9) |
Filtered Neutral | 61.5(24.9) | 65.6(27.1) |
Content Neutral | 83.3(18.3) | 74.4(26.6) |
Filtered Overall | 73.6(17.9) | 78.1(17.1) |
Content Overall | 87.5(11.0) | 88.5(9.0) |
We also conducted a 2(group) × 3(emotion) repeated measures ANOVA within each task (content and filtered), which revealed a main effect for emotion in the content task F (2,58) = 7.6, p = .001, partial η2 = .21 and the filtered task F (2,58) = 16.2, p < .001, partial η2 = .36. In both tasks the neutral sentences achieved less accuracy than the sentences containing either happy or sad affective prosody. There was no significant difference between groups (F (1,29) = .52, p = .47, partial η2 = .02 for filtered stimuli and F (2,58) = .01, p = .78, partial η2 = .003 for content stimuli) or group by emotion interaction (F (2,58) = 2.7, p = .07, partial η2 = .09 for filtered stimuli and F (2,58) = 2.24, p = .12, partial η2 = .07 for content stimuli).
Discussion
As expected, both groups were significantly more accurate on the content task than the filtered task. One surprising aspect of this study was that the TD group scored relatively poorly on the filtered task. It is possible that the large age range of our participants (7:6-18 years, mean 12:7) meant that some members of the TD group were too young to successfully perform this task, thereby lowering the group’s accuracy scores. However, both groups’ mean accuracy scores were significantly above chance, indicating that, overall, they understood the task and were able to interpret the stimuli accurately. The filtered sentences are an unnatural presentation of language, providing affect information exclusively in the voice, and therefore much more difficult to interpret than natural sentences with intact verbal content, resulting in lower accuracy scores across groups. However, the results of the filtered speech task do show that both participant groups were equally able to discern happy, sad, and neutral emotions in sentences using only prosodic cues, thereby confirming our main hypothesis. Both groups were also more accurate categorizing stimuli with affective overlay than neutral sentences, paralleling the results found by Plesa Skwerer et al. (2007) using the same stimuli with a group of adolescents and adults with Williams Syndrome, learning- or intellectual disabilities, and TD controls. These data suggest that the absence of affective information may be more difficult to interpret for TD individuals as well as participants with a range of communication and/or cognitive deficits. Further studies are required to determine whether this neutral affect disadvantage is robust across different methodologies and stimulus types.
The data also confirm our secondary hypothesis that the children with HFA would be as able as their TD peers to use the addition of semantic information to perform more accurately on the unfiltered than on the filtered sentences. These data are consistent with results of prior studies showing that individuals with HFA are able to label basic emotions using prosodic cues of word lists (e.g. days of the week) (Boucher, Lewis & Collis, 2000), and full sentences (Paul, Augustyn et al., 2005) but contradict data indicating deficits in single word affective prosody processing among children with ASD (McCann et al. 2007, Peppé et al. 2007). In contrast to these latter studies the filtered and unfiltered sentences in this study provided additional prosodic cues sustained over longer time periods than the comparatively short duration of a single spoken word. Our data suggest that the added complexity and prosodic information of sentences may assist participants with HFA with the decoding of affective prosody. It is also important to note that the participants in our studies were chronologically a few years older than the participants in the McCann et al., 2007 and Peppé et al., 2007 studies, whose participants were on average 9:4 and 9:10 years old respectively. Although the mean age of our participants is only three years older, this difference is meaningful because it implies that most of the subjects in our studies were expected to have reached maturity in prosody processing ability. Furthermore, the participants with ASD included here all had verbal IQ and language scores within two standard deviations of the mean, meaning that – in contrast to the McCann et al., Peppé et al., and Paul, Augustyn et al., 2005 studies - our results are based on a cohort of adolescents with high-functioning autism who did not at this time exhibit measurable language impairment.
Our data conflict with the results of Golan et al. (2007) who also used full sentences but found deficits in affective prosody recognition among adults with ASD. This difference can be explained by the fact that we only included two basic emotions that contrasted in valence (happy, sad) whereas Golan et al. (2007) used complex emotional states, such as “defensive,” “joking,” “unconcerned,” or “indignant,” and a choice between four, rather than three possible responses. It may well be that individuals with autism have preserved ability to discern basic emotions from affective prosodic cues alone, but fail in their interpretation of affective prosody for more complex and subtle emotions such as those tested by Golan et al. (2007).
Overall, the children and adolescents with HFA in our study labeled happy, sad, and neutral emotions in sentences-length, filtered speech stimuli as well as their TD peers. Further studies using more emotion types and larger participant groups are necessary to determine whether there are group differences that were not fully expressed in our study, most likely due to small sample sizes.
Experiment 2: Perception of Lexical Prosody
The purpose of the receptive lexical prosody task was to assess the ability of children and adolescents with HFA to use lexical stress alone to determine the meaning of the same ambiguous word pairs originally designed by Atkinson-King (1973), and adapted by Plesa-Skwerer et al. (2007) for use with adolescents and adults with Williams Syndrome, and learning or intellectual disabilities. Our aim was to expand the application of these proven stimuli to assess the comprehension of lexical stress in children with HFA.
Method
Participants
Participants were the same sample of children and adolescents as in Experiment 1.
Materials
The lexical ambiguity task contained 22 two-syllable word pairs, 11 of them with ambiguous meaning, e.g. PICKup (a type of truck) and pick UP (taking an item off the floor) and 11 foil words with only one possible meaning, e.g. T-shirt (see full list of stimuli in Appendix B). The target words were recorded by a female native English speaker at least three times each with stress on the first syllable, stress on the second syllable, and equal stress on both syllables. This last stress pattern was a control condition, since equal stress is not a valid form of lexical stress in American English. The first- and last-syllable stress productions determined the two possible meanings of the ambiguous words pairs (e.g. HOTdog vs. hotDOG). In order to ensure that each word pair was produced correctly, Plesa Skwerer et al. (2007) acoustically analyzed each stimulus recording for duration, mean intensity, and average pitch (F0) using Praat software. In accordance with established guidelines for acoustic properties of stressed syllables (Lieberman, 1960, Klatt, 1976), Plesa-Skwerer et al. verified that all samples in the stimulus list adhered to the criteria of longer duration, higher pitch, and greater intensity of the stressed syllable. Acoustic analyses using PRAAT confirmed that all stimuli displayed this distinction in both the first-syllable and last-syllable stress items, but – appropriately - not the equal stress items. The acoustically verified stimuli were then presented to 10 raters and items that incurred consistent errors were eliminated from the stimulus list. The 11 foil words were recorded by the same speaker with their correct first-syllable stress pattern as well as an equal stress pattern, to mask the experimental manipulation.
Two picture stimuli were created by an amateur artist for each of the two-syllable words. Every target word was combined with one picture representing the correct meaning and another indicating the meaning resulting from the opposite stress pattern, e.g. hotdog vs. hot dog, (see Fig 2). The foil words were paired with one picture representing the correct meaning of the word and another picture representing a feature of the foil item without matching the meaning of the target stimulus (e.g. “rainbow” was paired with a picture of a rainbow and a picture of a hairbow). The pictures were piloted in an open-ended task to ensure that they conveyed their intended meaning clearly. The experimental task was presented in two segments separated by a break. The words were pseudo-randomized and counterbalanced so that each word pair was presented in all three stress patterns during each half of the experiment for a total of 66 experimental trials. The eleven foil words were presented at least once in each stress pattern for a total of 29 foil trials. We created two pseudo-randomized sequences of the experiment to ensure that the same word-pair did not appear twice in a row, and counterbalanced them across participants. Participant responses were recorded via a Cedrus button box.
Procedure
Before beginning the experimental trials, all participants completed a three item training session to ensure that they understood the task. Participants saw two pictures side by side and heard a sentence via speakers containing the target word (e.g. take a copy of the HANDout). This was followed by a repetition of the two-syllable target (HANDout) in isolation. We then asked participants to select the picture that matched the meaning of the target word by pressing the left or right button on the button box, corresponding to the picture on the left or right of the screen. When participants demonstrated competence of the task in training, they moved on to complete the experimental task. The experimental stimuli consisted of pre-recorded presentations of the ambiguous stimulus words/noun phrases without contextual sentences, in order to ensure that prosodic stress assignment was the only cue participants could use to accurately determine which picture to choose. The prosody of each stimulus presentation clearly stressed one syllable over the other, making the intended meaning of the target word clear, meaning that the participant’s choice between the two pictures was determined exclusively by attending to the lexical stress pattern of the presented target word. We reminded participants to listen carefully to how the words were spoken and choose the picture accordingly. Corrective feedback was given during training, but not during the experimental task.
Results
All participants were at least 80% accurate on the foil items. We excluded all equal-stress items from the final analysis, since they were not accurate representations of lexical stress in American English, but rather served as a prosodic foil. Mean accuracy levels of experimental trials in percent correct for both stress patterns can be found in Table 3. Both groups’ accuracy levels for first- and last-syllable stress items were significantly above the 50% chance accuracy level (t (15) = 8.1, p < .001 for first syllable samples in the HFA group, t (15) = 2.2, p = .045 for second syllable samples in the HFA group, t (14) = 5.5, p < .001 for first syllable samples in the TD group, t (14) = 2.9, p < .01 for second syllable samples in the TD group based on Paired Sample t-tests). To compare accuracy for each stress condition we conducted a 2 (group) × 2 (stress) ANOVA with percent correct as the dependent variable and found a main effect for stress F (1,29) = 21.4, p < .001, partial η2 = .43 (data were normally distributed). Pairwise comparisons showed that, as predicted, both participant groups were more accurate on the first-syllable stress patterns than the last-syllable stress items (t (15) = 3.6, p = .003, Cohen’s d = 1.2 for the HFA group and t (14) = 3.1, p = .008, Cohen’s d = .8 for the TD group, based on Paired Samples t-tests). There was no significant main effect for group (F (1, 29) = 2.0, p = .17, partial η2 = .07), or group by stress interaction (F (1, 29) = 2.16, p = .15, partial η2 = .07).
Table 3.
HFA (n=16) M(SD) | TD (n=15) M(SD) | |
---|---|---|
First-syllable stress | 79.8(14.77) | 69.1(13.4) |
Last-syllable stress | 59.9(18.2) | 58.8(12.0) |
Discussion
There were no significant differences in performance between the two participant groups, indicating that the children and adolescents with HFA had the same competence level for this task as their TD peers. The lack of group differences in this task support our hypothesis that participants with HFA would be able to use their relatively preserved language skills, as evidence by normal-level verbal IQ and receptive vocabulary scores, to tap into the grammatical nature of these stimuli and successfully use lexical prosody to disambiguate compound nouns (e.g. greenhouse) from noun phrases (e.g. green house). The results also show that both participant groups were significantly better at categorizing first-syllable stressed items, which is the predominant stress pattern for nouns in American English, rather than the rarely used final-syllable stress pattern (Arciuli & Cupples, 2006, Jusczyk, Cutler, & Redanz, 1993).
An alternative explanation for the equal performance level across the HFA and TD participant groups is that the TD group’s accuracy levels were surprisingly low. This binary forced-choice task of basic lexical stress patterns should not have presented a great obstacle for TD children and adolescents, especially since the mean age for both groups (12:7 for the TD group and 12:4 for the HFA group) was above the age of lexical prosody perception competence established by Cruttenden (1974) and Atkinson-King (1973). However, the age range of our participants was relatively large (7:6-17 in the HFA group, 7:6-18 in the TD group), including children who were well below the age of competence for this type of task. The relatively small size of our groups and specific age distributions did now allow for statistical comparison of age-based subgroups, but it is possible that the younger members of the TD group account for the surprisingly low accuracy levels in this task. An older group of TD individuals (age 12:8-32 years) who participated in the study by Plesa Skwerer et al. (2007) using the same stimuli achieved accuracy scores of 95% on first syllable stress items and 80% on second syllable stress items, indicating that age might have been the biggest contributing factor to the poor accuracy results of our TD group. Further investigations using this type of methodology are required to definitively ascertain whether the low accuracy scores among the TD group were a function of age distribution.
Our results agree with prior data showing that individuals with ASD have preserved competence recalling stressed words (Fine et al. 1991) and are able to disassociate nouns from homophone verbs, which are distinguished only through first- or second-syllable lexical stress (REcall vs. reCALL; Paul, Augustyn, et al., 2005). Overall, both groups are equally able to distinguish the two types of lexical stress, and, as expected, both groups are more accurate at recognizing first-syllable stress patterns, which is the prevalent and primary stress pattern of American English.
Experiment 3: Production of Lexical Prosody
The task used in this experiment was designed to expand on past research of prosody production in individuals with ASD by using objective acoustic measures of lexical stress (pitch, intensity, and duration) rather than a subjective rating. By using these measures, we hoped to discern whether the often perceived inappropriate prosody of individuals with autism (Baltaxe, 1984; Baltaxe & Guthrie, 1987; Fine et al., 1991; Shriberg et al., 2001; Paul, Shriberg, et al., 2005) could be captured in a more objective manner for elicited lexical stress items. The task was designed to test the ability of children and adolescents with HFA to produce lexical stress to disambiguate compound nouns from homophone noun phrases, drawing on the same set of stimuli used in Experiment 2.
Method
Participants
Participants were the same groups of children and adolescents as in the previous tasks.
Materials
We created 22 two-sentence narratives to elicit the two possible meanings of 11 ambiguous word pairs. Each contextual framework was composed of a short (5-9 word) sentence to determine the context, followed by a longer (11-14 word) sentence that concluded with a picture illustrating the target word, paired with the target word in writing, e.g. “Kate calls Tom on his cell phone. When Tom doesn’t answer, Kate wishes he would (pick up),” paired with an illustration of Tom picking up the phone (See full list of stimuli in Appendix C). We maintained the same sentence-final position for all stimuli to maintain consistency across target word context. A sentence-final position also ensured that participants obtained all contextual information relevant for defining the target’s stress assignment prior to having to produce it, which made the elicited word/noun phrase more naturally embedded within the sentence. We also created contextual sentences for three foil compound nouns with first syllable stress (e.g. “DOORmat” ) and two noun phrases with second syllable stress (e.g. “small FISH” ). The ambiguous words and foils were a subset of the ones used in experiment 2, with the addition of two last-syllable stress foils which were used to elicit this less common version of lexical stress. The pictures used to elicit the target words were either taken with a digital camera or obtained through the Internet and were intended to look realistic and minimize apparent similarities between the tasks. Several versions of each contextual framework were recorded by a native English speaker. Two independent judges listened to the sentences and selected the most natural sounding example of each to be used in the task. Two different pseudo-randomized and counterbalanced sequences of the sentences were created and randomized across groups.
Procedure
Participants heard each of the contextual sentences via standard PC speakers attached to a portable CD player. The pictures were presented to the participants in a notebook that was placed in front of them. Before beginning the task all participants were told that they were going to hear a story about Tom and Kate and their summer vacation and that they were to say the missing word(s), which were illustrated and written in the notebook in front of them. They were then shown a picture of a practice word (snowman) and asked to say it into the microphone to ensure that they could be recorded clearly. The participants heard the 27 sentences (11 ambiguous word sentences with first-syllable stress, 11 ambiguous word sentences with last-syllable stress, and 5 foil sentences) with a pause after each sentence to allow them to say the missing word and turn to the page showing the next picture and word.
Participants completed the receptive (Experiment 2) and expressive tasks in two separate testing sessions that were divided by approximately 1.5 hours of other, non-related, activities, with Experiment 2 always preceding Experiment 3.
Results
To determine the accuracy of differential lexical stress production for each participant, we conducted analyses of mean pitch and intensity using PRAAT software to determine whether there were quantifiable group differences in the manner in which stress was assigned and produced. We also measured the whole word duration of every utterance using PRAAT. We chose this measure because longer whole word duration is indicative of a last-syllable stress pattern whereas shorter duration is indicative of a first-syllable stress pattern (Smith & Robb, 2005; Wang, Kent, Duffy, & Thomas, 2005), thereby providing us with a good measure of whether participants were able to produce the targeted stress patterns accurately. We established reliability for the coding of whole word duration by having 15% of the productions rated by a second, independent coder. The whole word duration determined by the second coder was then subtracted from the original measurement of the child’s production and the mean absolute difference between the two measurements was calculated. Reliability was defined as a mean absolute difference in length of less than 5ms. The mean absolute inter-coder difference in whole-word duration for 15% of the total sample was 4ms, thereby meeting criterion for reliability.
The sound quality of recordings for five participants with HFA and six TD participants contained too much static to reliably analyze for pitch and intensity, so we conducted a Oneway ANOVA (data were normally distributed) for mean pitch and intensity on the productions of eleven participants with HFA and nine TD controls. There were no significant group differences for intensity on first-syllable stress (F (1,18) = 2.56, p = .13), or second-syllable stress items (F (1,18) = 2.44, p = .14) and no significant group differences for pitch on first-syllable stress (F (1,18) = .54, p = .51), or second-syllable stress items (F (1,18) = .27, p = .61) showing that the mean intensity and pitch levels of the first- and second-syllable stressed productions were comparable across groups. However, we need to point out that these results are based on a very small subset of our overall data and that technical difficulties with our digital recordings may have influenced the ability of PRAAT to extract accurate pitch and intensity values from these productions.
Mean word length results for each group are shown in Table 4. We conducted a 2 (group) × 2 (stress) ANOVA (data were normally distributed) with mean word duration as the dependent variable. The analysis shows a significant main effect for stress F (1, 29) = 33.04, p < .001, partial η2 = .53 with the first-syllable stress words produced significantly shorter than the last-syllable stress items in both groups (t (15) = 3.5, p = .003, Cohen’s d = −.9 for the HFA group and t (14) = 6.0, p < .001, Cohen’s d = −.8 for the TD group, based on Paired Samples t-tests). There was also a significant group effect, F (1, 29) = 5.62, p = .024, partial η2 = .16 with participants in the HFA group having longer overall productions than participants in the control group (F (1,29) = 5.1, p = .032, Cohen’s d = .82 for first-syllable stress items and F (1,29) = 4.4, p = .044, Cohen’s d = .75 for last-syllable stress items, based on a Oneway ANOVA). This difference in overall utterance length is apparent despite the fact that the HFA group followed the correct pattern of American English by producing shorter first-syllable stressed words and longer last-syllable stressed items. There was no stress by group interaction (F (1, 29) = .072, p = .79, partial η2 = .002).
Table 4.
HFA (n=16) M(SD) | TD (n=15) M(SD) | |
---|---|---|
First-syllable stress | .82(15) | .68(.19) |
Last-syllable stress | .98(.19) | .83(.21) |
Discussion
The lexical prosody production task revealed two interesting results, both confirming and not confirming our hypotheses. Contrary to our expectation, children and adolescents with HFA were able to appropriately disambiguate word pairs through differentiated production of first-syllable and last-syllable stress patterns. However, as predicted, all of their productions were measured to be significantly different (i.e. longer) than those of their TD peers, despite showing no significant group differences in pitch or intensity allocation for the two different syllable stress elicitations.
Listening to individual productions by adolescents with HFA we noticed that participants often produced exaggerated pauses between the syllables, especially for second-syllable stress items. The labored and slow enunciations of the HFA group stood in stark contrast to the recordings of the control group, who produced effortless enunciations that subjectively appeared briefer, less labored, and more fluid in their transitions between syllables. It is possible that the participants with HFA were able to use their largely preserved grammatical skills as a compensatory strategy in this task, creating the grammatical distinction between these two stimulus types (compound noun “PICKup” vs. noun-phrase “pick UP” ) through purposeful pausing between the nouns of the two-word phrase, rather than through the production of the corresponding prosodic contours. It is important to point out, however, that the HFA group’s enunciations were longer than those of the TD group for both first- and second-syllable stress items. This indicates that the abnormal prosodic lengthening we recorded can’t be explained solely by the HFA group’s attempt to emphasize the grammatical class of two-word phrases through abnormally long pauses between the nouns, but must also reflect a more generalized expressive prosody deficit, specifically in the rhythm/timing domain. Overall, these results show quantifiable differences in prosody productions between participants with HFA and their TD peers and confirm that whole-word duration is an appropriate measure for capturing this group difference, since syllable-duration would not have been sensitive to a pause between syllables. This quantifiable duration difference may help explain the prosody production deficits found in previous studies (Shriberg et al., 2001, Paul et al., 2005, McCann et al., 2007, Peppé et al., 2007).
The technically accurate production of lexical stress in the HFA group may be explained by the narrowly defined task presented in this study, which only asked for the production of two specific stress patterns in two-syllable word pairs. Wang, Lee, Sigman, and Dipretto (2007) showed that overtly asking participants with ASD to attend to specific aspects of prosody enhanced their performance on a neurophysiology level. This indicates that task instructions can have a significant effect on the performance of this group on tasks of prosody. It is possible that our very narrowly defined task design and specific instructions helped participants with HFA produce accurate lexical stress distinctions. It is also possible that the competence level shown by the children and adolescents with HFA in the lexical stress perception task (Experiment 2) informed their productions of lexical stress for the same or at least similar stimuli.
It is important to reiterate, however, that despite being able to produce two appropriately differentiated stress patterns, the prosody productions of the children and adolescents with HFA were nevertheless acoustically, and thereby objectively, different from the productions of their typically developing peers. This duration difference was consistent across first syllable and last syllable stressed items and amounted to roughly 100 milliseconds. Although this difference may appear small, it is significant in the context of mean word productions being only 750 milliseconds long for first syllable stressed words and 900 milliseconds for second syllable stressed items. The HFA group lengthened these productions by 13% and 11% respectively, constituting a significant change in production and resulting in a significant group difference.
There is evidence to suggest a developmental component to sentence duration in the productions of very young children, with preschoolers producing slower speech that may also be more varied in speed, than school-aged children. However, this variability in articulatory lengthening appears to stabilize and developmentally plateau in children over the age of six, (Smith & Zelaznik, 2004, Kent & Forner, 1980), thereby making it unlikely that the production lengthening of the HFA group was caused by this aspect of developmental speech rate.
The interesting finding in Experiment 3 is that the participants with HFA were accurate in their productions of lexical stress, but still quantifiably different from their TD peers. These results are supported by evidence from other analyses of non-verbal language use in ASD. Grossman et al. (2008) showed that adolescents with ASD were able to produce prosodic and facial expression modulations that effectively communicated affect, but were still judged to be awkward in their execution. Similarly, in the present study, participants with HFA successfully distinguished between items requiring first syllable and second syllable stress, but both objective acoustic analyses and informal subjective listening were able to perceive the overall lengthening and atypical mid-word pausing of the productions as a distinct group difference. Further studies using more refined acoustic measures, as well as subjective coding of naturalness of production are required to better distinguish between the ability to produce a distinction between different verbal constructs through the accurate use of lexical stress, and the ability to produce overall prosody that is perceived and measured to be within the normal range of expectation for natural conversation.
General Discussion and Conclusion
The aim of our study was to enhance the current knowledge base of prosody competence in high-functioning, linguistically able individuals with autism by investigating the perception and production of lexical stress and the ability to categorize affective prosody in sentence-length stimuli. Our findings indicate that children and adolescents with HFA were equal to their TD peers in the ability to perceive affective prosody in sentences filtered to eliminate semantic content. However, we found surprisingly poor results in the TD group for this task that may explain the lack of a significant group difference. Both participant groups showed the same pattern of being more successful at categorizing sentences containing sad or happy affect than neutral sentences, indicating the their approaches to affective prosody interpretation at the sentence level are similar. The relatively simplistic nature of the task (forced choice, three options only) and the selection of two basic emotions (happy and sad) as stimuli cannot be overlooked. The results clearly indicate that within the confines of such a task and using only basic emotions, adolescents with HFA are not significantly different in their accuracy and error types from their TD peers, but we cannot say that this ability would necessarily translate to more complex emotions or a more open-ended task. Nevertheless, the filtered affect task (Experiment #1) does show that individuals with HFA are able to use sentence-level prosodic cues in isolation, without any semantic content to determine basic emotions of a speaker. These data clearly speak for the fact that individuals with HFA have at least basic affective prosody interpretation abilities.
The target group was also equal to their typically developing peers in the ability to perceptually disambiguate compound nouns from noun-phrases using only prosodic cues, suggesting a preserved ability to decode grammatical prosody in children and adolescents with HFA, again within the confines of a binary forced choice task. Our results also suggest that children and adolescents with HFA were able to accurately produce stress patterns to disambiguate compound nouns from noun phrases in the context of a sentence completion task, despite abnormally long word productions. These findings on lexical stress suggest that children and adolescents with HFA can use the grammatical and prosodic rules underlying lexical stress to differentiate between compound nouns and noun-phrases as well as their TD peers. The only group difference in this set of experiments was found in the significantly longer whole-word duration of lexical stress productions by the HFA group in Experiment 3. It is important to emphasize that their productions accurately differentiated the two stress patterns, showing a basic awareness of lexical stress rules and their expressive applications. The distinction between the two groups is therefore not tied to an inability of the participants with HFA to perform the lexical prosody production task, but rather to their unique speech patterns captured by quantifiable differences in acoustic measurements of length and a subjectively noticeable awkwardness. Overall, the participants with HFA demonstrated the same levels of competence for all three tasks in both modalities as their TD peers, and were different only in the overall length of their prosodic stress productions.
Although some studies have found impaired perception of prosodic stress in individuals with ASD, Shriberg et al. (2001) point out that most of these deficits were reported on the level of full sentences, such as the placement of stress on inappropriate words within a phrase or sentence, rather than the use of lexical stress within single words. This might explain why the participants with HFA in our study were able to correctly differentiate the meanings of two-syllable words and were also able to produce the appropriate first- and last-syllable stress patterns. Our data appear to be consistent with other studies that found deficits in stress production in participants with HFA when compared to their TD peers despite being able to correctly differentiate stress patterns perceptually (Foreman, 2002, Paul, Augustyn et al., 2005). McCann et al. (2007) described atypical prosody productions in a group of individuals with ASD that could not be explained by a developmental delay, but rather appeared to be symptomatic of a disorder encompassing prosodic skill. This interpretation captures the prosodic productions we recorded in Experiment 3 well. The children and adolescents with HFA did not show deficits, or immaturity, in producing differential stress patterns, but rather an overall atypical pattern of two-syllable word productions that was significantly different from the utterances of their TD peers.
In all three experiments participants had to rely on prosodic information to perform the task accurately. The filtered sentence affect task excluded all semantic information, leaving only prosody as the determining factor. The lexical stress ambiguous word pairs in the receptive task were differentiated exclusively through the perception of prosodic emphasis, which determined their grammatical class as a compound noun or noun-phrase. Accuracy in the lexical stress production task was achieved by correctly assigning prosodic stress to the first- or second syllable of the target, based on the interpretation of contextual cues. It has previously been suggested that individuals with ASD rely overly on verbal content and ignore prosodic cues (Lindner & Rosén, 2006). Our data show that children and adolescents with high functioning autism are able to apply basic knowledge of lexical and affective prosody when it is the main focus of their attention. These results are based on a relatively small sample size and further work comparing different methodological approaches to prosody perception, such as contrasting elicited vs. spontaneous productions, varying the specificity of task instructions, and using more refined objective measures of prosody production would be helpful in determining the underlying prosodic abilities of individuals with autism.
Acknowledgements
The authors wish to thank Chris Connolly, Karen Condouris, Danielle Delosh, and especially Margaret Kjelgaard, for their assistance in stimulus creation, task administration, and data analysis. We also thank the children and families who gave their time to participate in this study.
Funding was provided by NAAR, NIDCD (U19 DC03610; Tager-Flusberg, PI), which is part of the NICHD/NIDCD Collaborative Programs of Excellence in Autism, and by grant M01-RR00533 from the General Clinical Research Ctr. program of the National Center for Research Resources, National Institutes of Health.
Appendix A
List of stimulus sentences for affective discrimination task
Happy
When Mike pets the puppy, it’s wagging its tail.
When the kids invite Mike, he’s excited.
Whenever Sue calls Mike, he’s glad.
When Sue hugs the cat, it’s purring.
When Sue baby sits the kids, they’re cheering.
Whenever the teacher praises Sue, she’s thrilled.
Sad
When Mike hits the puppy, it’s whining.
When Mike sees his mother, she’s crying.
When nobody visits Sue, she’s lonely.
When Sue leaves her car, it’s howling.
When the kids tease Sue, she’s upset.
If Mike doesn’t call Sue, she’s unhappy.
Neutral
When Mike pets the puppy, it’s sleeping.
When Mike calls his mother, she’s in the kitchen.
When Mike buys the candy, it’s chocolate.
Whenever Sue draws a cat, it’s grey.
When Sue leaves the class, it’s noisy.
When Sue closes the store, it’s dark.
Appendix B
Word list for the Lexical Ambiguity Reception Task
Ambiguous stimuli | Foils |
---|---|
Blackboard | Bluebird |
Bulls-eye | Bookcase |
Greenhouse | Doormat |
Highlight | Driveway |
Holdup | Headphones |
Hotdog | Mailbox |
Makeup | Soft drink |
Pickup | Top shelf |
Takeoff | Tree house |
Top hat | T-shirt |
Wetsuit | Woodpile |
Appendix C
List of Sentences to elicit utterances for lexical ambiguity production task Words in parentheses are the correct utterance to be produced.
Tom helps David build a birdhouse. David tells him: “I need three nails and a (black board)”
Tom and Kate visit an old schoolhouse. In the school there are desks and a (blackboard) Tom and Kate go to a farm. One cow comes so close to them, they can see right into the (bull’s eye)
Kate is learning to use a bow and arrow. When she takes lessons, the teacher tells her to aim for the (bulls-eye)
Tom has sand on his shoes. When he comes home Kate tells him to wipe his feet on the (doormat)
Tom likes learning about plants, so when Kate goes out shopping he decides to visit a (greenhouse)
Tom and Kate will go visit their friend David. David lives on top of a hill in a (green house)
Kate wants to mark her guidebook. Tom gives her a marker and tells her: “Use it to (highlight)”
Kate is walking down Main Street. She looks up at the tall streetlight and says: “That is a (high light)”
David wants to meet Kate on a beach. So that Kate will find him he writes her name on a sign to (hold up)
Tom and Kate go to a movie. In the movie a man says: “Stop where you are. This is a (holdup)”
Kate takes her dog for a walk, so when she comes back he is a (hot dog)
Tom and Kate go to a restaurant. Kate is really hungry, so she orders a hamburger and a (hotdog)
Kate has written some postcards for her friends. She puts a stamp on each card and takes them to a (mailbox)
Kate is going to a movie. Before she leaves she puts on her ring and her (makeup)
Tom is angry that Kate broke his souvenir. Later Kate tells Tom that she’s sorry and they (make up)
Kate calls Tom on his cell phone. When Tom doesn’t answer, Kate wishes he would (pick up)
Tom and Kate rent a truck. While they are driving, Tom says: “Wow, I like this (pickup)
Kate goes to the aquarium. She says to Tom: ” Look at this (small fish)
Tom wants to send a gift to his mother. He wraps the gift in paper and puts it in a (square box)
Tom and Kate board the airplane. The pilot says: “Fasten your seatbelts, it’s time for (takeoff)
Because it is hot outside, Tom is glad he wore a jacket that he can (take off)
Kate wants to buy something for Tom. She sees a pile of things on a high shelf and tells the salesperson she wants the (top hat)
Tom and Kate go to the opera. Tom decides that he will wear a suit and a (tophat)
Tom buys an ice cream cone. On the way home he drops his ice cream on his (t-shirt)
Tom decides he wants to go diving. So he puts on his diving mask and his (wetsuit)
Tom didn’t bring his umbrella. When it starts raining, he comes home with a (wet suit)
Bibliography
- Arciuli J, Cupples L. The processing of lexical stress during visual word recognition: Typicality effects and orthographic correlates. The Quarterly Journal of Experimental Psychology. 2006;59(5):920–948. doi: 10.1080/02724980443000782. [DOI] [PubMed] [Google Scholar]
- Atkinson-King . Children’s acquisition of phonological stress contrasts. UCLA; 1973. Unpublished doctoral dissertation. [Google Scholar]
- Bachorowski J-A, Owren MJ. Vocal expression of emotion: Acoustic properties of speech are associated with emotional intensity and context. Psychological Science. 1995;6:219–224. [Google Scholar]
- Baltaxe CAM. Use of contrastive stress in normal, aphasic, and autistic children. Journal of Speech and Hearing Research. 1984;27:97–105. doi: 10.1044/jshr.2701.97. RC423 .A1 F58 1. [DOI] [PubMed] [Google Scholar]
- Baltaxe CAM, Guthrie D. The use of primary sentence stress by normal, aphasic and autistic children. Journal of Autism and Developmental Disorders. 1987;17:255–271. doi: 10.1007/BF01495060. [DOI] [PubMed] [Google Scholar]
- Baltaxe C, Simmons J. Prosodic development in normal and autistic children. In: Schopler E, Mesibov G, editors. Communication Problems in Autism. Plenum; New York: 1985. pp. 95–125. [Google Scholar]
- Banse R, Scherer KR. Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology. 1996;1996;170(3):614–636. doi: 10.1037//0022-3514.70.3.614. [DOI] [PubMed] [Google Scholar]
- Boersma P, Weenink D. PRAAT: Doing phonetics by computer (Version 4.3.14) 2005 [Google Scholar]
- Boucher J, Lewis V, Collis GM. Voice processing abilities in children with autism, children with specific language impairments, and young typically developing children. Journal of Child Psychology and Psychiatry. 2000;41:847–857. [PubMed] [Google Scholar]
- Cosmides L. Invariances in the acoustic expression of emotion during speech. Journal of Experimental Psychology: Human Perception and Performance. 1983;9:864–881. doi: 10.1037//0096-1523.9.6.864. [DOI] [PubMed] [Google Scholar]
- Cruttenden A. An experiment involving comprehension in children from 7-10. Journal of. of Child Language. 1974;1:221–31. [Google Scholar]
- Cutler A, Swinney D. Prosody and the development of comprehension. Journal of Child Language. 1987;14:145–167. doi: 10.1017/s0305000900012782. [DOI] [PubMed] [Google Scholar]
- Dunn LM, Dunn LM. Peabody picture vocabulary test revised. American Guidance Service; Circle Pines, MN: 1981. [Google Scholar]
- Fay W, Schuler AL. Emerging language in autistic children. University Park Press; Baltimore: 1980. [Google Scholar]
- Fine J, Bartolucci G, Ginsberg G, Szatmari P. The use of intonation to communicate in pervasive developmental disorders. Journal of Child Psychology and Psychiatry. 1991;32:771–782. doi: 10.1111/j.1469-7610.1991.tb01901.x. [DOI] [PubMed] [Google Scholar]
- Fosnot SM, Jun S. Prosodic characteristics in children with stuttering or autism during reading and imitation. Proceedings of the 14th International Congress of Phonetic Sciences. 1999:103–115. [Google Scholar]
- Gerken L, McGregor KK. An overview of prosody and its role in normal and disordered child language. American Journal of Speech-Language Pathology. 1998;7:38–48. [Google Scholar]
- Golan O, Baron-Cohen S, Hill JJ, Rutherford MD. The ‘Reading the Mind in the Voice’ Test — Revised: a study of complex emotion recognition in adults with and without autism spectrum conditions. Journal of Autism and Developmental Disorders. 2007;37(6):1096–1106. doi: 10.1007/s10803-006-0252-5. [DOI] [PubMed] [Google Scholar]
- Grossman RB, Edelson L, Rubinstein LB, Lomibao J, Borawski S, Tager-Flusberg H. Production of Emotional Prosody and Facial Expressions in Adolescents with Autism; Poster presentation at the International Meeting for Autism Research; London, England. 2008; May, 2008. [Google Scholar]
- Jusczyk PW, Cutler A, Redanz N. Infants’ preference for the predominant stress patterns of English words. Child Development. 1993;64:675–687. [PubMed] [Google Scholar]
- Kanner L. Autistic disturbances of affective contact. Nerv Child. 1943;2:217–50. Acta Paedopsychiatr. 1968;35(4):100–36. Reprint.
- Kaufman A, Kaufman N. Manual for the Kaufman Brief Intelligence Test. Second Edition American Guidance Service; Circle Pines, MN: 2004. [Google Scholar]
- Kent RD, Forner LL. Speech segment durations in sentence recitations by children and adults. Journal of Phonetics. 1980;8:157–168. [Google Scholar]
- Klatt DH. Linguistic uses of segmental duration in English: Acoustic and perceptual evidence. Journal of the Acoustical Society of America. 1976;59:1208–1221. doi: 10.1121/1.380986. [DOI] [PubMed] [Google Scholar]
- Lieberman P. Some acoustic correlates of word stress in American English. Journal of the Acoustical Society of Ameica. 1960;32:451–454. [Google Scholar]
- Lindner JL, Rosén LA. Decoding of Emotion through Facial Expression, Prosody and Verbal Content in Children and Adolescents with Asperger’s Syndrome. Journal of Autism and Developmental Disorders. 2006;36:769–777. doi: 10.1007/s10803-006-0105-2. [DOI] [PubMed] [Google Scholar]
- Lord C, Risi S, Lambrecht L, Cook EH, Jr, Leventhal BL, DiLavore PC, Pickles A, Rutter M. The Autism Diagnostic Schedule – Generic: A standard measures of social and communication deficits associated with the spectrum of autism. Journal of Autism and Developmental Disorders. 2000;30(3):205–223. [PubMed] [Google Scholar]
- Lord C, Rutter M, DiLavore PC, Risi S. Autism Diagnostic Observation Schedule – WPS (ADOS-WPS) Western Psychological Services; Los Angeles, CA: 1999. [Google Scholar]
- Lord C, Rutter M, Le Couteur A. Autism Diagnostic Interview-Revised: a revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders. Journal of Autism and Developmental Disorders. 1994;24(5):659–85. doi: 10.1007/BF02172145. [DOI] [PubMed] [Google Scholar]
- McCaleb P, Prizant B. Encoding of new versus old information by autistic children. Journal of Speech and Hearing Disorders. 1985;50:226–230. doi: 10.1044/jshd.5003.230. [DOI] [PubMed] [Google Scholar]
- McCann J, Peppé S, Gibbon FE, O’Hare A, Rutherford M. Prosody and its relationship to language in school-aged children with high-functioning autism. International Journal of Language and Communication Disorders. 2007;42(6):682–702. doi: 10.1080/13682820601170102. [DOI] [PubMed] [Google Scholar]
- Mehler J, Jusczyk PW, Lambertz G, Halsted N, Bertoncini J, Amiel-Tison C. A precursor of language acquisition in young infants. Cognition. 1988;29:144–178. doi: 10.1016/0010-0277(88)90035-2. [DOI] [PubMed] [Google Scholar]
- Merewether FC, Alpern M. The components and neuroanatomic bases of prosody. Journal of Communication Disorders. 1990;23(4-5):325–336. doi: 10.1016/0021-9924(90)90007-l. [DOI] [PubMed] [Google Scholar]
- Murray, Arnott Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. Journal of the Acoustical Society of America. 1993;93(2):1097–1108. doi: 10.1121/1.405558. [DOI] [PubMed] [Google Scholar]
- Peppé S, McCann J. Assessing intonation and prosody in children with atypical language development: the PEPS-C test and the revised version. Clinical Linguistics and Phonetics. 2003;17:345–354. doi: 10.1080/0269920031000079994. [DOI] [PubMed] [Google Scholar]
- Peppé S, McCann J, Gibbon F, O’Hare A, Rutherford M. Assessing prosodic and pragmatic ability in children with high-functioning autism. Journal of Pragmatics. 2006;38:1776–1792. [Google Scholar]
- Peppé S, McCann J, Gibbon F, O’Hare A, Rutherford M. Receptive and expressive prosodic ability in children with high-functioning autism. Journal of Speech, Language, and Hearing Research. 2007;50:1015. doi: 10.1044/1092-4388(2007/071). 1-28. [DOI] [PubMed] [Google Scholar]
- Paul R, Augustyn A, Klin A, Volkmar FR. Perception and production of prosody by speakers with autism spectrum disorders. Journal of Autism and Developmental Disorders. 2005;35:205–220. doi: 10.1007/s10803-004-1999-1. [DOI] [PubMed] [Google Scholar]
- Paul R, Shriberg LD, McSweeny J, Cicchetti D, Klin A, Volkmar F. Brief report: Relations between prosodic performance and communication and socialization ratings in high functioning speakers with autism spectrum disorders. Journal of Autism and Developmental Disorders. 2005;35(6):861–869. doi: 10.1007/s10803-005-0031-8. [DOI] [PubMed] [Google Scholar]
- Paul R, Bianchi N, Augustyn A, Klin A, Volkmar F. Production of syllable stress in speakers with autism spectrum disorders. Research in Autism Spectrum Disorders. 2007;2:110–124. doi: 10.1016/j.rasd.2007.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Plesa Skwerer D, Schofield C, Verbalis A, Faja S, Tager-Flusberg H. Receptive prosody in adolescents and adults with Williams Syndrome. Language and Cognitive Processes. 2007;22:247–271. [Google Scholar]
- Rapin I, Dunn M. Language disorders in children with autism. Seminars in Pediatric Neurology. 1997;4(2):86–92. doi: 10.1016/s1071-9091(97)80024-1. [DOI] [PubMed] [Google Scholar]
- Shriberg LD, Kwiatkowski J, Rasmussen C. The Prosody-Voice Screening Profile. Communication Skill Builders; Tucson, AZ: 1990. [Google Scholar]
- Shriberg LD, Paul R, McSweeny JL, Klin A, Cohen DJ, Volkmar FR. Speech and prosody characteristics of adolescents and adults with high-functioning autism and Asperger syndrome. Journal of Speech, Language, and Hearing Research. 2001;44:1097–1115. doi: 10.1044/1092-4388(2001/087). [DOI] [PubMed] [Google Scholar]
- Smith A, Robb M. Durational Characteristics of Newly-Learned Trochees and Iambs in Children with and without Speech Delays. Clinical Linguistics & Phonetics. 2005;19:1–14. doi: 10.1080/0269920042000193580. [DOI] [PubMed] [Google Scholar]
- Smith A, Zelaznik HN. Development of functional synergies for speech motor coordination in childhood and adolescence. Developmental Psychobiology. 2004;45:22–33. doi: 10.1002/dev.20009. [DOI] [PubMed] [Google Scholar]
- Sparrow S, Balla D, Cicchetti D. The Vineland Adaptive Behavior Sclaes (Survey Form) American Guidance Services; Circle Pinnes, MN: 1984. [Google Scholar]
- Vogel I, Raimy E. The acquisition of compound vs. phrasal stress: The role of prosodic constituents. Journal of Child Language. 2002;29:225–250. doi: 10.1017/s0305000902005020. [DOI] [PubMed] [Google Scholar]
- Wang AT, Lee SS, Sigman M, Dapretto M. Reading affect in the face and voice. Archives of General Psychiatry. 2007;46:698–708. doi: 10.1001/archpsyc.64.6.698. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang Y-T, Kent RD, Duffy JR, Thomas JE. Dysarthria in traumatic brain injury: A breath group and intonational analysis. Folio Phoniatrica et Logopaedica. 2005;57(2):59–89. doi: 10.1159/000083569. [DOI] [PubMed] [Google Scholar]