Abstract
Purpose
Accessing auditory and written material simultaneously benefits people with aphasia; however, the extent of benefit as well as people's preferences and experiences may vary given different auditory presentation rates. This study's purpose was to determine how 3 text-to-speech rates affect comprehension when adults with aphasia access newspaper articles through combined modalities. Secondary aims included exploring time spent reviewing written texts after speech output cessation, rate preference, preference consistency, and participant rationales for preferences.
Method
Twenty-five adults with aphasia read and listened to passages presented at slow (113 words per minute [wpm]), medium (154 wpm), and fast (200 wpm) rates. Participants answered comprehension questions, selected most and least preferred rates following the 1st and 3rd experimental sessions and after receiving performance feedback, and explained rate preferences and reading and listening strategies.
Results
Comprehension accuracy did not vary significantly across presentation rates, but reviewing time after cessation of auditory content did. Visual data inspection revealed that, in particular, participants with substantial extra reviewing time took longer given fast than medium or slow presentation. Regardless of exposure amount or receipt of performance feedback, participants most preferred the medium rate and least preferred the fast rate; rationales centered on reading and listening synchronization, benefits to comprehension, and perceived normality of speaking rate.
Conclusion
As a group, people with aphasia most preferred and were most efficient given a text-to-speech rate around 150 wpm when processing dual modality content; individual differences existed, however, and mandate attention to personal preferences and processing strengths.
Many individuals with aphasia experience chronic auditory and written processing impairments negatively affecting quality of life. A promising method of minimizing such challenges involves simultaneous presentation of auditory and written content through text-to-speech (TTS) technology (Knollman-Porter et al., 2019; Wallace, Knollman-Porter, Brown, & Hux, 2019). Using TTS systems with computer-generated speech output allows adults with aphasia to access digital texts associated with functional activities of interest such as e-mails, newspaper articles, recipes, and newsletters. Some people regularly use this technology to perform desired activities independently despite persistent aphasia-based reading challenges (Knollman-Porter, Wallace, Hux, Brown, & Long, 2015). TTS systems also have the advantage of affording methods of modifying multiple presentation features (e.g., voice selection, speech rate, highlighting) to match individual preferences and needs; however, researchers have yet to explore systematically the effects on comprehension of varying the majority of these features. Given the potential advantages of TTS systems for people with aphasia and the ever-increasing presence of technology as a means of accessing content in today's society, understanding the comprehension benefits associated with modifying technology-based presentation features is an important clinical concern. However, the extent to which varying TTS characteristics improves or hinders comprehension provides only a portion of the information clinicians need about using TTS systems as supports. Two other critical issues are (a) understanding individual preferences regarding system characteristics and settings and (b) reading and listening behaviors people with aphasia report using given dual modality content.
Rate of Speech Effects on Auditory Comprehension
Speech presentation rate is one modifiable TTS feature that warrants investigation. For people who lack proficient comprehension—such as second language learners and people with aphasia—slowing speech rate can provide a substantial advantage (King & East, 2011). This is particularly important when only one opportunity exists for hearing an utterance, as is typical during conversational interactions. To facilitate comprehension and reduce the need to repeat utterances, a common clinical practice is to recommend speakers slow their speech rate when interacting with a person with aphasia (American Speech-Language-Hearing Association, n.d.; National Aphasia Association, n.d.).
Evidence dating back to the 1970s and 1980s supports the practice of using a slow speaking rate when interacting with people with aphasia, especially when relatively severe comprehension problems exist. For example, Cermak and Moreines (1976) found that people with aphasia identified words more accurately when stimulus list presentation slowed from one word per second to one word every 3 s. At the sentence level, Blumstein, Katz, Goodglass, Shrier, and Dworetsky (1985) documented significantly improved comprehension by people with Wernicke's aphasia when listening to utterances read with pauses at salient syntactic boundaries (e.g., between noun and verb phrases) rather than ones read at “a ‘normal’ speaking rate” (p. 253) of 180 words per minute (wpm). Similarly, at the paragraph level, Pashek and Brookshire (1982) reported better paragraph comprehension by people with aphasia when presentation rate was 120 wpm rather than 150 wpm. Finally, Nicholas and Brookshire (1986) found that people with relatively poor comprehension after acquiring aphasia benefitted significantly from slowed presentation rate when processing both main ideas and details, although repeated exposure to the speech negated this effect.
Dual Modality Processing Behaviors and Speed
Having access to written texts at the same time as hearing the same content may mediate some aphasia-related auditory comprehension problems (Knollman-Porter et al., 2019; Wallace et al., 2019). However, people with aphasia typically exhibit reading and auditory comprehension problems that stem from word decoding, semantic recognition or association, and syntactic processing challenges (Cherney, 2004; DeDe, 2013; Leff & Behrmann, 2008). In combination, these reading difficulties contribute to impaired comprehension and slow reading speed (Webster, Morris, Howard, & Garraffa, 2018). The result is that reliance solely on reading is not feasible for many people with aphasia. Even among people with aphasia who do not report persistent reading challenges, differences from neurotypical adults regarding comprehension accuracy and reading speed are common (Webster et al., 2018). People with aphasia report that reading takes considerably longer after aphasia onset than it did previously, and they may abandon attempts to read lengthy texts because of the time and effort required (Knollman-Porter et al., 2015).
A relation may exist between the time a person spends reading content and his or her comprehension of the material; spending substantial time processing written content may promote better comprehension than spending little time. However, at least for adults with traumatic brain injury or with concomitant aphasia and cognitive impairment, increased reading rate prompted via TTS presentation does not necessarily result in decreased comprehension accuracy (Harvey, Hux, Scott, & Snell, 2013; Harvey, Hux, & Snell, 2013). Insufficient research exists about this topic to draw firm conclusions, however, and that which does exist has included small samples or single cases of people with acquired brain injury from etiologies other than stroke; generalization to people with aphasia may not be appropriate.
Evidence about benefits afforded from simultaneous access to auditory and written content exists in extant research. Specifically, researchers have shown that people with aphasia process short newspaper articles faster and with comparable comprehension given both modalities rather than either modality in isolation (Knollman-Porter et al., 2019). In studies addressing this topic, researchers often select default settings for speech output that yield rates between 135 and 155 wpm (Harvey & Hux, 2015; Knollman-Porter et al., 2019; Wallace et al., 2019). This rate is slower than normal speaking rates occurring in conversational interactions (i.e., 210 wpm or between 250 and 260 syllables per minute; Robb, MacLagan, & Chen, 2004; Tauroza & Allison, 1990) but is consistent with the average rate recommended for auditory book presentation (Williams, 1998). Because of the consistent selection of this rate in extant research, aphasiologists do not know how changing it may affect comprehension when content appears simultaneously in written and auditory modalities. If people with aphasia can manage TTS content presented at faster rates without compromising comprehension, they may increase their efficiency when processing written content.
Another important consideration regarding dual modality processing is a possible discrepancy between the rates at which a person performs best when reading versus listening. Simultaneous auditory and written content presentation through a TTS system creates a situation in which speech output rate may be substantially faster or slower than a person's habitual reading speed or the speed he or she believes will maximize comprehension. Such a discrepancy may prompt a focus on one modality more than another. However, researchers have yet to evaluate the effects such a scenario may have on processing behaviors, even though this knowledge would help in developing strategies to maximize comprehension by people with aphasia.
Strategy Use, Behaviors, and Preference
Understanding how adults with aphasia interact with technology tools can inform development, personalization, and implementation of supports designed to aid comprehension. Individual preferences and recognition of benefits gained can play critical roles in the acceptance of technological supports or application of strategies to enhance processing. As is true for other populations and other assistive technology or augmentative and alternative communication devices (Fager, Hux, Beukelman, & Karantounis, 2006; Pampoulou, 2019), exploring the perceptions of people with aphasia regarding comprehension accuracy and the benefit accrued through specific supports is vital to facilitating acceptance and consistent use.
Soliciting self-evaluation and providing performance feedback when a person does or does not implement a support or strategy can help practitioners understand perceptions about its potential value. Specifically, relations between comprehension accuracy and a person's preferred method of accessing content warrant consideration. Importantly, however, Riensche, Wohlert, and Porch (1983) found that listening rate preferences of people with aphasia did not match the rate maximizing comprehension. This may reflect a discrepancy between a person's perceptions and actual performance accuracy on comprehension tasks. Identifying and discussing such discrepancies through personalized feedback may promote strategy selection and modify opinions about the value of certain supports. Also, because people with aphasia exhibit variable performance across time regardless of impairment chronicity (McNeil & Pratt, 2001), soliciting opinions and collecting data on multiple occasions are critical.
Current Study
Our primary purpose for the current study was to examine the accuracy and preferences of adults with chronic aphasia when simultaneously reading and listening to content presented with a computer-generated voice at three preselected speech rates. As secondary aims, we investigated the time participants reviewed written content following termination of speech output, correlations among aphasia characteristics and dependent variables, participants' rate preferences, consistency of preferences given repeated exposure to experimental stimuli, and rationales provided for preferences and processing behaviors. To ensure applicability to materials people with aphasia may desire to access independently, we used short newspaper articles as experimental stimuli. The following questions served as the basis for investigation:
How does the comprehension accuracy of adults with aphasia vary when processing written newspaper articles presented simultaneously with a single auditory rendition of content in the form of computer-generated speech playing at three preselected rates?
How does the length of time adults with aphasia spend reviewing written renditions of passages after termination of computer-generated speech output vary given three preselected presentation rates?
How do different aphasia types (i.e., fluent and nonfluent) and severities (i.e., mild, moderate, and severe) affect comprehension accuracy and time spent reviewing written content after termination of computer-generated speech output given three preselected presentation rates?
What correlations exist among standardized test scores and the dependent variables of comprehension accuracy and time spent reviewing passages after termination of computer-generated speech output?
What are the presentation rate preferences of adults with aphasia when processing newspaper articles simultaneously through two modalities, and do these preferences change given repeated exposure or after receiving personalized performance feedback?
What rationales do adults with aphasia provide to justify their rate preferences and joint versus isolated reading and listening behaviors when processing written newspaper articles presented simultaneously with computer-generated speech output?
Method
Participants
Participants included 25 adults—16 men and nine women—who were between 10 and 253 months postonset of aphasia (M = 107.40, Mdn = 93, SD = 71.80). Participant E acquired aphasia from nonprogressive encephalopathy, and all others acquired aphasia from strokes. Participants ranged in age from 34 to 78 years (M = 59.80, Mdn = 60, SD = 11.23) and had between 12 and 19 years of education (M = 15.08, Mdn = 16, SD = 2.29). All but one (i.e., F) were right-hand dominant prior to acquiring aphasia. All spoke American English as their primary language. Demographic information about each participant appears in Table 1.
Table 1.
Demographic data for individual participants.
| Participant | Gender | Race | Age (years) | Education (years) | Months postonset | Months since assessment | Currently receiving SLP services | Living status | Employment status |
|---|---|---|---|---|---|---|---|---|---|
| A | Female | White | 60 | 15 | 11 | 3.90 | No | With spouse | Retired |
| B | Female | White | 73 | 16 | 227 | 5.48 | No | With spouse | Retired |
| C | Female | White | 60 | 15 | 11 | 6.23 | No | With spouse | Retired |
| D | Female | White | 73 | 16 | 227 | 5.68 | No | With spouse | Retired |
| E | Female | White | 63 | 14 | 228 | 6.52 | No | With spouse | Retired |
| F | Male | White | 49 | 16 | 124 | 0.06 | No | With spouse | Disability |
| G | Female | White | 73 | 12 | 83 | 6.06 | No | With spouse | Retired |
| H | Male | White | 74 | 12 | 75 | 4.87 | No | With spouse | Retired |
| I | Male | White | 69 | 18 | 28 | 5.13 | No | With spouse | Retired |
| J | Female | White | 34 | 18 | 40 | 5.13 | No | With spouse | Disability |
| K | Female | White | 52 | 16 | 126 | 4.68 | No | With children | Disability |
| L | Male | African American | 56 | 14 | 81 | 5.32 | No | With parent | Disability |
| M | Male | White | 47 | 12 | 108 | 7.94 | No | With parent | Part-time |
| N | Female | African American | 59 | 12 | 73 | 3.39 | No | With spouse | Retired |
| O | Male | White | 70 | 18 | 253 | 3.48 | No | Independent | Retired |
| P | Male | White | 63 | 16 | 10 | 4.58 | Yes | With spouse | Retired |
| Q | Male | White | 48 | 16 | 93 | 4.52 | Yes | With parent | Retired |
| R | Male | White | 46 | 16 | 185 | 3.87 | Yes | Independent | Part-time |
| S | Male | White | 60 | 12 | 183 | 5.19 | Yes | Independent | Part-time |
| T | Male | African American | 78 | 18 | 68 | 5.48 | Yes | With spouse | Retired |
| U | Male | White | 71 | 16 | 196 | 4.74 | No | With spouse | Retired |
| V | Male | White | 55 | 12 | 156 | 2.74 | Yes | With family | Volunteer |
| W | Male | White | 62 | 16 | 44 | 5.90 | Yes | Independent | Part-time |
| X | Male | White | 73 | 15 | 38 | 5.68 | No | With spouse | Retired |
| Y | Male | White | 57 | 16 | 100 | 4.71 | Yes | With spouse | Retired |
Note. SLP = speech-language pathology.
We accessed participants' assessment results from recent speech-language evaluations to determine their language, reading, and cognitive abilities at the time of study completion, or when these results were not available, we administered standardized batteries and subtests. The elapsed time since test administration for each participant appears in Table 1. Test results did not serve as inclusionary or exclusionary criteria for study participation. To provide language information, participants completed the Aphasia Quotient portion of the Western Aphasia Battery–Revised (WAB-R; Kertesz, 2006); the Comprehension of Spoken Sentences, Spoken Paragraphs, and Written Sentences subtests of the Comprehensive Aphasia Test (Swinburn, Porter, & Howard, 2004); and the Paragraph Factual subtest of the Reading Comprehension Battery for Aphasia–Second Edition (LaPointe & Horner, 1998). To provide cognitive information, participants completed the Cognitive Linguistic Quick Test + (CLQT+; Helm-Estabrooks, 2017). Assessment results per participant appear in Table 2.
Table 2.
Participant standardized test scores.
| Participant | Aphasia type as indicated by WAB-R scores | WAB-R |
CLQT+ domains |
CAT subtests |
RCBA-2 subtest |
||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Aphasia Quotient (100) | Aphasia Severity | Attention (215) | Memory (185) | Executive Function (40) | Language (37) | Visuospatial (105) | Composite Severity rating | Spoken Sentences (32) | Spoken Paragraphs (4) | Written Sentences (32) | Paragraph Factual (10) | ||
| A | Anomic | 93.1 | Mild | 166 | 169 | 13 | 31 | 13 | Mild | 22 | 4 | 26 | 10 |
| B | Broca's | 27 | Severe | 136 | 66 | 8 | 1 | 66 | Moderate | 21 | 2 | 22 | 9 |
| C | Conduction | 72.3 | Moderate | 197 | 144 | 25 | 26 | 93 | WNL | 25 | 4 | 20 | 9 |
| D | Conduction | 69.6 | Moderate | 200 | 144 | 32 | 26 | 102 | WNL | 28 | 4 | 26 | 9 |
| E | Transcortical sensory | 67.1 | Moderate | 156 | 106 | 13 | 12 | 72 | Moderate | 22 | 3 | 24 | 7 |
| F | Anomic | 75.2 | Moderate | 149 | 126 | 22 | 22 | 80 | Mild | 23 | 2 | 22 | 10 |
| G | Conduction | 66.2 | Moderate | 63 | 95 | 20 | 19 | 53 | Moderate | 29 | 3 | 23 | 9 |
| H | Anomic | 92.2 | Mild | 203 | 160 | 27 | 27 | 99 | WNL | 23 | 4 | 22 | 10 |
| I | Broca's | 46.94 | Severe | 182 | 108 | 24 | 12 | 95 | Mild | 20 | 1 | 22 | 7 |
| J | Broca's | 67.8 | Moderate | 192 | 146 | 24 | 23 | 94 | Mild | 25 | 3 | 28 | 10 |
| K | Wernicke's | 64.2 | Moderate | 188 | 126 | 25 | 23 | 91 | Mild | 19 | 2 | 20 | 7 |
| L | Broca's | 64.6 | Moderate | 176 | 162 | 22 | 27 | 86 | Mild | 16 | 4 | 24 | 10 |
| M | Anomic | 81.3 | Mild | 176 | 145 | 19 | 27 | 79 | WNL | 26 | 4 | 26 | 9 |
| N | Conduction | 80.6 | Mild | 201 | 158 | 31 | 29.5 | 99 | WNL | 18 | 3 | 24 | 10 |
| O | Broca's | 48.3 | Severe | 184 | 88 | 19 | 11 | 92 | Moderate | 16 | 2 | 14 | 9 |
| P | Broca's | 59.7 | Moderate | 147 | 82 | 17 | 16 | 67 | Moderate | 16 | 3 | 10 | 4 |
| Q | Broca's | 61.5 | Moderate | 192 | 91 | 26 | 15 | 94 | Mild | 15 | 2 | 13 | 8 |
| R | Conduction | 60.7 | Moderate | 124 | 57 | 13 | 13.5 | 51 | Moderate | 15 | 2 | 15 | 5 |
| S | Anomic | 97.2 | No aphasia | 176 | 127 | 21 | 23 | 83 | Mild | 26 | 3 | 18 | 4 |
| T | Conduction | 45 | Severe | 155 | 90 | 22 | 16 | 67 | Moderate | 10 | 3 | 9 | 7 |
| U | Anomic | 97 | No aphasia | 192 | 156 | 33 | 33 | 93 | WNL | 29 | 3 | 29 | 10 |
| V | Broca's | 52.6 | Moderate | 73 | 53 | 20 | 7 | 60 | Moderate | 28 | 4 | 24 | 9 |
| W | Broca's | 34.9 | Severe | 164 | 77 | 16 | 7 | 76 | Moderate | 16 | 2 | 12 | 3 |
| X | Anomic | 96.8 | No aphasia | 95 | 162 | 29 | 29 | 75 | Mild | 24 | 4 | 24 | 10 |
| Y | Broca's | 15.6 | Very severe | 172 | 72 | 17 | 2 | 83 | Moderate | 23 | 4 | 19 | 9 |
Note. WAB-R = Western Aphasia Battery–Revised; CLQT+ = Cognitive Linguistic Quick Test +; CAT = Comprehensive Aphasia Test; RCBA-2 = Reading Comprehension Battery for Aphasia–Second Edition; WNL = within normal limits.
WAB-R Aphasia Quotient (AQ) scores ranged from 15.60 to 97.20 (M = 65.50, SD = 21.70). In accordance with cutoff scores suggested in the WAB-R manual, one participant met the criterion for very severe aphasia (i.e., an AQ score of ≤ 25), five met the criterion for severe aphasia (i.e., AQ scores between 26 and 50), 12 met the criterion for moderate aphasia (i.e., AQ scores between 51 and 75), and seven met the criterion for mild aphasia (i.e., AQ scores between 76 and 93.8; Kertesz, 2006); the remaining three participants achieved AQ scores above the cutoff for an aphasia diagnosis per WAB-R standards but had current clinical diagnoses of aphasia from certified speech-language pathologists and displayed behaviors consistent with aphasia on other assessments measures (see Table 2). Further examination of standardized test results revealed that 10 participants had Broca's aphasia (i.e., nonfluent aphasia), seven had anomic aphasia (i.e., fluent aphasia), six had conduction aphasia (i.e., fluent aphasia), and one each had transcortical sensory and Wernicke's aphasia (i.e., fluent aphasia).
Participants completed screenings to ensure adequate hearing and vision to perform the experimental tasks. The hearing screening consisted of perceiving 40-dB tones presented via headphones at 1000, 2000, and 4000 Hz in at least one ear. Two participants (i.e., R and V) wore hearing aids and, as such, did not perform the hearing screening. Both had received audiology services within the past year and performed all experimental activities with hearing aids in place. For the visual acuity screening, participants had to identify accurately their first name each of 10 times it appeared in sets of five names printed in 20-point, Times New Roman black font on a laptop computer set at 100% enlargement.
Materials
Study materials included written and auditory presentations of edited newspaper passages, researcher-generated comprehension questions, a laptop computer, E-Prime 3.0 stimulus presentation software, a digital video camera to capture participant behaviors and comments during data collection, and materials for condition preference tasks.
Experimental Passages and Recordings
Experimental passages included excerpts from 39 newspaper articles (i.e., 36 experimental passages and three practice passages). We collected articles from U.S. newspapers accessible via online sources. All articles (a) were not local to the region in which participants resided, (b) did not report national news, (c) did not report general knowledge information, (d) only used acronyms assumed to be familiar to all participants (e.g., United States = U.S.), and (e) contained no more than two quotes. Although we intended to perform data analyses using all 36 experimental passages, an undetected recording error forced elimination of one passage and the associated data at a single presentation rate. All subsequent descriptions and analyses relate to the retained passages.
We edited stimulus articles to make them between 184 and 220 words in length (M = 205.86, SD = 11.75) and to have Flesch–Kincaid grade-level equivalencies between 9.0 and 11.0 (M = 10.22, SD = 1.69; Flesch, 1948). We selected these parameters to provide consistency across passages and to preserve ecological validity regarding reading difficulty; specifically, the edited passages had comparable Flesch–Kincaid grade levels to the unedited articles (range: 7.5–12.6, M = 9.95, SD = 1.28, t = 0.114), although, as expected, they had significantly fewer words (range: 205–797, M = 426.78, SD = 136.03, t < 0.001). Editing to meet length and grade-level criteria involved only the removal of complete sentences; we did not change sentence order or modify or delete single words or phrases from retained sentences. Edited passages ranged from eight to 13 sentences in length (M = 10.28, SD = 1.11). As a final step, we separated each passage into three logical paragraphs.
Digital versions of written texts for experimental and practice passages appeared in black, 24-point, Times New Roman font centered on a white background. We also made digital recordings of each passage using the computer-generated David voice available via PC platforms. We selected the David voice based on past research, indicating that people with aphasia comprehended it with relative ease and preferred it to another commonly used computer-generated voice (Hux, Knollman-Porter, Brown, & Wallace, 2017). This was important because extant research shows that computer-generated speech is more cognitively demanding and difficult for people both with and without aphasia to understand than natural speech (Hux et al., 2017; Koul, 2003; Koul & Dembowski, 2010). We used computer-generated speech because it is consistent with that available in TTS applications. We audio-recorded and saved as separate files each passage at three preselected rates: slow (i.e., Level 4 on a Dell XPS 15 computer, M = 113.43 wpm, SD = 5.53, range: 101.94–121.78 wpm), medium (i.e., Level 10, M = 153.70 wpm, SD = 9.50, range: 137.05–171.04 wpm), and fast (i.e., Level 12, M = 199.98 wpm, SD = 10.35, range: 184.76–216.77 wpm). Labeling rates as slow, medium, and fast was for reference purposes only; the rates are not necessarily ones that people without language impairments would label in this manner. However, the medium rate was the computer default rate and midpoint of available rates. The fast and slow rates each differed by approximately 40 wpm from the default and differed perceptibly from it, yet neither was excessively extreme in our opinion.
Comprehension Questions
We generated six comprehension questions related to the gist and details of each passage. All questions appeared as incomplete sentences, with the final word or phrase omitted and four multiple choice options listed for selection of the appropriate completion. An example passage—one not actually used in the study secondary to copyright issues—appears in the Appendix. A digital version of each comprehension question appeared in isolation in black, 24-point, Times New Roman font on a white background.
E-Prime Stimulus Presentation
We used 17-in. touchscreen PC laptops to present experimental passages to participants. To provide simultaneous auditory and written presentation of passages in different orders and different rate conditions across participants, we created nine stimulus sets using E-Prime software. Each stimulus set contained four passages in each of three rate conditions (i.e., 12 passages per set). The creation of nine stimulus sets ensured that passages heard by some participants at one rate would be heard by other participants at a different rate. Participants performed the experimental task with a different E-Prime stimulus set during each of three sessions, thus allowing for exposure to all stimulus passages across the course of the study.
Preference Selection Materials
We used the practice passages to prompt participants about preferences relating to presentation conditions. A set of PowerPoint slides, including one example of each computer-generated speech rate as well as the written words slow, medium, and fast, served to facilitate participants' identification of condition preference.
Procedure
We obtained approval from institutional review boards at both universities at which data collection occurred prior to recruiting participants or collecting data. Whenever possible, we accessed participants' standardized test results available from previous studies or clinical records rather than requiring repeat testing with a given assessment tool; the use of existing assessment findings was appropriate because all participants exhibited chronic aphasia. If assessment results were more than 1 year old or were not available, we administered the WAB-R during the first session and additional testing, as needed, during the first or subsequent sessions. We audio- and video-recorded all experimental sessions for later analysis purposes.
Passage Exposure, Comprehension Questions, and Processing Behaviors
We read aloud written instructions appearing on the laptop monitor prior to beginning each experimental session. Next, a practice passage matching the subsequent experimental condition appeared on the monitor. Practice passages served as opportunities for participants to adjust the volume of computer-generated speech output, rehearse using the touchscreen monitor to answer comprehension questions, and verify comprehension of the experimental task. After completing the practice passage and questions, participants used a Continue icon to advance the stimulus presentation program to the first experimental passage. The written text appeared on the screen with the associated auditory recording beginning 1 s later. For review purposes, the written text remained on the screen for as long as desired, but the auditory recording played only one time. When participants finished reviewing a passage, they selected a Done icon to advance the program to the first comprehension question. We read each question and its corresponding response options aloud unless a participant expressed a desire to read questions independently. Participants selected a response by touching one of four response options appearing on the computer screen; they could change their selection before progressing to the next question but not later than this. We advanced the screen once a participant indicated final response selection. Participants could not refer back to previous questions or the passage when performing the comprehension task. The procedures repeated for four passages in one condition (i.e., slow, medium, or fast auditory presentation rate) before progressing to the second and then third condition in the predetermined order for that session. Participants took breaks as desired.
Participants reported their perceptions and any strategies utilized during task performance following completion of each experimental session. We prompted them with questions about reading and listening simultaneously, only reading passages, or only listening to passages.
Participant Preference
Participants selected their most and least preferred auditory presentation rates following the first experimental session and after having completed three experimental sessions. Participants communicated their preferences either through verbalization or via a pointing response. For pointing responses, participants either pointed to a PowerPoint slide or to a written choice option used when reviewing the three conditions. We verified responses by repeating them aloud and requesting confirmation of our understanding. Participants provided verbal rationales for selections when possible given expressive language limitations. Finally, we gave participants feedback about their comprehension accuracy and average reviewing time for each of the three auditory presentation rates and again asked them to select their most and least preferred rates. Along with participant comments, this provided data about whether knowledge about accuracy and reviewing time influenced rate preferences.
Data Analysis
The E-Prime software recorded participants' responses to comprehension questions and total time spent reviewing each passage. For comprehension accuracy, we computed the percentage of correct responses to comprehension questions in each rate condition. For reviewing time, our interest was in the length of time participants spent reviewing passages after termination of the computer-generated speech output; this allowed us to examine whether auditory presentation rate affected the time people devoted to processing content through the written modality. We termed this variable extra reviewing time to distinguish it from the total time passages appeared on the computer monitor. To compute extra reviewing time, we subtracted the duration of passage playback from a participant's total reviewing time for each passage. Visual inspection of the graphed data allowed for identification of any performance patterns emerging with respect to the three rate conditions. Repeated-measures analysis of variance (ANOVA) computation for each variable provided information about significant differences among the slow, medium, and fast presentation rates. Because computation of Mauchly's test revealed violation of sphericity for the extra reviewing time variable, we applied the Greenhouse–Geisser correction to the ANOVA. We computed generalized eta squared (η2 G) as a measure of effect size, with values of .02 corresponding with small, .13 corresponding with medium, and .26 corresponding with large effects (Bakeman, 2005). Finally, we performed post hoc analyses, as appropriate, using the Holm correction to control for familywise error.
We divided participants into subgroups based on their WAB-R Aphasia Quotient results indicating fluent or nonfluent aphasia and mild, moderate, or severe aphasia. Subsequent computation of Mann–Whitney U or Kruskal–Wallis statistics, as appropriate, allowed determination of comprehension accuracy or extra reviewing time differences among subgroups with respect to each presentation rate condition. Because we performed three analyses for each dependent variable to correspond with the three rate conditions, we applied a Bonferroni correction and used a p value of .0167. We also computed Pearson product–moment correlations between the accuracy and extra reviewing time variables and scores participants earned on various standardized tests to explore possible influences on performance behaviors and rate preferences. We examined scatter plots to determine patterns evident between variables.
We reported condition preference data descriptively by tallying participants who selected each auditory presentation rate as their most and least preferred at the end of the first experimental session, after having completed three experimental sessions, and after receipt of personal accuracy and total reviewing time feedback. Collection of data at three time points allowed determination of whether repeated exposure to computer-generated speech and/or the receipt of performance feedback affected rate of speech preference. Analysis included examination of the number of people who changed their preference rating over time and/or after receiving performance feedback. We also examined the frequency with which matches occurred between preference and actual performance accuracy.
We transcribed verbatim all participant explanations and verbalizations about presentation rate preferences; transcripts included notations about salient gestures and facial expressions made by participants to convey their intents. Review of the written transcripts yielded information about reasons participants did or did not like specific presentation rates and explanations of strategies participants used to facilitate experimental task performance.
Results
Comprehension Accuracy
Participants achieved the highest average comprehension accuracy score in the slow condition (M = 75.06, SD = 14.98, range: 37.50–97.22) followed by the medium condition (M = 73.70, SD = 16.44, range: 37.50–97.22) and fast condition (M = 71.28, SD = 13.85, range: 41.67–94.44; see Figure 1). Computation of a repeated-measures ANOVA revealed no significant accuracy difference across conditions, F(2, 48) = 2.798, p = .071.
Figure 1.
Percentage of correct responses given slow, medium, and fast presentation rates. Letters reference participant identification codes.
Separation into subgroups resulted in 15 participants with fluent aphasia and 10 participants with nonfluent aphasia as well as seven participants with mild aphasia, 12 with moderate aphasia, and six with severe or very severe aphasia. Computation of Mann–Whitney U statistics revealed no significant accuracy difference between participants with fluent versus nonfluent aphasia regardless of speech presentation rate (slow: W = 112.5, p = .40; medium: W = 92.0, p = .36; fast: W = 90.0, p = .42). In contrast, computation of Kruskal–Wallis statistics revealed significant comprehension differences among people with differing aphasia severities for all rate conditions (slow: χ 2 = 10.01, p = .007; medium: χ 2 = 9.57, p = .008; fast: χ 2 = 8.85, p = .012). In all cases, participants with severe aphasia earned the lowest average accuracy score, and those with mild aphasia earned the highest average accuracy score.
Extra Reviewing Time
Participants took the most time reviewing passages after termination of auditory output in the fast condition (M = 41.67, SD = 43.54, range: 4.34–141.27), followed by the slow condition (M = 35.36, SD = 36.63, range: 4.06–142.23) and medium condition (M = 34.86, SD = 39.10, range: –0.30 to 126.76; see Figure 2). Computation of a repeated-measures ANOVA with Greenhouse–Geisser correction for sphericity violation revealed a significant difference across rate conditions, F(2, 48) = 6.349, p = .0099; however, the result yielded a generalized effect size value not meeting the criterion for a small effect size, η2 G = .006. Pairwise post hoc comparisons with Holm p-value adjustment revealed significant differences between the medium and fast conditions, p = .0013, but not between the slow and fast conditions, p = .0650, or the slow and medium conditions, p = .7801.
Figure 2.
Length of reviewing time extending beyond termination of auditory output given slow, medium, and fast presentation rates. Letters reference participant identification codes.
Computation of Mann–Whitney statistics revealed no significant difference in extra reviewing time between participants with fluent versus nonfluent aphasia regardless of speech presentation rate (slow: W = 72.0, p = .89; medium: W = 69.0, p = .76; fast: W = 69.0, p = .76). Similarly, computation of Kruskal–Wallis statistics revealed no significant extra reviewing time difference among people with differing severities of aphasia regardless of rate conditions (slow: χ 2 = 6.39, p = .04; medium: χ 2 = 5.69, p = .06; fast: χ 2 = 5.07, p = .08).
Visual inspection of Figure 2 revealed three participant subsets regarding reviewing time behaviors. Subset 1 included eight participants (i.e., F, G, L, P, R, S, U, and V) who advanced the stimulus presentation program to the first comprehension question within an average of 10 s of cessation of auditory content regardless of presentation rate. In Figure 2, letters representing Subset 1 participants appear clustered near or below the 25th quartile. Subset 2 included 10 participants (i.e., C, D, I, J, K, M, O, Q, T, and Y) who fell between the 25th and 75th quartile regarding reviewing time, taking between 10 and 50 s on average to review the written material before progressing to comprehension questions. The maximum average reviewing time difference across rate conditions never exceeded 20 s for Subset 2 participants (M = 10.02, SD = 5.75, range: 6.09–19.06). The remaining seven participants (i.e., A, B, E, H, N, W, and X) comprised Subset 3. These participants took substantial time to review passages after cessation of auditory output and prior to advancing to comprehension questions regardless of speech presentation rate; in all but two instances, reviewing time exceeded an average of 50 s regardless of rate condition (exceptions: E and N). The maximum average reviewing time difference across rate conditions for Subset 3 ranged from 11.10 to 45.65 s, yielding a mean reviewing time difference across conditions (M = 22.70, SD = 14.13) that was more than twice that of Subset 2 participants. Five of the seven people in Subset 3 took longest when reviewing passages in the fast condition (exception: E and W); three took the least amount of time in the slow condition, and four took the least time in the medium condition.
Condition Preferences
Participants ranked speech rate conditions according to preference following completion of the first experimental session (i.e., Postsession 1), after having had three sessions of exposure to the auditory output (i.e., Postsession 3), and after receiving feedback about comprehension accuracy and reviewing time at the end the final session (i.e., postfeedback).
Table 3 shows the number of participants who selected each rate as their most and least preferred at each evaluation time. Participants as a group indicated greatest preference for the medium rate (n = 16/25, 64%) and least preference for the fast rate (n = 3/25, 12%) after the first experimental session. Preference for the medium rate persisted following the third session (n = 15/25, 60%) and after receipt of performance feedback (n = 11/25, 44%), although less consensus occurred at this time point. Likewise, lack of preference for the fast rate persisted across the three evaluation times; however, the number of people least preferring the slow rate increased such that it nearly equaled that of the fast rate following the third experimental session (n = 11/25, 44%, and n = 12/25, 48%, respectively) and following receipt of performance feedback (n = 10/25, 40%, and n = 13/25, 52%, respectively).
Table 3.
Participant's most and least preferred rates of computer-generated speech output.
| Variable | Most preferred rate |
Least preferred rate |
||||
|---|---|---|---|---|---|---|
| Slow | Medium | Fast | Slow | Medium | Fast | |
| Postsession 1 choice | 6 | 16 | 3 | 4 | 6 | 15 |
| Postsession 3 choice | 7 | 15 | 3 | 11 | 2 | 12 |
| Postfeedback choice | 8 | 11 | 6 | 10 | 2 | 13 |
Note. Bold font indicates rate preferred by the greatest number of participants at each reporting time.
Inherent differences in duration of computer-generated speech output secondary to rate condition influenced total reviewing time such that the shortest time was the fast condition for all but two cases (i.e., Participants A and B were fastest when reviewing passages presented at the medium rate). Because the fast condition was not necessarily the one for which greatest accuracy occurred, participants often had to decide which data were more important when making a final rate preference decision. Only participants K and O were both most accurate and fastest at reviewing passages in the same rate condition (i.e., the fast condition); both participants selected the fast rate condition as their most preferred. Eleven other participants (i.e., B, C, E, G, H, J, L, M, V, X, and Y) preferred the rate at which they performed with greatest accuracy after receiving performance feedback, and five other participants (i.e., A, I, T, U, and X) selected their most preferred rate for which they had the shortest total reviewing time. Participants D, F, N, P, Q, R, S, and W preferred a rate for which they performed neither with greatest accuracy nor with shortest reviewing time.
Examination of the number of participants who changed their selection after three exposures to experimental stimuli was 15 (i.e., 60%) for most preferred rate and 14 (i.e., 56%) for least preferred rate. Eleven of the 15 changing their most preferred rate selected a slower rate after repeated exposures, and four selected a faster rate. The number of participants shifting their preference after receiving feedback was substantially smaller; only six participants (i.e., 24%) altered their most preferred selection, and three participants (12%) altered their least preferred selection from that chosen just prior to feedback. Two of the six participants shifting their most preferred rate after feedback selected a slower rate, whereas four selected a faster rate.
Participant Rationales Regarding Rate Preferences
Participants provided verbal rationales, when possible given communication limitations, for their rate preference selections. Some participants were better able than others to verbalize a rationale for preferring one rate over another, but all provided some form of verbal or nonverbal communication to justify their preference decision. Of the 25 participants, only B made no verbal comments other than a sound effect indicating a high rate of speed; all other participants produced at least one verbalization with words to convey a rationale. Figure 3 provides a graphic summary of positive and negative opinions about each rate condition.
Figure 3.
Graphic depiction of participant opinions about presentation rate conditions.
Slow Presentation Rate
Differing opinions emerged about whether the slow presentation rate was beneficial or detrimental. Six participants made exclusively positive comments about the slow condition, eight made exclusively negative comments, seven made a combination of positive and negative comments, and four did not comment about the slow rate.
Positive comments centered on the benefit derived from having sufficient time to think and process details (N: Sometimes for the details…I had more time to get it.; J: You have the chance to put it in your mind.… You can kind of think about it again.… It lets you think of certain words or think of the story.) and being able to read and listen simultaneously (V: You can read and listen at the same time to help understand.). In contrast, negative comments were primarily about the slow speech output being boring (O: Come on, hurry, hurry.; S: Too slow. Chugging along.; A: It was boring to go that slow.), that it was slower than a preferred reading rate (Q: Too slow. Reading faster.), and that it allowed for easy distraction (W: Hard to pay attention…. Too slow that I just thought of other things.).
Some comments from participants expressing a mixture of positive and negative views about the slow presentation rate were reflective of changing opinions with repeated exposure. For example, after the first experimental session, Participant N initially expressed a negative opinion (On the slow one, I was getting far away from them [i.e., the spoken words].). However, after additional exposures, he had a more positive opinion, reporting, “[The slow rate was] better than last time. I could slow myself to that [speed] and still understand and learn more.” Other comments expressed dissatisfaction with the voice output speed but expressed recognition that this rate helped promote comprehension with a single presentation of the auditory content (A: It was kinda slow…[but] you got more out of it without having to go back through.). Still, other participants acknowledged the ease of simultaneously reading and listening given the slow presentation rate (U: I was able to keep up…. [The slow rate provided] comfort and comprehension.) but also thought they would struggle to maintain attention (U: There was room for me to become lax…. [It was] hardest for me to stay on task.).
Medium Presentation Rate
Greater consensus appeared among participant comments about the medium presentation rate. Fifteen participants expressed exclusively positive opinions, two expressed exclusively negative opinions, and two expressed mixed opinions about the medium presentation rate; four participants made no comments about the medium presentation rate.
Positive comments referenced the compromise between fast and slow rates (W: Fast, but not too fast.; S: Perfect. Not too fast, [and] not too slow.), the fact that it matched a preferred reading rate and allowed for simultaneous reading and listening (T: I could follow at the same time as reading.; Y: Right speed to read…. [I] read at the same time that the voice was playing.; A: It went right according to what I could keep up with.), that it promoted comprehension (A: I got the most out of it…. I wasn't stressed to get the materials without having to go over it a couple times.; F: It was understood.; N: I felt that I could understand.), and that the voice output speed sounded normal (U: Normal, like everyday life.). Negative comments reflected the rate being too fast for some participants (J: Too fast.; P: Little bit fast…. Still too fast.; V: Sometimes it's too fast.).
Fast Presentation Rate
Only two participants (i.e., 8%) made exclusively positive comments about the fast presentation rate. A larger number (i.e., n = 8/25, 32%) provided a mixture of positive and negative comments, and the majority (i.e., 12/25, 48%) made negative comments; three participants did not make either positive or negative comments about the fast rate condition.
Participant O reported liking the fast presentation rate because he was a fast reader prior to acquiring aphasia and wanted to return to that speed (I used to read fast…. I want to be fast.). Other comments stemmed from the desire for a challenge (T: I like it to be a challenge. I like it hard.), although Participant U appeared wary of the difficulty associated with processing language at the fast presentation rate (There was no room for [distraction]. I had to stay on line…. It was harder [than the other rates] and made me work harder.). Some participants claimed a match between the fast presentation rate and their preferred reading rate (S: Perfect with looking.; W: [I] could read and listen at the same time.).
Some mixed positive and negative opinions reflected adaptation to the speed with repeated exposure (N: Before it was too fast, but I think over three times of doing this, I learned to pace myself.; X: I could keep up better than the last two days [i.e., sessions].). However, the vast majority of fast presentation rate comments were about the voice output being too rapid (V: I couldn't follow it a little bit.; P: Forget it. Too fast.; I: Wait a minute. Slow down.), not matching the reading rate (V: [It] didn't match [my reading rate] all the time.; X: I was reading it, and I wasn't quite following along with where they were. I wasn't reading fast enough.; E: I tried to listen to it because I can't read it fast enough.), and making comprehension difficult (U: I had to go back and get something that I missed. X: Harder to know what the story was about…. I needed to go back and re-read more than [at the] medium and slow [rates].; A: [I] usually had to read it an extra time.).
Correlations
Pearson product–moment computations revealed several significant correlations between the comprehension accuracy variable and scores on standardized tests or subtests. In particular, standardized measures reflective of the severity of language impairment (e.g., WAB-R Aphasia Quotient, Reading Comprehension Battery for Aphasia–Second Edition Paragraph Factual subtest, Comprehensive Aphasia Test Written Comprehension of Sentences subtest, CLQT+ Language Domain score) correlated significantly with comprehension accuracy. Also, the Memory Domain score of the CLQT+ correlated significantly with comprehension accuracy scores given the slow, medium, and fast presentation rates. This contrasted with the lack of significant correlations among comprehension accuracy and most nonlanguage variables such as the Attention Domain score, Visuospatial Domain score, and Nonlinguistic Cognition scores of the CLQT+. Table 4 presents the correlation results.
Table 4.
Correlations among comprehension accuracy and extra reviewing time variables and standardized test scores.
| Variable | WAB-R |
CLQT+ |
CAT |
RCBA-2 |
|||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Aphasia Quotient | Attention | Memory | Executive Functions | Language | Visuospatial Skills | Nonlinguistic Cognition | Linguistic/Aphasia | Auditory Comprehension of Sentence | Auditory Comprehension of Paragraphs | Written Comprehension of Sentences | Paragraph Factual | ||||
| Comprehension accuracy | |||||||||||||||
| Slow | .46* | .14 | .62*** | .38 | .54** | .02 | .30 | .55** | .37 | .48* | .56** | .62*** | |||
| Medium | .38 | .14 | .57** | .43* | .43* | .13 | .35 | .48* | .57** | .53** | .65*** | .64*** | |||
| Fast | .44* | .24 | .66*** | .47* | .54** | .13 | .40* | .55** | .37 | .49* | .58** | .63*** | |||
| Extra review time | |||||||||||||||
| Slow | –.07 | .01 | .06 | –.25 | –.14 | –.18 | –.13 | –.15 | –.16 | –.10 | –.03 | –.08 | |||
| Medium | –.03 | –.05 | .09 | –.19 | –.10 | –.17 | –.13 | –.12 | –.16 | –.04 | –.01 | .01 | |||
| Fast | –.02 | –.06 | .12 | –.20 | –.07 | –.22 | –.13 | –.08 | –.11 | .03 | .03 | .08 | |||
Note. WAB-R = Western Aphasia Battery–Revised; CLQT+ = Cognitive Linguistic Quick Test +; CAT = Comprehensive Aphasia Test; RCBA-2 = Reading Comprehension Battery for Aphasia–Second Edition.
p ≤ .05.
p ≤ .01.
p ≤ .001.
The finding of significant correlations among comprehension accuracy scores and several standardized test results contrasted with the finding of no significant correlations among extra reviewing time values and standardized measures (see Table 4). Examination of correlation scatter plots revealed an interesting phenomenon despite the lack of significant findings, however. Specifically, people either with mild or severe aphasia as evidenced by scores on measures of language functioning tended to take greater extra reviewing time than people with moderate aphasia. The short length of extra reviewing time taken by people in the middle of the severity continuum contrasted markedly with those at either end of the severity continuum and resulted in a curvilinear rather than a linear relation. Figure 4 provides a scatter plot of the length of extra reviewing time and the WAB-R Aphasia Quotient during the fast presentation condition as an example of this relation.
Figure 4.
Scatter plot of correlation between length of extra reviewing time and Western Aphasia Battery–Revised Aphasia Quotient during fast presentation condition. Letters reference participant identification codes.
Discussion
Our goal was to determine how varying the presentation rate of computer-generated speech affected comprehension by people with aphasia when accessing newspaper articles via combined auditory and written modalities (i.e., TTS systems). As a secondary aim, we examined whether different speech presentation rates affected the time spent reviewing written content after having heard the auditory output one time. We determined whether differences in aphasia type or severity affected performance and computed correlations among the dependent variables and standardized language and cognitive measures. Finally, we explored the presentation rate preferences of people with aphasia and analyzed whether they changed their presentation rate preference given repeated exposure to experimental stimuli or given performance feedback. In combination, the findings will help guide clinicians when assisting people with aphasia implement TTS technology as a comprehension support for functional activities.
Comprehension Accuracy
A significant accuracy difference across presentation rates did not occur; however, participants showed a trend toward greater accuracy with the slow and medium presentation rates compared to the fast rate. Performing statistical analyses with participants separated into fluent/nonfluent subgroups also revealed no significant accuracy differences across presentation rates. In contrast, separation into aphasia severity subgroups resulted in significant differences at all presentation rates—a result that is not surprising given that people with milder forms of aphasia typically perform language tasks better than those with more severe forms of aphasia.
The lack of a significant comprehension accuracy finding for the full group of study participants appears to contradict the long-standing assumption that people with aphasia comprehend better when presented with speech occurring at slow rates (American Speech-Language-Hearing Association, n.d.; Blumstein et al., 1985; Cermak & Moreines, 1976; National Aphasia Association, n.d.; Nicholas & Brookshire 1986; Pashek & Brookshire, 1982). However, the recommendation to slow speech rate refers to single modality processing of auditory content rather than dual modality processing of simultaneously presented auditory and written content. The notion that providing extra processing time benefits people who struggle with auditory comprehension does not appear to apply to dual modality processing.
Procedural differences from extant research about the benefits of slow content presentation may explain the lack of significant presentation rate differences. First, we used computer-generated speech as auditory stimuli rather than natural speech (Blumstein et al., 1985; Cermak & Moreines, 1976; Pashek & Brookshire, 1982); other aphasiologists studying speech rate effects on comprehension have used natural speech recordings produced at various preselected rates or altered by elongating vowels through digital manipulation or inserting pauses at salient points. Previous researchers have established that computer-generated speech is more difficult for people with and without disabilities to understand than natural speech (Hux et al., 2017; Koul, 2003; Koul & Dembowski, 2010). The added cognitive challenge associated with processing computer-generated speech could have outweighed any benefit provided by slowing the speech presentation rate.
A second procedural difference was our presentation of content through two rather than one modality and unlimited reviewing time of written content. This may have dampened any detrimental effects associated with a rapid presentation rate. Either factor—that is, combining modalities or allowing unlimited reviewing time—in isolation might improve comprehension accuracy regardless of the rate of speech presentation; both supports in combination make the lack of a significant finding across rate conditions not surprising, especially given that the trend was still for better performance with the slow and medium rates than the fast rate. From a clinical perspective, encouraging the provision of written material for unrestricted review as a supplement to auditory information appears appropriate, especially when the content is critical or presented at a fast rate. However, clinicians also need to consider the possibility that presenting content simultaneously through two modalities may tax the attentional resources of a person with aphasia. As such, individualization of support strategies is paramount.
Extra Reviewing Time
The length of time participants spent reviewing written content after the cessation of auditory output differed significantly across rate conditions, but no significant difference emerged when participants were split in accordance with aphasia type or severity. Hence, regardless of specific language characteristics, people with aphasia are likely to spend longer reviewing content when TTS presentation is fast rather than slow. Mediating the difference, however, was the finding of a minimal effect size between the fast and slow rates. This raises questions about clinical relevance. However, in all likelihood, the length of reviewing time following cessation of auditory output reflected the relative shortness and simplicity of the stimulus passages we presented (i.e., approximately 200 words on average and corresponding to Grade 9.0–11.0 reading level). Longer or more challenging passages would likely yield greater discrepancies in postauditory output reviewing time and a greater resultant effect size. This is important given that many speakers exceed 200 wpm (Tauroza & Allison, 1990)—that is, the average speech rate in the fast condition—and the difficulty of much adult reading material exceeds the eleventh grade level. Future researchers may wish to explore these possibilities.
We split the participant group in accordance with the length of extra reviewing time; some people engaged in minimal review (i.e., 10 s on average) following cessation of auditory output, some engaged in moderate review (i.e., less than 50 s), and some engaged in relatively extensive review (i.e., greater than 50 s). These subsets were not the same as subgroups formed when splitting participants by fluent/nonfluent characteristics or along a severity continuum. Indeed, computation of Mann–Whitney U or Kruskal–Wallis statistics confirmed that differences across fluent/nonfluent or mild/moderate/severe subgroups were not significant for the extra reviewing time variable. However, visual inspection of correlation analyses revealed that people with particularly severe or particularly mild aphasia were most likely to engage in substantial extra reviewing time. Those in the subset with extensive extra reviewing time included two of the three people with the lowest WAB-R Aphasia Quotient scores and three of the five people with the highest WAB-R Aphasia Quotient scores; only one person with moderate aphasia took greater than 50 s of extra reviewing time. Furthermore, those people engaging in substantial extra reviewing time varied the most across rate conditions. Taking these phenomenon into consideration, the findings suggest that having a fast speech presentation rate prompts some people with aphasia—especially those with particularly mild or particularly severe language impairments—to extend the length of time they review content through the alternate written modality.
A possible explanation for lengthy reviewing in the fast rate condition is that the auditory presentation was difficult to comprehend, and as a result, people increased reliance on processing the written content. This supposition is supported by comments participants made about the fast rate hampering comprehension. Another possibility is that the varying review times across the three subsets of study participants reflected the extent to which people felt competent processing written texts. Some people with aphasia report reading impairment of such severity that they avoid attempts to read independently (Knollman-Porter et al., 2015). Study participants with this level of impairment may have depended primarily on listening rather than reading to process experimental passages. They may have believed that reviewing the written content following auditory output termination would not benefit their comprehension and, hence, did not bother attempting to do so. Other participants with somewhat less severe reading challenges may have still relied predominantly on listening to process experimental passages but used the available extra viewing time to scan the written text for keywords. The reading required for this would have fulfilled more of a confirmatory than a reading comprehension purpose and may have varied little across presentation conditions. Still, other participants may have felt their reading capabilities were sufficient to assist with text comprehension. If the fast presentation rate negatively affected comprehension more than the slow or medium rates, people retaining this ability level may have relied differentially on reading for comprehension purposes across the three rate conditions. This would explain the varying review times evident. These contentions are purely speculative, however, and will require additional research—perhaps using eye tracking technology—to confirm or refute.
Rate Preference
As a group, study participants were consistent in most preferring the medium auditory presentation rate; least preference for the fast rate was also consistent, but with repeated exposure and/or feedback about performance, the slow rate condition was also unpopular. Important individual differences were evident regarding rate preferences, however. Specifically, two of the 25 participants were most accurate and fastest when reviewing passages in the fast condition and also consistently selected this condition as their most preferred. Because this selection appears appropriate and maximally efficient for these people, their preference should be honored despite its divergence from the group consensus. The same scenario happened regarding comprehension accuracy for one participant given the slow condition, although this was not the condition for which she spent the shortest time reviewing content. Situations such as these reinforce the importance of performing individualized assessments and tailoring support strategies to match personal preferences and abilities.
The majority of participants altered their choice of a most preferred condition over the course of three experimental sessions and receipt of performance feedback. Preference changes were more common following repeated exposure to experimental stimuli than they were in response to performance feedback. This finding reinforces the idea that multiple opportunities to become accustomed to particular patterns or peculiarities of speech production may diminish their salience and prompt preference changes. In combination with Riensche et al.'s (1983) report that listening rate preference does not consistently align with the rate at which a person with aphasia best comprehends, the finding also suggests a factor other than comprehension accuracy may be most salient when selecting a preferred rate. Comments made by current participants suggest a few possibilities about what this factor may be. First, people with aphasia may want speech rate to match roughly that heard in daily conversations, on auditory books, or on television or radio broadcasts; second, the salient factor may be the rate at which people with aphasia believe they can best maintain attention; third, the extent to which an auditory presentation rate matches a person's silent reading speed may be important; and fourth, a desire to use auditory presentation as a means of garnering only the content gist may prompt selection of a faster rate than if concerned about acquiring detailed understanding. Of course, multiple factors may influence rate preference, and no single factor may emerge as consistently most salient.
Most current study participants indicated following printed words to read along while hearing the auditory output of experimental passages. Several people commented about being unsuccessful in doing this given the fast and/or slow presentation rate. This is informative given that the medium presentation rate averaged 154 wpm—a rate consistent with recommendations for auditory books but slower than the 200-wpm rate typical of conversational speech or silent reading by adults (Dwyer & West, 1994; Lewandowski, Codding, Kleinmann, & Tucker, 2003; Robb et al., 2004; Tauroza & Allison, 1990; Williams, 1998). This provides important information about how people with aphasia read. Specifically, people with aphasia appear to read silently at a rate close to that of relatively slow spoken language—substantially slower than the silent reading rate of unimpaired adults. Such a rate, however, is comparable to the 144 wpm achieved by young adult readers when preparing for a recall task (Meyer & Rice, 1989)—an activity consistent with that mandated in the current study. Whether people with aphasia always read at this rate or alter their reading speed based on task demands remains unknown. We also do not know whether factors such as premorbid education or reading efficiency affect the desired or habitual reading rates of people with aphasia.
Limitations
Aspects of the procedures implemented for the current study present limitations to interpreting and generalizing the findings. Perhaps most important was our decision to allow participants to review written information after auditory output termination. Doing this meant either auditory presentation rate or multiple written content reviews could have influenced comprehension accuracy; hence, we cannot make firm conclusions about the effect of auditory presentation rate on comprehension accuracy. However, our procedural choices were strategic; we wanted to mimic the situation in which a person can hear spoken output only once—as is usual when people interact informally—but also wanted repeated access to written content because the relative permanency of textual material typically allows this. We considered alternate procedures such as allowing repeated access to auditory output, repeated access both to auditory and written content, or single access to content in both modalities. Regardless of choice, both positive and negative ramifications existed for the interpretation and applicability of findings. Future researchers may wish to investigate the effects of different accessing possibilities on comprehension accuracy, processing efficiency, and participant preference when manipulating auditory presentation speed as well as other aspects of experimental stimuli. Such information will prove helpful to practitioners when instructing people with aphasia to implement comprehension supports associated with TTS technology.
Another procedural limitation was that we modified newspaper articles for length and reading level before using them as experimental stimuli. We did this for experimental control and were careful to delete only full sentences from original articles; hence, the stimuli were short but in other ways roughly comparable to articles people might access in their daily lives. Still, the inclusion criteria imposed on length, reading level, quotations, abbreviations, and acronyms meant that the materials were representative of some, but not all, newspaper articles. Inclusion of written passages with other specifications might have influenced results.
Yet, another procedural limitation was that we asked participants to process the stimuli with sufficient attention to detail that they could respond to comprehension questions. This is not necessarily typical of the way adults read or listen to news stories, and considerable evidence suggests that people alter their behaviors when processing information for different purposes (Linderholm, Cong, & Zhao, 2008). Thus, the experimental task was somewhat artificial and limits the extent to which we can generalize the findings to other activities.
Future Directions
Having information presented simultaneously through two modalities allows people with aphasia the possibility of making maximum benefit of retained auditory and reading comprehension skills (Knollman-Porter et al., 2019; Wallace et al., 2019). Linguistic content that a person fails to comprehend through one modality may be processed more successfully through the second modality. Although the current study represents an attempt to examine some of the behaviors people with aphasia exhibit when processing dual modality content, we know little about factors that influence adoption of one comprehension support strategy over another.
An important question regarding the combined presentation of auditory and written content is the extent to which people process information simultaneously versus sequentially through the available modalities. The use of TTS systems creates a situation in which the rate of computer-generated speech may be substantially faster or slower than a person's preferred reading or listening speed or the speed he or she believes will maximize comprehension. Several possibilities exist in such a situation. For example, a person might choose exclusively to listen to spoken output and ignore written content entirely; alternately, a person might ignore the spoken output and focus solely on reading the material at a comfortable rate; yet, another option would be to track visually the words as spoken but to reread the content a second or third time to ensure comprehension. Analyzing the extent to which a person's eye movements while reading synchronize with auditory output would provide a means of exploring this question and might provide guidance for refining and implementing TTS technology in ways most likely to benefit people with aphasia. Exploration of such issues is necessary to provide practitioners with direction about optimal methods of instructing people with aphasia to use comprehension supports.
Conclusion
Identifying methods facilitating efficient decoding and comprehension of written materials to which people with aphasia desire access is clinically relevant. Accumulating evidence shows that providing combined written and auditory modalities via TTS technology is promising as a compensatory comprehension strategy for this purpose (Knollman-Porter et al., 2019; Wallace et al., 2019). With this evidence in mind, we strove to determine the effect of various speech presentation rates on comprehension when a person with aphasia had unlimited access to written content. Although a significant difference did not emerge among the three rate conditions implemented, the trend was for better comprehension with slower presentation rates. Also, the participants with aphasia took significantly longer to review passages when the speech rate was fast versus medium. In combination with participants' overall preference for the medium presentation rate, the findings suggest that approximately 150 wpm is an appropriate rate for clinicians to use when introducing TTS technology to people with aphasia; adjustments to faster or slower rates can then be made to match individual preferences and abilities. Continued exploration of additional modifiable features of TTS systems will be key to further promoting the implementation and use of this technology as a comprehension support.
Acknowledgments
Research reported in the article was supported by National Institute on Deafness and Other Communication Disorders Award R15DC015579 (awarded to Kelly Knollman-Porter, Sarah Wallace, Karen Hux, and Jessica Brown). The content is solely the responsibility of the authors and does not necessarily represent the view of the National Institutes of Health.
Appendix
Coffee Shop Raises Money for Service Dog
When you buy something at Kate's Coffee Shop, you pay for freshly baked pastries and hot, delicious coffee. But lately, customers have been paying for a little more than just that. Alexander and Kate Edwards, the owners, recently decided to put part of their profits toward a cause that is near to their hearts—raising money for a service dog for an elementary school student with a rare form of epilepsy.
Alexander and Kate learned that their son had befriended a fourth grader named John who has multiple seizures every day. After hearing their son talk about his friend, they began researching ways they could help him. They learned about service dogs specifically trained to sense oncoming seizures and alert their owners or other individuals around.
The Edwards brought up the idea of a service dog to John's parents and offered to give 10% of their profits toward raising money required for the dog. His parents were overwhelmed with excitement and gratefulness for the offer. “We just want to help in any way we can, and we're glad that Kate's Coffee Shop allowed us to give back a little to the community that has given us so much,” said Alexander.
Funding Statement
Research reported in the article was supported by National Institute on Deafness and Other Communication Disorders Award R15DC015579 (awarded to Kelly Knollman-Porter, Sarah Wallace, Karen Hux, and Jessica Brown).
References
- American Speech-Language-Hearing Association. (n.d.). Tips for communicating with a person who has aphasia. Retrieved from https://www.asha.org/public/speech/disorders/aphasia/
- Bakeman R. (2005). Recommended effect size statistics for repeated measures designs. Behavior Research Methods, 37, 379–384. [DOI] [PubMed] [Google Scholar]
- Blumstein S. E., Katz B., Goodglass H., Shrier R., & Dworetsky B. (1985). The effects of slowed speech on auditory comprehension in aphasia. Brain and Language, 24, 246–265. [DOI] [PubMed] [Google Scholar]
- Cermak L. S., & Moreines J. (1976). Verbal retention deficits in aphasic and amnesic patients. Brain and Language, 3, 16–27. [DOI] [PubMed] [Google Scholar]
- Cherney L. R. (2004). Aphasia, alexia, and oral reading. Topics in Stroke Rehabilitation, 11(1), 22–36. [DOI] [PubMed] [Google Scholar]
- DeDe G. (2013). Reading and listening in people with aphasia: Effects of syntactic complexity. American Journal of Speech-Language Pathology, 22, 579–590. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dwyer E. J., & West R. F. (1994). Effects of sustained silent reading on reading rate among college students. Balance in Reading Instruction, 1(2), 1–16. [Google Scholar]
- Fager S., Hux K., Beukelman D. R., & Karantounis R. (2006). Augmentative and alternative communication use and acceptance by adults with traumatic brain injury. Augmentative and Alternative Communication, 22, 37–47. [DOI] [PubMed] [Google Scholar]
- Flesch R. (1948). A new readability yardstick. Journal of Applied Psychology, 32, 221–233. [DOI] [PubMed] [Google Scholar]
- Harvey J., & Hux K. (2015). Text-to-speech accommodations for the reading challenges of adults with traumatic brain injury. Brain Injury, 29, 888–897. [DOI] [PubMed] [Google Scholar]
- Harvey J., Hux K., Scott N., & Snell J. (2013). Text-to-speech technology effects on reading rate and comprehension by adults with traumatic brain injury. Brain Injury, 27(12), 1388–1394. [DOI] [PubMed] [Google Scholar]
- Harvey J., Hux K., & Snell J. (2013). Using text-to-speech reading support for an adult with mild aphasia and cognitive impairment. Communication Disorders Quarterly, 35, 39–43. [Google Scholar]
- Helm-Estabrooks N. (2017). Cognitive-Linguistic Quick Test–Plus. San Antonio, TX: The Psychological Corporation. [Google Scholar]
- Hux K., Knollman-Porter K., Brown J., & Wallace S. E. (2017). Comprehension of synthetic speech and digitized natural speech by adults with aphasia. Journal of Communication Disorders, 69, 15–26. [DOI] [PubMed] [Google Scholar]
- Kertesz A. (2006). Western Aphasia Battery–Revised. San Antonio, TX: Pearson Education. [Google Scholar]
- King C., & East M. (2011). Learners' interaction with listening tasks: Is either input repetition or a slower rate of delivery of benefit? New Zealand Studies in Applied Linguistics, 17(1), 70–85. [Google Scholar]
- Knollman-Porter K., Wallace S. E., Brown J. A., Hux K., Hoagland B. L., & Ruff D. R. (2019). Effects of written, auditory, and combined modalities on comprehension by people with aphasia. American Journal of Speech-Language Pathology, 28, 1206–1221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Knollman-Porter K., Wallace S. E., Hux K., Brown J., & Long C. (2015). Reading experiences and use of supports by people with chronic aphasia. Aphasiology, 29, 1448–1472. [Google Scholar]
- Koul R. (2003). Synthetic speech perception in individuals with and without disabilities. Augmentative and Alternative Communication, 19, 49–58. [DOI] [PubMed] [Google Scholar]
- Koul R., & Dembowski J. (2010). Synthetic speech perception in individuals with intellectual and communicative disabilities. In Mullennix L. J. (Ed.), Computer synthesized speech technologies: Tools for aiding impairment (pp. 177–187). Hershey, PA: IGI Global. [Google Scholar]
- LaPointe L. & Horner J. (1998). RCBA-2: Reading Comprehension Battery for Aphasia–Second Edition. Austin, TX: PRO-ED. [Google Scholar]
- Leff A. P., & Behrmann M. (2008). Treatment of reading impairment after stroke. Current Opinion in Neurology, 21(6), 644–648. [DOI] [PubMed] [Google Scholar]
- Lewandowski L. J., Codding R. S., Kleinmann A. E., & Tucker K. L. (2003). Assessment of reading rate in postsecondary students. Journal of Psychoeducational Assessment, 21, 134–144. [Google Scholar]
- Linderholm T., Cong X., & Zhao Q. (2008). Differences in low and high working-memory capacity readers' cognitive and metacognitive processing patterns as a function of reading for different purposes. Reading Psychology, 29(1), 61–85. [Google Scholar]
- McNeil M. R., & Pratt S. R. (2001). Defining aphasia: Some theoretical and clinical implications of operating from a formal definition. Aphasiology, 15(10), 901–911. [Google Scholar]
- Meyer B. J. F., & Rice G. E. (1989). Prose processing in adulthood: The text, the reader and the task. In Poon L. W., Rubin D. C. & Wilson B. A. (Eds.), Everyday cognition in adult and later life (pp. 157–194). New York, NY: Cambridge University Press. [Google Scholar]
- National Aphasia Association. (n.d.). Communication strategies: Dos and don'ts. Retrieved from https://www.aphasia.org/aphasia-resources/communication-tips/
- Nicholas L. E., & Brookshire R. H. (1986). Consistency of the effects of rate of speech on brain-damaged adults' comprehension of narrative discourse. Journal of Speech and Hearing Research, 29(4), 462–470. [DOI] [PubMed] [Google Scholar]
- Pampoulou E. (2019). Speech and language therapists' views about AAC system acceptance by people with acquired communication disorders. Disability and Rehabilitation: Assistive Technology, 14, 471–478. [DOI] [PubMed] [Google Scholar]
- Pashek G. V., & Brookshire R. H. (1982). Effects of rate of speech and linguistic stress on auditory paragraph comprehension of aphasic individuals. Journal of Speech and Hearing Research, 25(3), 377–383. [DOI] [PubMed] [Google Scholar]
- Riensche L. L., Wohlert A., & Porch B. E. (1983). Aphasic comprehension and preference of rate-altered speech. British Journal of Disorders of Communication, 18(1), 39–48. [DOI] [PubMed] [Google Scholar]
- Robb M. P., MacLagan M. A., & Chen Y. (2004). Speaking rates of American and New Zealand varieties of English. Clinical Linguistics & Phonetics, 18(1), 1–15. [DOI] [PubMed] [Google Scholar]
- Swinburn K., Porter G. & Howard D. (2004). Comprehensive Aphasia Test. Hove, United Kingdom: Psychology Press. [Google Scholar]
- Tauroza S., & Allison D. (1990). Speech rates in British English. Applied Linguistics, 11, 90–105. [Google Scholar]
- Wallace S. E., Knollman-Porter K., Brown J. A., & Hux K. (2019). Narrative comprehension by people with aphasia given single versus combined modality presentation. Aphasiology, 33, 731–754. [Google Scholar]
- Webster J., Morris J., Howard D., & Garraffa M. (2018). Reading for meaning: What influences paragraph understanding in aphasia? American Journal of Speech-Language Pathology, 27, 423–437. [DOI] [PubMed] [Google Scholar]
- Williams J. R. (1998). Guidelines for the use of multimedia in instruction. Proceedings of the Human Factors and Ergonomics Society 42nd Annual Meeting, 42, 1447–1451. [Google Scholar]




