Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Jan 1.
Published in final edited form as: Aphasiology. 2019 Nov 22;35(4):518–538. doi: 10.1080/02687038.2020.1727837

Repeated word production is inconsistent in both aphasia and apraxia of speech

Katarina L Haley 1, Kevin T Cunningham 1, Adam Jacks 1, Jessica D Richardson 2, Tyson Harmon 3, Peter E Turkeltaub 4
PMCID: PMC8681875  NIHMSID: NIHMS1563761  PMID: 34924672

Abstract

Purpose:

There is persistent uncertainty about whether sound error consistency is a valid criterion for differentiating between apraxia of speech (AOS) and aphasia with phonemic paraphasia. The purpose of this study was to determine whether speakers with a profile of aphasia and AOS differ in error consistency from speakers with aphasia who do not have AOS. By accounting for differences in overall severity and using a sample size well over three times that of the largest study on the topic to date, our ambition was to resolve the existing controversy.

Method:

We analyzed speech samples from 171 speakers with aphasia and completed error consistency analysis for 137 of them. The experimental task was to repeat four multisyllabic words five times successively. Phonetic transcriptions were coded for four consistency indices (two at the sound-level and two at the word-level). We then used quantitative metrics to assign participants to four diagnostic groups (one aphasia plus AOS group, one aphasia only group, and two groups with intermediate speech profiles). Potential consistency differences were examined with ANCOVA, with error frequency as a continuous covariate.

Results:

Error frequency was a strong predictor for three of the four consistency metrics. The magnitude of consistency for participants with AOS was either similar or lower compared to that of participants with aphasia only. Despite excellent transcription reliability and moderate to excellent coding reliability, three of the four consistency indices showed limited measurement reliability.

Discussion:

People with AOS and people with aphasia often produce inconsistent variants of errors when they are asked to repeat challenging words several times sequentially. The finding that error consistency is similar or lower in aphasia with AOS than in aphasia without AOS is incompatible with recommendations that high error consistency be used as a diagnostic criterion for AOS. At the same time, group differences in the opposite direction are not sufficiently systematic to warrant use for differential diagnosis between aphasia with AOS and aphasia with phonemic paraphasia. Greater attention should be given to error propagation when estimating reliability of derived measurements.

Introduction

Acquired apraxia of speech (AOS) is a conceptually defined disorder of motor programming that almost always coexists with aphasia. Its behavioral definition is multidimensional and based on clinical appraisal of core diagnostic criteria, most of which have been adjusted and altered over time. Contemporary criteria are summarized in book chapters and systematic reviews (Ballard et al., 2015; Duffy, 2013; McNeil, Robin, & Schmidt, 2009; Miller & Wambaugh, 2017). They include slow speech with abnormal temporal prosody—especially in multisyllabic words and discourse—as evidenced by a combination of sound prolongations, pauses, and abnormal stress. Speech output is also characterized by subtle distortion errors, which are recognized by phonetically trained listeners as segmental variations that do not alter the perceived phoneme but are deemed incorrect in the phonetic context. In addition, listeners detect phonemic additions, omissions, and substitutions, occasionally combined with distortion errors within the same consonant or vowel segment (Haley, Jacks, Richardson, & Wambaugh, 2017; Odell, McNeil, Rosenbek, & Hunter, 1990).

The primary value of current diagnostic criteria is that they help differentiate AOS from aphasia with phonemic paraphasia—its closest diagnostic neighbor in the setting of stroke. Similar to those with AOS, people who have aphasia with phonemic paraphasia produce errors that affect phonemic accuracy; the difference is these errors occur in the context of fewer distortion errors and relatively normal prosody. For our purposes, it is also important to recognize a third category of stroke survivors who have aphasia but who produce no or very few sound errors (Haley, Jacks, & Cunningham, 2013). Although people with minimal speech impairment are not difficult to identify in clinical practice, their presentation profiles complicate diagnostic validation studies when they are included in an “aphasia only” comparison group, because any differences between groups could simply be attributable to inadequate severity matching.

In the present study, our concern was with a diagnostic criterion that has been regarded as primary (McNeil et al., 2009; Wambaugh, Duffy, McNeil, Robin, & Rogers, 2006), but was recently removed from most checklists (Ballard et al., 2015) due to a perception that its validity was controversial. This criterion is that speakers with AOS–in comparison to speakers with aphasia and phonemic paraphasia—make segmental sound errors that are: ”relatively consistent in terms of type and invariable in terms of location.” There has been some debate in the literature as to whether this conclusion is accurate and the matter has been considered unsettled for years. In a recent study, we sought to reconcile apparent disagreement across studies by demonstrating that error profiles can be simultaneously consistent and inconsistent, depending merely on how consistency is defined (Haley, Cunningham, Eaton, & Jacks, 2018). The purpose of the present study was to address what we identified as remaining empirical uncertainty: whether the phonetic form of errors is relatively consistent in type and location across repeated attempts to say the same target words. Our approach was to apply previously used consistency measures in a large and clinically representative sample of speakers with aphasia with and without AOS. In the following, we consider the relatively limited evidence published to date and the two variables that are particularly likely to confound group comparisons—error frequency and strategy for diagnostic classification.

Error consistency as a diagnostic criterion

The most obvious problem with the diagnostic criterion of relative error consistency is that it contradicted descriptions of the syndrome as it had been understood for more than 40 years. Since Darley’s characterization of AOS as a motor speech disorder distinct from dysarthria, diagnostic guidelines had suggested that speech output and sound errors in AOS were inconsistent rather than consistent (Darley, 1964; Darley, Aronson, & Brown, 1975; Wertz, LaPointe, & Rosenbek, 1984). Specifically, these early works described speech in AOS as including different types of errors produced on multiple repetitions of the same word. In contrast, the diagnostic criteria that were published in the late 1990s and early 2000s argued for a complete reversal, with errors in AOS described as relatively consistent in both location and type (McNeil, Robin, & Schmidt, 1997; 2009; Wambaugh et al., 2006). This reversal called for strong empirical and theoretical evidence and there was a reaction that perhaps this evidence had not been presented (Haley et al., 2013; Staiger, Finger-Berg, Aichert, & Ziegler, 2012). Moreover, guidelines were unclear on the magnitude of inconsistency that would be required for AOS diagnosis, rendering the criterion functionally unusable for clinicians. To complicate matters, studies began reporting contradictory results based on a number of additional ways to define error consistency (for reviews see Haley et al., 2018; Shuster & Wambaugh, 2008). The superficially contradictory results added to the confusion and sense of controversy, even though the observed phenomena were mostly compatible (Haley et al., 2018). In reaction, research laboratories varied extensively in the degree to which they adopted the recommended criterion of error consistency as a sign of AOS, and most practicing clinicians ignored it altogether and continued instead to use limited error consistency as a distinguishing feature for AOS (Molloy & Jagoe, 2019).

To our knowledge, only five small group studies have, so far, compared consistency on repeated word production in AOS to a control group of people with aphasia and no AOS. In all these studies, error consistency was defined based on repeated productions of a defined set of words that were reasonably challenging to produce. As we discuss the studies, we will use the term error consistency as a single dimension of intraspeaker variance (rather than mixing the terms variability and consistency). We do this to reduce the unnecessary confusion that can occur when synthesizing reports from different laboratories (Haley et al., 2018). Table 1 summarizes basic elements of the five studies and their reported findings. Results from four studies (# 1, 3, 4, and 5 in the table) concluded that the two diagnostic groups did not differ on error consistency measures, that speech errors were less consistent in AOS than in aphasia without AOS, or that consistency differed for error location and error type (Bislick, McNeil, Spencer, Yorkston, & Kendall, 2017; Haley et al., 2013; Miller, 1992; Scholl, McCabe, Heard, & Ballard, 2017). Only one of the studies (# 2; McNeil, Odell, Miller, & Hunter, 1995) concluded that speech sound errors were more consistent in location and type for speakers with AOS than for speakers with aphasia without AOS.

Table 1.

Results of previous group comparisons of error consistency between participants with AOS and participants without AOS.

Study # Participants with AOS: # Participants without AOS Consistency of error location Consistency of error type Consistency of word production
1. Miller, 1992 10:6 NA NA No difference
2. McNeil et al, 1995 4:5 AOS more consistent AOS more consistent NA
3. Haley et al., 2013 9:11 No difference No difference No difference
4. Bislick et al., 2017 10:10 No difference AOS less consistent NA
5. Scholl et al., 2017 20:21 AOS more consistent AOS less consistent NA

Note. NA = Not applicable

McNeil and colleagues (1995) compared four participants with a clinical diagnosis of AOS and mild or no aphasia to five participants with a clinical diagnosis of conduction aphasia and phonemic paraphasia. The participants’ task was to repeat ten words, 2–5 syllables in length, three times sequentially (e.g. “butterfly, butterfly, butterfly”). Consistency of error location and type were calculated at the speech sound level based on ratio metrics that were developed by the authors. Though the dispersion of scores overlapped between groups for consistency of error type, a difference in the mean for both was taken as evidence of relatively greater consistency for the AOS speakers. No inferential statistics were used since the sample was small. A unique feature of the study was that it was specifically focused on people with minimal or no clinically evident language impairment. It was potentially this relative “purity” that brought about the consequential impact of the study in terms of reversing the prevailing impression that low error consistency was a sign of AOS.

Other research teams focused on speakers with coexisting AOS and aphasia, with the primary motivation that the vast majority of people with AOS also have some degree of aphasia. Miller (1992) compared a group of ten speakers with aphasia who were diagnosed as having speech dyspraxia to six who were diagnosed as having phonemic paraphasia. Participants repeated five words, 1–3 syllables in length, five times in a row. A word-level consistency analysis (similar to the “Consistency for Production of Words”, CPR-w, metric we will introduce in this study) showed limited error consistency in both groups, with no statistically significant magnitude differences and a wide range of individual consistency scores. Two decades later, our team (Haley et al., 2013) compared nine participants with a profile indicative of aphasia with AOS to eleven with a profile indicative of aphasia with phonemic paraphasia. The study also included six who made only occasional sound errors, and another six whose presentation displayed features of AOS less clearly. The participants’ task was to repeat five multisyllabic words five times in a row. Results showed that syllable-level error consistency in multisyllabic words was greater for the group who made minimal sound errors than for all other groups, but that the other groups did not differ from each other (See table 1). In the same study, we included complementary comparisons between participants with the strongest converging evidence for AOS or conduction aphasia with phonemic paraphasia. Results indicated that the AOS speakers were descriptively less, rather than more, consistent in error type for both syllables and words, while there was no difference for consistency of syllable error location.

Finally, two recent studies compared participants with aphasia and AOS to participants with aphasia and no AOS, with the latter group potentially including a combination of participants with aphasia plus phonemic paraphasia and aphasia plus minimal sound production errors. Bislick and colleagues (2017) examined ten participants from each group, with diagnoses based on clinical consensus. The task was to repeat 15 multisyllabic words five times in a row. Results showed that the speakers with AOS had significantly lower consistency of error type than the speakers with aphasia, while there was no group difference for consistency of error location. In a slightly larger study that used a similar diagnostic and grouping strategy, Scholl and colleagues compared 20 participants with aphasia and AOS to 21 with aphasia and no AOS (Scholl et al., 2017). The speech sample consisted of 10 multisyllabic words repeated three times in a row. Like Bislick and colleagues, the results showed that the speakers with AOS had significantly lower consistency of segment error type but, like McNeil and colleagues, the authors found that consistency of error location was greater in speakers with AOS.

Though methodological differences complicate comparisons across laboratories, there is converging evidence that error consistency is at least not pathognomonic of AOS. Instead, there appears to be a relatively broad range of values within this diagnostic group and a similarly broad range for speakers with aphasia and no AOS, including the subset who produces prominent phonemic paraphasic errors. In light of the substantial variance among speakers, a much larger sample size is necessary to determine whether speakers with aphasia and AOS, as a group, are relatively more or less consistent than speakers with aphasia and phonemic paraphasia. In doing so, it will be important to control for systematic differences in error frequency (severity) and to ensure that the comparison groups are formed validly and transparently.

The complicating problem of error frequency

Error frequency can easily confound comparison of speech consistency between speaker groups. As mentioned, one way to ensure unbalanced groups is to include participants with minimal speech sound errors in the group of aphasia without AOS while participants in the AOS group generate prominent error frequencies. Even considering only those who do produce errors, studies generally report that stroke survivors who have aphasia without AOS display lower overall error frequency than speakers who have AOS (Cunningham, Haley, & Jacks, 2016; Haley et al., 2013; Haley, Jacks, de Riesthal, Abou-Khalil, & Roth, 2012; McNeil et al., 1995). Because averages and variance estimates are interdependent, these systematic differences cannot be ignored.

The specific nature of the relationship between frequency and variance depends on the metric and system under investigation. For some consistency measures in the AOS literature, the relationship between error frequency and error consistency is predictable based on clinical experience. For example, people with the most severe AOS appear to have access to a restricted phonetic output repertoire (Duffy, 2013) and therefore have fewer degrees of freedom to generate sound errors. For other consistency metrics, relationships to error frequency are integral to measurement definitions. Of particular relevance to the present study, a ceiling effect is predictable for metrics expressing consistency of error location, in that complete consistency is guaranteed if all segments in the target word are produced incorrectly (Haley et al., 2013). On the other end of the severity spectrum, speakers who produce very few errors do not generate many consecutive errors and are therefore overall consistent in their output, but have limited opportunity to demonstrate either high or low consistency of their errors in a reliable manner. For all these reasons, any group comparison of error consistency must evaluate the possibility that study results could be explained by error frequency.

Valid and transparent AOS diagnosis

So far, we have addressed basic challenges in defining the comparison group of speakers with aphasia, but we have avoided the more persistent dilemma of diagnosing AOS. It is a fundamental problem of virtually all AOS research that syndrome diagnosis has relied on clinical impression instead of measurement and on mental checklists of criteria that years later were modified or deemed invalid. In the absence of quantified behavior, there is no way of knowing whether participants diagnosed by one research team would be diagnosed similarly by another team. Consequently, integration of results across studies is perilous and, given the very small sample sizes in the research area to date, basic questions about the syndrome remain unanswered. This, of course, includes the question of whether errors are more or less consistent in AOS than in aphasia without AOS.

Because the diagnostic definition of AOS is that of a multidimensional behavioral syndrome, it is most appropriate to base diagnosis on speech features that represent its main dimensions. In recent research that has expressed core diagnostic features of AOS quantitatively and with increasingly large sample sizes, we have learned that profiles are far from dichotomous. Whereas some cases appear to be prototypical examples of the AOS or aphasia with phonemic paraphasia syndromes, others display a mixed pattern (Haley et al., 2017; Haley, Smith, & Wambaugh, 2019). The possibility of clinically relevant subtypes have been raised previously, including a recent differentiation among primarily phonetic, prosodic, and mixed subtypes of progressive AOS (Josephs et al., 2013; Utianski et al., 2018). Future work is needed to better delineate the heterogeneity of speech profiles that occurs also in stroke. In the present study, we used converging quantitative evidence of phonetic distortion and abnormal temporal prosody as the basis for diagnosis. We were able to evaluate a sample size that was unusually large for the area of study, and therefore to analyze separately the most prototypical performance profiles as well as those that were less clear, yet representative of clinical variation.

Purpose

The purpose of the study was to determine whether speakers with aphasia and AOS are more or less consistent than speakers with aphasia and no AOS when they repeat multisyllabic words sequentially. We suspected that error frequency would be an important confounding factor. Therefore, our research question was: Do speakers with aphasia and AOS produce different error consistency than speakers with aphasia and no AOS when comparisons are corrected for differences in overall error frequency?

Method

The study was approved by the institutional review boards of the collaborating universities. All participants provided signed informed consent.

Participants

We analyzed speech recordings from 171 people diagnosed with AOS and/or aphasia after an acquired focal cerebral lesion. The participants were native speakers of American English and reported no history of speech or language impairment or progressive neurologic disease. Etiology was stroke for 166 cases (97.1%), focal traumatic brain injury for three (1.1%), multiple sclerosis for one (0.6%), and radiation necrosis for one (0.6%). Time post onset ranged from one to 540 months (M = 54.7, SD = 67.9) and age ranged from 19 to 88 years (M = 60.0, SD = 12.5). There were more male (n=108; 63.2%) than female (n=63; 36.8%) participants. Thirty-four (19.9%) identified as African American, five as Hispanic/Latino (2.9%), two (1.2%) as Asian American, one (0.6%) as multiple race, one as American Indian/Alaska Native (0.6%) and one (0.6%) did not report race/ethnicity. The remaining 127 (74.3%) identified as non-Hispanic European American.

Aphasia severity was estimated with the Western Aphasia Battery-Revised (WAB-R; Kertesz, 2006). The Aphasia Quotient ranged from 17.0 to 99.8 (M = 72.7, SD = 21.8) for the full sample. The first and third author verified perceptually that no speaker presented with significant or primary dysarthria. We did not attempt to specifically rule out mild unilateral upper motor neuron dysarthria since it is unclear how to complete this differentiation auditorily in the setting of coexisting AOS.

Perceptual and acoustic metrics for group formation

A word repetition task was completed as part of a motor speech evaluation (see Haley et al., 2017 for a listing of the target words). The task was audio-recorded and target words were saved as individual files. Narrow phonetic transcription of this sample was completed by two phonetically trained observers who entered computer-readable phonetic characters (Vitevitch & Luce, 2004) on spreadsheets after listening to each word via circum-aural headphones and simultaneously observing waveform and spectrographic displays in Praat (Boersma & Weenink, 2017). The first production of each word was transcribed, disregarding partial word repetitions of sounds and syllables. This focus is consistent with previous studies (Bislick et al., 2017; Haley et al., 2013; McNeil et al., 1995; Miller et al., 1992) and helps minimize underestimation of error consistency for the AOS participants (who are most likely to produce additional syllables as a form of part-word repetition). Normal allophonic or dialectal variation was purposely transcribed identically to the target to ensure that they were not counted as errors. When participants self-corrected or repeated their productions, the transcribers coded the first attempt that had the correct number of syllables or, if no attempts had the correct number of syllables, the first attempt that had the closest number of syllables. Diacritic marks were used to indicate up to 11 of the most common distortion types expected in left hemisphere stroke survivors (Haley et al., 2019). The proportion of target speech sounds that was produced with a phonemic error was calculated as the phonemic edit distance ratio using custom code in R (Smith, Cunningham, & Haley, 2019). We also calculated the proportion of speech sound segments that was coded with one or more phonetic distortion codes. Inter-observer phonetic transcription reliability was satisfactory (r = .929 for phonemic errors, r = .856 for distortion errors on 32 randomly selected participants).

To quantify temporal prosody, we used the Word Syllable Duration (WSD) metric (Haley et al., 2012). It was measured in 6–10 multisyllabic words from the motor speech evaluation. The words were repeated a single time and to be included the participant was required to produce a response that consisted of at least three syllables. The acoustic word duration was divided by the number of produced syllables to yield the mean syllable duration for each word, thus including both articulation time and intersyllabic pauses. The WSD for each participant was then expressed as the mean duration across all multisyllabic words produced by that speaker. Inter-observer reliability for WSD, calculated on 21 speakers, was excellent (r = .956).

Transcription and consistency coding for the repeated word productions

In addition to the single word repetition, each participant produced the words: artillery, impossibility, catastrophe, and rhinoceros five times in succession. These productions (20 for each participant) were the experimental targets for our consistency analyses. Audio recordings of each participant’s repeated word production were transcribed by a third phonetically trained listener, this time using broad transcription. As before, transcriptions were completed in computer-readable phonetic characters with the acoustic waveform and spectrograms available for reference. A template with target productions was provided, allowing the coder to alter productions only when errors were perceived. To estimate inter-observer reliability, a second transcriber completed an independent transcription for the words produced by 28 (16%) of the speakers. Point-to-point exact agreement for individual consonant and vowel segments was 91.9%.

Segment alignment and coding strategies were identical to those described in detail in our previous work (Haley et al., 2018). They were also completed by the same coder (KTC—the second author of the present study). First, we calculated overall error frequency, then two consistency metrics at the segment level (CEL-s and CET-s) and two at the word level (CET-w and CPR-w). Calculation formulas are provided in Appendix 1 and further detail with coding examples are provided in our 2018 study. The metrics were expressed as ratios from 0.0 to 1.0, with 1.0 indicating perfect consistency.

Error frequency.

An error frequency ratio was calculated with the same edit distance and semi-automated procedure we used for the quantified motor speech evaluation (Smith et al., 2019). We expressed it on a scale from 0.0 to 1.0 to indicate the proportion of phoneme segments that was produced incorrectly.

Consistency of error location (CEL-s).

This measure expresses the degree to which errors occur on the same target segment across trials. Like previous studies, errors were considered consistent if the same sound segment was in error for more than 50% of the trials (Bislick et al., 2017; Haley et al., 2018, 2013; McNeil et al., 1995). The percentage of sound segments with consistent error location was calculated by dividing the number of segments consistently in error by the number of total incorrect segments.

Consistency of error type (CET-s and CET-w).

The sound-level metric CET-s was calculated for the subset of segments previously determined to be consistent in error location. It expressed the degree to which the speaker also made the same error across trials and was defined as one minus the number of different errors divided by the number of errors for a given error location (Bislick et al., 2017; Haley et al., 2018; McNeil et al., 1995). The word-level metric CET-w was defined as one minus the error token variability—a ratio between frequency of erroneous word production variants and erroneous word attempts (Haley et al., 2018, 2013; Marquardt, Jacks, & Davis, 2004).

Consistency of production (CPR-w).

This word-level measure was calculated to express sequential consistency for the entire word by taking into consideration accurate productions as well as erroneous productions. It was defined as one minus the total token variability—a ratio between frequency of word production variants and word attempts (Haley et al., 2018, 2013; Marquardt et al., 2004; Miller, 1992).

Four diagnostic groups

In preparation for the error consistency analyses, we trimmed the sample by eliminating speakers who produced no errors on the repeated words task (n = 25) and speakers whose output was deemed too limited or severe to align reliably with the target word (n = 12). The remaining 137 speakers were divided into four comparison groups based on quantitative criteria for temporal prosody and distortion frequency. The median word syllable duration (301.6 ms) and frequency of sound distortions (8.9%) were used to divide the sample into four quadrants. Two were of particular interest: Long syllable duration with high distortion rates, which was considered indicative of apraxia of speech (“AOS”, n=38) and normal syllable duration with low distortion rates, which we considered indicative of aphasia without AOS (“APH”, n=37). Participants in the two remaining groups were assigned to two “borderline” diagnostic categories; long syllable duration and low distortion rates (“BORD1”, n=31); normal syllable duration and high distortion rate (“BORD2”, n=31).

Average scores for the perceptual and acoustic metrics from the motor speech evaluation and the WAB are presented in Table 2 for each of the four diagnostic groups. As ensured by the grouping strategies, the groups differed in WSD and frequency of distortion errors. Verifying the challenge of inherent severity differences between the comparison groups, there were also differences in the frequency of phonemic errors, with the AOS group producing 25.4% of target segments incorrectly, compared to only 11.9% for the APH group and the two borderline groups showing intermediate frequencies of 13.0% and 16.8%.

Table 2.

Demographic information and results of speech and language testing for participants within each of the four diagnostic groups, including those with a profile most consistent with apraxia of speech (AOS), aphasia without AOS (APH), and a borderline diagnostic profile (BORD1 and BORD2). Values indicate means standard-deviations (in parerenthesis).

Participant group
Variable AOS (n = 38) BORD1 (n = 31) BORD2 (n = 31) APH (n = 37) ANOVA
Sex (M:F) 19:19 20:11 19:12 31:6
Age (years) 60.3 (13.3) 57.1 (14.6) 59.4 (11.3) 61.4 (12.9) F(3,133) = 0.67, p = .569
Months post-onset 46.0 (40.8) 56.4 (62.8) 40.6 (36.6) 51.1 (109.7) F (3,133) = 0.29, p = .830
WAB-R AQ (/100) 65.6 (19.9) 73.3 (19.4) 71.8 (20.3) 76.5 (18.2) F(3,133) = 2.04, p = .111
WSD (ms) 415.6 (105.3) 375.7 (50.2) 253.6 (34.1) 259.1 (29.1) F(3,133) = 57.06, p < .001
Distortion errors (%) 17.2 (11.2) 4.3 (2.5) 15.1(12.5) 4.3 (2.3) F(2,133) = 22.73, p < .001
Phonemic errors (%) 25.4 (18.5) 13.4 (14.7) 16.8 (19.5) 11.9 (16.2) F(3,133) = 4.45, p = .005

Note. AOS = Apraxia of Speech (WSD>301.6 ms, DIST>8.6%); BORD1 = (WSD>301.6 ms, DIST < 8.6%); BORD2 =(WSD<301.6 ms, DIST > 8.6%); APH = Aphasia without AOS (WSD<301.6 ms, DIST<8.6%); WAB-R AQ = Aphasia Quotient from the Western Aphasia Battery-Revised (Kertesz, 2006); WSD = Word syllable duration.

Even after eliminating the speakers who produced no errors on the repeated words, error consistency could not be calculated for some speakers and metrics, due to an inadequate number of errors. For this reason, the number of datapoints varied somewhat across the four metrics. Across all 137 participants, it was not possible to calculate CEL-s for one participant, CET-s for 21 participants, and CET-w for 12 participants. It was possible to calculate CPR-w for all participants.

Analysis plan

Due to the anticipated effect of error frequency, differences were evaluated with ANCOVA, using group as a categorical predictor and error frequency as a continuous predictor. For variables with significant group effects, we completed pairwise group comparisons with correction for multiple comparisons using the Šidák method. Analyses were completed in SPSS Statistics 26.0 (IBM Corp., Armonk, NY).

Results

In all, we examined error frequency and consistency based on 2,740 target word productions (137 speakers x 4 target words x 5 repetitions) and 26,715 target phonemes (8–12 per word). The mean proportion of incorrectly produced phonemes on the experimental targets was .340 for the AOS group, .225 and .249 for the two borderline groups (BORD1 and BORD2, respectively), and .201 for the APH group.

Our prediction that error frequency correlated with error consistency was verified. As illustrated in Figure 1, the nature of the relationship differed across consistency metrics. Error frequency showed a positive Pearson correlation with consistency of error location for segments (r = .622 for CEL-S), limited correlation with consistency of error type (r = .136 for CET-s) and a negative correlation with both word-level consistency measures (r = − .380 for CET-w; r = −.689 for CPR-w).

Figure 1.

Figure 1.

Relationship between segmental error frequency ratio and error consistency. Results for analyses at the segment level are on the top (CEL-s = Consistency of error location for segments; CET-s = Consistency of error type for segments) and results for analyses at the word level are on the bottom (CET-w = Consistency of error type for words; CPR-w = Consistency of production for words).

Group comparisons are presented in Figure 2. We report results of the ANCOVA analyses first for the two segment-level analyses (CEL-s and CET-s; top of figure 2) then for the two word-level analyses (CET-w and CPR-w; bottom of figure 2).

Figure 2.

Figure 2.

Box and whisker plots illustrating differences in consistency across four comparison groups (AOS = apraxia of speech profile, BORD1 and BORD2 = borderline profile, APH= aphasia without AOS). Results for analyses at the segment level are on the top (CEL-s = Consistency of error location for segments; CET-s = Consistency of error type for segments) and results for analyses at the word level are on the bottom (CET-w = Consistency of error type for words; CPR-w = Consistency of production for words). Each plot shows the median line with the box extending from the first to the third quartile. X indicates the mean. The whiskers extend to the lowest/highest data points data within 1.5 times the interquartile range.

Segment-level analyses:

Error frequency was a strong predictor of CEL-s, F(1,131) = 87.92, p < .001, with higher error frequencies predicting greater consistency of error location. CEL-s also varied by group, F(3,131) = 2.81, p = .042. Pairwise comparisons showed significantly lower consistency for the AOS group compared to the BORD2 group (p = .038), but no other differences (ps > .73). For consistency of error type, CET-s, there was no effect of error frequency, F(1,111) = 1.13, p = .289, or group, F(3,111) = 1.86, p = .140.

Word-level analyses:

Error frequency was a strong predictor of consistency of error type for words, CET-w, F(1,119) = 16.34, p < .001, but the group factor did not reach significance F(3,119) = 2.23, p = .088. Error frequency was also a strong predictor of consistency of production for words, CPR-w, F(1,132) = 111.47, p < .001. For this measure, the group effect was also highly significant, F(3,132) = 6.93, p < .001. Pairwise comparisons showed that the AOS and BORD1 group were less consistent than the APH group (ps = .001); the BORD2 group did not differ significantly from the other groups.

Reliability of the consistency measures:

Having demonstrated strong reliability of the phonetic transcription, we thought it was also important to estimate reliability of the derived consistency metrics, since the two can sometimes diverge. First, we compared the metrics derived from the primary phonetic transcriber to those derived from the secondary phonetic transcriber. We did this for the same 28 speakers we used to evaluate inter-observer transcription reliability. This time, we used the separate transcriptions but compared the derived consistency metrics through intraclass correlation. We used a single-rater, absolute-agreement, 2-way mixed-effects model with 95% confidence intervals and SPSS Statistics 26.0 (IBM Corp., Armonk, NY) for these calculations. Results are presented in Table 3. Based on published recommendations (Koo & Li, 2016), the intraclass correlation indicated excellent reliability for error frequency (ICC = .979). Surprisingly, reliability for the derived consistency measures was poor for both of the sound-level measures (ICC = .194, .393 for CEL-s and CET-s, respectively) and for consistency of error type at the word level (ICC = .448 for CET-w). Only the whole word production consistency measure indicated good reliability (ICC = .877 for CPR-w).

Table 3.

Measurement reliability of consistency measures derived from independent phonetic transcriptions coded by the same observer on two separate occasions. Reliability is expressed as the intraclass correlation based on the same 28 participants for which we estimated inter-observer point-to-point agreement at the phoneme level (91.9%).

Variable Intraclass correlation 95% Confidence Interval F Test with True Value 0
Lower Bound Upper Bound Value df1 df2 p
Error frequency .979 .955 .990 94.23 27 27 <.001
CEL-s .194 −.236 .553 1.46 23 23 .184
CET-s .393 −.049 .702 2.24 20 20 .039
CET-w .448 .069 .715 2.63 23 23 .012
CPR-w .877 .752 .941 14.80 27 27 <.001

Note. Intraclass estimations were based on a single-rater, absolute-agreement, 2-way mixed-effects model

Alignment of production attempts and coding of error location and type are complex processes that require diligence on the part of the coder. This is particularly true for speakers who produce numerous errors, as many of our speakers did. Our coder, KTC, was highly experienced, having completed similar error consistency coding in our two previous studies (Haley et al., 2013; 2018). Nevertheless, some degree of error is to be expected, particularly given the volume of coding that was necessary for this study. To evaluate to what extent the limited agreement might be explained by the coding process itself, KTC realigned and then recalculated each of the consistency metrics based on the transcriptions that were originally coded for the 28 speakers. A comparison between the original and the recoded scores was used to estimate intra-coder reliability. As shown in Table 4, intra-coder reliability was good at the segment level (ICC = .780 for CEL-s, ICC = .653 for CET-s) and excellent at the word level (CET-w = .902, CPR-w = .987).

Table 4.

Intra-observer reliability of consistency measures derived from independent coding of the same phonetic transcript on two separate occasions. Reliability is expressed as the intraclass correlation based on the same 28 participants for which we estimated inter-observer point-to-point agreement at the phoneme level (91.9%) and measurement reliability (table 3)

Variable Intraclass correlation 95% Confidence Interval F Test with True Value 0
Lower Bound Upper Bound Value df1 df2 p
CEL-s .780 .556 .899 8.09 23 23 <.001
CET-s .653 .337 .836 4.76 22 22 <.001
CET-w .902 .788 .957 19.49 23 23 <.001
CPR-w .987 .973 .994 155.54 27 27 <.001

Note. Intraclass estimations were based on a single-rater, absolute-agreement, 2-way mixed-effects model

Complementary comparisons

Our diagnostic groups were formed based on two-dimensional quantitative evidence that indicated the presence or absence of AOS: slow production of multisyllabic words and high frequency of segmental distortion errors. There was likely additionally relevant heterogeneity within this classification. For example, it is possible that the relatively lower error consistency in the quantitatively defined AOS group could be explained by the fact that a minority of speakers in the APH comparison group presented with prominent frequencies of phonemic paraphasia and that the correction for error frequency may have masked qualitative differences between prototypical AOS and prototypical aphasia with phonemic paraphasia. It is also possible that speakers with relatively pure AOS would demonstrate different speech features than the vast majority of speakers with coexisting AOS and aphasia that were the focus of the study. For this reason, we inspected results for two small subgroups within the broader AOS and APH groups and compared them to two comparison groups that were matched on the basis of error frequency. Results are presented in Table 5.

Table 5.

Complementary comparisons for five speakers with phonemic paraphasia, all selected from the APH group and 5 speakers from the AOS group and matched for proportion of phoneme errors on the motor speech evaluation. The bottom two groups are two participants with AOS and performance above the cutoff for normal performance on the WAB-R and two matched participants with no AOS.

WAB-R Motor speech evaluation Repeated multisyllabic word task
Group Profile AQ Phoneme errors WSD Distortion errors Error frequency CEL-s CET-s CET-w CPR-w
APP (n=5)
APH Wernicke 76.3 .54 272.9 .02 .52 .93 .75 .63 .63
APH Conduction 69.0 .42 287.8 .04 .68 .90 .77 .53 .53
APH Conduction 71.2 .18 223.5 .04 .62 .98 .79 .75 .75
APH Conduction 59.0 .14 215.3 .03 .48 .88 .67 .44 .44
APH Wernicke 36.2 .14 286.7 .03 .17 .91 .80 .75 .88
AOS matched (n=5)
AOS Broca’s 42.5 .45 364.2 .15 .66 .84 .62 .36 .38
AOS Broca’s 71.6 .39 463.5 .33 .46 .76 .49 .00 .00
AOS Broca’s 70.5 .39 502.7 .11 .29 .75 .70 .31 .31
AOS Broca’s 55.1 .19 395.3 .09 .33 .79 .63 .20 .19
AOS Broca’s 55.7 .15 388.0 .16 .19 .70 .67 .17 .50
NABW and AOS (n=2)
AOS NABW 97.3 .16 346.06 .22 .05 .00 NA NA .88
AOS NABW 96.0 .15 412.99 .19 .04 .00 NA .75 .81
NABW matched (n=2)
APH NABW 94.4 .10 273.7 .07 .04 .86 .67 NA .88
APH NABW 95.0 .07 263.0 .00 .01 .00 NA NA .94

Note: WAB -R= Western Aphasia Battery-Revised; AQ = Aphasia Quotient; Phoneme errors = Proportion of target phonemes produced incorrectly on the motor speech evaluation; WSD = Word Syllable Duration (Haley et al., 2012); Distortion errors = Proportion produced phonemes with distortion error; Error frequency = Proportion of target phonemes produced incorrectly on the repeated word task; CEL-s = Consistency of error location for sound segments, CET-s = Consistency of error type for sound segments; CET-w = Consistency of error type for words; CPR-w Consistency of production for words; APP = Aphasia with Phonemic Paraphasia, NABW = Not aphasic by the WAB-R (AQ > 93.8); NA = Not Applicable due to insufficient data to calculate error consistency.

To compare AOS specifically to aphasia with prominent phonemic paraphasia, we extracted a subgroup consisting of APH participants who produced sound errors on 14% or more of the phonemes in the motor speech evaluation target words and displayed a WAB-R profile of either Conduction or Wernicke’s aphasia. There were five of them. We refer to their profile as aphasia with phonemic paraphasia (APP). Next, we matched them on a case-by-case basis to five participants from the AOS group who had a diagnosis of Broca’s aphasia and produced similar frequencies of phonemic errors. As shown, the error consistency scores were either the same or lower for the AOS speakers than for the APP speakers, corroborating the overall study results.

The bottom of Table 5 displays results for two participants from the AOS group who also scored above the 93.8 Aphasia Quotient cutoff for normal performance on the WAB-R (Kertesz, 2006) and produced at least 15% of the target phonemes on the motor speech evaluation in error. After auditory verification of sound files, the first three authors interpreted the profile to be mild AOS in both cases. For comparison purposes, we identified two speakers from the APH group who produced the greatest proportion of target phonemes incorrectly out of the seven participants in that group who scored above the WAB-R cutoff for aphasia. Due to the small number of errors produced on the experimental word repetition task (5% or lower for all four speakers), it was only possible to compare CEL-s and CPR-w, which were similar to those of the relatively pure AOS participants for these participants with minimal aphasia and no AOS.

Discussion

Speech sound errors are inconsistent in both aphasia and AOS

The purpose of our study was to determine whether speakers with aphasia and a quantitatively documented profile of AOS differ in error consistency from speakers with aphasia who do not have AOS. Based on simple inspection of means and medians, the AOS group displayed slightly greater consistency of error location than the APH group and substantially lower consistency of error type at both segment and word levels. However, there was extensive variation among participants and considerable distribution overlap among groups, including between the AOS and APH groups. To determine whether these descriptive differences should be attributed to the diagnostic profiles rather than measurement error or random variation, it is necessary to employ statistical inferencing with a model that accounts for obvious confounding factors. For our purposes, the most important factor was error frequency, which had a predictable relationship to at least two of the consistency metrics. As is typically the case in studies comparing people with AOS to people with aphasia and no AOS, our groups were not balanced in regard to the severity of their speech output. In comparison to the APH group, the AOS group produced more than twice as many sound errors on the single word motor speech evaluation task and 1.7 times as many errors on the experimental sequential word repetition task. Previous studies that have addressed the topic of error consistency in aphasia and AOS have reported similar imbalance between diagnostic groups—always with greater error frequency in the AOS group than in the APH group (Bislick et al., 2017; Haley et al, 2013; McNeil et al., 1995; Scholl et al., 2017).

Indeed, ANCOVA showed that error frequency was a significant covariate for three of the four consistency measures (CEL-s, CET-w, and CPR-w). Participants who produced more errors were more consistent in the location of those errors (CEL-s) and less consistent in the word variants they generated (CET-w and CPR-w). After accounting for error frequency, the AOS speakers were less consistent in error location (CEL-s) than the BORD2 group, and the AOS and BORD1 groups were less consistent in overall word production (CPR-w) than the APH group. No other group comparisons reached significance. Therefore, we conclude that there is either a) no consistency difference between aphasia with AOS and aphasia without AOS or b) a difference where errors are relatively less consistent in AOS. The latter conclusion is obviously contrary to the recommendation that diagnosticians consider relatively high error consistency a primary criterion for AOS (McNeil et al., 1997; 2009; Wambaugh et al., 2006). Our complementary inspection of the performance for participants with particularly prominent phonemic paraphasia and no evidence of AOS as well as participants with profiles consistent with AOS without aphasia per the WAB-R yielded the same conclusions: Speakers with AOS show either similar or lower levels of consistency compared to speakers with aphasia and phonemic paraphasia. As we observed, four of the five small group studies that have addressed the question reached the same conclusion (Bislick et al., 2017; Haley et al., 2013; Miller, 1992; Scholl et al., 2017).

The results are also in agreement with other phonetic transcription studies that focused on speakers with AOS and did not include a control group. These reports indicate that people with aphasia and AOS and people with relatively isolated forms of AOS generate varied phonetic output when they are asked to repeat words several times (Haley et al., 2018; LaPointe & Horner, 1976; Mlcoch, Darley, & Noll, 1982; Shuster & Wambaugh, 2008; Staiger et al., 2012). A substantial body of research using physiologic and acoustic methods corroborate the auditory perceptual results by demonstrating token-to-token inconsistency for speech coordination in both time and space (Blumstein, Cooper, Goodglass, Statlender, & Gottlieb, 1980; Haley, Ohde, & Wertz, 2000; Itoh, Sasanuma, Hirose, Yoshioka, & Ushijima, 1980; Katz, Machetanz, Orth, & Schönle, 1990; Liss & Weismer, 1992; Seddoh et al., 1996).

The converging evidence indicates strongly that the recommendation that AOS, in comparison to aphasia without AOS, should be diagnosed on the basis of errors that are “relatively consistent in terms of type and invariable in terms of location” (McNeil et al., 1997; 2009; Wambaugh et al., 2006) was premature and that its removal from diagnostic checklists was appropriate. Instead, it is essential to acknowledge the salient evidence that speech errors are very often inconsistent in speakers with AOS, particularly when whole words are considered. Clinical experience has conveyed this insight from the earliest modern endorsement of AOS as a disorder distinct from aphasia (Darley, 1964) to diagnostic practices by contemporary clinicians (Molloy & Jagoe, 2019). On the other hand, it would be inaccurate to infer that relative inconsistency of errors helps to differentiate between AOS and aphasia with phonemic paraphasia. The feature is instead evident in both profiles and its magnitude can be expected to vary with error frequency and therefore severity.

Derived metrics can be less reliable than the direct measures from which they originate

We were surprised to find that the generation of sound-level consistency measures from different phonetic transcriptions had such limited reliability, when the transcription itself was highly reliable in terms of both point-to-point agreement (91.9%) and error frequency (ICC = .979). As demonstrated by the re-coding of the primary phonetic transcription, some of the limitation can be explained by alignment, coding, and calculation judgments. Whereas intra-observer coding consistency was excellent for the word-level metrics, it was only in the “good” range for the sound-level metrics (ICC = .780 and .653; Koo & Li, 2016). To understand this difference, it is important to remember that the sound-level metrics were originally developed to analyze a small sample of speakers who rarely produced errors that were severe enough to affect listeners’ phonemic perception. McNeil and colleagues (1995) reported that their four speakers with AOS produced 79% of the target words correctly and that the four speakers with conduction aphasia produced 93% of the targets correctly. With such high accuracy, it should not be difficult to align the repeated attempts with each other in such a way that consistency can be determined on a phoneme-by-phoneme basis. The alignment is much more difficult with the higher error frequencies that have characterized our samples of speakers with aphasia and AOS. Consider for example the productions /dəkgæstrəfi/ and /dəkæstəfri/ for the target /kətæstrəfi/. It is ambiguous whether the /g/ or /k/ in the first production should be aligned with the consonant onset of the second target syllable. In our previous group study (Haley et al, 2013), we elected to analyze syllables instead of phonemes to reduce these alignment challenges and for our second report on error consistency (Haley et al., 2018), we developed additional operational definitions to maximize coding reliability at the phoneme level. The same definitions were employed in the present study, but it was evident from the intra-coder reliability results that they did not fully resolve the ambiguity. An additional problem is that the equations are such that even minor differences in alignment decisions could alter the measure substantially. A single phoneme alignment difference could, for example, determine whether phonemes aligned with a particular word position should be considered “consistent in location” and whether it was even possible to calculate consistency of error type.

As indicated by the poor reliability for the sound-level metrics across two very similar phonetic transcriptions, minor differences in phonetic transcription also contributed to measurement differences. For example, the two coders transcribed the first four productions of “impossibility” with exact agreement but for the fifth the main coder transcribed /ɪmpɑsəbɪli/ and the reliability coder transcribed /ɪmpɑsəbɪləti/. The minor disagreement resulted in CEL-s calculations of 0.74 for the primary coder and 0.22 for the secondary coder. The potential reliability difference between direct and derived measures is often underappreciated in speech-language pathology. In a recent study on prosody, we reported similar limitations for ratios expressing lexical stress based on acoustic measures that, by themselves, were highly reliable (Haley & Jacks, 2019). Error propagation does occur when equations are used to derive new measures (Taylor, 1997) and this measurement complication probably deserves much greater attention in our literature. Similar to our own assumption that direct measurement reliability would be sufficient affirmation of data quality, previous group comparisons of error consistency in AOS and aphasia demonstrated reliability of phonetic transcription but paid little or no attention to the reliability of the derived consistency measures (Bislick et al. 2017; Haley et al., 2018: McNeil et al., 1995).

It deserves mentioning that one of the consistency measures, CPR-w, displayed excellent coder reliability and good measurement reliability across phonetic transcriptions. In a previous study we also found that this measure was more strongly correlated with clinicians’ impressions of production consistency than were other word and syllable-level measures (Haley et al., 2013)1. Whereas consistency of sound error location and type requires transcription and careful coding, consistency of whole word production can be recognized (and potentially quantified) in real time. Of the four measures that have been examined in comparisons between speakers with aphasia and AOS and speakers with aphasia with phonemic paraphasia, CPR-w appears to be the most robust and most informative, even though—as we concluded—it does not facilitate differential diagnosis between the two disorders.

Measurements matter

The study of error inconsistency in aphasia and AOS demonstrates how important it is to use transparent and strong definitions when assessing and diagnosing speech disorders. It has been eminently evident, even within the relatively sparse literature on the topic, that vague characterizations promote diverse interpretations and can contribute to an illusion of controversy. As discussed and noted by others, different aspects of speech can be consistent and variable in the same person, simply depending on definition (Haley et al., 2018; Mlcoch et al., 1982; Shuster & Wambaugh, 2008; Staiger et al., 2012), and this in no way implies that results are incompatible. To increase diagnostic precision, clinicians need to operationalize and measure as carefully as possible the constructs they deem to be most meaningful (Darley, 1964).

There is an important difference between listing a criterion on a mental checklist or research report and documenting its presence quantitatively. As we noted at the onset, one of the fundamental problems with the AOS research literature is that behavioral definitions of the syndrome have changed over time. These fluctuations challenge scientific progress, because results are not easily synthesized across studies and the process of replication and extension cannot progress systematically. To increase diagnostic transparency, the objective of research reports should be to disclose the—likely diverse—presentation profiles of study participants, so others can form their own conclusions about grouping decisions and external validity. An all too common scenario is that diagnostic criteria are mentioned but not measured or documented. While this practice is better than not reporting criteria at all, it provides only minimal insight about the diagnostician’s strategy. For example, there is no way to know how criteria like “impaired prosody” or presence of “sound distortions and distorted substitutions” were interpreted by the research team and to what extent such features were actually present in the participants’ speech.

When relative consistency of error type and location is listed as a diagnostic criterion it is possible (even likely) that it was not measured. As we observed, consistency coding at the sound level depends on phonetic transcription, utterance alignment, and calculation of ratios. It is not something clinicians could feasibly accomplish based on impression. The risk is that the criterion was listed because the research team considered it important, but that corresponding features were not present in the actual speech output. As a case in point, participants in the original study by McNeil and colleagues were selected with a diagnostic checklist that was common at the time and included “variability of articulation and prosody on repeated trials for the same utterance” (McNeil et al., 1995; p. 44)—yet this was the one study that found errors to be relatively consistent in both location and type. Because features on checklists for AOS diagnosis are typically vaguely defined and have an undocumented relationship to the speech output, their impact is inherently limited. Though the present study demonstrated that relative consistency of sound error location and type is not a valid diagnostic criterion for AOS, we do not recommend dismissing results of the substantial body of research that, for a period of time, was conducted with this feature as a listed criterion for participant selection. Instead of debating the relative merits of the words on our diagnostic checklists and reconsidering research findings whenever a prevailing list is modified, it is time to invest in more explicit definitions of the evolving criteria and more valid and reliable methods for quantifying them.

Acknowledgments

We appreciate the diligent work of Kathryn Distefano, Jenna Hall, and Michael Smith whose careful phonetic transcription coding was essential to the project. We also express gratitude to Elizabeth Lacey and Sarah Grace Dalton who assisted with participant recruitment and assessment. The project was supported by grants NS092144-01 and DC011881 from the National Institutes of Health, and a Brigham Young University McKay School of Education Research Grant.

Appendix 1.

Formulas for consistency metrics at the segment level (CEL-s and CET-s) and word level (CET-w and CPR-w).

Metric Formula
CEL-s a / b a = Number of sound errors made consistently
b = Number of sound errors
CET-s 1 – (c / a) c = Number of different error types
a = Number of sound errors made consistently
CET-w 1 – ([d – 1] / [e – 1]) d = Number of incorrect word variants
e = Number of incorrect words
CPR-w 1 – ([f – 1] / [g – 1]) f = Number of word variants
g = Number of words produced

Footnotes

[1]

CPR-w = 1 – TTV, where TTV indicates total token variability as defined by Marquardt and colleagues (2004).

References

  1. Ballard KJ, Wambaugh JL, Duffy JR, Layfield C, Maas E, Mauszycki S, & McNeil MR (2015). Treatment for acquired apraxia of speech: A systematic review of intervention research between 2004 and 2012. American Journal of Speech-Language Pathology, 24(2), 316–337. doi: 10.1044/2015_AJSLP-14-0118 [DOI] [PubMed] [Google Scholar]
  2. Bislick L, McNeil M, Spencer KA, Yorkston K, & Kendall DL (2017). The nature of error consistency in individuals with acquired apraxia of speech and aphasia. American Journal of Speech-Language Pathology, 26(2S), 611–630. doi: 10.1044/2017_AJSLP-16-0080 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Blumstein SE, Cooper WE, Goodglass H, Statlender S, & Gottlieb J (1980). Production deficits in aphasia: a voice-onset time analysis. Brain and Language, 9(2), 153–170. doi: 10.1016/0093-934X(80)90137-6 [DOI] [PubMed] [Google Scholar]
  4. Boersma P, & Weenink D (2017). Praat: Doing phonetics by computer (6.0.31). Computer software. [Google Scholar]
  5. Cunningham KT, Haley KL, & Jacks A (2016). Speech sound distortions in aphasia and apraxia of speech: reliability and diagnostic significance. Aphasiology, 30(4), 396–413. doi: 10.1080/02687038.2015.1065470 [DOI] [Google Scholar]
  6. Darley FL (1964). Diagnosis and Appraisal of Communication Disorders. Englewood Cliffs, NJ.: Prentice-Hall. [Google Scholar]
  7. Darley FL, Aronson AE, & Brown JR (1975). Motor speech disorders. Philadelphia, PA: Saunders. [Google Scholar]
  8. Duffy JR (2013). Motor speech disorders: Substrates, differential diagnosis, and management (3rd ed.). St. Louis, MO: Mosby. [Google Scholar]
  9. Haley KL, Cunningham KT, Eaton CT, & Jacks A (2018). Error consistency in acquired apraxia of speech with aphasia: effects of the analysis unit. Journal of Speech, Language, and Hearing Research, 61(2), 210–226. doi: 10.1044/2017_JSLHR-S-16-0381 [DOI] [PubMed] [Google Scholar]
  10. Haley KL, & Jacks A (2019). Word-level prosodic measures and the differential diagnosis of apraxia of speech. Clinical Linguistics & Phonetics, 33(5), 479–495. doi: 10.1080/02699206.2018.1550813 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Haley KL, Jacks A, & Cunningham KT (2013). Error variability and the differentiation between apraxia of speech and aphasia with phonemic paraphasia. Journal of Speech, Language, and Hearing Research, 56(3), 891–905. doi: 10.1044/1092-4388(2012/12-0161) [DOI] [PubMed] [Google Scholar]
  12. Haley KL, Jacks A, de Riesthal M, Abou-Khalil R, & Roth HL (2012). Toward a quantitative basis for assessment and diagnosis of apraxia of speech. Journal of Speech, Language, and Hearing Research, 55(5), S1502–17. doi: 10.1044/1092-4388(2012/11-0318) [DOI] [PubMed] [Google Scholar]
  13. Haley KL, Jacks A, Richardson JD, & Wambaugh JL (2017). Perceptually salient sound distortions and apraxia of speech: A performance continuum. American Journal of Speech-Language Pathology, 26(2S), 631–640. doi: 10.1044/2017_AJSLP-16-0103 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Haley KL, Ohde RN, & Wertz RT (2000). Precision of fricative production in aphasia and apraxia of speech: A perceptual and acoustic study. Aphasiology, 14(5–6), 619–634. doi: 10.1080/026870300401351 [DOI] [Google Scholar]
  15. Haley KL, Smith M, & Wambaugh JL (2019). Sound distortion errors in aphasia with apraxia of speech. American Journal of Speech-Language Pathology, 28(1), 121–135. doi: 10.1044/2018_AJSLP-17-0186 [DOI] [PubMed] [Google Scholar]
  16. IBM Corp. Released 2019. IBM SPSS Statistics for Windows, Version 26.0. Armonk, NY: IBM Corp. [Google Scholar]
  17. Itoh M, Sasanuma S, Hirose H, Yoshioka H, & Ushijima T (1980). Abnormal articulatory dynamics in a patient with apraxia of speech: x-ray microbeam observation. Brain and Language, 11(1), 66–75. doi: 10.1016/0093-934x(80)90110-8 [DOI] [PubMed] [Google Scholar]
  18. Josephs KA, Duffy JR, Strand EA, Machulda MM, Senjem ML, Lowe VJ, … Whitwell JL (2013). Syndromes dominated by apraxia of speech show distinct characteristics from agrammatic PPA. Neurology, 81(4), 337–345. doi: 10.1212/WNL.0b013e31829c5ed5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Katz W, Machetanz J, Orth U, & Schönle P (1990). A kinematic analysis of anticipatory coarticulation in the speech of anterior aphasic subjects using electromagnetic articulography. Brain and Language, 38(4), 555–575. doi: 10.1016/0093-934x(90)90137-6 [DOI] [PubMed] [Google Scholar]
  20. Kertesz A (2006). Western Aphasia Battery-Revised. San Antonio, TX: Pearson. [Google Scholar]
  21. Koo TK, & Li MY (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155–163. doi: 10.1016/j.jcm.2016.02.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. LaPointe LL, & Horner J (1976). Repeated trials of words by patients with neurogenic phonological selection-sequencing impairment (apraxia of speech). Clinical Aphasiology (6), 261–277. [Google Scholar]
  23. Liss JM, & Weismer G (1992). Qualitative acoustic analysis in the study of motor speech disorders. The Journal of the Acoustical Society of America, 92(5), 2984–2987. doi: 10.1121/1.404364 [DOI] [PubMed] [Google Scholar]
  24. Marquardt TP, Jacks A, & Davis BL (2004). Token‐to‐token variability in developmental apraxia of speech: three longitudinal case studies. Clinical Linguistics & Phonetics, 18(2), 127–144. doi: 10.1080/02699200310001615050 [DOI] [PubMed] [Google Scholar]
  25. McNeil M, Odell KH, Miller SB, & Hunter L (1995). Consistency, variability, and target approximation for successive speech repetitions among apraxic, conduction aphasic, and ataxic dysarthric speakers. Clinical Aphasiology, 23, 39–55. [Google Scholar]
  26. McNeil MR, Robin DA, & Schmidt RA (1997). Apraxia of speech: Definition, Differentiation, and Treatment. In McNeil M (Ed.), Clinical management of sensorimotor speech disorders, Thieme: New York, pp. 311–344. [Google Scholar]
  27. McNeil M, Robin D, & Schmidt R (2009). Apraxia of speech: Definition, differentiation, and treatment. In NcNeil MR (Ed.), Clinical management of sensorimotor speech disorders (2nd ed). Thieme: New York, pp. 249–268. [Google Scholar]
  28. Miller N (1992). Variability in speech dyspraxia. Clinical Linguistics & Phonetics, 6(1–2), 77–85. doi: 10.3109/02699209208985520 [DOI] [PubMed] [Google Scholar]
  29. Miller N, & Wambaugh J (2017). Acquired apraxia of speech. In Papathanasiou I & Coppens P (Eds.), Aphasia and related neurogenic communication disorders (pp. 493–526). Burlington, MA: Jones & Bartlett Learning. [Google Scholar]
  30. Mlcoch AG, Darley FL, & Noll JD (1982). Articulatory Consistency and Variability in Apraxia of Speech. In Brookshire RH (Ed.), Clinical Aphasiology Conference Proceedings (pp. 235–238). Minneapolis, MN: BRK. [Google Scholar]
  31. Molloy J, & Jagoe C (2019). Use of diverse diagnostic criteria for acquired apraxia of speech: a scoping review. International Journal of Language & Communication Disorders. doi: 10.1111/1460-6984.12494 [DOI] [PubMed] [Google Scholar]
  32. Odell K, McNeil MR, Rosenbek JC, & Hunter L (1990). Perceptual characteristics of consonant production by apraxic speakers. Journal of Speech and Hearing Disorders, 55(2), 345–359. doi: 10.1044/jshd.5502.345 [DOI] [PubMed] [Google Scholar]
  33. Scholl DI, McCabe PJ, Heard R, & Ballard KJ (2017). Segmental and prosodic variability on repeated polysyllabic word production in acquired apraxia of speech plus aphasia. Aphasiology, 32(5), 1–20. doi: 10.1080/02687038.2017.1381876 [DOI] [Google Scholar]
  34. Seddoh SAK, Robin DA, Sim H-S, Hageman C, Moon JB, & Folkins JW (1996). Speech timing in apraxia of speech versus conduction aphasia. Journal of Speech and Hearing Research, 39(3), 590. doi: 10.1044/jshr.3903.590 [DOI] [PubMed] [Google Scholar]
  35. Shuster LI, & Wambaugh JL (2008). Token‐to‐token variability in adult apraxia of speech: A perceptual analysis. Aphasiology, 22(6), 655–669. doi: 10.1080/02687030701632161 [DOI] [Google Scholar]
  36. Smith M, Cunningham KT, & Haley KL (2019). Automating error frequency analysis via the phonemic edit distance ratio. Journal of Speech, Language, and Hearing Research, 62(6), 1719–1723. doi: 10.1044/2019_JSLHR-S-18-0423 [DOI] [PubMed] [Google Scholar]
  37. Staiger A, Finger-Berg W, Aichert I, & Ziegler W (2012). Error variability in apraxia of speech: A matter of controversy. Journal of Speech, Language, and Hearing Research, 55(5). doi: 10.1044/1092-4388(2012/11-0319) [DOI] [PubMed] [Google Scholar]
  38. Taylor JR (1997). An introduction to error analysis (2nd ed.) Sausalito, CA: University Science Books. [Google Scholar]
  39. Utianski RL, Duffy JR, Clark HM, Strand EA, Botha H, Schwarz CG, … Josephs KA (2018). Prosodic and phonetic subtypes of primary progressive apraxia of speech. Brain and Language, 184, 54–65. doi: 10.1016/j.bandl.2018.06.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Vitevitch MS, & Luce PA (2004). A web-based interface to calculate phonotactic probability for words and nonwords in English. Behavior Research Methods, Instruments, & Computers, 36(3), 481–487. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Wambaugh JL, Duffy JR, McNeil MR, Robin DA, & Rogers MA (2006). Treatment guidelines for acquired apraxia of speech: A synthesis and evaluation of the evidence. Journal of Medical Speech-Language Pathology, 14(2), 15–34. [Google Scholar]
  42. Wertz RT, LaPointe LL, & Rosenbek JC (1984). Apraxia of speech in adults: The disorder and its management. Orlando, FL: Grune & Stratton. [Google Scholar]

RESOURCES