Abstract
Background:
Discourse analyses yield quantitative measures of functional communication in aphasia. However, they are historically underutilized in clinical settings. Confrontation naming assessments are used widely clinically and have been used to estimate discourse-level production. Such work shows that naming accuracy explains moderately high proportions of variance in measures of discourse, but proportions of variance remain unexplained. We propose that the inclusion of circumlocution productions into predictive models will account for a significant amount more of the variance. Circumlocution productions at the naming-level, while they may not contain the target word, are similar to the content that contributes to discourse informativeness and efficiency. Thus, additionally measuring circumlocution may improve our ability to estimate discourse performance and functional communication.
Aim:
This study aimed to test whether, after controlling for naming accuracy, the addition of a measure of circumlocution into predictive models of discourse-level informativeness and efficiency would account for a significant amount more of the variance in these discourse-level outcomes.
Methods & Procedures:
Naming and discourse data from 43 people with poststroke aphasia were analyzed. Naming data were collected using 120 pictured items and discourse data were collected using two picture description prompts. Data scoring and coding yielded measures of naming accuracy, incorrect response type, communicative informativeness, and efficiency. We used robust hierarchical regression to evaluate study predictions.
Outcomes & Results:
After controlling for naming accuracy, the inclusion of circumlocution into predictive models accounted for a significant amount more of the variance in both informativeness and efficiency. The subsequent inclusion of other response types, such as real word and nonword errors, did not account for a significant amount more of the variance in either outcome.
Conclusions:
In addition to naming accuracy, the production of circumlocution during naming assessments may correspond with measures of informativeness and efficiency at the discourse-level. Reducing the burden of estimating patients’ functional communication will increase our ability to estimate functional communication using tools that are easy to administer and interpret.
Keywords: naming, circumlocution, discourse
Introduction
Language serves many functions in human interaction, including the exchange of information and ideas (Armstrong & Ferguson, 2010). In aphasia, the ability to use speech and language to express information and ideas in everyday situations is often evaluated via discourse sample analysis (Armstrong, 2000). Discourse analyses are designed to quantify the microlinguistic (i.e., lexical processing, grammatical formation) and macrolinguistic processes (i.e., pragmatics, cohesion) that occur during connected speech (Andreetta et al., 2012; Armstrong 2000; Linnik et al., 2016). A shift in the last few decades has led to the development of discourse analysis methods that are designed to capture people with aphasia’s everyday, functional use of language (Armstrong, 2000; Holland, 1982; Linnik et al., 2016). These approaches aim to quantify the person’s ability to successfully convey meaning and information despite their language production difficulties (Linnik et al., 2016). Overall, these measures yield scores thought to estimate people with aphasia’s communicative effectiveness during daily life.
In a review of discourse analysis procedures, Bryant and colleagues identified 536 unique measures reported in the literature, grouped into three main clusters: measures related to language productivity, information content, and grammatical complexity (2016). Of these, language productivity measures (e.g., word finding behaviors, lexical diversity) and information content (e.g., semantic/conceptual content, cohesion) were most frequently reported (Bryant et al., 2016). Scores yielded from these measures factor in a multitude of production types—nouns, verbs, story propositions, cohesive ties—and thus estimate language use across multiple linguistic domains and can be used to estimate the informativeness of the speaker’s verbal communication in daily life (Armstrong & Ferguson, 2010).
Despite discourse analyses’ ability to capture functional language use, they are more frequently used in research than in clinical settings due to clinicians’ valid concerns with practicality and efficiency (Bryant et al., 2016). Clinician-reported barriers to implementing discourse analyses in regular clinical practice include: time required to collect, analyze, and interpret samples; feelings of not possessing the skills/knowledge necessary to accurately perform analyses; and lack of consensus on which types of discourse elicitation approaches offer the most representative measure of everyday language use (Marini et al., 2007; Togher, 2001). A 2004 survey on outcome measurements used by Speech-Language Pathologists (SLP) in the U.S. and Canada yielded 336 reported outcome measurements—only 3.8% of which were measures of discourse or narrative speech (Simmons-Mackie et al., 2005). Nearly two decades later, in a 2021 survey of SLPs in the U.S., only 1% of their respondents reported using discourse analysis as an outcome measurement tool (Tierney-Hendricks et al., 2022). Ultimately, despite their functional diagnostic value, these measures are generally underutilized in clinical settings.
To address this issue, researchers have sought to estimate the relationship between word retrieval at the confrontation naming-level and performance at the discourse-level, with the intention to use naming performance to predict discourse-level performance (for review, see Richardson et al., 2018). Naming assessments are used widely in clinical practice and require less time to score and interpret. Additionally, naming scores are found to correspond with the overall integrity of the language system, and therefore are thought to offer an optimal measure with which to estimate discourse-level production (Fergadiotis & Wright, 2016; Fridriksson et al., 2010; Herbert et al., 2008). Previous works examining the relationship between naming and discourse have used a variety of tasks to measure naming, including the Western Aphasia Battery – Revised (WAB-R; Kertesz, 2007), Boston Naming Test and its Short Form (BNT-2; Kaplan et al., 2001), Northwestern Assessment of Verbs and Sentences Verb Naming Test (Cho-Reyes & Thompson, 2012), Greek Object and Action Test (Kambanaros, 2010), Test of Adolescent and Adult Word Finding (TAWF; German, 1990), and other non-standardized object and action naming tasks (Bastiaanse & Jonkers, 1998; Fergadiotis & Wright, 2016; Herbert et al., 2008; Kambanaros, 2010; Mayer & Murray, 2003; Pashek et al., 2002). These studies have also used a variety of tasks to measure discourse, including personal narratives, picture descriptions, story retells, video narration, and conversation (Bastiaanse & Jonkers, 1998; Fergadiotis & Wright, 2016; Herbert et al., 2008; Kambanaros, 2010; Mayer et al., 2003; Pashek & Tompkins, 2002, Richardson et al., 2018). Across these studies, discourse outcomes of interest included measures of informativeness and efficiency, such as proportion of Correct Information Units (CIUs; Nicholas & Brookshire, 1993) and number of nouns and verbs (Richardson et al., 2018). Positive correlations between naming scores and one’s informativeness in discourse are found to be relatively strong, but on the whole, vary from low to high (Richardson et al., 2018). Therefore, while naming performance has been found to explain a proportion of the variance in measures of discourse production, a significant proportion of variance remains unexplained (Fergadiotis & Wright, 2016). This fact may be due to differences in how these two levels of production are currently measured. More specifically, naming is scored based on accuracy, whereas discourse is often scored based on informativeness.
For instance, CIU analysis, identified as one of the most frequently applied measures of discourse production (Bryant et al., 2016), quantifies how well a person communicates information to listeners regardless of how closely their productions conform to standardized rules and patterns (Armstrong, 2000; Nicholas & Brookshire, 1993). CIUs are defined as words that are “intelligible in context, accurate in relation to the picture(s) or topic, and relevant to and informative about the content of the picture(s) or the topic” (Nicholas & Brookshire, 1993, p. 348). To calculate informativeness and efficiency, the number of CIUs produced is compared against the total number of words produced and amount of speaking time (Nicholas & Brookshire, 1993). Comparing CIUs to the total number of words yields a score of communicative informativeness (%CIUs) and calculating the average number of CIUs produced per minute yields a score of communicative efficiency (CIUs/min) (Nicholas & Brookshire, 1993). To be counted as a CIU and contribute to these measures, a word need not necessarily conform to a predetermined set of standardized noun and verb tokens. Such criteria allow for a wider array of verbal productions to contribute to calculations of informativeness and efficiency. Ultimately, any productions considered relevant and informative about the picture or topic contribute to %CIUs and CIUs/min, thus capturing people with aphasia’s functional and flexible use of language beyond predetermined targets. This approach has proven effective, as CIUs are found to correspond with listeners’ perceptions of speakers’ informativeness and are therefore thought to provide an ecologically valid measure of communicative informativeness (Jacobs, 2001; Webster & Morris, 2019).
Conversely, when naming performance is quantified, responses are typically measured using a dichotomous scale: correct or incorrect. Generally, if the person’s verbal response matches or contains the target or is deemed an appropriate alternative, the response is counted as correct. Responses that do not match or contain the target word are counted as incorrect. Therefore, naming assessments yield a score based on the number of responses that match the assessment’s intended targets, and do not count other informative responses. This procedure is in contrast with response scoring in a discourse analysis method such as CIU analysis, in which productions that serve to convey meaning—which need not necessarily be in the form of an expected target word—are counted towards the overall score. The discordance between the types of verbal productions that are counted towards discourse-level outcomes versus those counted towards naming-level outcomes may account for some of the aforementioned unexplained variance in discourse outcomes. We propose that an expanded measure that encompasses both correct responses and incorrect responses that convey meaning or information about the target at the naming-level may predict communicative informativeness at the discourse-level and account for more of the variance in these outcomes. In other words, the ability to predict informativeness and efficiency of one’s discourse may be improved by quantifying certain “incorrect” productions at the naming-level which, despite being incorrect, still contain accurate, relevant, and informative information about the target.
Some naming assessments provide coding systems that classify “incorrect” responses, such as semantic, phonemic, and mixed paraphasias, circumlocution, neologisms, stereotypies, and perseveration. These classifications may be used to inform clinicians’ overall impressions of the person’s language profile, but (again) do not get factored into the overall score. Interestingly, investigations that have analyzed incorrect responses during naming report that changes in the types of incorrect responses that people with aphasia make over time (e.g., in response to treatment) may indicate changes and/or improvements in linguistic activation and processing (Edmonds & Kiran, 2006; Kendall et al., 2013; Kiran & Thompson, 2003; Minkina et al., 2016). Therefore, despite being incorrect in relation to the target, incorrect responses can inform clinicians’ and researchers’ understandings of their clients’ or participants’ language systems. Of particular relevance to this investigation is circumlocution.
Circumlocution is when a person with aphasia “compensates for a word retrieval failure by telling something about the object, in lieu of naming it” (Goodglass & Wingfield, 1997, p. 15). When a person circumlocutes, they are typically describing or talking around the target (e.g., “it has webbed feet, says ‘quack.’”). Error coding ‘circumlocutions’ or ‘descriptions’ identifies responses that provide a characterization of the target or attempt to explain/describe its function, use, or purpose (Nicholas et al., 1989; Roach et al., 1996). Circumlocution can facilitate self-cued naming (Francis et al., 2002) and, importantly, is shown to assist in listener comprehension in moments where target retrieval fails (Antonucci & MacWilliam, 2015; Falconer & Antonucci, 2012; Tompkins et al., 2006). For example, Tompkins and colleagues found that when people with aphasia produced descriptions of objects’ use, function, and outward characteristics in the absence of target word production, observers were able to correctly guess/predict the intended targets in as many as 70% of instances (2006). One participant’s naming accuracy was ~10%, but with their use of descriptors in the absence of target word production, observers correctly predicted the intended targets in ~70% of instances (Tompkins et al., 2006). To our knowledge, these same benefits to listener comprehension have not been found for paraphasias, neologisms, or other incorrect response types.
Circumlocutions typically contain words that are accurate, relevant, or provide information about targets. Thus, within the context of discourse-level informativeness and efficiency, circumlocution would yield a greater number of words that are accurate, relevant, and informative to the picture/topic. If a person with aphasia uses circumlocution in a moment of word retrieval difficulty during discourse, logically they are producing more total words and accurate/relevant words (i.e., CIUs). An increase in CIUs resulting from circumlocution would result in an increase in measures of informativeness (i.e., a larger proportion of the words are relevant, informative) and efficiency (i.e., more time is spent producing relevant, informative content). Therefore, the production of circumlocutions at the discourse-level can increase scores of informativeness and efficiency. Generally, the same cannot be said for paraphasias, neologisms, stereotypies, etc. In fact, the presence of such productions in discourse is shown to yield lower listener perceptual ratings of clarity, organization, intelligibility, and even competence (Duffy et al., 1980; Harmon et al., 2016; Kong & Wong, 2018). We propose that circumlocution should be considered independently of other incorrect productions, and should be included in attempts to use naming performance to predict discourse-level informativeness and efficiency (Figure 1).
Figure 1.

This figure depicts how different forms of target word production (correct production, absent production, circumlocution production, semantic paraphasia, phonologic paraphasia, etc.) are generally handled in relation to quantifying naming accuracy and CIU discourse measures.
This study aimed to test the prediction that the inclusion of circumlocution into predictive models of communicative informativeness (%CIUs) and efficiency (CIUs/min) will account for a significant amount more of the variance than models that contain naming accuracy alone. First, we hypothesized that the inclusion of circumlocution in a predictive model would account for a significant amount more of the variance in %CIUs, because the production of circumlocutions at the discourse-level corresponds to more CIUs. We also hypothesized that the inclusion of circumlocution in a predictive model would account for a significant amount more of the variance in CIUs/min, because the production of circumlocutions corresponds with more time producing relevant and informative content. We further predicted that the secondary inclusion of other types of error-coded productions (i.e., real word and nonword errors) would not account for significant amounts of additional variance in either outcome.
Materials and Methods
Design
Data used in this study come from the Brain-Based Understanding of Individual Language Differences After Stroke (BUILD) Project, a prospective, observational case-control study that aims to better understand the unique patterns of communication and cognitive strengths and weaknesses present in people following stroke (Clinicaltrials.gov Identifier: NCT04991519). It was approved by the Georgetown University Institutional Review Board (IRB). The current study is a retrospective, cross-sectional analysis of a subset of participant data from the BUILD project.
Participants
Participants were recruited through author P.E.T.’s clinic and consultation service at MedStar National Rehabilitation Hospital (NRH), via referral from SLP and other clinicians, NRH Advanced Recovery Registry, and the Stroke National Capital Areal Network for Research Participant Database via e-mail advertisements, physical flyers, hard copies (i.e., mail), word of mouth, and online advertisements. Inclusion criteria for stroke survivor participants included being at least 18 years old, having a stroke in the left hemisphere of the brain, and having learned English at eight years of age or younger. To be included in analyses, participants had to achieve a score that classified them as having aphasia on the WAB-R (Kertesz, 2007). Exclusion criteria included a history of other brain conditions that could impact result interpretation (such as multiple sclerosis, dementia, traumatic brain injury, and right hemisphere stroke), severe psychiatric conditions that would interfere with study participation (such as schizophrenia, a history of psychiatric disease requiring hospitalization, electroconvulsive therapy, or ongoing medication use other than common antidepressants), and history of learning disability that could impact result interpretation. Participants provided informed consent approved by the Georgetown University IRB.
At the time of analysis, there were 73 participants in the BUILD dataset. Thirty participants were excluded from analyses; 22 of whom did not achieve a WAB-R Aphasia Quotient that classified them as having aphasia (i.e., WAB-R AQ > 93.8), one had multiple strokes, and seven had incomplete naming or discourse data at the time of analysis. This left 43 participants who met inclusion criteria and had complete data. Their demographic, stroke, naming, and discourse data are included in Table 1.
Table 1.
Participant demographic data, stroke type and chronicity data, WAB-R results, and summary of data collected from confrontation naming and discourse tasks (n = 43)
| Basic demographic data | Age | Mean (± SD) | 60.26 years (±12.45) |
| Range | 39.37 – 92.18 | ||
| Gender | Women (% of sample) | n = 15 (34.88%) | |
| Men | n = 28 (65.11%) | ||
| Race | African American | n = 18 (41.86%) | |
| Caucasian | n = 25 (58.13%) | ||
| Ethnicity | Hispanic or Latino (% of sample) | n = 1 (2.32%) | |
| Non-Hispanic or Latino | n = 42 (97.67%) | ||
| Handedness | Right (% of sample) | n = 39 (90.69%) | |
| Left | n = 4 (9.30%) | ||
| Stroke and aphasia data | Type of Stroke | Ischemic (% of sample) | n = 33 (76.74%) |
| Hemorrhagic | n = 7 (16.28%) | ||
| Ischemic + Hemorrhagic | n = 3 (6.98%) | ||
| Time Post-stroke | Mean (± SD) | 4.20 years (± 4.45) | |
| Median | 2.23 | ||
| Range | 0.27 – 16.73 | ||
| WAB-R Aphasia Quotient | Mean (± SD) | 67.86 (± 22.73) | |
| Median | 75.2 | ||
| Range | 13.2 – 93.6 | ||
|
WAB-R
Classification |
Broca’s (% of sample) | n = 11 (25.58%) | |
| Transcortical Motor | n = 3 (6.98%) | ||
| Global | n = 1 (2.33%) | ||
| Wernicke’s | n = 1 (2.33%) | ||
| Transcortical Sensory | n = 1 (2.33%) | ||
| Conduction | n = 5 (11.63%) | ||
| Anomic | n = 21 (48.84%) | ||
| Confrontation naming performance summary data | Naming Accuracy (%) | Mean (± SD) | 53.23 (± 29.75) |
| Median | 59.17 | ||
| Range | 0 – 94.17 | ||
|
Percent Descriptions (%)
(circumlocutions) |
Mean (± SD) | 8.03 (± 11.77) | |
| Median | 3.49 | ||
| Range | 0 – 46.61 | ||
| Percent Real Word Errors (%) | Mean (± SD) | 40.69 (± 19.26) | |
| Median | 42.31 | ||
| Range | 0 – 79.59 | ||
| Percent Nonwords Errors (%) | Mean (± SD) | 31.42 (± 20.75) | |
| Median | 33.33 | ||
| Range | 0 – 70 | ||
| Percent Stereotypies (%) | Mean (± SD) | 0.58 (±2.33) | |
| Median | 0 | ||
| Range | 0 – 13.3 | ||
| Percent Perseveration (%) | Mean (± SD) | 5.05 (± 5.93) | |
| Median | 3.51 | ||
| Range | 0 – 20 | ||
| Discourse performance summary data |
%CIUs
(Measure of informativeness) |
Mean proportion (± SD) | 0.56 (± 0.27) |
| Median | 0.61 | ||
| Range | 0 – 0.91 | ||
|
CIUs/min
(Measure of efficiency) |
Mean (± SD) | 32.50 (± 24.24) | |
| Median | 32.84 | ||
| Range | 0 – 81.18 |
Data Collection, Sharing, and Coding
Data were collected at Georgetown University and MedStar National Rehabilitation Hospital, with some home visits. A data sharing agreement was established between Georgetown University and the MGH Institute of Health Professions.
Naming Data Collection
Participants completed a 120-item naming assessment that comprised of the two 30-item Philadelphia Naming Test – Short Forms (PNT; Walker & Schwartz, 2012) and 60 items previously used for studies on Inner Speech reports (Fama et al., 2019; Stimuli available at cognitiverecoverylab.com/researchers). Testers included a research associate, research SLPs, and a trained research assistant. Picture stimuli were presented on a computer screen using E-Prime 3.0 software (Psychology Software Tools, Pittsburgh, PA). For the PNT, participants were told, “Please name the pictures when they appear. Please say only one word. Give the best one-word name for each picture. What really counts is the first thing you say. But if you make a mistake, you can try to fix it.” For the Inner Speech items, participants were told, “You will see a picture on the screen. Please name the picture as soon as you see it. Give the best one-word name for each picture. What really counts is the first thing you say. But if you make a mistake, you can try to fix it.” Participants were given 20 seconds to make a response.
Naming Data Scoring and Coding
Naming assessments were video and audio recorded for transcribing, scoring, and coding. A BUILD naming error coding protocol was used to formalize transcription, scoring, and coding rules and procedures. All study members were trained on this protocol. For each item, the Tester identified the first complete attempt and scored its accuracy. If the response was incorrect, Testers further coded the response by identifying the Type of Attempt (see below) and flagged if it had a semantic or phonological/orthographic relationship with the target, contained perseveration, or contained error detection and self-correction attempts. Rules governing each response type were adapted from the PNT Scoring documentation (Roach et al., 1996). The Tester was the first rater, who scored and coded all responses. To ensure reliability, 10% of the naming data were scored and coded by a second blind rater, and ratings were compared for agreement. Third raters were assigned to resolve any discrepancies that could not be resolved by the first and second raters, or to act as the second rater if the assigned second rater was also the tester. Discrepancies and ambiguous responses were also discussed at consensus meetings. Interrater reliability for naming accuracy (Cohen’s κ = 0.89) and type of response (Cohen’s κ = 0.85) were both strong (McHugh, 2012).
Operational Definitions.
A first complete attempt was an utterance containing both a consonant and a vowel, not including schwa. A correct response was any attempt that matched or contained the target, or was deemed an appropriate alternative. Special issues in determining correctness included plural leniency, dialectal variation, and dysarthria and apraxia leniency. If participants responded with a plural version of the target, the response was counted as correct. If participants produced a response consistent with their dialect, the response was counted as correct. Responses containing dysarthric errors were distinguished from aphasic errors by SLPs on the coding team, were coded as correct, and discussed at consensus meetings. Incorrect responses were any responses that fell beyond these bounds.
Incorrect responses were further coded by their Type of Attempt, including: description, real word, nonword, stereotypy, perseveration, or unintelligible. Description responses were responses that provided a characterization of the target, attempted to explain its function/purpose, or identified a personal relation to the target. Descriptions could contain semantic association information (e.g., “ride it,” for bike), categorical semantic information (e.g., “type of animal” for cat), or phonological/orthographic information (e.g., “starts with a D” for dog). These responses were used as our measure of circumlocution. Real word responses were target attempts that were definable words, and nonword responses were target attempts that were not definable words. Stereotypies were responses that contained a word or phrase repetitively used by the individual. Perseverations were responses that were repetitions of earlier responses within the same test, excluding stereotypies.
Narrative Data Collection
Participants were asked to describe the Picnic Scene from the WAB-R (Kertesz, 2007) and the Cookie Theft Picture from the Boston Diagnostic Aphasia Examination – 3rd Edition (BDAE-3; Goodglass et al., 2001). Time limits were not given for narrative production.
Narrative Data Coding
All narrative samples were independently reviewed by two lab members: a Transcriber and a Coder. Transcribers and Coders included research SLPs and trained research assistants. A BUILD narrative analysis protocol was used to formalize transcription and coding rules and procedures. All study members were trained on this protocol. The Transcriber viewed the video recordings and transcribed the two descriptions in Microsoft Word. Transcribers were instructed to transcribe everything said by the tester and participant, including real words, pseudowords, laughter, oral spelling, non-speech (e.g., coughs), self-talk/side-talk (e.g., “Oh, this thing again!”), and fillers. Transcribers also noted timestamps, which included start and end times of speech productions for the participant and the Tester. Tester speaking time was removed from total speech sample time, along with non-speech (e.g., coughs, throat clearing), laughter, and self-talk or side-talk that lasted longer than two seconds. To ensure reliability of narrative transcription, 66% of the transcriptions (57/86) were reviewed and verified by a second person.
Coders watched the video recordings to identify utterance breaks and make any changes to the transcription if there were minor disagreements. Coders used syntax, intonation, and pauses to make judgements about where utterances began and ended (Saffran et al., 1989). Next, Coders identified and coded word types, including nouns, verbs, repeated words, nonwords, morphemes, and false starts. They then identified any further meaningless information, such as qualifiers/modifiers (e.g., “Apparently…”), irrelevant words or tangents, and inaccurate words (e.g., “cat” for “dog”). To ensure reliability of narrative coding, 34% of the samples’ codes (30/86) were reviewed and verified by a second Coder. Second Coders were always research SLPs.
Once a narrative sample was transcribed and coded, it was transferred into an Excel spreadsheet template for scoring. A MATLAB script was then used to generate summary scores for each test administration, including CIUs per minute and Percent of Words that are CIUs. Two members of the research team (AD and CvdS) checked the first several entries of the narrative data, utterance-by-utterance, to ensure accuracy of the MATLAB-generated summary scores—which they confirmed to be accurate. For the remaining narrative samples, the Coder visually scanned the transcription in the MATLAB window to ensure all utterances and content were transferred correctly from the Excel spreadsheet into MATLAB. Consistent with prior work, CIUs were words that were intelligible, accurate, informative, and relevant to the picture (Nicholas & Brookshire, 1993). ‘Words’ were defined as words or nonwords that were transcribable, but did not have to be accurate, relevant, or informative to the picture. The inclusion of nonwords into the total word count differs from the Nicholas and Brookshire CIU protocol, in which nonwords are not counted as words (1993). We made this decision to account for all types of verbal productions within the total word count, such as the jargon that is often produced in fluent aphasia. CIUs per minute (CIU/min) was calculated by dividing the total number of CIUs produced by the number of minutes the participants spoke. Percent of Words that are CIUs (%CIUs) was calculated by dividing the total number of CIUs by the total number of words.
Analytic Plan
Prior work finds that naming accuracy accounts for a moderately high amount of the variance in informativeness and efficiency at the discourse-level (R2= .36–.46; Fergadiotis & Wright, 2016; Herbert et al., 2008; Richardson et al., 2018). We used hierarchical regression to test our predictions that adding a measure of circumlocution to predictive models of informativeness and efficiency would account for a significant amount more of the variance in these outcomes. Hierarchical regression allows for model comparison and provides estimates of the unique variance accounted for by each level, i.e., the individual predictors or sets of predictors, within the model (Field et al., 2012).
We compiled participants’ naming assessment results across all 120 stimuli, which yielded continuous numerical values corresponding to Percent Naming Accuracy and the proportions of incorrect responses that were descriptions, real words, nonwords, stereotypies, and perseverations. Then we compiled participants’ narrative discourse results for each picture description, which included two continuous numerical values corresponding to average %CIUs and average CIUs/min.
Statistical analyses were conducted using R (R Core Team, 2021) in RStudio (RStudio Team, 2021). For the first model predicting informativeness, the outcome variable was %CIUs. In Step 1 of the communicative informativeness model (Model 1a), we entered Percent Naming Accuracy, as it is an established predictor of informativeness (Fergadiotis & Wright, 2016). In Step 2 (Model 1b), we entered Percent Descriptions. In Step 3 (Model 1c), we entered the Percent Real Word Errors, Percent Nonword Errors, Percent Stereotypies, and Percent Perseverations. We decided to enter the predictor variables into the model in this order to isolate the unique variance accounted for by each set of predictors, in line with our hypothesis. We examined the R2 at each step and conducted an analysis of variance (ANOVA) to compare Model 1a, Model 1b, and Model 1c (Field et al., 2012). For the second model predicting efficiency, the outcome variable was CIUs/min, and the predictor variables were the same. We entered the predictor variables in the same order as the three steps described above, starting with naming accuracy only (Model 2a), adding descriptions (Model 2b), and then adding all other types of responses (Model 2c). Again, we examined the R2 at each step, and conducted an ANOVA to compare the three models (Model 2a, Model 2b, Model 2c).
Results
Data Preparation
Before running the models, we assessed for univariate outliers on all outcome and predictor variables by calculating z-scores using base R. Two cases were identified as univariate outliers. We assessed for multivariate outliers by calculating Mahalanobis Distances using the mahalanobis function from the ‘stats’ package (R Core Team, 2021). No cases were identified as multivariate outliers. The two univariate outliers were retained at this point, with the intention of assessing for influential cases after running the models. We assessed for multicollinearity with a correlation matrix using the cor function from the ‘stats’ package (R Core Team, 2021). No correlations between predictors exceeded 0.60 and thus were not considered highly correlated (Figure 2) (Field et al., 2012).
Figure 2.

Correlation matrix depicting correlations between predictor variables. No correlations exceeded 0.6, and therefore, were included in analyses as planned.
Communicative Informativeness
We conducted a three-step hierarchical regression to evaluate the prediction of %CIUs from Percent Naming Accuracy, Percent Descriptions, Percent Real Word Errors, Percent Nonword Errors, Percent Stereotypies, and Percent Perseverations using the lm( ) function in the ‘stats’ package (R Core Team, 2021), in line with our hypothesis and the steps described above. We then examined model diagnostics to assess whether the final model met assumptions of the general linear model and assessed for influential cases. No variance inflation factor (VIF; which measures the amount of multicollinearity between predictors) values exceeded 4 (Salmerón et al., 2018). Diagnostic plots generated using the plot( ) function (R Core Team, 2021) showed several cases with high residuals and standardized residuals, but the standardized residuals vs. standardized fitted values plot looked normal. We assessed for influential cases by extracting leverage and Cook’s distance statistics and identified four cases beyond the calculated cut-offs. Instead of removing these cases, we decided to retain them and use robust regression. Robust regression is used to decrease or attenuate the impact, influence, or leverage that outliers and influential cases have on the model (Field & Wilcox, 2017; Susanti et al., 2014).
We used the lmrob( ) function in the ‘robustbase’ R package, which uses an MM-type estimator, (Maechler et al., 2022) to run the robust regression. Percent Naming Accuracy contributed significantly to the model, F(1, 42) = 26.38, p <.001, R2 =.622 (b = 0.008, t = 5.14, p < .001) (Table 2, Model 1a). Introducing Percent Descriptions in Step 2 accounted for a significant change in R2, F(1, 41) = 5.16, p =.02, ΔR2 =.015 (b = 0.005, t = 2.27, p = .03) (Table 2, Model 1b). Introducing Percent Real Word Errors (b = 0.0002, t = 1.11, p = .92), Percent Nonword Errors (b = 0.0004, t = 0.26, p = .80), Percent Stereotypies (b = −0.011, t = −0.90, p = .37), and Percent Perseverations (b = 0.002, t = 0.24, p = .81) in Step 3 did not account for a significant change in R2, F(4, 40) = 2.37, p =.67, ΔR2 = −.017 (Table 2, Model 1c). Therefore, the addition of Percent Descriptions accounted for a significant amount more of the variance in %CIUs than the first model with Percent Naming Accuracy alone, but the subsequent addition of overt errors did not. Table 2 contains a summary of the robust regression results.
Table 2.
Robust hierarchical regression results predicting %CIUs
| Hierarchical regression steps | |||
|---|---|---|---|
| (Model 1a) | (Model 1b) | (Model 1c) | |
| Intercept | 0.167* (−0.009, 0.343) |
0.096 (−0.055, 0.247) |
0.108 (−0.173, 0.390) |
| Percent Naming Accuracy |
0.008*** (0.005, 0.010) |
0.008*** (0.006, 0.010) |
0.008** (0.002, 0.013) |
| Percent Descriptions |
0.005** (0.001, 0.009) |
0.004 (−0.001, 0.010) |
|
| Percent Real Word Errors | 0.0003 (−0.005, 0.005) |
||
| Percent Nonword Errors | 0.0005 (−0.003, 0.004) |
||
| Percent Stereotypies | −0.011 (−0.035, 0.013) |
||
| Percent Perseverations | 0.002 (−0.011, 0.015) |
||
| Observations | 43 | 43 | 43 |
| R2 | 0.631 | 0.654 | 0.674 |
| Adjusted R2 | 0.622 | 0.637 | 0.620 |
| Residual std. error | 0.148 (df = 41) | 0.154 (df = 40) | 0.143 (df = 36) |
Note:
p < 0.1;
p < 0.05;
p < 0.01
95% CI in parentheses
Communicative Efficiency
We conducted another three-step hierarchical regression to evaluate the prediction of CIUs/min from Percent Naming Accuracy, Percent Descriptions, Percent Real Word Errors, Percent Nonword Errors, Percent Stereotypies, and Percent Perseverations using the lm( ) function in the ‘stats’ package (R Core Team, 2021), in line with our hypothesis and the steps described above. We then examined model diagnostics. No VIF values exceeded 4 (Salmerón et al., 2018). We assessed for homoscedasticity using the plot( ) function (R Core Team, 2021) which showed several cases with high residuals and standardized residuals, but the standardized residuals vs. standardized fitted values plot looked normal. Two cases were identified as having high leverage values or Cook’s distance values beyond the calculated cut-offs. Again, we decided to retain these cases and use robust regression.
We again used the lmrob( ) function in the ‘robustbase’ R package (Maechler et al., 2022) to run the robust regression. Percent Naming Accuracy contributed significantly to the model, F(1, 42) = 40.29, p < .001, R2 = .411 (b = 0.50, t = 6.35, p < .001) (Table 3, Model 1b). Introducing Percent Descriptions in Step 2 accounted for a significant change in R2, F(1, 41) = 15.73, p <.001, ΔR2 = .083 (b = 0.49, t = 3.97, p < .001) (Table 3, Model 2b). Introducing Percent Real Word Errors (b = 0.36, t = 0.94, p = .35), Percent Nonword Errors (b = −0.16, t = −0.71, p = 0.48), Percent Stereotypies (b = 0.48, t = 0.39, p = .70), and Percent Perseverations (b = −0.11, t = −0.11, p = .91) in Step 3 did not account for a significant change in R2, F(4, 40) = 6.26, p =.18, ΔR2 =.032 (Table 3, Model 3b). Therefore, the addition of Percent Descriptions accounted for a significant amount more of the variance in CIUs/min than Percent Naming Accuracy alone, but the subsequent addition of the other incorrect response types did not. Table 3 contains a summary of the robust regression results.
Table 3.
Robust hierarchical regression results predicting CIUs/minute
| Hierarchical regression steps | |||
|---|---|---|---|
| (Model 2a) | (Model 2b) | (Model 2c) | |
| Intercept | 3.095 (−5.642, 11.831) |
−4.986 (−11.605, 1.632) |
−8.677 (−36.381, 19.027) |
| Percent Naming Accuracy |
0.502*** (0.347, 0.657) |
0.560*** (0.366, 0.755) |
0.466** (0.059, 0.874) |
| Percent Descriptions |
0.490*** (0.248, 0.731) |
0.484** (0.073, 0.895) |
|
| Percent Real Word Errors | 0.358 (−0.388, 1.103) |
||
| Percent Nonword Errors | −0.156 (−0.585, 0.273) |
||
| Percent Stereotypies | 0.479 (−1.907, 2.865) |
||
| Percent Perseverations | −0.113 (−2.156, 1.929) |
||
| Observations | 43 | 43 | 43 |
| R2 | 0.425 | 0.518 | 0.594 |
| Adjusted R2 | 0.411 | 0.494 | 0.526 |
| Residual std. error | 15.668 (df = 41) | 13.202 (df = 40) | 13.268 (df = 36) |
Note:
p < 0.1;
p < 0.05;
p < 0.01
95% CI in parentheses
Discussion
This study aimed to test whether, after controlling for naming accuracy, the addition of a measure of circumlocution into predictive models of communicative informativeness and efficiency would account for a significant amount more of the variance in these discourse-level outcomes. Overall, our findings are consistent with previous studies that have found naming accuracy to be a significant predictor of discourse-level measures (Richardson et al., 2018), and expand upon this work by demonstrating that circumlocution at the single-word, confrontation naming-level is a potentially meaningful factor to consider when attempting to predict discourse informativeness and efficiency.
The addition of circumlocution responses in the informativeness model contributed a small, positive, and significant increase of 1.5% of the variance in %CIUs. This 1.5% increase is relatively small, but establishes that circumlocution accounts for a significant and unique portion of variance in %CIUs beyond what is accounted for by naming accuracy, and thus may be an additional meaningful behavior to measure when attempting to estimate informativeness. In our sample, a 1% increase in incorrect responses that were circumlocutions was associated with a ~1% increase in %CIUs, meaning that circumlocution was positively associated with %CIUs. Our results suggest that there is a significant relationship between one’s use of circumlocution at the single-word, confrontation naming-level and one’s informativeness in discourse. Clinically, when interpreting a person with aphasia’s confrontation naming performance and using that performance to make inferences about the overall informativeness of their communication, it may be meaningful for clinicians to consider circumlocution in moments of anomia in addition to naming accuracy.
Of note, similar work investigating the relationship between confrontation naming performance and communicative informativeness has found 46% shared variance between naming accuracy and %CIU in discourse, and concluded that there are likely additional factors that may help predict informativeness (Fergadiotis & Wright, 2016). Our results (Model 1a) showed a higher percent of shared variance, with naming accuracy accounting for 62% of the variance in %CIUs. These results are more similar to findings by Mayer and Murray (2003), who found that naming accuracy on the TAWF (German, 1990) accounted for 61% of the variance in the percent of word retrieval in discourse (%WR-D), and 71% of the variance in percent of word retrieval in conversation (%WR-C). While our outcome variables differ, we posit that our estimate is closer to their estimate due to the fact that, partial words were counted as words in our protocol (if they were transcribable) and were included in the %CIU calculation denominator. Similarly, Mayer and Murray’s discourse outcomes of %WR-D and %WR-C included partial words in their total word count and in their denominator. Our %CIU outcome and Mayer and Murray’s %WR-D/C outcomes provide estimates that represent the proportion of accurate, relevant, and informative productions within a wider array of possible productions, including correct productions, incorrect productions, and word finding errors (2003). Traditional %CIU calculations were likely used in the study calculations by Fergadiotis and Wright, as their data came from AphasiaBank, whose protocol uses traditional CIU scoring procedures by not including partial words or word errors that are not intelligible in context in their total word count/denominator. We propose that our %CIU calculation—in its account for a wider variety and greater number of words in the denominator—may explain why our R2 is greater than that of prior work examining %CIUs. Overall, we propose that our %CIU calculation, in its account for more words in the denominator and in its similarity in calculation to naming accuracy, may explain why our R2 is greater than that of prior work examining %CIUs.
The inclusion of circumlocution responses in the efficiency model accounted for an additional 8.3% of the variance in CIUs/min. An 8.3% increase is relatively large, and suggests that circumlocution is a valuable behavior to measure when using naming performance to estimate discourse efficiency. In our sample, a 1% increase in incorrect responses that were circumlocutions was associated with an increase of 0.49 CIUs/minute. This finding is also clinically meaningful because it suggests that when using a person with aphasia’s confrontation naming performance to make inferences about the efficiency of their communication, it may be meaningful for clinicians to measure circumlocution in moments of anomia in addition to naming accuracy. To our knowledge, this is the first study that used naming performance to predict CIUs/min. Therefore, while we cannot compare these findings to prior work, we observed that our models accounted for less of the variance in CIUs/min than %CIUs. It could be that naming accuracy accounts for less of the variance in efficiency due to the fact that high accuracy does not necessarily require high efficiency, and low accuracy does not necessarily equate to low efficiency. This differs somewhat from the relationship between naming accuracy and informativeness, as a certain extent of accuracy is required to be informative, and informativeness is required to a certain extent to be considered accurate.
Circumlocution at the naming-level may be even more meaningful in terms of predicting discourse informativeness and efficiency than the changes in R2 found here suggest. Given the retrospective nature of this study, circumlocution was not encouraged during the naming assessment. Rather, participants were told to respond with one word. These instructions may have resulted in participants producing fewer circumlocutions than they may have had they been explicitly told to circumlocute or not been discouraged from doing so. Administration procedures in this study may have yielded lower estimations of participants’ circumlocution ability than their true circumlocution ability. Actively encouraging participants to circumlocute would yield a more representative score of circumlocution ability. Adding such scores into predictive models may have yielded even larger increases in the variance explained and demonstrated an even stronger relationship between circumlocution in naming and discourse informativeness/efficiency. We believe there is still value in estimating the relationship between naming and discourse using current administration instructions, as this is how naming assessments are administered clinically. However, if the goal is to improve our ability to predict discourse production using performance on naming assessments, modifications to administration may be warranted to capture a truer estimate of circumlocution ability.
Interestingly, including circumlocution in the models accounted for a larger additional proportion of the variance in CIUs/min (8.3%) than in %CIUs (1.5%). Logically, this makes sense: Producing more CIUs/minute at the discourse-level would correspond with more time spent producing intelligible, accurate, and relevant content. Circumlocution at the naming-level would correspond with something similar: production of intelligible, relevant, and informative content in relation to the target item—in lieu of target-word production, but still contributing relevant content in the time provided. A person who produces such content at a more frequent rate, whether it be at the single-word or discourse-level, will have higher communicative efficiency than someone who produces informative content at a less frequent rate. Therefore, it may be that measuring circumlocution at the naming-level is even more relevant when attempting to predict effectiveness than it is for informativeness.
Overall, our findings offer preliminary evidence that measuring circumlocution at the confrontation naming-level in addition to naming accuracy may improve our ability to predict communicative informativeness and efficiency in people with aphasia’s discourse-level communication. Providing clinicians with a way to more precisely estimate functional communication will allow them to more easily incorporate estimates of functional communication in their clinical assessments/impressions. Clinicians already use naming assessments and are highly skilled in scoring for accuracy and coding error types. Calculating the proportion of incorrect responses that are descriptions/circumlocutions is straightforward, would not require a big shift in clinical practice, and would not add a significant amount of time or effort to naming assessment administration, scoring, and interpretation. Reducing obstacles to estimating functional communication would also allow clinicians to track changes in their patients’ discourse-level communication more frequently throughout their course of treatment. As previously stated, discourse-level communication is reported as a highly important outcome to people with aphasia and their care partners. Therefore, if clinicians can more easily estimate their patients’ functional communication ability, functional communication will remain at the forefront of measuring progress in therapy. Easier and more frequent estimation of functional communication could also be helpful in evaluating therapeutic approaches’ effectiveness in facilitating improvements in functional communication. An additional, speculative trickle-down effect of measuring circumlocution as a meaningful predictor of discourse effectiveness is a possible perspective shift on circumlocution in general, both in people with aphasia and aphasia practitioners. If effective circumlocution is regarded more widely as an effective adaptive strategy versus an impairment symptom, people with aphasia’s linguistic anxiety may be reduced with the knowledge that target word production is not the most essential outcome (Cahana-Amitay et al., 2011; Chapey et al., 2000).
Future work aiming to replicate these findings should consider a few important points. First is that of naming assessment administration instructions and implicit task expectations, which we briefly discussed above. Standardized naming assessment instructions often discourage circumlocution, and thus, attempting to measure circumlocution using such administration instructions may not capture the full extent of a person’s circumlocution ability. For example, the PNT instructions ask people to respond with a single word and avoid descriptions or multiple attempts (Roach et al., 1996). As previously mentioned, we used these instructions, and while our participants still circumlocuted, administration instructions explicitly encouraging circumlocution would likely have captured more circumlocution, i.e., their ability to be informative. Another element of task administration that may warrant reconsideration is that of response time limits. Time limits for responses are imposed in many naming assessments (30 seconds for the PNT, 20 seconds for the WAB-R Object Naming subtest and BNT-2, 10 seconds for the VNT), and may implicitly discourage the use of self-cueing attempts. While 5–10 seconds has been reported as the optimal response time cut-off for picture naming accuracy (Evans et al., 2020), such time cut-offs were not, of course, determined with circumlocution productions in mind. Overall, current naming assessment administration procedures may not elicit as much circumlocution as an intentional measure of circumlocution could, one without an imposed time limit and with explicit instructions to describe targets in moments of anomia. Such a measure may account for a greater amount of the variance in discourse-level informativeness.
It may be that other types of discourse elicitation procedures, e.g., dialogues or unstructured conversation, yield different types of discourse samples/productions (Armstrong et al., 2011; Stark, 2019). Picture scenes, such as the WAB-R Picnic Scene and BDAE Cookie Theft Picture, contain a set of identifiable objects and actions and thus, samples are often more object-focused than samples of dialogues, personal narratives, or story retells may be. Importantly, previous work has established correspondence between different types of elicitation procedures in terms of CIUs (Leaman & Edmonds, 2021; Nicholas & Brookshire, 1993). Additionally, CIU analyses have been adapted for use with unstructured conversational speech samples (CIUconv; Leaman & Edmonds, 2019), which demonstrate excellent interrater and test-retest reliability. Overall, future work that aims to use naming accuracy and circumlocution to estimate communicative informativeness may wish to evaluate these predictors in dialogic and conversational samples in addition to monologic samples.
In addition to different discourse elicitation methods, there are also different discourse quantification methods, including quantitative production analysis which aims to measures grammaticality (Berndt et al., 2000; Gordon, 2008; Saffran et al., 1989), moving-average type token ratio which aims to measure lexical diversity (Covington & McFall, 2010; Cunningham & Haley, 2020; Templin, 1957), and main concept analysis which focuses on expression of knowledge (Nicholas & Brookshire, 1995; Richardson & Dalton, 2016). Depending on what aspect(s) of discourse one wishes to estimate (e.g., informativeness, efficiency, lexical diversity), circumlocution may differentially contribute to predictive models and may be more predictive of certain discourse metrics (e.g., informativeness) than others (e.g., grammaticality). Therefore, future work may wish to explore whether circumlocution is similarly predictive of other discourse metrics.
Future work may also explore whether circumlocution is differentially predictive of discourse-level communication depending on aphasia subtype. For example, lexical retrieval at the single-word level has been found to be more predictive of gist production for people classified as having Broca’s and Wernicke’s aphasia compared to those classified as having Anomic or Conduction aphasia (Richardson et al., 2018). It may also be that there are differential relationships between circumlocution and discourse by aphasia subtype considering that circumlocution involves lexical retrieval which is differentially impacted across subtypes. We did not explore this possibility since aphasia subtypes were not equally represented in our sample, but future work with a larger sample of people with more varied aphasia subtype classifications may wish to explore to what extent circumlocution is differentially predictive of discourse-level communication across subtypes.
Another important consideration is the potentially informative content present in certain productions that are currently regarded as real word errors or semantic paraphasias. For example, many naming assessment scoring criteria dictate that a superordinate category production (e.g., “fruit” for blueberry) is coded as a real word error, while a ‘type response’ (e.g., “type of fruit” for blueberry) is coded as a description. It may be that a person with aphasia who says “fruit” is providing the superordinate category for the picture as a circumlocution, but due to the absence of the words “type of,” their production is categorized as a real word error vs. a description. A direct way to assess intentionality would be for assessors to ask people with aphasia, in instances where they provide a single superordinate category word during a naming assessment, if that was their final/definitive answer. Another way to address this would be to update current practices to allow for other types of productions that have descriptive value or content which may lack the grammaticality required to be considered a description, but contain intonation and pauses (Saffran et al., 1998) that indicate clear word-searching behavior. Investigating superordinate category word productions will be considered in future iterations of this work related to capturing circumlocution.
Finally, no discussion that includes mention of standardized assessments is complete without an acknowledgement of the racial, ethnic, and cultural-linguistic bias inherent in many standardized tools used to assess people with aphasia. In this study, the PNT was chosen because it is known to be less affected by participant factors such as race and socioeconomic status (Walker & Schwartz, 2012), and scoring rules were utilized that do not penalize speakers of dialects that are not the standardized English dialect (Hudley & Mallinson, 2015). There is ample evidence of performance discrepancies between racial and ethnic groups on the BNT-2 (Baird et al., 2007; Boone et al., 2007; Na & King, 2019; Pedraza et al., 2009), WAB-R (Burns et al., 2019; Ellis & Peach, 2017; Milman et al., 2014; Molrine & Pierce, 2002; Rohde et al., 2018), and BDAE-3 (Molrine & Pierce, 2002), which are thought to arise from differential item functioning (Pedraza et al., 2009), linguistic and format bias (Molrine & Pierce, 2002; Taylor & Payne, 1983), and the use of predominantly white, middle-class validation samples (Boone et al., 2007; Molrine & Pierce, 2002; Na & King, 2019; Rohde et al., 2018). Therefore, results from such assessment tools—when used to assess Black, Hispanic, and Asian people with aphasia, as well as people with aphasia from other racial and ethnic backgrounds or those who do not speak standardized English (Hudley & Mallinson, 2015) as their primary dialect—should be interpreted cautiously through this lens. And while there are instructions in some of the examiner’s manuals (e.g., the BNT-2) to interpret scores with respect to cultural-linguistic background, a survey study of >400 people showed that less than half of their respondents reported doing so, and only 30–40% reported taking native language background or educational status into consideration during interpretation (Bortnik et al., 2013). A survey of SLPs in the U.S. found low SLP-respondent familiarity with and use of culturally-based diagnostic modifications of aphasia assessments when assessing African American clients (Bond & Gooch, 2016). Future investigations predicting discourse-level production with confrontation naming scores from tools with known cultural-linguistic bias should attempt to account for this source of measurement error, or should use tools whose scores are less impacted by individual factors such as race and cultural-linguistic background.
Conclusions
This study provides preliminary evidence that measuring circumlocution at the confrontation naming-level (the way it is currently measured), in addition to naming accuracy, accounts for a significant amount more of the variance in measures of communicative informativeness and efficiency at the discourse-level. Circumlocutory responses at the naming-level, while they may not contain the target word, still contribute informative content about targets. Therefore, it makes sense that measuring circumlocution behaviors in addition to target word production at the confrontation naming-level would account for more of the variance in outcomes at the discourse-level, which contain both target words and other productions that are deemed to be informative, accurate, and relevant to the topic/picture. Reducing the obstacles associated with estimating a patient’s functional communication by using clinically-accessible tools that are straightforward in their administration and scoring will increase clinicians’ ability to evaluate functional communication more regularly in clinical practice.
Acknowledgements
Thank you to Elizabeth Lacey, Sachi Paul, Anna Prince, Caroline Fisher, Caitlin McDermott, Katherine Modrall, and Jessica Schwartz for the time and effort you each spent with data acquisition and coding. Thank you to Annie Fox for your statistical consulting, and to Lauren Brock for your time transcribing videos in the preliminary stages of this work.
Funding Acknowledgement
This work was supported by the National Institute on Deafness and Other Communication Disorders (NIDCD) of the National Institutes of Health under award number R01DC014960. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Data Availability Statement
Data available on request from the authors: The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
- Andreetta S, Cantagallo A, & Marini A (2012). Narrative discourse in anomic aphasia. Neuropsychologia, 50(8), 1787–1793, 10.1016/j.neuropsychologia.2012.04.003 [DOI] [PubMed] [Google Scholar]
- Antonucci SM, & MacWilliam C (2015). Verbal description of concrete objects: a method for assessing semantic circumlocution in persons with aphasia. American Journal of Speech-Language Pathology, 24(4), S828–S837, 10.1044/2015_AJSLP-14-0154 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Armstrong E (2000). Aphasic discourse analysis: The story so far. Aphasiology, 14:9, 875–892, 10.1080/02687030050127685 [DOI] [Google Scholar]
- Armstrong E, & Ferguson A (2010). Language, meaning, context, and functional communication. Aphasiology, 24(4), 480–496. [Google Scholar]
- Armstrong E, Ciccone N, Godecke E, & Kok B (2011). Monologues and dialogues in aphasia: Some initial comparisons. Aphasiology, 25(11), 1347–1371, 10.1080/02687038.2011.577204 [DOI] [Google Scholar]
- Baird AD, Ford M, & Podell K (2007). Ethnic differences in functional and neuropsychological test performance in older adults. Archives of Clinical Neuropsychology, 22(3), 309–318, 10.1016/j.acn.2007.01.005 [DOI] [PubMed] [Google Scholar]
- Bastiaanse R, & Jonkers R (1998). Verb retrieval in action naming and spontaneous speech in agrammatic and anomic aphasia. Aphasiology, 12(11), 951–969, 10.1080/02687039808249463 [DOI] [Google Scholar]
- Berndt. RS, Wayland S, Rochon E, Saffran E, & Schwartz M (2000). Quantitative production analysis: A training manual for the analysis of aphasic sentence production. Hove, U.K.: Psychology Press. [Google Scholar]
- Bond B, & Gooch J (2016, November). An examination of the prevalence of diagnostic testing modifications to assess communication disorders in African-Americans. Poster presented at the 2016 Annual Convention of the American Speech-Language-Hearing Association, Philadelphia, PA. [Google Scholar]
- Boone KB, Victor TL, Wen J, Razani J, & Pontón M (2007). The association between neuropsychological scores and ethnicity, language, and acculturation variables in a large patient population. Archives of Clinical Neuropsychology, 22(3), 355–365, 10.1016/j.acn.2007.01.010 [DOI] [PubMed] [Google Scholar]
- Bortnik KE, Boone KB, Wen J, Lu P, Mitrushina M, Razani J, & Maury T (2013). Survey results regarding use of the Boston Naming Test: Houston, we have a problem. Journal of Clinical and Experimental Neuropsychology, 35(8), 857–866, 10.1080/13803395.2013.826182 [DOI] [PubMed] [Google Scholar]
- Bryant L, Ferguson A, & Spencer E (2016). Linguistic analysis of discourse in aphasia: A review of the literature. Clinical Linguistics & Phonetics, 30(7), 489–518, 10.3109/02699206.2016.1145740 [DOI] [PubMed] [Google Scholar]
- Burns SP, White BM, Magwood G, Ellis C, Logan A, Jones Buie JN, & Adams RJ (2019). Racial and ethnic disparities in stroke outcomes: a scoping review of post-stroke disability assessment tools. Disability and rehabilitation, 41(15), 1835–1845, 10.1080/09638288.2018.1448467 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cahana-Amitay D, Albert ML, Pyun SB, Westwood A, Jenkins T, Wolford S, & Finley M (2011). Language as a stressor in aphasia. Aphasiology, 25(5), 593–614, 10.1080/02687038.2010.541469 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chapey R, Duchan JF, Elman RJ, Garcia LJ, Kagan A, Lyon JG, & Simmons Mackie N (2000). Life participation approach to aphasia: A statement of values for the future. The ASHA Leader, 5(3), 4–6, 10.1044/leader.FTR.05032000.4 [DOI] [Google Scholar]
- Cho-Reyes S, & Thompson CK (2012). Verb and sentence pro- duction and comprehension in aphasia: Northwestern Assessment of Verbs and Sentences (NAVS). Aphasiology, 26(10), 1250–1277, 10.1080/02687038.2012.693584 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Covington MA, & McFall JD (2010). Cutting the Gordian knot: The moving-average type–token ratio (MATTR). Journal of quantitative linguistics, 17(2), 94–100, 10.1080/09296171003643098 [DOI] [Google Scholar]
- Cunningham KT, & Haley KL (2020). Measuring lexical diversity for discourse analysis in aphasia: Moving-average type–token ratio and word information measure. Journal of Speech, Language, and Hearing Research, 63(3), 710–721, 10.1044/2019_JSLHR-19-00226 [DOI] [PubMed] [Google Scholar]
- Duffy JR, Boyle M, & Plattner L (1980). Listener reactions to personal characteristics of fluent and nonfluent aphasic speakers. In Clinical Aphasiology: Proceedings of the Conference 1980 (pp. 117–126). BRK Publishers, http://aphasiology.pitt.edu/id/eprint/574 [Google Scholar]
- Edmonds LA, & Kiran S (2006). Effect of semantic naming treatment on crosslinguistic generalization in bilingual aphasia. Journal of Speech, Language, and Hearing Research, 49, 729–748, 10.1044/1092-4388(2006/053) [DOI] [PubMed] [Google Scholar]
- Ellis C, & Peach RK (2017). Racial-ethnic differences in word fluency and auditory comprehension among persons with poststroke aphasia. Archives of Physical Medicine and Rehabilitation, 98(4), 681–686, 10.1016/j.apmr.2016.10.010 [DOI] [PubMed] [Google Scholar]
- Evans WS, Hula WD, Quique Y, & Starns JJ (2020). How much time do people with aphasia need to respond during picture naming? Estimating optimal response time cutoffs using a multinomial ex-Gaussian approach. Journal of Speech, Language, and Hearing Research, 63(2), 599–614. 10.1044/2019_JSLHR-19-00255 [DOI] [PubMed] [Google Scholar]
- Falconer C, & Antonucci SM (2012). Use of semantic feature analysis in group discourse treatment for aphasia: Extension and expansion. Aphasiology, 26(1), 64–82, 10.1080/02687038.2011.602390 [DOI] [Google Scholar]
- Fama ME, Snider SF, Henderson MP, Hayward W, Friedman RB, & Turkeltaub PE (2019). The subjective experience of inner speech in aphasia is a meaningful reflection of lexical retrieval. Journal of Speech, Language, and Hearing Research, 62(1), 106–122, 10.1044/2018_JSLHR-L-18-0222 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fergadiotis G, & Wright HH (2016). Modelling confrontation naming and discourse performance in aphasia. Aphasiology, 30, 364–380, 10.1080/02687038.2015.1067288 [DOI] [Google Scholar]
- Field A, Miles J, & Field Z (2012). Discovering statistics using R. Sage publications. [Google Scholar]
- Field AP, & Wilcox RR (2017). Robust statistical methods: A primer for clinical psychology and experimental psychopathology researchers. Behaviour research and therapy, 98, 19–38. [DOI] [PubMed] [Google Scholar]
- Francis DR, Clark N, & Humphreys GW (2002). Circumlocution-induced naming (CIN): A treatment for effecting generalisation in anomia?. Aphasiology, 16(3), 243–259, 10.1080/02687040143000564 [DOI] [Google Scholar]
- Fridriksson J, Bonilha L, Baker JM, Moser D, & Rorden C (2010). Activity in preserved left hemisphere regions predicts anomia severity in aphasia. Cerebral cortex, 20(5), 1013–1019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- German DJ (1990). TAWF: Test of adolescent/adult word finding. Austin, TX: Pro-Ed. [Google Scholar]
- Goodglass H, Kaplan E, & Weintraub S (2001). BDAE: The Boston Diagnostic Aphasia Examination. Philadelphia, PA: Lippincott Williams & Wilkins. [Google Scholar]
- Goodglass H, & Wingfield A (1997). Word-finding deficits in aphasia: Brain—behavior relations and clinical symptomatology. Anomia (pp. 3–27). Academic Press, 10.1016/B978-012289685-9/50002-8 [DOI] [Google Scholar]
- Gordon JK (2008). Measuring the lexical semantics of picture description in aphasia. Aphasiology, 22(7–8), 839–852. 10.1080/02687030701820063 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harmon TG, Jacks A, Haley KL, & Faldowski RA (2016). Listener perceptions of simulated fluent speech in nonfluent aphasia. Aphasiology, 30(8), 922–942, 10.1080/02687038.2015.1077925 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Herbert R, Hickin J, Howard D, Osborne F, & Best W (2008). Do picture-naming tests provide a valid assessment of lexical retrieval in conversation in aphasia?. Aphasiology, 22(2), 184–203, 10.1080/02687030701262613 [DOI] [Google Scholar]
- Holland AL (1982). Observing functional communication of aphasic adults. Journal of Speech and Hearing Disorders, 47(1), 50–56, 10.1044/jshd.4701.50 [DOI] [PubMed] [Google Scholar]
- Hudley AHC, & Mallinson C (2015). Understanding English language variation in US schools. Teachers College Press, 10.1080/09500782.2011.611418 [DOI] [Google Scholar]
- Jacobs BJ (2001). Social validity of changes in informativeness and efficiency of aphasic discourse following linguistic specific treatment (LST). Brain and Language, 78(1), 115–127, 10.1006/brln.2001.2452 [DOI] [PubMed] [Google Scholar]
- Kambanaros M (2010). Action and object naming versus verb and noun retrieval in connected speech: Comparisons in late bilingual Greek–English anomic speakers. Aphasiology, 24(2), 210–230, 10.1080/02687030902958332 [DOI] [Google Scholar]
- Kaplan E, Goodglass H, & Weintraub S (2001). Boston Naming Test. Philadelphia, PA: Lea and Febiger, https://psycnet.apa.org/doi/10.1037/t27208-000 [Google Scholar]
- Kendall DL, Brookshire CE, Minkina I, & Bislick L (2013). An analysis of aphasic naming errors as an indicator of improved linguistic processing following phonomotor treatment. American Journal of Speech-language Pathology, 22(2), S240–9, 10.1044/1058-0360(2012/12-0078) [DOI] [PubMed] [Google Scholar]
- Kertesz A (2007). The Western Aphasia Battery – Revised. New York: Grune & Stratton. [Google Scholar]
- Kiran S, & Thompson CK (2003). The role of semantic complexity in treatment of naming deficits: Training semantic categories in fluent aphasia by controlling exemplar typicality. Journal of Speech, Language & Hearing Research, 46(3), 10.1044/1092-4388(2003/061) [DOI] [PubMed] [Google Scholar]
- Kong APH, & Wong CWY (2018). An integrative analysis of spontaneous storytelling discourse in aphasia: Relationship with listeners’ rating and prediction of severity and fluency status of aphasia. American Journal of Speech-Language Pathology, 27(4), 1491–1505, 10.1044/2018_AJSLP-18-0015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leaman MC, & Edmonds LA (2019). Revisiting the correct information unit: Measuring informativeness in unstructured conversations in people with aphasia. American Journal of Speech-Language Pathology, 28(3), 1099–1114. [DOI] [PubMed] [Google Scholar]
- Leaman MC, & Edmonds LA (2021). Assessing language in unstructured conversation in people with aphasia: Methods, psychometric integrity, normative data, and comparison to a structured narrative task. Journal of Speech, Language, and Hearing Research, 64(11), 4344–4365, 10.1044/2021_JSLHR-20-00641 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Linnik A, Bastiaanse R, & Höhle B (2016). Discourse production in aphasia: A current review of theoretical and methodological challenges. Aphasiology, 30(7), 765–800, 10.1080/02687038.2015.1113489 [DOI] [Google Scholar]
- Maechler M, Rousseeuw P, Croux C, Todorov V, Ruckstuhl A, Salibian-Barrera M, Verbeke T, Koller M, Conceicao EL, Anna di Palma M (2022). robustbase: Basic Robust Statistics. R package version 0.95–0, http://robustbase.r-forge.r-project.org/.
- Marini A, Caltagirone C, Pasqualetti P, & Carlomagno S (2007). Patterns of language improvement in adults with non-chronic non-fluent aphasia after specific therapies. Aphasiology, 21(2), 164–186, 10.1080/02687030600633799 [DOI] [Google Scholar]
- Mayer J, & Murray L (2003). Functional measures of naming in aphasia: Word retrieval in confrontation naming versus connected speech. Aphasiology, 17(5), 481–497, 10.1080/02687030344000148 [DOI] [Google Scholar]
- McHugh ML (2012). Interrater reliability: the kappa statistic. Biochemia medica, 22(3), 276–282. [PMC free article] [PubMed] [Google Scholar]
- Milman LH, Faroqi-Shah Y, & Corcoran CD (2014). Normative data for the WAB-R: A comparison of monolingual English speakers, Asian Indian-English bilinguals, and Spanish-English bilinguals. Clinical Aphasiology Conference, St. Simon’s Island, Georgia, USA. [Google Scholar]
- Minkina I, Oelke M, Bislick LP, Brookshire CE, Hunting Pompon R, Silkes JP, & Kendall DL (2016). An investigation of aphasic naming error evolution following phonomotor treatment. Aphasiology, 30(8), 962–980, 10.1080/02687038.2015.1081139 [DOI] [Google Scholar]
- Molrine CJ, & Pierce RS (2002). Black and White adults’ expressive language performance on three tests of aphasia. American Journal of Speech-Language Pathology, 11, 139–150, 10.1044/1058-0360(2002/014) [DOI] [Google Scholar]
- Na S & King TZ (2019) Performance discrepancies on the Boston Naming Test in African-American and non-Hispanic White American young adults. Applied Neuropsychology: Adult, 26:3, 236–246, 10.1080/23279095.2017.1393427 [DOI] [PubMed] [Google Scholar]
- National Institute on Deafness and Other Communication Disorders (NIDCD). (2015, December). Aphasia. U.S. Department of Health & Human Services, National Institutes of Health. https://www.nidcd.nih.gov/health/aphasia. [Google Scholar]
- Nicholas LE, & Brookshire RH (1993). A system for quantifying the informativeness and efficiency of the connected speech of adults with aphasia. Journal of Speech, Language, and Hearing Research, 36(2), 338–350, 10.1044/jshr.3602.338 [DOI] [PubMed] [Google Scholar]
- Nicholas LE, & Brookshire RH (1995). Presence, completeness, and accuracy of main concepts in the connected speech of non-brain-damaged adults and adults with aphasia. Journal of Speech and Hearing Research, 38, 145–156. 10.1044/jshr.3801.145 [DOI] [PubMed] [Google Scholar]
- Nicholas LE, Brookshire RH, MacLennan DL, Schumacher JG, & Porrazzo SA (1989). The Boston Naming Test: revised administration and scoring procedures and normative information for non-brain-damaged adults. Clinical aphasiology, 18, 103–115. [Google Scholar]
- Pashek GV, & Tompkins CA (2002). Context and word class influences on lexical retrieval in aphasia. Aphasiology, 16(3), 261–286, 10.1080/02687040143000573 [DOI] [Google Scholar]
- Pedraza O, Graff-Radford NR, Smith GE, Ivnik RJ, Willis FB, Petersen RC, & Lucas JA (2009). Differential item functioning of the Boston Naming Test in cognitively normal African American and Caucasian older adults. Journal of the International Neuropsychological Society, 15(5), 758–768, 10.1017/S1355617709990361 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Psychology Software Tools, Inc. [E-Prime 3.0]. (2016). Retrieved from https://support.pstnet.com/.
- R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. [Google Scholar]
- RStudio Team (2021). RStudio: Integrated Development Environment for R. RStudio, PBC, Boston, MA: URL http://www.rstudio.com/. [Google Scholar]
- Richardson JD, & Dalton SG (2016). Main concepts for three different discourse tasks in a large non-clinical sample. Aphasiology, 30(1), 45–73, 10.1080/02687038.2015.1057891 [DOI] [Google Scholar]
- Richardson JD, Hudspeth Dalton SG, Fromm D, Forbes M, Holland A, & MacWhinney B (2018). The relationship between confrontation naming and story gist production in aphasia. American journal of speech-language pathology, 27(1S), 406–422, 10.1044/2017_AJSLP-16-0211 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roach A, Schwartz MF, Martin N, Grewal RS, & Brecher A (1996). The Philadelphia naming test: scoring and rationale. Clinical aphasiology, 24, 121–133. [Google Scholar]
- Rohde A, Worrall L, Godecke E, O’Halloran R, Farrell A, & Massey M (2018). Diagnosis of aphasia in stroke populations: A systematic review of language tests. PloS one, 13(3), 10.1371/journal.pone.0194143 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Saffran EM, Berndt RS, & Schwartz MF (1989). The quantitative analysis of agrammatic production: Procedure and data. Brain and language, 37(3), 440–479, 10.1016/0093-934X(89)90030-8 [DOI] [PubMed] [Google Scholar]
- Salmerón R, García CB, & García J (2018). Variance inflation factor and condition number in multiple linear regression. Journal of Statistical Computation and Simulation, 88(12), 2365–2384, 10.1080/00949655.2018.1463376 [DOI] [Google Scholar]
- Simmons-Mackie N, Threats TT, & Kagan A (2005). Outcome assessment in aphasia: A survey. Journal of communication disorders, 38(1), 1–27, 10.1016/j.jcomdis.2004.03.007 [DOI] [PubMed] [Google Scholar]
- Stark BC (2019). A comparison of three discourse elicitation methods in aphasia and age-matched adults: Implications for language assessment and outcome. American Journal of Speech-Language Pathology, 28(3), 1067–1083, 10.1044/2019_AJSLP-18-0265 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Susanti Y, Pratiwi H, Sulistijowati S, & Liana T (2014). M estimation, S estimation, and MM estimation in robust regression. International Journal of Pure and Applied Mathematics, 91(3), 349–360, 10.12732/ijpam.v91i3.7 [DOI] [Google Scholar]
- Templin MC (1957). Certain language skills in children; their development and interrelationships. University of Minnesota Press. 10.5749/j.ctttv2st [DOI] [Google Scholar]
- Tierney-Hendricks C, Schliep ME, & Vallila-Rohter S (2022). Using an implementation framework to survey outcome measurement and treatment practices in aphasia. American Journal of Speech-Language Pathology, 31(3), 1133–1162, 10.1044/2021_AJSLP-21-00101 [DOI] [PubMed] [Google Scholar]
- Togher L (2001). Discourse sampling in the 21st century. Journal of communication disorders, 34(1–2), 131–150, 10.1016/S0021-9924(00)00045-9 [DOI] [PubMed] [Google Scholar]
- Tompkins CA, Scharp VL, & Marshall RC (2006). Communicative value of self-cues in aphasia: A re-evaluation. Aphasiology, 20(7), 684–704, 10.1080/02687030500334076 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Walker GM, & Schwartz MF (2012). Short-form Philadelphia naming test: rationale and empirical evaluation. American Journal of Speech-Language Pathology, 21(2), S140–S140, https://eric.ed.gov/?redir=http%3a%2f%2fdx.doi.org%2f10.1044%2f1058-0360(2012%2f11-0089) [DOI] [PMC free article] [PubMed] [Google Scholar]
- Webster J, & Morris J (2019). Communicative informativeness in aphasia: Investigating the relationship between linguistic and perceptual measures. American Journal of Speech-Language Pathology, 28(3), 1115–1126, 10.1044/2019_AJSLP-18-0256 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data available on request from the authors: The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
