Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Nov 1.
Published in final edited form as: Neuropsychology. 2022 Sep 15;36(8):709–718. doi: 10.1037/neu0000860

Assessing naming errors using an automated machine learning approach

Tatiana T Schnur 1, Chia-Ming Lei 2
PMCID: PMC9970144  NIHMSID: NIHMS1871522  PMID: 36107705

Abstract

Objective:

After left hemisphere stroke, 20–50% of people experience language deficits, including difficulties in naming. Naming errors that are semantically related to the intended target (e.g., producing “violin” for picture HARP) indicate a potential impairment in accessing knowledge of word forms and their meanings. Understanding the cause of naming impairments is crucial to better modeling of language production as well as for tailoring individualized rehabilitation. However, evaluation of naming errors is typically by subjective and laborious dichotomous classification. As a result, these evaluations do not capture the degree of semantic similarity and are susceptible to lower inter-rater reliability because of subjectivity.

Methods:

We investigated whether a computational linguistic measure using word2vec (Mikolov, Chen, Corrado, & Dean, 2013) addressed these limitations by evaluating errors during object naming in a group of patients during the acute stage of a left-hemisphere stroke (N=105).

Results:

Pearson correlations demonstrated excellent convergent validity of word2vec’s semantically related estimates of naming errors and independent tests of access to lexical-semantic knowledge (p < .0001). Further, multiple regression analysis showed word2vec’s semantically related estimates were significantly better than human error classification at predicting performance on tests of lexical-semantic knowledge.

Conclusions:

Useful to both theorists and clinicians, our word2vec-based method provides an automated, continuous, and objective psychometric measure of access to lexical-semantic knowledge during naming.

Keywords: natural language processing, word2vec, language production, stroke

1. Introduction

Naming difficulties are commonly observed among patients suffering from language deficits secondary to left hemisphere stroke. Naming deficits (anomia) may result from impairment at multiple stages during naming (e.g., Caramazza, 1997; Dell, 1986; Dell, Schwartz, Martin, Saffran, & Gagnon, 1997). Classifying the types of naming errors produced is one approach to identify at what level deficits occur and thus is a principal method to evaluate naming deficits (Budd et al., 2010; Jefferies & Lambon Ralph, 2006; Reilly, Peelle, Antonucci, & Grossman, 2011; Schwartz & Brecher, 2000; Tippett & Hillis, 2015). It is common to adopt subjective evaluation of naming errors by using dichotomous judgments (i.e., related vs. unrelated) for classification by trained raters. The method of subjective dichotomous classification, however, is laboriously time-consuming, invites reliability and validity concerns, and cannot capture the degree of semantic similarity between naming error and the intended target, which may provide additional information concerning loci of naming deficits. Here, we investigated the degree to which an automated computational linguistic measure that computes the degree of semantic similarity between a target and a response (using word2vec; Mikolov et al., 2013) predicted access to lexical-semantic knowledge in comparison to traditional subjective dichotomous classification. Our findings introduce both theorists and clinicians to a potential assessment tool for objectively quantifying the degree of impairment in accessing lexical-semantic knowledge during word production.

To name an object we process the visual input, access the corresponding semantic knowledge, select the associated lexical and phonological information, and then execute a specific articulatory program (Dell et al., 1997; Goldrick & Rapp, 2007; Levelt, 1989). When naming, when a problem arises in the access to or storage of the appropriate set of semantic features associated with a target response (Warrington & Shallice, 1979; cf. Chapman, Hasan, Schulz, & Martin, 2020; Rapp & Caramazza, 1993) access to an overlapping (but different from the target) set of semantic features may lead to the production of a semantically related, but erroneous response (Dell et al., 1997; Hillis, Tuffiash, Wityk, & Barker, 2002). In contrast, errors that sound like the target word, phonological errors, indicate a problem accessing the sounds associated with the word to be produced (Tippett & Hillis, 2015). Thus, classifying the type of error produced during naming can help localize at what level in the naming process a deficit occurs.

From a clinical perspective, assessing the nature of naming deficits via the types of naming errors produced is important because this is used to identify patients for targeted cognitive-linguistic treatment. Treatments for word production deficits after stroke may be most effective when they are tailored to the type of impairment causing the word finding difficulty (Kristinsson et al., 2021; Maher & Raymer, 2004; Nickels, 2002). For example, naming errors that stem from problems with lexical and/or semantic access may be approached with semantic feature analysis treatment (Boyle, 2010) while problems retrieving the sounds associated with words may be treated with phonological components analysis (Leonard, Rochon, & Laird, 2008; cf. Boyle et al., 2022). Thus, the diagnosis and treatment of naming impairments depends in part on the degree to which patients produce semantically related vs. other types of errors (Drew & Thompson, 1999; Kristinsson et al., 2021; Neumann, 2018; Nickels & Best, 1996; Rose & Douglas, 2008; cf. Boyle, 2010; Coelho, McHugh, Regina, & Boyle, 2000).

The most common approach for quantifying semantically related naming errors (e.g., “violin” for HARP) is through qualitative, dichotomous classification (related, unrelated). This approach presents with limitations in both its ease of application and its ability to best assess semantic similarity (Fergadiotis, Gorman, & Bedrick, 2016; Wang et al., 2018). It is a time-consuming process to make a subjective judgement of the semantic relationship between each target-response pair. Some naming responses may present vague or implicit semantic relationships to the target, for example “eagle” for DEER and “cotton” for THORN. Some approaches apply strict scoring criteria for semantic relatedness which may be difficult to consistently apply, for example, excluding responses that share both a phonological and semantic relationship with the target (“hog” for PIG) or responses judged to be outside the target’s grammatical class (“swimming” for FISH; cf. Schwartz et al., 2009; Walker et al., 2011). To make a dichotomous judgment on the semantic relationship can induce inconsistent inter- and intra-rater reliability due to the raters’ subjective judgement. For example, Nicholas, Brookshire, Maclennan, Schumacher, and Porrazzo (1989) report that the error type coding reliability of the Boston Naming Test (Kaplan, Goodlass, & Weintrab, 1983) ranged from 32% to 100% where reliability for related responses was 95% and unrelated responses was 32%. In fact, because related and unrelated responses were so difficult to differentiate, they consequently combined them into one category. Poor and/or labor-intensive assessment of naming errors impacts our ability to assess the nature of naming deficits, and further limits therapeutic interventions (Nickels, 2002) and theoretical implications (e.g., Foygel & Dell, 2000).

1.1. Automated classification of word relationships

One approach to address the limitations in manual coding of naming errors while assessing the semantic similarity between words is to use machine learning techniques. Distributional semantic models (DSM; Andrews, Vigliocco, & Vinson, 2009; Lenci, 2008) calculate a word’s distribution across the contexts in which the word occurs where words in similar contexts are assumed to have similar meanings (the distributional hypothesis; Firth, 1957; Harris, 1954). The use of algorithms to calculate the degree of similarity between words within their contexts has a long history (e.g., Salton, Wong, & Yang, 1975; cf. Turney & Pantel, 2010). One modern approach is the use of word2vec, a three-layer neural network of word meanings (Mikolov, Chen, et al., 2013; Mikolov, Sutskever, Corrado, & Dean, 2013; cf. Devlin, Chang, Lee, & Toutanova, 2018; Peters et al., 2018) similar to the well-known Latent Semantic Analysis (Landauer & Dumais, 1997). The publicly available word2vec model uses part of the Google news data set which includes approximately 100 billion words (Mikolov, Sutskever, et al., 2013). The word2vec model predicts the probability of a word appearing in the corpus, based on the context of the preceding and following words, using a continuous bag-of-words architecture resulting in 300-dimensional vectors for 3 million words and phrases. Each word is represented as a vector embedded in space such that words that appear in similar distributional contexts are geometrically closer together, where words in similar distributional contexts often have similar semantic properties (although not always). Word similarity is measured by the angular distance (co-sine similarity; Euclidian distance is another option) between word vectors. Thus, DSMs like word2vec calculate the semantic similarity between words based on how frequently two words co-occur with surrounding words representing similar global distributional patterns (Günther, Rinaldi, & Marelli, 2019).

Using word2vec’s computed similarity metric between two words (i.e., cosine similarity) demonstrates good outcomes in predicting behavioral performance and subjective ratings of lexical-semantic knowledge. For example, word2vec’s cosine similarity measure performed better in comparison to traditional DSMs such as Latent Semantic Analysis among others in predicting human judgements of the semantic relatedness between two words (Pereira, Gershman, Ritter, & Botvinick, 2016). Word2vec’s cosine similarity measure also better predicts response times to make a decision whether a target was a word or not (lexical decision) and the speed of single word reading (Mandera, Keuleers, & Brysbaert, 2017). When predicting subjects’ neural activity patterns when making judgments of taxonomic and thematic relationships between word-pairs, semantic similarity computed by the word2vec model outperformed subjective semantic judgements in a representational similarity analysis fMRI study of the visual word form area (Wang et al., 2018). Fergadiotis et al. ( 2016) demonstrated that a trained word2vec model when dichotomously categorizing (via a pre-identified threshold, cf. Mckinney-Bock & Bedrick, 2019) semantically related vs. unrelated naming errors in comparison to human raters had good sensitivity, correctly labeling an error as semantically related (probability of .77) and high specificity, correctly labeling an error as unrelated (probability of .92). Together, these results suggest that a word2vec-derived similarity score is sensitive to behavior influenced by lexical-semantic knowledge and is on par with human raters in determining degree of semantic relationship between words.

However, as Fergadiotis et al. (2016) and others point out (cf. Beaty & Johnson, 2021; Wang et al., 2018), human ratings of the semantic relationship between words do not necessarily reflect the ground truth, that is, the degree to which words overlap in semantic space, e.g., via overlap in semantic properties or the frequency with which they occur in events. Ratings are subject to problems in variability and reliability due to their inherent subjectivity, difficulty to detect subtle relationships, and the fatigue induced to produce them. Thus, it is currently an open question whether the semantic relationship between naming targets and responses is better assessed by machine learning approaches (e.g., word2vec-derived similarity scores) or human ratings.

To address this question, we adopted a different measure of ground truth. Following general consensus in the literature, we assumed that the degree of semantic relationship between a naming error and target reflects a problem during naming located between identifying/ accessing the correct semantic representation associated with the lexical representation and/or selecting the correctly associated lexical representation (Dell et al., 1997; Goldrick & Rapp, 2007; Hillis et al., 2002). Several assessments of lexical-semantic access during comprehension share aspects of lexical-semantic access with naming. For example, accuracy in deciding whether a semantically related or unrelated word matches a picture (word-to-picture matching; WPM; Breese & Hillis, 2004; Martin & Schnur, 2019) measures access from words to concepts (represented by pictures) or integrity of the concepts themselves. Because picture naming and word-picture matching require mapping between shared lexical and semantic representations the representations and their connections are assumed to be the same in comprehension and production (Harvey & Schnur, 2015; Howard, 1995; Levelt, 1999; Wei & Schnur, 2016; cf. Caramazza, 1997). Similarly, deciding whether concepts are semantically associated when either accessed through words (synonym triples test; Nozari, Dell, & Schwartz, 2011; Saffran, Schwartz, Linebarger, Martin, & Bochetto, 1988) or pictures (Pyramid and Palm Tree Test; Howard & Patterson, 1992) also provides an independent measure of the integrity of lexical-semantic access. Thus, the degree to which people produce semantically related responses during naming should correlate with performance on independent measures of lexical-semantic access. As such, we can use these independent measures of lexical-semantic access to test whether the semantic relationship between naming targets and responses is well-assessed by machine learning approaches, i.e., word2vec’s derived similarity scores, and how they compare with human ratings.

1.2. Current Study

We tested the validity of an automated machine-learning approach, word2vec (Mikolov, Chen, et al., 2013) to measure the semantic similarity between targets and naming responses produced by a large group of left hemisphere acute stroke patients (N=105) and compared it to dichotomous semantic error classification by human raters. We adopted the word2vec model not only because it provides an automated continuous measure of semantic similarity, but also because of the ease of application for a pre-trained model network of word representations. We analyzed the semantic similarity between each target-response pair generated by word2vec (automated, continuous) and the standardly accepted human rater classification (qualitative, dichotomous). We estimated the convergent validity of word2vec-derived similarity scores and human error classification with independent tests of lexical-semantic access from both pictures and words using Pearson correlations. We then used multivariate regression to compare the degree to which the methods accounted for performance on these independent tests of lexical-semantic access. We predicted that our word2vec-based method by providing a continuous and more stable measure of semantic similarity between targets and responses would better account for performance on the independent tests of lexical-semantic access in comparison to human error classification. Thus, this approach will provide a better reflection of impaired integrity of lexical-semantic access, that is the identification and/or access of correct semantic representations associated with lexical representations and/or the selection of correctly associated lexical representations. The validation of an automated and easier to implement continuous measure of semantically related naming responses will be useful for clinicians interested in pairing specific naming deficits with targeted treatments as well as theorists who use naming errors to inform theories of language production processes.

2. Methods

2.1. Participants

We recruited 105 participants (45 Female; 13 left-handed) within 72 hours of a left hemisphere stroke from comprehensive stroke centers in the south-central United States. We included participants who were native monolingual English speakers between 18 and 85 years of age, retained a functional level of comprehension adequate to follow task instructions and to provide consent or signify that a legal representative could consent for them. Participants did not have a medical history of sensory deficits impeding them from perceiving visual and/or auditory stimuli, nor other neurological diseases (e.g., tumor, dementia). All participants were tested within a median of three days after onset of stroke symptoms (M = 4.0; SD = 2.7, range 1–17 days). The mean age of participants was 60.6 years (SD = 12.7, range 20–85) and the mean years of education, provided by 88 participants, was 14.1 years (SD = 3.0, range 6–24)1. We tested participants on a range of tasks as part of an ongoing study and describe tasks relevant to the study below. We received Institutional Review Board approval of informed consent.

2.2. Tasks & Materials

2.2.1. Picture naming

Participants named 77 pictures of depictable objects as part of a larger test battery. We presented 47 color photographs from the Bank of Standardize Stimuli (BOSS; Brodeur, Dionne-Dostie, Montreuil, & Lepage, 2010) and thirty object line-drawings from Zingeser and Berndt (1990) where four items were excluded due to erroneous repeated presentation across tasks. See Appendix A for stimuli. Participants were shown each picture on a computer screen and asked to use one word to name the object with a naming deadline of 10s.

2.2.1.1. Target-response semantic similarity estimates
Error-type coding.

We categorized picture naming responses as correct when identical to the target name, matching the target name by at least 50% segment overlap (e.g., “biolin” for VIOLIN) or if an acceptable alternative (e.g., “stove” for OVEN, including plural forms). We categorized one-word incorrect responses as either semantically related (SR) to the target (e.g., “lamb” for DEER), responses phonologically related (PR) to the target without a semantic relation (e.g., “breed” for BROOM), or Unrelated (UR) responses which were not SR or PR (e.g., “hat” for PIE) but could include components of the depicted picture (e.g., “wood” for RAFT). Other types of errors included Descriptions (D) which were either multiword responses (e.g., “as the word turns” for GLOBE) or responses outside the grammatical class of the target (e.g., “throw” for BALL); Nonword (NW) responses; and no responses (NR) including omissions (e.g., failure to name after the 10s naming deadline) and attempts to spell words, produce sound effects, mumbled responses, or neutral comments, e.g., “I don’t like it”. Two trained raters independently scored participants’ responses. We semi-randomly selected ~15% of our participants (n = 15; five participants with < 20% overall accuracy and 10 selected randomly) to evaluate inter-rater reliability. Point-to-point percentage reliability was 91% (calculated as agreed-upon cases divided by total cases × 100). Discrepant scores were reconciled after consulting with a Speech-Language Pathologist (CML). Of the 105 subjects, 17 subjects were able to complete 46 of the 73 items and three completed 43 items. We excluded three items due to poor recognizability as these items had high unrelated naming errors (> 30%) and overall low naming accuracy (< 50% “office”, “rock”, and “thorn”) and one item because that word did not exist in the pre-trained word2vec model (“axe”), retaining a total of 69 named items. We excluded four subjects because they produced whole word responses on too few trials (< 50%). After excluding 29 non-words, 180 no responses, and 17 picture-part naming errors, the total number of naming attempts analyzed included 6,235 responses, an average 61.7 (range 25–69) across the 101 participants. For each participant, we calculated the proportion naming error type as the number of errors divided by total number of items named.

Word2vec.

The word2vec estimates of semantic similarity were based on continuous word vector representations generated by the word2vec tool’s skip-gram architecture (https://code.google.com/p/word2vec/ (Mikolov, Chen, et al., 2013) derived from the Google news data set of approximately 100 billion words. The publicly available word2vec model used the following parameters: window size = 5, sample = 0.001, negative sample number = 5, learning rate = 0.025, vector size = 100. We calculated word similarity for all whole-word picture naming responses by calculating the cosine angle between response and target word vectors. Correct, alternative, and acceptable responses (including plural forms) were assigned a value of 1 (correct). For multiword responses (i.e. description errors), we used the first complete noun response, or if missing, other content word response (e.g., verb, adjective). All other responses were not included. Following previous work (Günther, Dudschig, & Kaup, 2016), we converted five negative word2vec-derived similarity scores to zero.

2.2.2. Measures of lexical-semantic access

2.2.2.1. Word to picture matching task (Breese & Hillis, 2004)

We presented one of 17 pictures to participants and asked, “Is this a ___?” Participants responded “yes” if the picture matched the spoken word (e.g., CAT/cat) and “no” if they did not match. For unmatched pairs, 17 pictures were presented once with a semantic foil (e.g., CAT/dog) and another two times with different foils (phonologically and unrelated foils). A total of 68 trials was divided into four presentation sets in a larger test battery. For the dependent measure, we used accuracy on matching and semantically related trials to calculate d prime as an estimate of participants’ ability to discriminate between matching and semantic foil trials (cf. Martin, Ding, Hamilton, & Schnur, 2021; Martin & Schnur, 2019). D-prime scores were adjusted for extreme accuracy values by replacing 0 with 0.5/n and replacing 1 with (n – 0.5)/n, where n was the number of signal (matching) or noise (semantically related) trials (Stanislaw & Todorov, 1999).

2.2.2.2. Synonymy triples (Saffran et al., 1988)

We presented and read out three written words on the screen to participants, asking participants which two of three words were similar in meaning. Half of trials presented words referring to abstract concepts and half were words referring to concrete concepts. Participants responded by reading aloud or pointing to the two words of their choice.

2.2.2.3. Pyramids and Palm Tree Test (Howard & Patterson, 1992)

We presented three drawings of objects, one above the other two, and asked participants to decide whether the left or right lower object was more closely related or associated with the top object (e.g., PYRAMID at top, PALM TREE at bottom left (correct answer), PINE TREE at bottom right). Participants responded verbally or pointed either “left” or “right.”

2.3. Statistical Analyses

To establish convergent validity of word2vec-derived similarity scores with tests of lexical-semantic access during comprehension and human error classification (proportion SR naming error), we used Pearson correlations. For the convergent validity analysis, to compare performance more easily across tasks, within each lexical-semantic comprehension task (PPT and synonymy triples accuracy, WPM SR trial d-prime) we averaged performance across trials and then scaled (z-scores) within-group. We also scaled participant scores for the target-response semantic similarity estimates for the same reasons. For proportion SR naming error, we scaled participant scores in comparison to the participant group (z-scores) and then multiplied by −1 so that negative values reflected poorer performance and positive values reflected better performance. For word2vec-derived similarity scores, for each participant we averaged scores across picture naming trials and scaled performance (z-scores) within the group2. Because performance was significantly correlated across the WPM, synonymy, and PPT tasks (r’s > .47, p’s < .0001) for each participant we averaged z-scores across tasks to generate a single composite lexical-semantic score. Where participants were unable to complete either synonymy triples (n=7) or the PPT (n = 17), we averaged scores across the remaining two tasks. We then used multivariate regression to test the relative contribution of the two semantic similarity measures of target naming responses (word2vec-derived similarity scores and proportion SR error) in predicting the integrity of lexical-semantic access as measured by the composite lexical-semantic score.

2.4. Transparency and Openness

Ethics approval.

The Baylor College of Medicine Institutional Review Board approved the informed consent (H-39208).

Consent to participate.

All participants gave consent prior to the study. Subjects received financial compensation for participation.

Consent for publication.

All participants gave informed consent for publication.

Preregistration.

The study was not formally preregistered.

Availability of data and materials.

Anonymized data that support the findings of this study are publicly available on Open Science Framework (OSF).

Code availability.

The word2vec model is publicly available here https://code.google.com/p/word2vec/. See Appendix B for Python code used to extract word2vec-derived similarity scores.

Conflict of interest.

The authors declare no conflicts of interest/competing interests.

3. Results

3.1. Performance across measures

Subjects demonstrated a range in performance across all measures, as seen in Table 1.

Table 1.

Descriptive statistics for measures of naming, lexical-semantic access, and word2vec-derived similarity scores.

Mean SD Range

Accuracy Overall Naming 0.86 0.11 0.42 – 1
WPM Semantically Related Foils 0.84 0.10 0.53 – 1
WPM Matching Foils 0.98 0.06 0.53 – 1
Synonymy Triples 0.83 0.19 0.33 – 1
PPT 0.83 0.14 0.42 – 1

Proportion Naming Error (# errors/ # trials) Semantically Related 0.07 0.05 0 – 0.23
Phonologically Related 0.01 0.02 0 – 0.09
Unrelated 0.01 0.02 0 – 0.10
Descriptions 0.003 0.008 0 – 0.05
Nonwords 0.005 0.02 0 – 0.07
No Responses 0.03 0.06 0 – 0.38
word2vec 0.93 0.05 0.76 – 1

SD = standard deviation; WPM = word-picture matching; PPT: Pyramid and Palm Tree test; word2vec = word2vec-derived similarity scores

3.2. Convergent Validity and Multivariate Regression Analyses

The semantic similarities of target-responses measured by word2vec and proportion SR naming error were significantly correlated with performance on each measure of lexical-semantic access as well as each other (r’s > .41; p’s < .0001 corrected for multiple comparisons, see Table 2). Demographic variables of age at testing, years education, and days between stroke and test did not significantly contribute to performance across tasks (r’s < .29, p’s > .01). Thus, our word2vec-based method of classifying the semantic relationship between naming target-response demonstrated high convergent validity with lexical-semantic access during comprehension and human error classification.

Table 2.

Correlations (r-values) between standardized lexical-semantic and naming measures and demographic variables. P values in parentheses. Significant relationships after correcting for multiple comparisons in bold (.05/25 = .002).

Synonymy Triples PPT WPM SR d-prime Proportion SR naming error Word2vec

PPT .61 (< .0001)
WPM SR d-prime .48 (< .0001) .47 (< .0001)
Proportion SR naming error .54 (< .0001) .47 (< .0001) .41 (< .0001)
Word2vec .61 (< .0001) .47 (< .0001) .47 (< .0001) .86 (< .0001)
Age at testing −.01 (.89) .02 (.83) −.03 (.78) −.09 (.34) −.17 (.09)
Years education .21 (.06) .29 (.01) .15 (.18) .12 (.28) .07 (.43)
Days between stroke and testing −.12 (.24) −.14 (.22) .08 (.40) −.03 (.79) .02 (.84)

WPM = word-picture matching; SR = semantically related; PPT = Pyramid and Palm Tree test; Word2vec = word2vec-derived similarity scores

We performed a multiple regression to assess whether each naming measure contributed to lexical-semantic access performance independently of the other measure. Because the predictor variables were highly correlated which can affect model parameter estimation, we calculated the predictor variables’ variation inflation factor (VIF). Both the word2vec-derived similarity score and the SR naming error had VIFs of < 4, well-below standardly adopted thresholds that indicate that multicollinearity is problematic (standard thresholds vary from 5 to 10 and greater, cf. O’Brien, 2007). The overall regression model was significant (F (2, 98) = 31.5, p < .0001). The word2vec-derived similarity score was a significant predictor of the composite lexical-semantic measure (beta = .44, t = 3.44, p = .0008) independent of the SR naming error, which was not (beta = .09, t = .73, p = .46). As seen in Figure 1, leverage plots show the influence of the word2vec-derived similarity score on the composite lexical-semantic measure (left) and no significant additional influence of proportion SR naming error (right).

Figure 1.

Figure 1.

Multiple regression leverage residual scatterplots demonstrate the independent contribution of scaled word2vec-derived similarity scores and proportion semantically related naming errors to the composite measure of lexical-semantic access during comprehension.

4. Discussion

In this study we examined whether an automated machine learning approach, word2vec (Mikolov, Chen, et al., 2013) used to calculate the semantic relationship between a picture name and naming response accounts for individual differences in lexical-semantic access during comprehension in comparison to human error classification in a large group of participants after acute left hemisphere stroke. Our word2vec-based method’s classification of the semantic relatedness of picture naming responses had excellent convergent validities with measures of the integrity and access to lexical-semantic knowledge during comprehension as well as human judgements of naming responses. The word2vec-derived similarity score was a significantly better predictor of lexical-semantic comprehension performance in comparison to human judgements. Thus, our word2vec-based method is a validated automated approach to objectively quantify naming responses on a continuous scale of semantic relatedness.

Our word2vec-based method’s advantage over human judgements in measuring lexical-semantic access via picture naming is likely due to its very nature of assessing within-context distributional similarity of word meanings on a continuous (not dichotomous) scale. All three measures of lexical-semantic access (word to picture matching of semantically related vs. unrelated words, Breese & Hillis, 2004); Synonymy triples, Saffran et al., 1988); and the Pyramid and Palm Trees test, Howard & Patterson, 1992) require meaning comparisons accessed via words, pictures or both. Further, the synonymy task includes abstract (e.g., courage, bravery, strength) and concrete (e.g., frog, toad, lizard) concepts while the others, by virtue of their picture presentation format, do not. The word2vec-derived similarity score likely better captured distantly associated relationships assessed by synonymy and PPT by providing measures of semantic similarity across all whole-word responses. Because the word2vec-derived similarity score reflects the degree of both categorical and associated semantic relationships to a degree that dichotomous ratings cannot, it is a more sensitive, and as shown here, accurate measure of impairments in lexical-semantic access during naming.

When using our word2vec-based method to assess lexical-semantic access deficits during naming, two considerations arise. First, we recruited subjects following an acute left hemisphere stroke who were able to produce speech independent of a clinical diagnosis of aphasia. As a result, our subject cohort included patients with a large range of naming abilities (intact to impaired) where fewer patients had severe naming deficits as reflected by an average group accuracy of 86%. Thus, our findings regarding our word2vec-based method’s prediction of lexical-semantic deficits may be limited in their generalization to those with mild to moderate naming deficits. Second, we used Google’s pre-trained word2vec model (Mikolov, Chen, et al., 2013) which, although easy to use due to its off-the-shelf nature, invites several issues. The pre-trained model fixes the number of words considered before and after a word (context window size). Context window size affects generation of word vectors and their cosine distances which form the bases of the estimates of the semantic similarity between words. As a result, adjusting the context window size changes the semantic properties of the word vectors to reflect varying degrees of semantic association and categorical relatedness (Levy & Goldberg, 2015; Mckinney-Bock & Bedrick, 2019). Another issue is that because words can be polysemous, the pre-trained model may not best capture the relationship between a target and response (e.g., “cup” for GLASS may not be evaluated as closely related because cup also refers to championship, cf. Salem, Gale, Fergadiotis, & Bedrick, 2022). Although these issues likely occurred in our analysis, they did not present a large enough confound to dissociate the word2vec-derived similarity scores from the independent measures of lexical-semantic access. That said, future work should include investigations of how changes in hyperparameters improve machine learning models of naming errors (cf. McKinney-Bock et al., 2021) as well as the use of other models beyond word2vec to improve semantic similarity estimates (cf. Salem et al., 2022).

To our knowledge, the work presented here is the first to validate that word2vec-derived similarity scores between targets and naming responses reflect the integrity of lexical-semantic access. As a result, this approach is relevant in work using semantic similarity metrics to identify brain networks which support different representations (e.g., abstract and concrete concepts, taxonomic and thematic concepts, nouns and verbs; cf. Akinina et al., 2019; Kaiser, Jacobs, & Cichy, 2022; Schwartz et al., 2011, 2009; Wang et al., 2018), to categorize neurological disease (e.g., primary progressive aphasia variants, post-stroke aphasia, cf. Budd et al., 2010; Jefferies & Lambon Ralph, 2006; Reilly et al., 2011), to identify the level of naming deficits (e.g., Caramazza, 1997; Dell et al., 1997; Walker, Hickok, & Fridriksson, 2018), and to understand the efficacy of language treatment (e.g., Kristinsson et al., 2021; Leonard et al., 2015; Miceli, Amitrano, Capasso, & Caramazza, 1996; Nickels & Best, 1996). Thus, the adoption of word embedding models like word2vec has implications for understanding the architecture of the language system as well as for clinical approaches to the study of naming deficits across different patient populations.

In conclusion, our study demonstrated that our word2vec-based method is a validated psychometric approach to evaluate the access of lexical-semantic knowledge during naming for speakers after acute left hemisphere stroke. The use of this method circumvents the need to evaluate semantically related naming errors using human dichotomous classification judgements which are laborious, cannot capture the degree of semantic similarity and are susceptible to lower inter-rater reliability because of subjectivity. Because our word2vec-based method is easier to implement and more accurate and consistent, we conclude that for both theorists and clinicians, this will be a useful tool to measure lexical-semantic access during naming.

Key Points.

Question:

We tested whether an automated machine learning approach which calculates the semantic relationship between a picture name and naming response accounts for individual differences in lexical-semantic access during comprehension in a large group of participants after acute left hemisphere stroke.

Findings:

Using word2vec to classify the semantic relatedness of picture naming responses had excellent convergent validity with measures of lexical-semantic access during comprehension and significantly outperformed human judgements.

Importance:

The validation of our word2vec-based method as an automated, continuous, and objective psychometric measure provides both theorists and clinicians an easy to implement, accurate and consistent tool to measure lexical-semantic access during naming.

Next Steps:

Future work should extend application beyond individuals with mild to moderate naming deficits after stroke while also exploring how different language models and model hyperparameters improve estimations of the semantic similarity of naming responses.

Acknowledgements/Funding

The authors wish to thank Jolie Anderson, Miranda Brenneman, Cris Hamilton, and Danielle Rossi for data collection. We thank Saketh Katta, Junhua Ding, and Steve Fung for assistance evaluating subject exclusion criteria. We thank Junhua Ding and Iris Gua for assistance implementing word2vec. We thank the clinical neurological intensive care unit teams at the University of Texas Health Sciences Center and Memorial Hermann Hospital, The Houston Methodist Hospital, and the Baylor St. Luke’s Hospital for their assistance in patient recruitment and neurological assessment. We gratefully acknowledge and thank our research subjects and their caregivers for their willingness to participate in this research. This work was presented at the Academy of Aphasia in Macau, China (2019). This work was supported by the National Institute on Deafness and Other Communication Disorders of the National Institutes of Health under award number R01DC014976 to the Baylor College of Medicine (PI Schnur).

APPENDIX A.

Stimuli list for picture naming. Acceptable alternatives listed in parentheses. We excluded three items due to poor recognizability (“office”, “rock”, and “thorn”) and one item because that word did not exist in the pre-trained word2vec model (“axe”).

Almonds (nuts), ant, axe, ball, bell, belt, bird, book, boot, bridge, broom, butterfly, cabbage (lettuce), camel, cane, carrot, cat, chain, cheese, clock, coat, coins (change, money), crayon, crib, crutch, deer, doll, duck, feather, fence, fork, fox, ghost, globe, hammer, harp, house, key, leaf, mitten (glove, mitt), mop, mouse, mushroom, nose, office, oven (stove), owl, pear, pickle (cucumber), pie, pig, pill (capsule), rabbit, raft, ribbon, robot, rock (stone), rope, ruler, scarf, seal, ship, skull, staples, sun, sweater, thorn, toast (bread), toothbrush, van, vase, violin, witch.

APPENDIX B.

Python code used to extract word2vec-derived similarity scores.

import gensim.models.keyedvectors as word2vec model=word2vec.KeyedVectors.load_word2vec_format (‘/Users/GoogleNews-vectors-negative300.bin’, binary=True)
import csv
accepted_forms =
{
“almonds”: [“almonds”, “nuts”, “almond”, “nut”], “ant”: [“ant”, “ants”], “ball”: [“ball”, “balls”],
“bell”: [“bell”, “bells”], “belt”: [“belt”, “belts”], “book”: [“book”, “books”], “boot”: [“boot”,
“boots”], “bird”: [“bird”, “birds”], “bridge”: [“bridge”, “bridges”], “broom”: [“broom”,
“brooms”], “butterfly”: [“butterfly”, “butterflies”], “cabbage”: [“cabbage”, “lettuce”,
“cabbages”, “lettuces”], “camel”: [“camel”, “camels”], “cane”: [“cane”, “canes”], “carrot”:
[“carrot”, “carrots”], “cat”: [“cat”, “cats”], “chain”: [“chain”, “chains”], “cheese”: [“cheese”,
“cheeses”], “clock”: [“clock”, “clocks”], “coat”: [“coat”, “coats”], “coins”: [“coins”, “change”,
“pocket change”, “money”, “coin”], “crayon”: [“crayon”, “crayons”], “crib”: [“crib”, “cribs”],
“crutch”: [“crutch”, “crutches”], “deer”: [“deer”, “deers”], “doll”: [“doll”, “dolls”], “duck”:
[“duck”, “ducks”], “feather”: [“feather”, “feathers”], “fence”: [“fence”, “fences”], “fork”: [“fork”,
“forks”], “fox”: [“fox”, “foxes”], “ghost”: [“ghost”, “ghosts”], “globe”: [“globe”, “globes”],
“hammer”: [“hammer”, “hammers”], “harp”: [“harp”, “harps”], “house”: [“house”, “houses”],
“key”: [“key”, “keys”], “leaf”: [“leaf”, “leaves”], “mitten”: [“mitten”, “glove”, “mitt”, “mittens”,
“gloves”, “mitts”], “mop”: [“mop”, “mops”], “mouse”: [“mouse”, “mice”], “mushroom”:
[“mushroom”, “mushrooms”], “nose”: [“nose”, “noses”], “office”: [“office”, “offices”], “oven”:
[“oven”, “stove”, “ovens”, “stoves”], “owl”: [“owl”, “owls”], “pear”: [“pear”, “pears”], “pickle”:
[“pickle”, “cucumber”, “pickles”, “cucumbers”], “pie”: [“pie”, “pies”], “pig”: [“pig”, “pigs”], “pill”:
[“pill”, “capsule”, “pills”, “capsules”], “rabbit”: [“rabbit”, “rabbits”], “raft”: [“raft”, “rafts”],
“ribbon”: [“ribbon”, “ribbons”], “robot”: [“robot”, “robots”], “rock”: [“rock”, “stone”, “rocks”,
“stones”], “rope”: [“rope”, “ropes”], “ruler”: [“ruler”, “rulers”], “scarf”: [“scarf”, “scarfs”,
“scarves”], “seal”: [“seal”, “seals”], “ship”: [“ship”, “ships”], “skull”: [“skull”, “skulls”], “staples”:
[“staples”, “staple”], “sun”: [“sun”, “suns”], “sweater”: [“sweater”, “sweaters”], “thorn”:
[“thorn”, “thorns”], “toast”: [“toast”, “bread”, “toasts”, “breads”], “toothbrush”: [“toothbrush”,
“toothbrushes”], “van”: [“van”, “vans”], “vase”: [“vase”, “vases”] “violin”: [“violin”, “violins”],
“witch”: [“witch”, “witches”],
}
with open(‘/Users/data.csv’, ‘w’) as csvfile4:
      output=csv.writer(csvfile4)
      for kk in range(0,len(data)):
             output.writerow(data[kk])

Footnotes

CRediT authorship contribution statement

Tatiana T. Schnur: Conceptualization, Formal analysis, Visualization, Methodology, Writing – original draft, Writing – Review & Editing, Supervision, Funding acquisition.

Chia-Ming Lei: Investigation, Formal analysis, Writing – original draft.

1

Due to clinical demands in the acute care setting, in some cases we were unable to collect education from all participants.

2

The normal distribution of data is not essential to use z-scoring to put variables using different metrics onto a common scale (Rosenthal & Rosnow, 1991).

References

  1. Akinina YS, Dragoy OV, Ivanova MV, Iskra EV, Soloukhina OA, Petryshevsky AG, … Dronkers NF (2019). Grey and white matter substrates of action naming. Neuropsychologia, 131, 249–265. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Andrews M, Vigliocco G, & Vinson D (2009). Integrating experiential and distributional data to learn semantic representations. Psychological Review, 116(3), 463. 10.1037/A0016261 [DOI] [PubMed] [Google Scholar]
  3. Beaty RE, & Johnson DR (2021). Automating creativity assessment with SemDis: An open platform for computing semantic distance. Behavior Research Methods, 53(2), 757–780. 10.3758/s13428-020-01453-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Boyle M (2010). Semantic feature analysis treatment for aphasic word retrieval impairments: What’s in a name? Topics in Stroke Rehabilitation, 17(6), 411–422. [DOI] [PubMed] [Google Scholar]
  5. Boyle M, Gordon JK, Harnish S, Kiran S, Martin N, Rose ML, & Salis C (2022). Evaluating cognitive-linguistic approaches to interventions for aphasia within the Rehabilitation Treatment Specification System. Archives of Physical Medicine and Rehabilitation2, 103(3), 590–598. [DOI] [PubMed] [Google Scholar]
  6. Breese EL, & Hillis AE (2004). Auditory comprehension: Is multiple choice really good enough? Brain and Language, 89(1), 3–8. 10.1016/S0093-934X(03)00412-7 [DOI] [PubMed] [Google Scholar]
  7. Brodeur MB, Dionne-Dostie E, Montreuil T, & Lepage M (2010). The bank of standardized stimuli (BOSS), a new set of 480 normative photos of objects to be used as visual stimuli in cognitive research. PLoS ONE, 5(5). 10.1371/journal.pone.0010773 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Budd MA, Kortte K, Cloutman L, Newhart M, Gottesman RF, Davis C, … Hillis AE (2010). The nature of naming errors in primary progressive aphasia versus acute post-stroke aphasia. Neuropsychology, 24(5), 581–589. 10.1037/a0020287 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Caramazza A (1997). How many levels of processing are there in lexical access? Cognitive Neuropsychology, 14(1), 177–208. 10.1080/026432997381664 [DOI] [Google Scholar]
  10. Chapman CA, Hasan O, Schulz PE, & Martin RC (2020). Evaluating the distinction between semantic knowledge and semantic access: Evidence from semantic dementia and comprehension-impaired stroke aphasia. Psychonomic Bulletin and Review, 27(4), 607–639. 10.3758/S13423-019-01706-6/FIGURES/2 [DOI] [PubMed] [Google Scholar]
  11. Coelho CA, McHugh, Regina E, & Boyle M (2000). Semantic feature analysis as a treatment for aphasic dysnomia: A replication. Aphasiology, 14(2), 133–142. 10.1080/026870300401513 [DOI] [Google Scholar]
  12. Dell GS (1986). A spreading-activation theory of retrieval in sentence production. Psychological Review, 93(3), 283–321. 10.1037/0033-295X.93.3.283 [DOI] [PubMed] [Google Scholar]
  13. Dell GS, Schwartz MF, Martin N, Saffran EM, & Gagnon DA (1997). Lexical access in aphasic and nonaphasic speakers. Psychological Review, 104(4), 801–838. 10.1037/0033-295X.104.4.801 [DOI] [PubMed] [Google Scholar]
  14. Devlin J, Chang M-W, Lee K, & Toutanova K (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. ArXiv Preprint 1810.04805.
  15. Drew RL, & Thompson CK (1999). Model-based semantic treatment for naming deficits in aphasia. Journal of Speech, Language, and Hearing Research, 42(4), 972–989. [DOI] [PubMed] [Google Scholar]
  16. Fergadiotis G, Gorman K, & Bedrick S (2016). Algorithmic classification of five characteristic types of paraphasias. American Journal of Speech-Language Pathology, 25(4), S776–S787. 10.1044/2016_AJSLP-15-0147 [DOI] [PubMed] [Google Scholar]
  17. Firth JR (1957). A synopsis of linguistic theory 1930–1955. In Studies in Linguistic Analysis (pp. 1–32). [Google Scholar]
  18. Foygel D, & Dell GS (2000). Models of impaired lexical access in speech production. Journal of Memory and Language, 43(2), 182–216. 10.1006/jmla.2000.2716 [DOI] [Google Scholar]
  19. Goldrick M, & Rapp B (2007). Lexical and post-lexical phonological representations in spoken production. Cognition, 102(2), 219–260. 10.1016/j.cognition.2005.12.010 [DOI] [PubMed] [Google Scholar]
  20. Günther F, Dudschig C, & Kaup B (2016). Latent semantic analysis cosines as a cognitive similarity measure: Evidence from priming studies. Quarterly Journal of Experimental Psychology, 69(4), 626–653. 10.1080/17470218.2015.1038280 [DOI] [PubMed] [Google Scholar]
  21. Günther F, Rinaldi L, & Marelli M (2019). Vector-space models of semantic representation from a cognitive perspective: A discussion of common misconceptions. Perspectives on Psychological Science, 14(6), 1006–1033. 10.1177/1745691619861372 [DOI] [PubMed] [Google Scholar]
  22. Harris ZS (1954). Distributional structure. Word, 10(2–3), 146–162. [Google Scholar]
  23. Harvey DY, & Schnur TT (2015). Distinct loci of lexical and semantic access deficits in aphasia: Evidence from voxel-based lesion-symptom mapping and diffusion tensor imaging. Cortex, 67, 37–58. 10.1016/j.cortex.2015.03.004 [DOI] [PubMed] [Google Scholar]
  24. Hillis AE, Tuffiash E, Wityk RJ, & Barker PB (2002). Regions of neural dysfunction associated with impaired naming of actions and objects in acute stroke. Cognitive Neuropsychology, 19(6), 523–534. 10.1080/02643290244000077 [DOI] [PubMed] [Google Scholar]
  25. Howard D (1995). Lexical anomia: Or the case of the missing lexical entries: The Quarterly Journal of Experimental Psychology Section A, 48(4), 999–1023. 10.1080/14640749508401426 [DOI] [Google Scholar]
  26. Howard D, & Patterson K (1992). Pyramids and palm trees: A test of semantic access from pictures and words. Thames Valley: Bury St. Edmonds. [Google Scholar]
  27. Jefferies E, & Lambon Ralph MA (2006). Semantic impairment in stroke aphasia versus semantic dementia: A case-series comparison. Brain, 129(8), 2132–2147. 10.1093/brain/awl153 [DOI] [PubMed] [Google Scholar]
  28. Kaiser D, Jacobs AM, & Cichy RM (2022). Modelling brain representations of abstract concepts. PLoS Computational Biology, 18(2), e1009837. 10.1371/journal.pcbi.1009837 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Kaplan E, Goodlass H, & Weintrab S (1983). The Boston Naming Test. Philadelphia: Lea and Febiger. [Google Scholar]
  30. Kristinsson S, Basilakos A, Elm J, Spell LA, Bonilha L, Rorden C, … Fridriksson J (2021). Individualized response to semantic versus phonological aphasia therapies in stroke. Brain Communications, 3(3), fcab174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Landauer TK, & Dumais ST (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211–240. [Google Scholar]
  32. Lenci A (2008). Distributional semantics in linguistic and cognitive research. Italian Journal of Linguistics, 1, 1–31. [Google Scholar]
  33. Leonard C, Laird L, Burianova H, Graham S, Grady C, Simic T, & Rochon E (2015). Behavioural and neural changes after a “choice” therapy for naming deficits in aphasia: preliminary findings. Aphasiology, 29(4), 506–525. [Google Scholar]
  34. Leonard C, Rochon E, & Laird L (2008). Treating naming impairments in aphasia: Findings from a phonological components analysis treatment. Aphasiology, 22(9), 929–947. [Google Scholar]
  35. Levelt WJ. (1989). Speaking: From Intention to Articulation. United Kingdom: MIT Press. [Google Scholar]
  36. Levelt WJM (1999). Models of word production. Trends in Cognitive Sciences, 3(6), 223–232. 10.1016/S1364-6613(99)01319-4 [DOI] [PubMed] [Google Scholar]
  37. Levy O, & Goldberg Y (2015). Dependency based word embeddings. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 302–308. [Google Scholar]
  38. Maher L, & Raymer A (2004). Management of anomia. Topics in Stroke Rehabilitation, 11(1), 10–21. [DOI] [PubMed] [Google Scholar]
  39. Mandera P, Keuleers E, & Brysbaert M (2017). Explaining human performance in psycholinguistic tasks with models of semantic similarity based on prediction and counting: A review and empirical validation. Journal of Memory and Language, 92, 57–78. 10.1016/J.JML.2016.04.001 [DOI] [Google Scholar]
  40. Martin RC, Ding J, Hamilton AC, & Schnur TT (2021). Working memory capacities neurally dissociate: Evidence from acute stroke. Cerebral Cortex Communications, tgab005(6), 862–876. 10.1093/texcom/tgab005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Martin RC, & Schnur TT (2019). Independent contributions of semantic and phonological working memory to spontaneous speech in acute stroke. Cortex, 112, 58–68. 10.1016/j.cortex.2018.11.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Mckinney-Bock K, & Bedrick S (2019). Classification of semantic paraphasias: Optimization of a word embedding model. Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP, 52–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. McKinney-Bock K, Cowan B, Li L, Casilio M, Fergadiotis G, & Bedrick S (2021). Improving machine learning models of paraphasia classification with semantic and lexical information. 50th Clinical Aphasiology Conference. [Google Scholar]
  44. Miceli G, Amitrano A, Capasso R, & Caramazza A (1996). The treatment of anomia resulting from output lexical damage: Analysis of two cases. Brain and Language, 52, 150–174. [DOI] [PubMed] [Google Scholar]
  45. Mikolov T, Chen K, Corrado G, & Dean J (2013). Efficient estimation of word representations in vector space. ArXiv Preprint ArXiv:1301.3781. [Google Scholar]
  46. Mikolov T, Sutskever I, Corrado G, & Dean J (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 3111–3119. [Google Scholar]
  47. Neumann Y (2018). A case series comparison of semantically focused vs. phonologically focused cued naming treatment in aphasia. Clinical Linguistics & Phonetics, 32(1), 1–27. [DOI] [PubMed] [Google Scholar]
  48. Nicholas LE, Brookshire RH, Maclennan DL, Schumacher JG, & Porrazzo SA (1989). Revised administration and scoring procedures for the Boston Naming test and norms for non-brain-damaged adults. Aphasiology, 3(6), 569–580. 10.1080/026870300401513 [DOI] [Google Scholar]
  49. Nickels L (2002). Therapy for naming disorders: Revisiting, revising, and reviewing. Aphasiology, 16(10–11), 935–979. [Google Scholar]
  50. Nickels L, & Best W (1996). Therapy for naming disorders (Part II): Specifics, surprises and suggestions. Aphasiology, 10(2), 109–136. 10.1080/02687039608248401 [DOI] [Google Scholar]
  51. Nozari N, Dell GS, & Schwartz MF (2011). Is comprehension necessary for error detection? A conflict-based account of monitoring in speech production. Cognitive Psychology, 63(1), 1–33. 10.1016/j.cogpsych.2011.05.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. O’Brien RM (2007). A caution regarding rules of thumb for variance inflation factors. Quality & Quantity, 41, 673–690. https://doi.org/0.1007/s11135-006-9018-6 [Google Scholar]
  53. Pereira F, Gershman S, Ritter S, & Botvinick M (2016). A comparative evaluation of off-the-shelf distributed semantic representations for modelling behavioural data. Cognitive Neuropsychology, 33(3–4), 175–190. 10.1080/02643294.2016.1176907 [DOI] [PubMed] [Google Scholar]
  54. Peters ME, Neumann M, Gardner M, Clark C, Lee K, & Zettlemoyer L (2018). Deep contextualized word representations. ArXiv Preprint, 2227–2237. [Google Scholar]
  55. Rapp B, & Caramazza A (1993). On the distinction between deficits of access and deficits of storage: A question of theory. Cognitive Neuropsychology, 10(2), 113–141. [Google Scholar]
  56. Reilly J, Peelle JE, Antonucci SM, & Grossman M (2011). Anomia as a marker of distinct semantic memory impairments in Alzheimer’s disease and semantic dementia. Neuropsychology, 25(413–426). [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Rose M, & Douglas J (2008). Treating a semantic word production deficit in aphasia with verbal and gesture methods. Aphasiology, 22(1), 20–41. 10.1080/02687030600742020 [DOI] [Google Scholar]
  58. Rosenthal R, & Rosnow RL (1991). Essentials of behavioral research (2nd ed.). New York: McGraw-Hill. [Google Scholar]
  59. Saffran EM, Schwartz MF, Linebarger MC, Martin N, & Bochetto P (1988). The Philadelphia comprehension battery. In Unpublished test battery. [Google Scholar]
  60. Salem A, Gale R, Fergadiotis G, & Bedrick S (2022). Improving automatic semantic similarity classification of the PNT. In EasyChair. [Google Scholar]
  61. Salton G, Wong A, & Yang CS (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620. [Google Scholar]
  62. Schwartz MF, & Brecher A (2000). A model-driven analysis of severity, response characteristics, and partial recovery in aphasics’ picture naming. Brain and Language, 73(1), 62–91. 10.1006/BRLN.2000.2310 [DOI] [PubMed] [Google Scholar]
  63. Schwartz MF, Kimberg DY, Walker GM, Brecher A, Faseyitan OK, Dell GS, … Coslett HB (2011). Neuroanatomical dissociation for taxonomic and thematic knowledge in the human brain. Proceedings of the National Academy of Sciences, 108(20), 8520–8524. 10.1073/pnas.1014935108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Schwartz MF, Kimberg DY, Walker GM, Faseyitan O, Brecher A, Dell GS, & Coslett HB (2009). Anterior temporal involvement in semantic word retrieval: Voxel-based lesion-symptom mapping evidence from aphasia. Brain, 132(12), 3411–3427. 10.1093/brain/awp284 [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Stanislaw H, & Todorov N (1999). Calculation of signal detection theory measures. Behavior Research Methods, Instruments, and Computers, 31(1), 137–149. 10.3758/BF03207704 [DOI] [PubMed] [Google Scholar]
  66. Tippett DC, & Hillis AE (2015). The cognitive processes underlying naming. In The Handbook of Adult Language Disorders (pp. 141–150). Psychology Press. [Google Scholar]
  67. Turney PD, & Pantel P (2010). From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37, 141–188. [Google Scholar]
  68. Walker GM, Hickok G, & Fridriksson J (2018). A cognitive psychometric model for assessment of picture naming abilities in aphasia. Psychological Assessment, 30(6), 809–826. 10.1037/pas0000529 [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Walker GM, Schwartz MF, Kimberg DY, Faseyitan O, Brecher A, Dell GS, & Coslett HB (2011). Support for anterior temporal involvement in semantic error production in aphasia: New evidence from VLSM. Brain and Language, 117(3), 110–122. 10.1016/j.bandl.2010.09.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Wang X, Xu Y, Wang Y, Zeng Y, Zhang J, Ling Z, & Bi Y (2018). Representational similarity analysis reveals task-dependent semantic influence of the visual word form area. Scientific Reports, 8(1), 1–10. 10.1038/s41598-018-21062-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Warrington EK, & Shallice T (1979). Semantic access dyslexia. Brain, 102(1), 43–63. [DOI] [PubMed] [Google Scholar]
  72. Wei T, & Schnur TT (2016). Long-term interference at the semantic level: Evidence from blocked-cyclic picture matching. Journal of Experimental Psychology: Learning Memory and Cognition, 42(1). 10.1037/xlm0000164 [DOI] [PubMed] [Google Scholar]
  73. Zingeser LB, & Berndt RS (1990). Retrieval of nouns and verbs in agrammatism and anomia. Brain and Language, 39(1), 14–32. 10.1016/0093-934X(90)90002-X [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Anonymized data that support the findings of this study are publicly available on Open Science Framework (OSF).

RESOURCES