Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Oct 1.
Published in final edited form as: Ann Neurol. 2023 Aug 18;94(4):647–657. doi: 10.1002/ana.26744

Measuring Sentence Information via Surprisal: Theoretical and Clinical Implications in Nonfluent Aphasia

Neguine Rezaii 1, James Michaelov 2, Sylvia Josephy-Hernandez 1, Boyu Ren 3, Daisy Hochberg 1, Megan Quimby 1, Bradford C Dickerson 1
PMCID: PMC10543558  NIHMSID: NIHMS1920541  PMID: 37463059

Abstract

Objective

Nonfluent aphasia is characterized by simplified sentence structures and word-level abnormalities, including reduced use of verbs and function words. The predominant belief about the disease mechanism is that a core deficit in syntax processing causes both structural and word-level abnormalities.

Here, we propose an alternative view based on information theory to explain the symptoms of nonfluent aphasia. We hypothesize that the word-level features of nonfluency constitute a distinct compensatory process to augment the information content of sentences to the level of healthy speakers. We refer to this process as lexical condensation.

Methods

We use a computational approach based on Language Models (LMs) to measure sentence information through surprisal, a metric calculated by the average probability of occurrence of words in a sentence, given their preceding context. We apply this method to the language of patients with nonfluent primary progressive aphasia (nfvPPA) (n=36) and healthy controls (n=133) as they describe a picture.

Results

We found that nfvPPA patients produced sentences with the same sentence surprisal as healthy controls by using richer words in their structurally impoverished sentences. Furthermore, higher surprisal in nfvPPA sentences correlated with the canonical features of agrammatism: a lower function-to-all-word ratio, a lower verb-to-noun ratio, a higher heavy-to-all-verb ratio, and a higher ratio of verbs in -ing forms.

Interpretation

Using surprisal enables testing an alternative account of nonfluent aphasia that regards its word-level features as adaptive, rather than defective, symptoms, a finding that would call for revisions in the therapeutic approach to nonfluent language production.

Graphical Abstract

graphic file with name nihms-1920541-f0001.jpg

INTRODUCTION

Nonfluent aphasia is a language disorder characterized by effortful speech and impaired sentence formation. The sentence impairment can be described at the levels of words as well as structures that determine word relationships.1,2 At the structural level, nonfluent patients have difficulty using complex syntactic rules, such as embedding one clause into another. At the word level, the impairment is manifested by using fewer function words (e.g., pronouns, determiners, and prepositions) and a lower proportion of verbs to nouns than healthy controls. The cause for both levels of impairment is commonly believed to be a core deficit in syntax processing.3,4 Under this agrammatic account, however, several basic features of nonfluency remain unexplained. For example, it is unclear why nonfluent patients tend to use heavier (i.e., semantically richer) verbs,5,6 more verbs in the -ing form, and often have an intact comprehension of verbs and function words.7,8 While often dismissed by the agrammatic account, these unexplained features may point to a distinct language process critical for communication in nonfluent aphasia.

Our recent findings based on information theory and probabilistic linguistics have raised an alternative explanation of the symptoms of nonfluent aphasia. One study showed that when healthy speakers are constrained to produce short sentences consisting of only one to two words, similar word-level features of nonfluency emerge in their language.9 The constrained utterances had increased proportions of content words over function words, nouns over verbs, heavy verbs over light verbs, and more verbs in the -ing form than nonconstrained sentences. The resemblance between the language of healthy speakers under a production constraint and the typical language of nonfluent patients suggests that this style of word selection may not be a defect but rather a response to a bottleneck in language production. Our results also showed that words that are commonly dropped by nonfluent patients have a higher frequency of occurrence. Using an information-theoretic approach, we showed that high-frequency words contribute less to the information content of a sentence than low-frequency words. The work thus concluded that nonfluent patients drop the less informative words of a sentence in favor of more informative ones in response to their deficit in producing complex structures. This conclusion was supported by another study that compared the complexity of words and syntactic structures of sentences in patients with nonfluent aphasia.10 Similar to words, the complexity of syntactic structures can be measured through their frequency. A syntactic structure that is commonly used is easier to access, hence less complex. The work showed that sentences with high-frequency syntactic structures contain lower-frequency words. Similarly, sentences with low-frequency syntactic structures contained higher-frequency words. This trade-off between the frequencies of words and syntactic rules offers yet another clue to a possible compensatory mechanism in which syntactically impoverished sentences are packaged with more informative words, enabling patients to communicate their message.

Here, we aim to test such a compensatory hypothesis in nonfluent aphasia (Fig 1). According to this hypothesis, patients with nonfluent aphasia choose more informative words in their syntactically simple sentences, with the net effect of sustaining sentence information. We refer to this process of increasing sentence information through the choice of words as lexical condensation. If true, this alternative explanation would offer a shift from the prevailing agrammatic account. The information theorety-based understanding of the disease process would then require revisions in current therapeutic approaches to nonfluent aphasia. The hypothesis can be tested by measuring the informational content of sentences through a metric that factors in both lexical and syntactic information. The metric will enable us to explore how these two linguistic elements interact to encompass the information content of a sentence in health or disease. One such metric is surprisal, a measure of the likelihood of occurrence of words in a sentence, given their preceding context.11 Words with a lower probability of occurrence following a context will be more surprising, and thus have a higher informational content. For example, in the context “the woman is pouring a …”, the word seltzer has higher surprisal than the word drink due to its lower probability of occurrence given the preceding words. Unlike word frequency, which reflects the information content of a word in isolation, surprisal has the critical advantage of being sensitive to its preceding context. Therefore, both lexical and syntactic properties of the context will contribute to the probability of occurrence of the upcoming word,12 because not all words fit all structures or strings of words.13

Figure 1.

Figure 1.

The working hypothesis of the study. Each sentence, represented by the black sliding bar, can be made up of a different share of lexical and syntactic information. In a healthy individual, the bar can slide over a wide range of possible combinations of the two sources of information while keeping the sentence information constant. In nfvPPA, the pathological process limits the use of complex syntax, pushing the sliding bar to the left, where more informative words must be selected to convey the intended message. Sentence information is measured by average surprisal, lexical information by average content word frequency, and syntactic information by average syntax frequency.

We calculate surprisal using an automated algorithm, PsychFormers.14 The algorithm is based on Language Models (LMs), which are computational systems designed to assign probabilities to sequences of words.15 The surprisal of a sentence is operationalized as the average surprisal of its words or subwords as recognized by the LM. We will apply this algorithm to language samples collected from patients with the nonfluent variant of primary progressive aphasia (nfvPPA) and healthy controls, as they described a picture of a family at a picnic. First, we examine a large sample of language from healthy individuals to establish that the surprisal of a sentence is indeed a composite metric made up of both lexical and syntactic information. We then measure the information content of words and syntactic rules using their respective frequencies. Next, we compare the surprisal of spoken sentences from patients with nfvPPA with that of healthy controls. Finding similar surprisal in length-matched sentences produced by the two groups provides evidence for the proposal that the syntactically simpler sentences of the nonfluent patients are enriched with the use of more informative words. Lastly, we repeat the analyses using the written samples of nfvPPA patients and healthy controls describing the same picture. This analysis helps delineate whether the lexical condensation strategy is unique to spoken language or also exists in writing. If observed only in speaking, the compensation strategy might be a response to effortful speech. In other words, the cost of articulation is so high that nonfluent patients have to resort to short, lexically rich sentences. If the strategy is also present in writing, then the bottleneck is expected to be at an earlier stage in language production that is common to both speaking and writing.

METHODS

Participants

Patients.

Thirty-six patients with nfvPPA were recruited from an ongoing longitudinal study at the Primary Progressive Aphasia Program in the Frontotemporal Disorders Unit of Massachusetts General Hospital (MGH). Thirty-four patients provided spoken samples, and 30 provided written ones (28 patients were common to both groups). We followed the established consensus clinical and imaging-supported criteria for diagnosing nfvPPA.16 The clinical criteria require the patients to have agrammatism and/or effortful speech in addition to two of the following symptoms: impaired comprehension of complex sentences, spared single-word comprehension, and spared object knowledge. The imaging criteria require the patients to have predominant left posterior fronto-insular atrophy on MRI and/or hypometabolism in the corresponding region. Patients were excluded from this study if nondegenerative nervous system, medical conditions, or psychiatric diagnoses better accounted for deficits. To accomplish this diagnostic process, all patients underwent a standard clinical evaluation comprising a structured history obtained from both patient and informant, comprehensive medical, neurological, and psychiatric history and exams, neuropsychological and speech-language assessments, and a clinical brain MRI and PET scans. Ratings on our scale called the Progressive Aphasia Severity Scale (PASS) were also included.17 Modeled after the Clinical Dementia Rating Scale (CDR), PASS uses the clinician’s best judgment and integrates information from the patient’s test performance and a companion’s interview. The PASS includes “boxes” for fluency, syntax, word retrieval and expression, repetition, auditory comprehension, single-word comprehension, reading, writing, and functional communication. The PASS Sum-of-Boxes (SoB) is the sum of the box scores. The clinical and demographic information of the patients is shown in Table 1.

Table 1.

The clinical and demographic information of patients with nfvPPA

Mean
Age (SD)
Mean years of
education (SD)
% Female % Right-
Handed
Mean PASS
articulation
(SD)
Mean PASS
fluency (SD)
Mean PASS
SoB (SD)
Mean CDR
SoB (SD)
67.26
(13.58)
17.09
(6.61)
58.3% 88.9% 0.97
(0.95)
0.81
(0.64)
6.50
(4.47)
1.60
(1.52)

PASS = Progressive Aphasia Severity Scale

CDR = Clinical Dementia Rating

SoB = Sum of Boxes

Healthy controls.

A total of 133 native English speakers with no reported history of neurological or acquired/developmental language disorders were recruited to provide spoken (n = 49) and written (n = 84) samples. Thirty-six participants of the spoken cohort were recruited from the Speech and Feeding Disorders Laboratory at the MGH Institute of Health Professions. The rest of the healthy participants (13 spoken and 84 written) were recruited from Amazon’s Mechanical Turk (MTurk) and received financial compensation for their participation. MTurk participants filled out the short and validated version of the everyday cognition test with twelve items designed to detect cognitive and functional decline.18 Healthy participants had an average age of 50.5 (SD = 16.4) and average years of education of 15.8 (SD = 1.7). Of all healthy participants, 51.9% were female, and 88.9% were right-handed. For all group comparisons, we used a subsample of healthy controls that matched the age and years of education of patients with nfvPPA.

This study was approved by the Mass General Brigham Healthcare System Institutional Review Boards, which govern human subjects research at MGH, and in accordance with their guidelines. All healthy controls and patients from the clinic provided written informed consent to take part in this study. The recruitment of healthy controls from Amazon Turk was approved by the Brain Resilience in Aging: Integrated Neuroscience Studies (BRAINS) program at MGH.

Language samples.

Participants were asked to look at a drawing of a family at a picnic from the Western Aphasia Battery– Revised19 and describe it using as many complete sentences as possible. Spoken language samples were transcribed into text using the Microsoft Dictate application. Because the automatic transcription of the language of patients with nonfluent aphasia is associated with inaccuracies (with a range between zero to 32% in our samples), a research collaborator blind to the grouping manually checked and corrected all transcripts.

All comparisons were matched based on sentence length to control for a possible nonlinear relationship between sentence length and other language features of a sentence. If there was a significant difference in the sentence length between the two groups of comparison, we ran an algorithm that resulted in an equal distribution of sentence length between the two groups. For each sentence length, the algorithm determined the minimum number of sentences available across the two groups and randomly sampled that number of sentences with the specified length from the pool of sentences from both groups. In addition, we matched the one-to-two-word sentences typical of nfvPPA by including data from a cohort of healthy individuals from MTurk who were asked to describe the same picnic picture using only one-to-two-word sentences. This method allowed us to extend our analyses to the short sentences in nfvPPA. We conducted normalization at the sentence level based on our previous finding,10 where we showed that healthy speakers can shift between the use of complex syntax and complex words from sentence to sentence. For example, they might use simple words and complex syntax in one sentence while adopting the opposite strategy in the next sentence. Therefore, the analysis of language at the individual level that averages the syntax frequency and word frequency of all sentences would be insensitive to the normal sentence-level variability in language production.

Measuring surprisal

The surprisal (S) of a word (wi) can be measured by the negative logarithm of its probability given its preceding context w1wi1, as shown in the formula,

S(wi)=logP(wiw1wi1).

To operationalize the measurement of surprisal, we used PsychFormers,14 a tool developed by one of the authors, JM, which calculates the surprisal of a word or sequence of words using a given transformer-based language model. Transformer language models have a neural network LM architecture20 that has been found to outperform recurrent neural networks, 21 the previous state-of-the-art architecture, at the standard language modeling task (predicting words from context, see 22 for review), as well as a range of other tasks.23,24 In the present study, we calculate word surprisal using GPT-2,23 a high-quality transformer language model trained on 40 GB of text data to predict words based on their preceding linguistic context. Specifically, we use the 117 million-parameter GPT-2 model made available through the transformers Python library25 to calculate the total surprisal of a sentence, which was then divided by the number of tokens in the sentence to get a metric of surprisal normalized by sentence length. A lexical token is a sequence of characters split up by the language model’s tokenizer, which can be treated as the equivalent of a word in the language model’s vocabulary. In this work, the surprisal of a sentence denotes the normalized sentence surprisal, which is the average surprisal of the tokens in a sentence. For example, sentence (1) “I forget the name of it” includes high-frequency words and high-frequency syntax, with an average surprisal of 4.9. Whereas, sentence (2) “There is a boombox playing music” contains lower-frequency words and lower-frequency syntax than sentence (1), with an average surprisal of 6.1. Sentence (3), “That is my particular unique problem” has a similar average word frequency compared to sentence (2), but still a lower average syntax frequency because of the juxtaposition of two adjectives before a noun with an average surprisal of 7.5. These example sentences had the same number of words.

Language analysis

Word frequency, syntax frequency, noun frequency, verb frequency, function-to-all-word ratio, verb-to-noun ratio, and heavy-to-all-verb ratio were automatically extracted using Quantitext, a language toolbox developed in the MGH FTD Unit with the goal of increasing precision while reducing human labor.26 To determine the part of speech of words, the toolbox uses the automated Stanza Lexicalized Parser.27 Nouns, verbs (except be, have, and do), adjectives, and adverbs were considered as content words. All other words were classified as function words. We measured the function-to-all-word ratio by dividing the number of function words by the number of all words in a sentence. The verb-to-noun ratio was calculated by dividing the number of verbs by the sum of nouns and verbs in each sentence. The following verbs were classified as light verbs: ‘be’, ‘go’, ‘take’, ‘come’, ‘make’, ‘get’, ‘give’, and ‘have’ while excluding auxiliaries from this list.6 All other verbs were classified as heavy verbs. Light verbs are relatively nonspecific semantically and often need to be accompanied by a noun or adjective to deliver a more specific meaning, such as “taking a nap”. In contrast, heavy verbs, such as “napping,” carry a specific meaning.28 Heavy-to-all-verb ratio was measured by dividing the number of heavy verbs by the total number of verbs in a sentence.

To measure word frequency, we used the Switchboard corpus,29 which consists of spontaneous telephone conversations averaging 6 minutes in length spoken by over 500 speakers of both sexes from a variety of dialects of American English. We use this corpus to estimate word frequency in spoken English, independently of the patient and control sample. The corpus contains 2,345,269 words. The word frequency of each sentence is calculated by taking the average log frequency of content words within that sentence with reference to the Switchboard counts.

To measure syntax frequency,10 we first parsed the sentences in the corpus using Stanza to extract headed syntactic rules. A headed syntactic rule is determined by the head and all its dependents in a dependency parse, whether they occur on the left or right. We applied this method on Switchboard, which resulted in 954,616 rules, and measured the syntax frequency of each sentence of participants by calculating the average log frequency of the syntactic rules of that sentence based on the Switchboard counts.

Statistical analyses

For the statistical analyses of this study, we used the R software version 4.1.2. To estimate the smooth but potentially nonlinear relationship between the average surprisal, syntax frequency, and word frequency of a sentence, we used generalized additive models (GAM). GAM is a generalized linear model in which the mean of the outcome is a sum of unknown smooth univariate functions of continuous predictors.30 The value of effective degrees of freedom (EDF) formed by the GAM model shows the degree of curvature of the relationship. A value of 1 for EDF is translated as a linear relationship. Values larger than one denote a more complex relationship between the predicting and outcome variables. We used the “gam” function in the “mgcv” package in R to fit the model.31 The model parameters were estimated via the restricted maximum likelihood (REML) method.32 To evaluate the relationship between sentence length, word frequency, and syntax frequency, we combined the language data from spoken and written modalities in healthy controls. Due to the sparsity of the distributions of the syntax frequency and word frequency at the two tails, which made the gam model prone to noise, we included 99% of the sentences by removing the sparse data at the two tails from both word frequency and syntax frequency distributions. To compare language features at the sentence level across different groups, we used mixed-effects models with subject-specific random intercept via the lme4 package in R.33 To evaluate the relationship between surprisal and other lexical properties of a sentence, we used repeated measures correlation to analyze the common intra-individual association for paired repeated measures. Repeated measures correlation (rmcorr) accounts for intra-participant dependence among repeated observations using analysis of covariance (ANCOVA). By removing measured inter-participant variability, rmcorr provides the best linear fit of the intra-participant relationship between a pair of variables, assuming a common slope but varying intercepts across all subjects. We used the rmcorr package from R to perform the analysis.34 Bonferroni correction was applied here to adjust for multiple comparisons with the alpha set at 0.01.

Results

1. Testing whether surprisal is a composite metric consisting of lexical and syntactic information

We first evaluated how the surprisal of a sentence is related to the frequency of its content words and syntactic rules in the language samples of healthy individuals, including both written and spoken modalities. We fitted a GAM to the dataset to account for a potential non-linearity of the relationship between the variables of interest. A subject-specific random intercept, which captured the intra-participant correlation of the repeated measures, was included in the model, which was used to predict the average surprisal of a sentence from the word frequency and syntax frequency.

The surprisal of a sentence could be predicted from both word frequency (EDF = 1, p < 0.001) and syntax frequency (EDF = 2.6, p < 0.001) in the sentence. (Fig 2) shows partial effect plots for each smooth term in the GAM that contribute to the overall prediction. The value of the effective degrees of freedom (EDF) for the word frequency term shows a linear relationship between the surprisal of a sentence and the average frequency of the content words of that sentence. The relationship between the surprisal of a sentence and average syntax frequency is nonlinear, as shown in the pattern in (Fig 2).

Figure 2.

Figure 2.

Sentence surprisal is predicted by word frequency as well as syntax frequency within each sentence. Partial effect plots for each smooth term–word frequency and syntax frequency–in the GAM illustrate each component of the model, predicting sentence surprisal. Shaded zones show 95% confidence intervals around the mean of the effect. “Adjusted sentence surprisal” represents the sentence surprisal, adjusted for the other frequency variable (i.e., in the plot showing that word frequency is inversely correlated with adjusted sentence surprisal, sentence surprisal is adjusted for syntax frequency).

2. Comparing word frequency, syntax frequency, and surprisal in the spoken and written samples of nfvPPA patients and healthy controls

2.1. Comparing the spoken language of nfvPPA patients and healthy controls

Fitting a mixed-effects model for sentence length with a fixed effect of subject groups (nfvPPA vs. control) and random intercepts for subjects, we found that patients with nfvPPA produce shorter spoken sentences (mean = 5.57, SD = 3.31) than healthy individuals (mean = 8.96, SD = 4.49) (β = −3.228, SE = 0.487, t = −6.625, p < 0.001). In light of the statistically significant difference in the distribution of sentence length in the two subject groups and the potential nonlinear effects of sentence length on other metrics, we performed the analyses on subsamples of language data from each group with matched distributions of the sentence length. For all the comparisons below, a mixed-effects model was used to predict the variable of interest with group and sentence length as predictors and random intercepts for subjects. The distributions for word frequency, syntax frequency, and surprisal of the two groups are shown in (Fig 3A).

Figure 3.

Figure 3.

The density graphs of word frequency, syntax frequency, and surprisal at the sentence level to compare spoken and written modalities in nfvPPA patients and healthy controls. * denotes p < 0.05, ** p < 0.01, and NS non-significance.

Word frequency. Compared to word frequency in the spoken language of healthy individuals (mean = 5.58, SD = 1.54), patients with nfvPPA produced sentences with lower content word frequency (mean = 5.21, SD = 1.70) (β = −0.346, SE = 0.149, t = −2.318, p = 0.023). Syntax frequency. Compared to syntax frequency in the spoken language of healthy individuals (mean = 7.63, SD = 1.82), patients with nfvPPA produced syntactic rules of higher frequency (mean = 8.17, SD = 1.74) (β = 0.534, SE = 0.171, t = 3.117, p = 0.003). Surprisal. We found no statistical difference between the surprisal of sentences in the spoken samples of healthy controls (mean = 7.59, SD = 1.92) and patients with nfvPPA (mean = 7.81, SD = 1.76) (β = 0.146, SE = 0.209, t = 0.70, p = 0.487). These results suggest that by using richer words in their grammatically simpler spoken sentences, nfvPPA patients keep sentence surprisal the same as healthy controls

2.2. Comparing the spoken and written language of healthy controls

To compare the sentence length of the spoken and written language of healthy individuals as they describe the same picture, we fitted a mixed-effects model to predict sentence length from the modality of language production. The sentence length of spoken (mean = 9.31, SD = 4.74) and written (mean = 8.65, SD = 4.02) language were not statistically different in healthy individuals (β = −0.515, SE = 0.448, t = −1.149, p = 0.253). For all the comparisons below, a mixed-effects model was used to detect the difference in variables of interest between two modalities of language production, adjusting for the effect of sentence length. A random intercept was added to capture intra-participant correlations of repeated measures. The distributions for word frequency, syntax frequency, and surprisal of the two groups are shown in (Fig 3B).

Word frequency. Compared to spoken language (mean = 5.80, SD = 1.27), the written language of healthy individuals contained sentences with lower content word frequency (mean = 5.41, SD = 1.12) (β = −0.458, SE = 0.135, t = −3.389, p = 0.001). Syntax frequency. Compared to spoken language (mean = 7.62, SD = 1.55), the written language of healthy individuals contained syntactic structures with a higher frequency of occurrence (mean = 8.12, SD = 1.24) (β = 0.625, SE = 0.190, t = 3.292, p = 0.002). Surprisal. The sentence surprisal of spoken (mean = 6.49, SD = 1.18) and written (mean = 6.87, SD = 1.48) modalities were not statistically different in healthy individuals (β = 0.188, SE = 0.151, t = 1.244, p = 0.223). In healthy individuals, written sentences contain richer words, but simpler syntax, resulting in the same surprisal when compared with spoken language

2.3. Comparing the spoken and written samples of patients with nfvPPA

We repeated the analyses in section 2.2 for patients with nfvPPA and also found no statistical difference between the sentence length of spoken (mean = 5.57, SD = 3.31) and written (mean = 5.54, SD = 2.84) samples (β = 0.232, SE = 0.250, t = 0.926, p = 0.355). The distributions for word frequency, syntax frequency, and surprisal of the two groups are shown in (Fig 3C).

Word frequency. We found no significant difference in content word frequency between spoken (mean = 5.16, SD = 1.63) and written (mean = 5.03, SD = 1.28) language samples of nfvPPA patients (β = −0.145, SE = 0.141, t = −1.025, p = 0.306). Syntax frequency. There was no significant difference in syntax frequency between spoken (mean = 8.16, SD = 1.72) and written (mean = 8.40, SD = 1.48) language samples in nfvPPA (β = 0.213, SE = 0.155, t = 1.372, p = 0.171). Surprisal. Similarly, we found no significant difference in the sentence surprisal of spoken (mean = 7.90, SD = 1.76) and written (mean = 8.07, SD = 2.15) language in nfvPPA (β = 0.176, SE = 0.141, t = 1.245, p = 0.214). Our findings suggest that the spoken and written language samples of patients with nfvPPA show no differences with respect to word frequency, syntax frequency, and surprisal.

2.4. Comparing the written language of nfvPPA patients and healthy controls

Fitting a mixed-effects model for sentence length with a fixed effect of subject groups (nfvPPA vs. control) and random intercepts for subjects, we found that patients with nfvPPA produce shorter written sentences (mean = 5.54, SD = 2.84) than healthy individuals (mean = 8.64, SD = 4.00) (β = −2.490, SE = 0.553, t = −4.499, p < 0.001). Similar to section 2.1, we performed the following analyses on subsamples of language data from each group with equal distributions of sentence length. The distributions for word frequency, syntax frequency, and surprisal of the two groups are shown in (Fig 3D).

Word frequency. We found no statistical difference in the average word frequency of written sentences between healthy individuals (mean = 5.41, SD = 1.12) and patients with nfvPPA (mean = 5.03, SD = 1.28) (β = −0.257, SE = 0.162, t = −1.592, p = 0.117). Syntax frequency. Similarly, there was no statistical difference in the average syntax frequency of written sentences between healthy individuals (mean = 8.12, SD = 1.24) and patients with nfvPPA (mean = 8.40, SD = 1.48) (β = 0.275, SE = 0.208, t = 1.319, p = 0.19). Surprisal. We found no statistical difference between the surprisal of sentences in the written samples of healthy controls (mean = 6.87, SD = 1.48) and patients with nfvPPA (mean = 8.07, SD = 2.15) (β = 0.097, SE = 0.253, t = 0.383, p = 0.70). In sum, we found that the spoken and written language samples of patients with nfvPPA show no differences with respect to word frequency, syntax frequency, and surprisal.

3. Testing the relationship between surprisal and the word-level features of nonfluency

Here, we determine the relationship between the word-level features and surprisal in the language of patients with nfvPPA at the sentence level after combining spoken and written sentences. We used repeated measures correlation37 for this analysis as each participant produced more than one sentence. (Fig 4) shows the radar chart of the coefficient of the repeated measures correlations (rrm). In the language of patients with nfvPPA, surprisal correlated significantly with function-to-content words (rrm = −0.46, p < 0.001), verb-to-noun ratio (rrm = −0.12, p = 0.007), light-to-all-verb ratio (rrm = −0.15, p = 0.006), verb frequency (rrm = −0.24, p < 0.001), and noun frequency (rrm = −0.23, p < 0.001). In sum, using fewer function words, fewer verbs, and more heavy verbs is correlated with higher sentence information.

Figure 4.

Figure 4.

The radar chart shows the absolute value of the coefficient of the repeated measures correlation (rrm) between surprisal and various word-level features at the sentence level in the written and spoken samples of nfvPPA patients.

Discussion

The multiple symptoms that arise following a brain lesion often pose a challenge in determining the order of causality. Mere co-occurrence could lead to the assumption that all symptoms are deficits directly caused by the lesion. However, given the brain’s tendency to restore lost functions, even partially, some symptoms arise as an adaptive response to the core deficit. Although challenging, it is crucial to differentiate adaptive from defective symptoms to avoid targeting the wrong symptoms in rehabilitative approaches and to better understand the foundation of the disease mechanism. This challenge is especially relevant for the gamut of nonfluent symptoms that arise following a lesion to the dominant hemisphere’s fronto-insular regions. Both structural and word-level abnormalities resulting from such damage are commonly viewed as defects caused by the lesion. Here, we employed advances in computational linguistics within an information-theoretic framework to test an alternative hypothesis that the word-level symptoms of nonfluency are, in fact, an adaptive response to the poor formation of sentence structure. We measured the information content of sentences based on surprisal and found that patients with nfvPPA can produce sentences with the same surprisal as healthy controls. To achieve the same level of surprisal, the syntactically impoverished sentences of nonfluent patients get packaged with more informative words through lexical condensation. Surprisal, as calculated by LMs, correlates well with behavioral measures of processing difficulty, such as reading time35,36 and neural measures, such as the N400.37,38 For our study, surprisal turned out to be a particularly suitable metric as it factors in both lexical and syntactic information, allowing us to probe variations in surprisal as the collective representation of these two sources of sentence information. We also showed that higher sentence surprisal correlates with the canonical word-level features of nonfluency, namely, a lower function-to-all-word ratio, a lower verb-to-noun ratio, and a higher heavy-to-all-verb ratio. In addition to offering an alternative explanatory framework about nonfluent aphasia, our findings indicate the need for revisions in therapeutic approaches to the disease. Current treatment modalities often aim to promote the use of function words and verbs (e.g., see50), an approach that might result in an outcome opposite to the goal of increasing the informativity of language. Critically, our results suggest that the essence of treatment should be centered on empowering nonfluent patients to use more informative words when they have difficulty in making structurally complex sentences.

The evidence for lexical condensation in nonfluent aphasia revives a series of accounts based on compensation from the past century. According to the “economy of effort” proposal,39,40 the cost of articulation is so high that patients with nonfluent aphasia are forced to use only the “important” words. Later work on adaptation theory deemed sentence processing in nonfluent aphasia so slow that the sentence elements would disappear from memory before syntactic operations are complete.41,42 This slowed process would lead nonfluent patients to limit the number of sentence elements that need to be retained in memory by using only the most essential words. Although theoretically plausible, these proposals lacked rigorous methods to test their claims in the pre-computational linguistics era.

In healthy language production, lexical and syntactic processing are intricately intertwined.15,49 The lexical identification and structural scaffolding must work in close coordination to ensure that the intended message gets across. A bottleneck in the flow of information in one domain, whether lexical or syntactic, would instigate efforts to recast the message through adjustments in the other domain. We posit that the interactivity between syntax and the lexicon enables nfvPPA patients to readily adapt to their difficulty in making complex structures. Over time, however, the adjustment to the use of richer words may gradually influence the ways in which these patients formulate thoughts.43

The dynamic balance between lexical and syntactic information in healthy individuals is also evident when speaking versus writing. In the present study, despite the constant message of language in the picture description task, we found that written samples contained lower-frequency words than speaking. This finding is consistent with the established literature that lexical information is richer in writing as measured through various metrics such as higher lexical diversity,44,45 more attributive adjectives,45 and higher lexical density (as measured by the number of content words over either the total number of words or clauses).46,47 We also showed that the lower-frequency content words in writing were embedded in higher-frequency syntactic structures. This finding suggests that the writing modality, which is often under less processing demands than speaking, enables accessing lower frequency words, but at the cost of using simpler structures.48 The net effect of this balance is the same amount of surprisal between written and spoken sentences.

Unlike healthy individuals, nfvPPA patients did not display flexibility in changing the share of lexical and syntactic information across speaking and writing. We found no difference in word frequency, syntax frequency, and surprisal between the two modalities in nfvPPA patients. This rigidity is likely due to the fixed deficit in using syntactically complex sentences that limits the choice of syntactic rules to only simple structures, likely placing language production in nfvPPA patients at the maximum capacity for using low-frequency words.49 Lastly, the comparison between speaking and writing in nfvPPA makes it possible to probe further the functional locus of the bottleneck in language production. Our finding of the same lexical and syntactic properties in the written samples rules out the possibility that the articulatory stage is the limiting factor. Future studies should investigate the functional locus of the bottleneck at an earlier stage of production that is common to both modalities and explore the stages of high-level motor planning, retrieving and assembling sentence elements, or message conceptualization as potential candidates.

Summary for Social Media.

1. If you and/or a co-author has a Twitter handle that you would like to be tagged, please enter it here. (format: @AUTHORSHANDLE).

@NeguineR

@DickersonLabMGH

@SylJosephy

@jamichaelov

2. What is the current knowledge on the topic? (one to two sentences)

Patients with nonfluent aphasia have difficulty producing complex syntactic structures. They also have deficits at the word level, such as reduced use of verbs and function words. Traditionally, both kinds of deficits have been attributed to an impairment in the processing of syntax.

3. What question did this study address? (one to two sentences)

Here, we aim to test whether the word-level features of nonfluent aphasia stem from a compensatory response to patients’ difficulty in generating complex syntax. The goal of this compensation response, which we call lexical condensation, is to optimize the information content of sentences.

4. What does this study add to our knowledge? (one to two sentences)

We adopt a computational approach to evaluate the information content of sentences produced by patients with nonfluent aphasia through surprisal. Contrary to the common belief, we show that the word-level features of nonfluency are not deficits but rather a compensation strategy to sustain sentence information.

5. How might this potentially impact on the practice of neurology? (one to two sentences)

This research offers a foundational shift in how we interpret symptoms of nonfluent aphasia. This work also indicates a need for revisions in the treatment of nonfluent aphasia to place the focus on promoting the use of informative words.

If your paper is accepted, our Social Media Editor may decide to Tweet about it or otherwise promote your work, and we encourage you to do the same. You may also submit a draft Tweet of no more than 180 characters that conveys the essential message of your paper, which the Social Media Editor will consider and possibly post to @ANA_Journals on Twitter.

Patients with nonfluent aphasia have difficulty using complex syntactic structures. They thus choose semantically richer words in their structurally simple sentences to sustain sentence information.

Acknowledgments

This research received support from NIH grants R01 DC014296, R21 DC019567, R21 AG073744, and by the Tommy Rickles Chair in Primary Progressive Aphasia Research. We thank Jordan Green and Claire Cordella for providing healthy control data from the Speech and Feeding Disorders Laboratory at the MGH Institute of Health Professions. We thank Arash Afraz for his comments on this work, including text and figures.

Footnotes

Potential Conflict of Interest

The Authors declare no Competing Financial or Non-Financial Interests.

Data availability

The code for measuring surprisal is available at https://github.com/jmichaelov/PsychFormers. Anonymized data not published within this article will be made available by request from any qualified investigator.

References

  • 1.Saffran EM, Berndt RS, Schwartz MF. The quantitative analysis of agrammatic production: Procedure and data. Brain Lang. 1989;37(3):440–479. doi: 10.1016/0093-934X(89)90030-8 [DOI] [PubMed] [Google Scholar]
  • 2.Goodglass H. Agrammatism in aphasiology. Clin Neurosci N Y N. 1997;4(2):51–56. [PubMed] [Google Scholar]
  • 3.Bradley DC, Garrett MF, Zurif EB. Syntactic deficits in Broca’s aphasia. In: Biological Studies Of Mental Processes; David Caplan. MIT Press; 1980:269–286. [Google Scholar]
  • 4.Miceli G, Silveri MC, Villa G, Caramazza A. On the basis for the agrammatic’s difficulty in producing main verbs. Cortex J Devoted Study Nerv Syst Behav. 1984;20(2):207–220. doi: 10.1016/s0010-9452(84)80038-6 [DOI] [PubMed] [Google Scholar]
  • 5.Bencini G, Ronald D. Verb access difficulties in agrammatic aphasic narratives. Pap Present 70th Annu Meet Linguist Soc Am San Diego CA. Published online 1996. [Google Scholar]
  • 6.Breedin SD, Saffran EM, Schwartz MF. Semantic factors in verb retrieval: an effect of complexity. Brain Lang. 1998;63(1):1–31. doi: 10.1006/brln.1997.1923 [DOI] [PubMed] [Google Scholar]
  • 7.Hillis AE, Heidler-Gary J, Newhart M, Chang S, Ken L, Bak TH. Naming and comprehension in primary progressive aphasia: The influence of grammatical word class. Aphasiology. 2006;20(2-4):246–256. doi: 10.1080/02687030500473262 [DOI] [Google Scholar]
  • 8.Kim M, Thompson CK. Patterns of Comprehension and Production of Nouns and Verbs in Agrammatism: Implications for Lexical Organization. Brain Lang. 2000;74(1):1–25. doi: 10.1006/brln.2000.2315 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Rezaii N, Ren B, Quimby M, Hochberg D, Dickerson BC. Less is more in language production: an information-theoretic analysis of agrammatism in primary progressive aphasia. Brain Commun. Published online April 25, 2023:fcad136. doi: 10.1093/braincomms/fcad136 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Rezaii Neguine, Mahowald Kyle, Ryskin Rachel, Dickerson Bradford, Gibson Edward. A syntax–lexicon trade-off in language production. Proc Natl Acad Sci. 2022;119(25):e2120203119. doi: 10.1073/pnas.2120203119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Hale J. A probabilistic earley parser as a psycholinguistic model. In: Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies. NAACL ’01. Association for Computational Linguistics; 2001:1–8. doi: 10.3115/1073336.1073357 [DOI] [Google Scholar]
  • 12.Smith NJ, Levy R. The effect of word predictability on reading time is logarithmic. Cognition. 2013;128(3):302–319. doi: 10.1016/j.cognition.2013.02.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Divjak D. Frequency in Language: Memory, Attention and Learning. Cambridge University Press; 2019. doi: 10.1017/9781316084410 [DOI] [Google Scholar]
  • 14.Michaelov JA, Bergen BK. Do language models make human-like predictions about the coreferents of Italian anaphoric zero pronouns? Published online October 3, 2022. doi: 10.48550/arXiv.2208.14554 [DOI] [Google Scholar]
  • 15.Jurafsky D, Martin JH. Speech and Language Processing An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 3rd edition draft. Univ Colo Boulder. [Google Scholar]
  • 16.Gorno-Tempini ML, Hillis AE, Weintraub S, et al. Classification of primary progressive aphasia and its variants. Neurology. 2011;76(11):1006–1014. doi: 10.1212/WNL.0b013e31821103e6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Sapolsky D, Domoto-Reilly K, Dickerson BC. Use of the Progressive Aphasia Severity Scale (PASS) in monitoring speech and language status in PPA. Aphasiology. 2014;28(8-9):993–1003. doi: 10.1080/02687038.2014.931563 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Tomaszewski Farias S, Mungas D, Harvey DJ, Simmons A, Reed BR, Decarli C. The measurement of everyday cognition: development and validation of a short form of the Everyday Cognition scales. Alzheimers Dement J Alzheimers Assoc. 2011;7(6):593–601. doi: 10.1016/j.jalz.2011.02.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Kertesz A, Kertesz A, Raven JC, PsychCorp (Firm). WAB-R: Western Aphasia Battery-Revised. PsychCorp; 2007. [Google Scholar]
  • 20.Vaswani A, Shazeer N, Parmar N, et al. Attention Is All You Need. Published online December 5, 2017. doi: 10.48550/arXiv.1706.03762 [DOI] [Google Scholar]
  • 21.Elman JL. Finding structure in time. Cogn Sci. 1990;14(2):179–211. doi: 10.1016/0364-0213(90)90002-E [DOI] [Google Scholar]
  • 22.Dai Z, Yang Z, Yang Y, Carbonell J, Le QV, Salakhutdinov R. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. Published online June 2, 2019. doi: 10.48550/arXiv.1901.02860 [DOI] [Google Scholar]
  • 23.Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language Models are Unsupervised Multitask Learners. :24. [Google Scholar]
  • 24.Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics; 2019:4171–4186. doi: 10.18653/v1/N19-1423 [DOI] [Google Scholar]
  • 25.Wolf T, Debut L, Sanh V, et al. Transformers: State-of-the-Art Natural Language Processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics; 2020:38–45. doi: 10.18653/v1/2020.emnlp-demos.6 [DOI] [Google Scholar]
  • 26.Rezaii N, Wolff P, Price BH. Natural language processing in psychiatry: the promises and perils of a transformative approach. Br J Psychiatry. 2022;220(5):251–253. doi: 10.1192/bjp.2021.188 [DOI] [PubMed] [Google Scholar]
  • 27.Qi P, Zhang Y, Zhang Y, Bolton J, Manning CD. Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics; 2020:101–108. doi: 10.18653/v1/2020.acl-demos.14 [DOI] [Google Scholar]
  • 28.Gordon JK, Dell GS. Learning to divide the labor: an account of deficits in light and heavy verb production. Cogn Sci. 2003;27(1):1–40. doi: 10.1207/s15516709cog2701_1 [DOI] [PubMed] [Google Scholar]
  • 29.Godfrey JJ, Holliman EC, McDaniel J. SWITCHBOARD: telephone speech corpus for research and development. In: Proceedings of the 1992 IEEE International Conference on Acoustics, Speech and Signal Processing - Volume 1. ICASSP’92. IEEE Computer Society; 1992:517–520. [Google Scholar]
  • 30.Hastie TJ, Tibshirani RJ. Generalized Additive Models. CRC Press; 1990. [Google Scholar]
  • 31.Wood SN. Mixed GAM Computation Vehicle with Automatic Smoothness Estimation. Published online 2012. [Google Scholar]
  • 32.Corbeil RR, Searle SR. Restricted Maximum Likelihood (REML) Estimation of Variance Components in the Mixed Model. Technometrics. 1976;18(1):31–38. doi: 10.1080/00401706.1976.10489397 [DOI] [Google Scholar]
  • 33.Bates D, Mächler M, Bolker B, Walker S. Fitting Linear Mixed-Effects Models Using lme4. J Stat Softw. 2015;67:1–48. doi: 10.18637/jss.v067.i01 [DOI] [Google Scholar]
  • 34.Bakdash JZ, Marusich LR. Repeated Measures Correlation. Front Psychol. 2017;8:456. doi: 10.3389/fpsyg.2017.00456 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Levy R. Expectation-based syntactic comprehension. Cognition. 2008;106(3):1126–1177. doi: 10.1016/j.cognition.2007.05.006 [DOI] [PubMed] [Google Scholar]
  • 36.Boston MF, Hale J, Kliegl R, Patil U, Vasishth S. Parsing costs as predictors of reading difficulty: An evaluation using the Potsdam Sentence Corpus. J Eye Mov Res. 2008;2(1). doi: 10.16910/jemr.2.1.1 [DOI] [Google Scholar]
  • 37.Frank SL, Otten LJ, Galli G, Vigliocco G. The ERP response to the amount of information conveyed by words in sentences. Brain Lang. 2015;140:1–11. doi: 10.1016/j.bandl.2014.10.006 [DOI] [PubMed] [Google Scholar]
  • 38.Michaelov JA, Coulson S, Bergen BK. So Cloze yet so Far: N400 Amplitude is Better Predicted by Distributional Information than Human Predictability Judgements. IEEE Trans Cogn Dev Syst. Published online 2022:1–1. doi: 10.1109/TCDS.2022.3176783 [DOI] [Google Scholar]
  • 39.Isserlin M. Über Agrammatismus. Zeitschrift für Neurologie und Psychiatrie. 1922;75:332–416. [Google Scholar]
  • 40.Pick A. Die Agrammatischen Sprachstörungen: Studien zur Psychologischen Grundlegung der Aphasielehre. Springer-Verlag; 1913. [Google Scholar]
  • 41.Kolk H. A Theory of Grammatical Impairment in Aphasia. In: Kempen G, ed. Natural Language Generation: New Results in Artificial Intelligence, Psychology and Linguistics. NATO ASI Series. Springer Netherlands; 1987:377–391. doi: 10.1007/978-94-009-3645-4_24 [DOI] [Google Scholar]
  • 42.Kolk HH, Heeschen C. Adaptation symptoms and impairment symptoms in Broca’s aphasia: Aphasiology: Vol 4, No 3. Aphasiology. 1990;4:221–231. [Google Scholar]
  • 43.Eling P. On the function of language. In: Reader in the History of Aphasia: From Franz Gall to Norman Geschwind. John Benjamins Publishing; 1994:242–261. [Google Scholar]
  • 44.Gibson JW, Gruner CR, Kibler RJ, Kelly FJ. A quantitative examination of differences and similarities in written and spoken messages. Speech Monogr. 1966;33(4):444–451. doi: 10.1080/03637756609375510 [DOI] [Google Scholar]
  • 45.Drieman GHJ. Differences between written and spoken language: An exploratory study. Published online 1962. doi: 10.1016/0001-6918(62)90006-9 [DOI] [Google Scholar]
  • 46.Biber D. Variation across Speech and Writing. Cambridge University Press; 1988. doi: 10.1017/CBO9780511621024 [DOI] [Google Scholar]
  • 47.Castello E. Text Complexity and Reading Comprehension Tests. Peter Lang; 2008. [Google Scholar]
  • 48.Rezaii N. The syntax-lexicon trade-off in writing. Published online June 24, 2022. doi: 10.48550/arXiv.2206.12485 [DOI] [Google Scholar]
  • 49.Josephy-Hernandez S, Rezaii N, Jones A, et al. Automated analysis of written language in the three variants of primary progressive aphasia. Published online July 25, 2022:2022.07.24.22277977. doi: 10.1101/2022.07.24.22277977 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Silagi ML, Ferreira OP, de Almeida IJ, et al. Treatment of agrammatism in oral and written production in patients with Broca’s aphasia The use of implicit and explicit learning. Dement Neuropsychol. 2020;14(2):103–109. doi: 10.1590/1980-57642020dn14-020002 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The code for measuring surprisal is available at https://github.com/jmichaelov/PsychFormers. Anonymized data not published within this article will be made available by request from any qualified investigator.

RESOURCES