Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2026 Apr 8:2026.04.07.717076. [Version 1] doi: 10.64898/2026.04.07.717076

Decoding concept representations in aphasia after stroke

Jerry Tang 1,*, Carly Millanski 1, Allison Chen 1, Lisa D Wauters 1, Jordyn Anders 1, Shilpa Shamapant 2,3, Stephen M Wilson 4, Alexander G Huth 5,6,†,*, Maya L Henry 1,2,†,*
PMCID: PMC13081921  PMID: 41993327

Abstract

Many stroke survivors with aphasia struggle to map their thoughts into words or motor plans. Neuroprostheses that decode concept representations could help these individuals communicate by predicting the words, phrases, or sentences that they are struggling to produce. Here we decoded concept representations measured using functional magnetic resonance imaging (fMRI) from participants with different aphasia profiles. The decoders generated continuous word sequences that could describe the concepts that the participants were hearing about, seeing, or imagining. To forecast how this approach would generalize across the heterogeneity of aphasia profiles, we characterized how stroke affects the anatomical organization and information capacity of conceptual processing. Mapping how concepts are organized across the brain, we found that conceptual tuning during non-linguistic processing was largely consistent between the participants with aphasia and neurologically healthy participants. Comparing information processing between the participants with aphasia and neurologically healthy participants, we found that both groups processed similar amounts of non-linguistic information. Our findings indicate that concept representations can be largely spared in individuals with aphasia and demonstrate how these representations can be decoded to support communication.

Introduction

Aphasia is a communication disorder caused by damage to the language regions of the brain. The most common cause of aphasia is stroke, and it is estimated that nearly a third of stroke survivors experience aphasia1. Strokes can affect multiple brain regions that serve different linguistic functions, so individuals with post-stroke aphasia can exhibit a variety of language production and comprehension impairments24. However, these individuals often perform well on non-linguistic tasks5,6 and report knowing what they want to say but not knowing how to say it7. This suggests that their conceptual knowledge is often spared even when their access to lexical or phonological knowledge is disrupted810. Consequently, neuroprostheses could help these individuals communicate by decoding spared concept representations into intended language output.

Concepts appear to be encoded in a widely distributed network of brain regions known as the semantic system11. To measure concept representations in neurologically healthy participants, we previously recorded brain activity using functional magnetic resonance imaging (fMRI) while each participant listened to many hours of narrative stories12. We then used voxelwise modeling to estimate how the activity in each brain region is related to semantic features of the stimulus words13. This produces an encoding model that can take any word sequence and predict how the participant’s brain would respond. To decode new brain responses, we generated word sequences to make the predicted responses under the encoding model as similar as possible to the actual responses14. This produces a natural language approximation of the concepts that the participant was thinking about.

While this semantic decoding approach can successfully approximate the meaning of perceived and imagined stimuli, it has previously only been demonstrated in neurologically healthy participants. Individuals with aphasia often have comorbidities that make it difficult for them to tolerate the many hours of scanning required to train decoders15. Moreover, these individuals often have language comprehension impairments that could make it difficult to train decoders on brain responses to linguistic stimuli3. To simultaneously address these barriers, we developed an approach for transferring semantic decoders across participants by aligning their brain responses using a small set of common stimuli16,17. Because concept representations are shared across cognitive processes, the common stimuli can be either linguistic or non-linguistic, making this transfer learning approach robust to language comprehension impairments1821.

Here we used these approaches to decode concept representations in participants with post-stroke aphasia. We trained encoding models on brain responses from neurologically healthy reference participants, transferred them to the participants with aphasia, and used the aligned models to decode new brain responses. We found that the decoder predictions could describe the concepts that the participants with aphasia were hearing about, seeing, or imagining. Because semantic decoding requires spared concept representations, we assessed the generality of this approach by characterizing how stroke affects conceptual processing. We found that the anatomical organization and information capacity of conceptual processing were largely spared even in participants with extensive lesions and severe speech and language and impairments. This demonstrates the potential for semantic decoding to support individuals with post-stroke aphasia regardless of their lesion profiles, speech and language impairments, or intended concepts.

Participants with aphasia

We enrolled three participants with post-stroke aphasia (Figure 1). P1 was a right-handed man who suffered a left intraparenchymal hemorrhagic stroke resulting from an arteriovenous malformation 8 years prior to participation in the study. P2 was a right-handed man who suffered a left middle cerebral artery infarction resulting from an embolism 2 years prior to participation. P3 was a right-handed woman who suffered bilateral middle cerebral artery infarctions resulting from bilateral pulmonary emboli 4 years prior to participation. We manually delineated lesions on T1-weighted images using ITK-SNAP. We characterized speech and language processing using the Quick Aphasia Battery22, auditory verbal comprehension using the Peabody Picture Vocabulary Test23, and conceptual processing using the Pyramids and Palm Trees Test24 (Table 1).

Figure 1. Participants with aphasia.

Figure 1.

Top: speech samples from P1 and P2 while they attempted to describe a picture of a boy pushing a girl on a swing. Middle: T1-weighted images from each participant. Bottom: cortical flatmaps with approximate lesion extent for each participant.

Table 1. Demographic information and behavioral assessment scores.

Behavioral assessment was performed using the Quick Aphasia Battery22, Peabody Picture Vocabulary Test23, and Pyramids and Palm Trees–Pictures Test24. P1 had moderate aphasia predominantly affecting word finding and sentence comprehension. P2 had moderate aphasia predominantly affecting grammatical construction and word finding. P3 had severe expressive-receptive aphasia and severe apraxia of speech. Some assessment instructions were provided in writing for P3.

Patient ID P1 P2 P3
Demographic
Age 55 47 46
Gender Male Male Female
Education (years) 17 13 17
Handedness Right Right Right
Speech and language processing
Quick Aphasia Battery: Auditory Word Comprehension (10) 10 10 0.21
Quick Aphasia Battery: Auditory Sentence Comprehension (10) 1.67 6.67 3.75
Quick Aphasia Battery: Written Word Comprehension (10) 9.58 10 10
Quick Aphasia Battery: Word Finding (10) 4.75 4.75 1.75
Quick Aphasia Battery: Grammatical Construction (10) 8.25 1.63 0.50
Quick Aphasia Battery: Writing (10) 6.88 1.88 9.38
Quick Aphasia Battery: Speech Motor Programming (10) 10 10 0
Quick Aphasia Battery: Repetition (10) 9.17 5.83 1.67
Quick Aphasia Battery: Reading Aloud (10) 5.42 5.00 4.17
Quick Aphasia Battery: Overall (10) 6.83 6.30 1.65
Peabody Picture Vocabulary Test (16) 13 14 9
Conceptual semantic processing
Pyramids and Palm Trees–Pictures (52) 45 45 46

Structural imaging indicated that the participants had different lesion profiles. P1 had an extensive left hemisphere lesion impacting lateral and ventral temporal cortex. P2 had an extensive left hemisphere lesion impacting lateral frontal cortex, inferior parietal cortex, lateral temporal cortex, and insular cortex. P3 had an extensive left hemisphere lesion impacting lateral frontal cortex, inferior parietal cortex, lateral temporal cortex, and insular cortex, as well as a less extensive right hemisphere lesion impacting lateral frontal cortex, lateral temporal cortex, and insular cortex.

Speech and language testing indicated that the participants had impairments to different aspects of language production. P1 predominantly struggled with word finding; his spoken output was fluent with frequent pauses and circumlocutions. P2 struggled with grammatical construction and word finding; his spoken output was largely limited to short phrases and written output was largely limited to single words. P3 struggled with speech motor programming and grammatical construction; her spoken output was largely unintelligible due to severe apraxia of speech and written output was largely limited to short phrases. The participants also had different degrees of language comprehension impairment. P1 and P2 could comprehend conversational speech but struggled with more grammatically complex sentences, while P3 struggled to comprehend any spoken language but demonstrated much greater sparing of written language.

Despite their different speech and language impairments, participants showed relative sparing of nonverbal semantic processing on the Pyramids and Palm Trees–Pictures Test, a picture association task (45/52 for P1, 45/52 for P2, 46/52 for P3). The scores were close to the cutoff for neurologically healthy performance (47/52), indicating that non-linguistic conceptual knowledge was relatively spared. All participants demonstrated understanding of task instructions and complied with the behavioral assessments and neuroimaging experiments.

Semantic decoding of perceived and imagined concepts

The goal of semantic decoding is to generate word sequences using only brain responses as input. To model the relationships between word sequences and brain responses, we trained encoding models on fMRI responses collected from four neurologically healthy reference participants while they listened to 10 h of narrative stories12,13,25,26 (Figure 2a). To transfer these encoding models to the participants with aphasia, we trained cross-participant converters to align fMRI responses between the reference participants and the participants with aphasia16,17,27. Converter training data collected from both groups comprised 3 h of narrative stories, as well as 1 h of silent movies to account for comprehension impairments. The participants with aphasia were given the option to listen to the stories with subtitles (Methods). To decode new brain responses from a participant with aphasia, we can generate word sequences using a language model, predict brain responses to the word sequences using the aligned encoding models, and identify the word sequences that are most consistent with the recorded brain responses14 (Figure 2b).

Figure 2. Semantic decoding of perceived and imagined concepts.

Figure 2.

(a) Encoding models were trained to predict brain responses collected from four neurologically healthy reference participants while they listened to 10 h of narrative stories. The encoding models were transferred to three participants with aphasia by aligning brain responses to 3 h of narrative stories and 1 h of silent movies. (b) To decode new brain responses from the participants with aphasia, a language model was used to generate word sequences and the aligned encoding models were used to identify the word sequences that were most consistent with the brain responses. (c) Decoder predictions from brain responses to test stories were compared to transcripts of the stories. (d) Story decoding performance for all participants with aphasia was significantly higher than expected by chance (q(FDR) < 0.05; one-sided non-parametric test) but lower than that of neurologically healthy controls. (e) Decoder predictions from brain responses to silent movies were compared to verbal descriptions of the movies. (f) Movie decoding performance for all participants with aphasia was significantly higher than expected by chance (q(FDR) < 0.05; one-sided non-parametric test) and approached that of neurologically healthy controls. (g) Decoder predictions from brain responses during mental imagery were compared to the prompted topics. (h) The decoder predictions could be used to identify the prompted topics with 67% accuracy (10/12 trials for P1; 8/12 for P2; 6/12 for P3; chance level 4/12).

To evaluate the semantic decoders, we first decoded single-trial brain responses collected while the participants listened to novel test stories. We found that the decoder predictions could successfully approximate the gist of the story events for P1 and P2 (Figure 2c; see Supplementary Table 1 for all participants). We quantified decoding performance by comparing the predictions to the story words using BERTScore, which is a metric designed to measure the similarity of meaning between two word sequences28. Story decoding performance for all participants with aphasia was significantly higher than expected by chance (q(FDR) < 0.05; one-sided non-parametric test) but lower than that of neurologically healthy controls (Figure 2d). In particular, P3 had the lowest decoding performance consistent with her severe language comprehension impairment, which may have prevented her from accessing the conceptual meaning of the stories.

To determine whether the lower story decoding performance reflects conceptual processing impairments or language comprehension impairments, we next decoded single-trial brain responses collected while the participants watched novel test movies that were presented without sound. Our previous work showed that word sequences describing silent movie stimuli can be decoded from neurologically healthy participants using semantic decoders trained only on linguistic stimuli14. Here, we found that the decoder predictions could successfully approximate the gist of the movie events for all participants with aphasia (Figure 2e; see Supplementary Table 2 for all participants). We quantified decoding performance by comparing the predictions to official verbal descriptions of the movies. Movie decoding performance for all participants with aphasia was significantly higher than expected by chance (q(FDR) < 0.05; one-sided non-parametric test) and approached that of neurologically healthy controls, indicating that non-linguistic conceptual processing was relatively spared (Figure 2f).

Language neuroprostheses should be able to decode internally generated concepts, so we decoded single-trial brain responses collected while the participants silently imagined three topics in the MRI scanner. We prompted the participants to imagine traveling to the scanning session, eating breakfast, and getting dressed. The participants then described what they had imagined for each topic outside of the scanner. We found that the decoder predictions often matched the prompted topics for all participants with aphasia (Figure 2g; see Supplementary Table 3 for all participants). We quantified decoding performance by comparing the decoder predictions to the prompted topics and the participant descriptions. Mental imagery decoding performance for all participants with aphasia was significantly higher than expected by chance (q(FDR) < 0.05; one-sided non-parametric test) and the decoder predictions could be used to identify the prompted topics (24/36 trials; chance level 12/36) (Figure 2h). These results demonstrate the potential for language neuroprostheses to predict the concepts that individuals with aphasia are thinking about but struggling to produce.

Anatomical organization of conceptual processing after stroke

The decoding results demonstrate that spared concept representations in the participants with aphasia can be decoded into continuous language. We can forecast how these results would generalize to other concepts from other individuals with aphasia by determining the extent to which concept representations are spared in stroke. We first mapped how stroke affects the anatomical organization of the semantic system. In neurologically healthy participants, concepts are encoded by distributed patterns of brain activity that are largely consistent across individuals13, stimulus modalities20,21,29, and languages30. One possibility is that spared brain regions in individuals with aphasia are selective for the same concepts before and after stroke, which would indicate that the semantic system can be robust to focal damage. Another possibility is that spared brain regions acquire selectivity for new concepts to compensate for damaged regions, which would indicate that the semantic system can reorganize in response to focal damage.

To evaluate these possibilities, we used encoding models to map the conceptual tuning of each brain region. We trained encoding models for the neurologically healthy reference participants using a lexical semantic embedding space derived from word co-occurrence statistics13. We transferred the encoding models to the participants with aphasia by aligning brain responses using either narrative stories or silent movies17. For each participant with aphasia, this produced a story-aligned encoding model that maps linguistic conceptual tuning and a movie-aligned encoding model that maps non-linguistic conceptual tuning. We then compared conceptual tuning between the participants with aphasia and the reference participants at every location on a cortical surface atlas31. Because the encoding models were transferred across participants using only functional correspondences, any anatomical correspondences between the encoding models indicate shared conceptual tuning.

To visualize conceptual tuning, we assigned a color to every location on the cortical surface atlas by projecting the encoding model weights onto the three dimensions of the lexical semantic embedding space that explained the most variance in a separate group of neurologically healthy participants13 (Figure 3a). The story-aligned encoding models showed that P1 shared linguistic tuning with the reference participants outside of damaged regions while P2 and P3 had larger differences in their linguistic tuning. However, the differences could result from impaired linguistic access to the concept representations rather than reorganization of the concept representations themselves. This was confirmed by the movie-aligned encoding models, which showed that all participants with aphasia shared non-linguistic tuning with the reference participants outside of damaged regions. For instance, concrete concepts (e.g. visual, tactile, body part) were represented near visual cortex and somatosensory cortex, while abstract concepts (e.g. social, mental, time) were represented in medial parietal cortex, inferior parietal cortex, and lateral temporal cortex13.

Figure 3. Anatomical organization of conceptual processing after stroke.

Figure 3.

(a) Encoding models were transferred from neurologically healthy reference participants to participants with aphasia by aligning brain responses using narrative stories or silent movies. Concept-selective cortical locations were colored based on their conceptual tuning. During linguistic processing, P1 shared tuning with the reference participants outside of damaged regions. During non-linguistic processing, all participants with aphasia shared tuning with the reference participants outside of damaged regions. (b) Non-linguistic tuning was compared between the participants with aphasia and the reference participants at every location on a cortical surface atlas. Shared tuning was quantified using the rank correlation between simulated brain responses to 10,470 concepts. Tuning was significantly more similar than expected by chance (q(FDR) < 0.05; one-sided permutation test) in all tested anatomical regions and functional networks.

To quantify these results, we operationalized the conceptual tuning at every location on the cortical surface atlas by using the movie-aligned encoding models to simulate brain responses to 10,470 concepts13. We then computed the shared tuning between each participant with aphasia and the reference participants using the rank correlation between the simulated brain responses. High tuning correlation in a brain region indicates that the region is selective for similar concepts in the participant with aphasia and the reference participants. Low tuning correlation indicates that the region is selective for different concepts. We found that tuning correlation was significantly higher than expected by chance (q(FDR) < 0.05; one-sided permutation test) in all tested anatomical regions and functional networks (Figure 3b). These results demonstrate that the anatomical organization of the semantic system can be robust to stroke, which is consistent with previous findings that concepts are redundantly encoded in multiple brain regions13,14. Consequently, damage to a brain region may not prevent us from decoding the concepts encoded in the region as long as the other regions that encode the concepts are spared.

Information capacity of conceptual processing after stroke

We next measured how stroke affects the information capacity of the semantic system. Even if each brain region processes the same concepts after stroke as it did before stroke, the semantic system as a whole might not process information to the same extent (e.g. only processing concepts from certain categories) or granularity (e.g. only processing more superordinate concepts). Because semantic decoders operate on spared concept representations, the information capacity of the semantic system determines the extent and granularity of the concepts that can be decoded.

While we cannot measure the premorbid capacity of the semantic system in a participant with aphasia, we can approximate it using brain responses from neurologically healthy reference participants. We can then quantify the post-stroke capacity in the participant with aphasia by assessing how well their brain responses can predict brain responses from the reference participants. Here we separately trained cross-participant converters on brain responses to narrative stories and silent movies, and evaluated converter performance using the linear correlation between the predicted and actual responses from the reference participants (Figure 4a). High converter performance in a reference participant brain region indicates that the information processed in the region is still processed somewhere in the participant with aphasia32. Low converter performance indicates that the information may no longer be processed anywhere in the participant with aphasia. We obtained a ceiling on converter performance by training cross-participant converters between pairs of reference participants.

Figure 4. Information capacity of conceptual processing after stroke.

Figure 4.

(a) Cross-participant converters were trained to predict brain responses from each neurologically healthy reference participant using brain responses from each participant with aphasia. Converter performance was evaluated using the linear correlation between the predicted and actual brain responses from the reference participant. (b) Converters trained on narrative stories had modest performance outside of auditory and articulatory regions, indicating that some but not all of the linguistic information processed by the reference participants was also processed by the participants with aphasia. Converters trained on silent movies had high performance across cortex, indicating that most of the non-linguistic information processed by the reference participants was also processed by the participants with aphasia.

We visualized converter performance on a cortical surface atlas after averaging across reference participants31 (Figure 4b). Story-based converter performance was modest outside of auditory and articulatory regions, which indicates that some of the conceptual information in the narrative story stimuli was not fully processed by the participants with aphasia, consistent with the language comprehension impairments that we documented. Conversely, movie-based converter performance approached the ceiling in most brain regions, which indicates that most of the conceptual information in the silent movie stimuli was fully processed by the participants with aphasia. These results demonstrate that the information capacity of the semantic system can be largely spared even in participants with extensive strokes and severely impaired language processing. Consequently, semantic decoding may generalize far beyond the concepts and participants that were assessed in this study.

Factors affecting semantic decoding performance

There are several remaining barriers to using semantic decoders as language neuroprostheses. One major barrier is that the decoders are not consistently accurate. Another major barrier is that semantic decoding from neurological patients has only been demonstrated using fMRI, which requires a large, expensive, and non-portable scanner that is unsuitable for most practical applications. To determine potential avenues for translating semantic decoders into practical language neuroprostheses, we assessed the factors that affect mental imagery decoding performance for the participants with aphasia. For these analyses, we obtained a more sensitive measure of decoding performance by assigning probabilities to the three mental imagery topics based on the decoder predictions and taking the probability of the prompted topic (Methods).

To make semantic decoders more accurate, we could increase the amount of training data, use higher quality brain recordings, or develop more powerful algorithms14. While advances in brain recording technologies and decoding algorithms are difficult to forecast, many studies have found that increasing the amount of training data consistently improves decoding performance14,3335. Here we assessed whether increasing the amount of training data improves mental imagery decoding performance for the participants with aphasia (Figure 5a). First we found that mental imagery decoding performance increased with the amount of alignment data used to transfer decoders across participants. Notably, using silent movie data (classification probability c=0.028log2mmovie+0.282) was more effective than using comparable amounts of narrative story data (c=0.013log2mstory+0.311), demonstrating the feasibility of decoding from individuals with language comprehension impairments16. Next we found that mental imagery decoding performance increased with the number of reference participants (c=0.034log2nreference+0.414). It is straightforward to collect large fMRI datasets from neurologically healthy participants, making this another avenue for improving decoding performance12,36.

Figure 5. Factors affecting semantic decoding performance.

Figure 5.

(a) Mental imagery decoding performance increased with the amount of alignment data used to transfer semantic decoders from neurologically healthy reference participants to participants with aphasia. Aligning brain responses using silent movie data was more effective than aligning brain responses using comparable amounts of narrative story data. Mental imagery decoding performance also increased with the number of reference participants. Error bars indicate the standard error of the mean (n = 10 permutations). (b) Semantic decoders were restricted to individual brain regions to identify the potential for decoding from portable technologies that lack the full brain coverage of fMRI. Mental imagery decoding performance from specific brain regions approached decoding performance from the full brain.

To make semantic decoders more portable, we could adapt them for brain recordings made using functional near-infrared spectroscopy37, electroencephalography33, or intracranial electrodes34,35,38. While these more portable technologies lack the full brain coverage of fMRI, previous studies suggest that concepts are redundantly encoded in multiple brain regions13,14. Here we assessed whether decoding from the participants with aphasia requires the full brain coverage of fMRI. We partitioned the cerebral cortex into 68 anatomical regions and only decoded brain responses from each region (Figure 5b). We obtained high decoding performance from bilateral inferior parietal, medial parietal, and superior frontal regions in P1, bilateral medial parietal, right middle temporal, and bilateral inferior parietal regions in P2, and bilateral middle temporal, ventral temporal, and medial parietal regions in P3. By showing that semantic decoding does not require the full brain coverage of fMRI, these results indicate the potential for decoding from more portable technologies. By showing that the brain regions with high decoding performance can differ across individuals, these results also underscore the value of using fMRI to identify recording sites for portable decoders.

Discussion

This study demonstrates that concept representations in individuals with post-stroke aphasia can be decoded into continuous language with a reasonable degree of accuracy. In practice, semantic decoders could be used as language neuroprostheses to predict the words, phrases, or sentences that these individuals are struggling to produce. Existing augmentative and alternative communication (AAC) devices allow individuals with aphasia to retrieve words or phrases linked to images, but they are limited in real-time responsiveness and in the scope and richness of concepts they can convey, falling short of natural communication39. By directly predicting what individuals are thinking about based on their brain activity, language neuroprostheses could be substantially more effective at translating rich internal thoughts into natural language.

Most existing language neuroprostheses decode motor representations to help individuals with dysarthria, who struggle to map motor representations into muscle movements34,35,38. However, these motor decoders are unsuitable for individuals with aphasia, who are often impaired on earlier stages of language production such as lexical access or phonological encoding. Because all stages of language production are thought to be downstream of conceptual processing, semantic decoders that target concept representations should be highly robust to the wide range of linguistic and co-occurring motoric impairments in aphasia810. Our decoding results generalized across participants with different behavioral profiles, demonstrating the potential for semantic decoding to circumvent word finding, grammatical construction, and speech motor programming impairments.

The generality of semantic decoding depends on the extent to which conceptual processing is spared in aphasia. Previous studies have approximated this extent using behavioral assessments4043 and controlled neuroimaging experiments44, but these studies lacked the precision and coverage required to characterize the anatomical organization and information capacity of the semantic system. By using naturalistic neuroimaging experiments to measure thousands of concept representations, we found that conceptual processing can be largely spared in individuals with post-stroke aphasia despite extensive lesions impacting different regions of the semantic system. These results demonstrate that semantic decoding can be robust to the heterogeneity of lesion profiles in aphasia3,45.

Our results point toward several scalable avenues for improving the accuracy of the decoder predictions. It may also be possible to increase the functional utility of the decoder predictions by integrating them with residual language output, displaying them in real-time, and personalizing them for each individual with aphasia. For instance, many individuals with aphasia can produce some words despite struggling with others, so the successfully produced words could be used as linguistic context to inform the decoder predictions46,47. Similarly, many individuals with aphasia can successfully produce words after receiving semantic cues, so it could be helpful to display multiple sets of decoder predictions even if none are completely accurate48.

While this study decoded brain activity measured using fMRI, our results demonstrate that semantic decoding from individuals with aphasia does not require the full brain coverage of fMRI. Consequently, semantic decoding could be adapted for wearable technologies such as functional near-infrared spectroscopy37 or electroencephalography33. Semantic decoding using fMRI could also help plan implant locations for intracranial neuroprostheses targeting conceptual information49,50. Taken together, our results suggest several potential avenues for using semantic decoding to support communication in individuals with aphasia.

Methods

Participants

Behavioral and MRI data were collected from three right-handed participants with aphasia: P1 (male, age 55), P2 (male, age 47), and P3 (female, age 46). MRI data were also collected from four neurologically healthy participants: R1 (male, age 38), R2 (male, age 25), R3 (male, age 22), and R4 (female, age 27).

The experimental protocol was approved by the Institutional Review Board at the University of Texas at Austin. Written informed consent was obtained from all participants. Participants were compensated at a rate of $10 per hour for behavioral assessment and $25 per hour for MRI scanning. No data were excluded from the analyses.

Behavioral assessment

In-person behavioral assessments were conducted at Austin Speech Labs and the University of Texas at Austin. Remote behavioral assessments were conducted over Zoom. All behavioral assessments were performed and scored by a trained speech-language pathologist. Speech and language processing were assessed using the extended version of the Quick Aphasia Battery3,22. Auditory verbal comprehension was assessed using a subset of items from the Peabody Picture Vocabulary Test23. Conceptual processing was assessed using the Pyramids and Palm Trees–Pictures Test24.

MRI data collection

Due to a scanner upgrade at the UT Austin Biomedical Imaging Center, MRI data were collected on a 3T Siemens Skyra scanner, a 3T Siemens Vida scanner, and a 3T Siemens Prisma scanner using a 64-channel Siemens volume coil. Semantic decoders have previously been transferred across these MRI scanners16.

Functional data were collected using gradient echo EPI with repetition time (TR) = 2.00 s, echo time (TE) = 30.8 ms, flip angle = 71°, multi-band factor (simultaneous multi-slice) = 2, voxel size = 2.6mm × 2.6mm × 2.6mm (slice thickness = 2.6mm), matrix size = (84, 84), and field of view = 220 mm. Anatomical data for all participants except R1 were collected using a T1-weighted multi-echo MP-RAGE sequence on the same 3T Siemens Skyra, 3T Siemens Vida, and 3T Siemens Prisma scanners with voxel size = 1mm × 1mm × 1mm following the Freesurfer morphometry protocol. Anatomical data for participant R1 were collected on a 3T Siemens TIM Trio scanner at the UC Berkeley Brain Imaging Center with a 32-channel Siemens volume coil using the same sequence.

Cortical surface reconstruction

Cortical surface meshes were generated from T1-weighted anatomical scans using Freesurfer51. Anatomical surface segmentations were manually checked and corrected before surface reconstruction. Functional images were aligned with the cortical surface using boundary based registration (BBR) implemented in FSL. The alignments were manually checked and corrected as necessary. Cortical surfaces were anatomically aligned with the fsAverage atlas using the get_mri_surf2surf_matrix function in pycortex when comparing or averaging results across participants31,52. Brain lesions were manually drawn in ITK-SNAP based on T1-weighted anatomical scans. Vertices corresponding to lesioned cortex were excluded from surface analyses.

Experimental tasks

Story listening, movie watching, and mental imagery data were collected from the participants with aphasia in 5 scanning sessions. Story listening and movie watching data were collected from the neurologically healthy participants in 11 scanning sessions.

In the story listening experiment, the participants with aphasia listened to 18 narrative stories (6–16 minutes each) from The Moth Radio Hour. Before the experiment, the participants listened to the first 20 seconds of each story outside of the scanner. The participants were given the option to exclude stories that they struggled to understand due to speech rate, speaker accent, or audio quality. The excluded stories were replaced with other stories from The Moth Radio Hour chosen by the participants. The participants were also given the option to listen to the stories with subtitles. The subtitles displayed each word in black when it was spoken in the story and displayed the previous words in gray to the left of the current word. P1 and P3 chose to use the subtitles. Each story was played during a single fMRI scan with 10 seconds of silence before and after the story. The stories were played over Sensimetrics S14 in-ear piezoelectric headphones. 3 of the stories were held out for model testing (‘Where There’s Smoke’ by Jenifer Hixson, ‘Hang Time’ by Brian Gavagan, ‘The Tiniest Bouquet’ by Christine Gentry) and the remaining 15 stories were used for model training. The neurologically healthy participants listened to 54 narrative stories (6–17 minutes each) from The Moth Radio Hour and Modern Love including the stories that the participants with aphasia listened to.

In the movie watching experiment, all participants watched 13 movie clips (3–7 minutes each) from Pixar Animation Studios and The Blender Foundation. The movie clips were self-contained and almost entirely devoid of language. Each movie clip was played without sound during a single fMRI scan with 10 seconds of darkness before and after the movie clip. The original high-definition movie clips were cropped and downsampled to 727 × 409 pixels. For qualitative evaluations, 1 movie was held out for model testing (‘Sintel’) and the remaining 12 movies were used for model training. For quantitative evaluations, 3 of the movies were held out for model testing (‘La Luna’, ‘Partly Cloudy’, ‘Presto’) and the remaining 10 movies were used for model training.

In the mental imagery experiment, the participants with aphasia were prompted to silently imagine three topics—traveling to the scanning session, eating breakfast, and getting dressed. In each trial, a pseudorandom prompt was displayed for 8 seconds. The prompts indicated for the participants to imagine the corresponding topics upon seeing a thought bubble icon. The prompts also contained visual cues (a car icon for traveling to the scanning session, a plate icon for eating breakfast, and a shirt icon for getting dressed) to support comprehension. A thought bubble icon was then displayed for 30 seconds. Each topic was prompted 4 times in a single fMRI scan (7 minutes). For the mental imagery analyses, all 18 stories and all 13 movies were used for model training.

Stimulus pre-processing

The stories and the official audio descriptions of the movies from Pixar Animation Studios were manually transcribed. Certain sounds (for instance, laughter and breathing) were marked to improve the accuracy of the automated alignment. The Penn Phonetics Lab Forced Aligner (P2FA) was used to automatically align the audio to the transcript53. Praat was used to manually check and correct each aligned transcript54.

fMRI data pre-processing

Each functional run was motion-corrected using the FMRIB Linear Image Registration Tool (FLIRT) from FSL55. All volumes in the run were then averaged to obtain a high quality template volume. FLIRT was used to align the template volume for each run to the overall template. For participants with aphasia, the overall template was chosen to be the template for the first movie run. For neurologically healthy participants, the overall template was chosen to be the template for the first story run. The automatic alignments were manually checked.

Low-frequency voxel response drift was subtracted from the signal. Low-frequency voxel response drift in the story and mental imagery scans was identified using a 2nd order Savitsky-Golay filter with a 120 second window. Because the movie scans were shorter, low-frequency voxel response drift was identified using a Legendre polynomial of degree three56. The remaining signal was then z-scored.

Language model

GPT (also known as GPT-1) is a 12-layer transformer neural network trained to predict the next word in a sequence given the previous words57. In order to perform this task, GPT learns to encode the meanings of input sequences in its hidden layers. Given a word sequence S=s1,s2,,sn, the output from GPT was used to model the probability distribution over the next word sn+1 and the hidden layers of GPT were used to model the meanings of the previous words s1,s2,,sn.

Encoding model

Voxelwise encoding models were trained to predict brain responses given word sequences. Quantitative features were extracted for each word-time pair si,ti by providing the word sequence si5,si4,,si1,si to the GPT language model and taking the activations from the ninth hidden layer25,26,58,59. This yields a new list of vector-time pairs Mi,ti where Mi is a 768-dimensional embedding for si. These vectors were then resampled at times corresponding to the fMRI acquisitions using a three-lobe Lanczos filter.

A linearized finite impulse response (FIR) model was fit to every cortical voxel in the participant’s brain. A separate linear temporal filter with four delays (t1, t2, t3, and t4 timepoints) was fit for each of the 768 features, yielding a total of 3,072 features. With a TR of 2 seconds this was accomplished by concatenating the feature vectors from 2, 4, 6, and 8 seconds earlier to predict responses at time t. Each feature channel of the training matrix was z-scored before the regression procedure.

The 3,072 weights for each voxel were estimated using L2-regularized linear regression. The regression procedure has a single free parameter that controls the degree of regularization. This regularization coefficient was found by repeating a regression and cross-validation procedure 50 times. In each iteration, approximately a fifth of the timepoints were removed from the model training dataset and reserved for validation. Then the model weights were estimated on the remaining timepoints for each of 10 possible regularization coefficients (log spaced between 10 and 1,000). These weights were used to predict brain responses for the reserved timepoints, and R2 was computed between the predicted and actual brain responses. The regularization coefficient was chosen as the value that led to the best performance, averaged across bootstraps, voxels, and neurologically healthy participants, on the reserved timepoints. After training, the encoding model can take any word sequence S and predict brain responses R^(S).

Cross-participant transfer

To reduce the amount of data required from the participants with aphasia, encoding models were trained on brain responses from the neurologically healthy reference participants and then transferred to the participants with aphasia. Because the anatomical structure and functional organization of the brain vary across individuals, transferring models across participants requires aligning their brain responses14,60.

To learn this alignment, brain responses Rreference and Raphasia were collected while each reference participant and participant with aphasia were shown a common set of story and movie stimuli. A cross-participant converter Creferenceaphasia was trained to predict the activity in each voxel of the participant with aphasia using the activity in the 10,000 best predicted voxels of the reference participant. The converter weights were estimated using L2-regularized linear regression. The regularization coefficient was found by repeating the regression and cross-validation procedure 50 times for each pair of reference participants. The regularization coefficient was chosen as the value that led to the best performance, averaged across bootstraps, voxels, and reference participant pairs, on the reserved timepoints. After training, the converter can take any encoding model trained on the reference participant and align it with brain responses from the participant with aphasia.

Converters were also used to identify concept-selective voxels in the participants with aphasia. Converters were trained on brain responses to either stories or movies and then evaluated by aligning brain responses to the other modality. Prediction performance was quantified using the linear correlation between the aligned and actual brain responses from the participant with aphasia. Prediction performance was z-scored and averaged across modalities to identify the 10,000 voxels with the highest cross-modality prediction performance.

For some analyses, separate converters Caphasiareference were trained to predict the activity in each voxel of the reference participant using the activity in the concept-selective voxels of the participant with aphasia. Converter performance was used to quantify whether the information processed in each brain region of the reference participant was also processed somewhere in the participant with aphasia.

Word rate decoder

A word rate decoder was estimated for each participant with aphasia to predict when concepts were perceived or imagined. The word rate at each fMRI acquisition was defined as the number of stimulus words that occurred since the previous acquisition.

To identify the voxels that encode word rate, L2-regularized linear regression was used to estimate a set of weights that predict brain responses R from word rate w. A linear temporal filter with four delays (t1, t2, t3, and t4 timepoints) was fit for the word rate feature. With a TR of 2 seconds, this was accomplished by concatenating the word rate from 2, 4, 6, and 8 seconds earlier to predict the brain responses at time t. The 3,000 voxels with the highest cross-validation performance were selected for word rate decoding.

To train the word rate decoder, L2-regularized linear regression was used to estimate a set of weights that predict word rate w from brain responses R. A separate linear temporal filter with four delays (t+1, t+2, t+3, and t+4 timepoints) was fit for each selected voxel. With a TR of 2 seconds, this was accomplished by concatenating the brain responses from 2, 4, 6, and 8 seconds later to predict the word rate at time t. Given new brain responses Raphasia, the model predicts the word rate at each acquisition. The time between consecutive acquisitions (2 seconds) can then be divided by the predicted word rates (rounded to the nearest non-negative integers) to predict word times.

Semantic decoder

The goal of language decoding is to maximize the probability distribution P(SR) over word sequences S given brain responses R. P(SR) can be factorized into the product of a prior distribution P(S) over word sequences and an encoding distribution P(RS) over brain responses given word sequences14,61.

P(S) was estimated using the language model. The language model maps from a word sequence to the probability distribution over the next word. The probability of observing S=s1,s2,,sn in natural language can be modeled by multiplying the probabilities Psis1:i1 of each word conditioned on the previous words.

P(RS) was estimated using the encoding models. Each encoding model maps from semantic features of S to brain responses R^(S). Assuming that BOLD signals are affected by Gaussian additive noise, P(RS) can be modeled as a multivariate Gaussian distribution with mean μ=R^(S) and covariance =(RR^(S))T(RR^(S))62. Because the aligned encoding models from the neurologically healthy reference participants could only explain a small fraction of the variance in the brain responses from the participant with aphasia, was approximated by computing RTR for each training story from the participant with aphasia and averaging the brain response covariance matrices across the training stories.

To decode new brain responses Raphasia, the beam search decoding approach introduced in ref.14 was used to generate word sequences with high values of P(S) and PRaphasiaS. The decoder maintained a beam containing the k most likely word sequences up to each timepoint. Upon receiving new brain images, the word rate model was used to detect new words. Next the language model was used to generate continuations for each candidate S in the beam by predicting the probability distribution Psnsni,,sn1 over the next word sn given the words predicted in the previous 8 seconds sni,,sn1. Each word in the language model nucleus63 was appended to the candidate to form a continuation H. Finally the aligned encoding models from the reference participants were used to rank the continuations based on the likelihood PRaphasiaH of observing the brain responses. The k most likely continuations were retained in the beam for the next timepoint and the process was repeated until the entire scan was decoded.

Previous studies have found that combining models from multiple reference participants can improve decoding performance16,60. As a result, PRaphasiaH was separately computed using the aligned encoding model from each reference participant and the ranks were averaged. When decoding brain responses to stories and movies, the likelihoods PRaphasiaH were also averaged with the likelihoods PRaphasiareferenceH, which were obtained by multiplying the brain responses Raphasia by the cross-participant converters Caphasiareference and evaluating the aligned responses using the native encoding model from each reference participant16. This regularization leverages the fact that the noise covariance can be estimated using many more hours of brain responses from the reference participants. When assessing the benefits of cross-participant modeling, aligned encoding models from the reference participants were compared to native encoding models from the participant with aphasia.

Custom analysis software was written in Python using the NumPy64, SciPy65, PyTorch66, Transformers67, and pycortex52 libraries.

Decoding performance

Decoder predictions from the story and movie experiments were compared to target words from the stimulus transcripts using the BERTScore metric28. BERTScore was computed between the predicted and target words within a 20 second window around each second of the stimulus. BERTScore uses a bidirectional transformer language model to represent the meaning of each predicted and target word as a contextual embedding, and scores each target word based on its maximum cosine similarity across all of the predicted words. These scores are averaged across the target words to quantify the similarity of meaning between the predicted and target word sequences.

Because BERTScore values are not inherently interpretable, they were normalized with respect to a null distribution28. Null sequences were generated by sampling from the language model without using any brain data except to predict word times14. The null model maintained a beam of 10 candidate sequences and generated continuations from the language model nucleus at each predicted word time63. The only difference between the actual decoder and the null model was that, instead of ranking the continuations by the likelihood of the fMRI data, the null model randomly assigned a likelihood to each continuation. For each participant with aphasia, this process was repeated 200 times to generate 200 null sequences. This process was as similar as possible to the actual decoder without using any brain data to select words, so these sequences reflected the null hypothesis that the decoder does not recover meaningful information about the stimulus from the brain data. The null sequences were scored against the target words to produce a null distribution of decoder scores for each participant with aphasia.

Decoder predictions from the mental imagery experiment were compared to the prompted topics (“I traveled here”, “I got dressed”, and “I ate breakfast”) using BERTScore. Because the participants could have imagined idiosyncratic concepts for each topic, they were also instructed to describe what they imagined after the scan. P1 and P2 described what they imagined out loud while P3 wrote what she imagined. The descriptions were manually edited to remove filler words and define proper nouns, and were then concatenated with the prompts to form target words for each topic. To evaluate accuracy, BERTScore was computed between the decoder predictions and the target words for the prompted topic. To evaluate discriminability, BERTScore was computed between the decoder predictions and the target words for all three topics, and the similarities were normalized into probabilities that the participant was imagining each topic. To make the decoding results more stable, the decoding procedure was repeated for 5 sets of 10,000 voxels sampled from the top 12,500 concept-selective voxels and the probabilities were averaged across the samples.

Anatomical regions

Whole brain MRI data were partitioned into 4 anatomical regions using Freesurfer labels: anterior left hemisphere, posterior left hemisphere, anterior right hemisphere, and posterior right hemisphere. Anterior regions were defined using the superiorfrontal, rostralmiddlefrontal, caudalmiddlefrontal, parsopercularis, parstriangularis, parsorbitalis, lateralorbitofrontal, medialorbitofrontal, precentral, paracentral, frontalpole, rostralanteriorcingulate, and caudalanteriorcingulate labels. Posterior regions were defined using the superiorparietal, inferiorparietal, supramarginal, postcentral, precuneus, posteriorcingulate, isthmuscingulate, superiortemporal, middletemporal, inferiortemporal, bankssts, fusiform, transversetemporal, entorhinal, temporalpole, parahippocampal, lateraloccipital, lingual, cuneus, and pericalcarine labels. Whole brain MRI data were also partitioned into 34 anatomical regions in each hemisphere using these labels along with the insula label in order to identify brain regions with high mental imagery decoding performance.

Functional networks

Whole brain MRI data were partitioned into 5 functional networks using a clustering approach26. Encoding models were trained to predict brain responses from the neurologically healthy participants using lexical semantic features of the stimulus words13. Encoding model weights were averaged across the 4 delays to produce a conceptual tuning vector for each voxel, which captures how the voxel responds to the stimulus features. The top 10,000 voxels were identified in each reference participant based on cross-validation performance. The conceptual tuning vectors of the top 10,000 voxels were concatenated across participants and rescaled to have unit norm. Principal components analysis was applied to the conceptual tuning vectors, and 64 dimensions that explained 80% of the variance were chosen.

Spherical k-means clustering was applied to the principal components. To determine the number of clusters, the inertia of the clustering algorithm was computed for a range of clusters between 1 and 20. This was used to identify the point where the inertia changes from an exponential drop to a linear drop in inertia. This point occurred at 5 clusters. Each of the 5 clusters was interpreted by projecting the lexical semantic features of 10,470 words onto the cluster centroid. The clusters were subjectively determined to represent concrete, social, place, temporal, and people concepts.

The conceptual tuning vectors were anatomically aligned with the fsAverage atlas and averaged across neurologically healthy participants. The conceptual tuning vectors were then projected onto the 64 principal components and then assigned to one of the 5 clusters. This process results in a functional network label for every fsAverage vertex.

Statistical testing

Statistical significance of decoder scores was assessed using a one-sided non-parametric test. The decoder scores were averaged across the stimulus windows and the observed mean decoder score was compared to the mean decoder scores for the null sequences. p values were computed as the fraction of null sequences with a mean decoder score greater than or equal to than the observed mean decoder score.

Statistical significance of encoding model performance was assessed using a one-sided permutation test. The encoding models were used to predict brain responses to 3 held-out test stories. Prediction performance for each voxel was computed as the linear correlation between the predicted and actual response time-courses. A null distribution for each voxel was obtained by randomly resampling (with replacement) 10-TR blocks from the voxel’s actual response time-course. Resampling contiguous blocks preserves the auto-correlation structure of the voxel’s responses. The null prediction performance was then computed as the linear correlation between the predicted and permuted response time-courses. Repeating this process for 1,000 trials provided a null distribution of prediction performance for each voxel. p values were computed for each voxel as the fraction of permutations with prediction performance greater than or equal to than the observed prediction performance.

Statistical significance of tuning correlation was assessed using a one-sided permutation test. To compare conceptual tuning between a participant with aphasia and the neurologically healthy participants, lexical semantic encoding models weights were averaged across the 4 delays to produce a conceptual tuning vector for each voxel. The conceptual tuning vectors were anatomically aligned with the fsAverage atlas. The lexical semantic features of 10,470 words were projected onto the conceptual tuning vectors from the participant with aphasia to simulate how each cortical location would respond. This process was repeated for the average conceptual tuning vectors across the neurologically healthy participants. Tuning correlation was computed by taking the rank correlation between the two sets of simulated responses for each fsAverage vertex and then averaging across fsAverage vertices. A null distribution was obtained by permuting the conceptual tuning vectors across fsAverage vertices before simulating brain responses. Repeating this process for 1,000 trials provided a null distribution of tuning correlation. p values were computed as the fraction of permutations with a tuning correlation greater than or equal to than the observed tuning correlation.

Quantitative tuning correlation analyses were restricted to concept-selective vertices in the fsAverage atlas. These vertices were identified by training a lexical semantic encoding model for each neurologically healthy participant. Concept-selective voxels were identified as voxels with an observed prediction performance score that was significantly higher than its null distribution. The binarized voxel masks of conceptual selectivity were anatomically aligned with the fsAverage space and averaged across neurologically healthy participants. Vertices with a mean value greater than 0.5 were considered to be significantly predicted, resulting in 74,370 significantly predicted vertices out of 327,684 total vertices.

Unless otherwise stated, all tests were performed within each participant and then replicated across all participants (n = 3). All tests were corrected for multiple comparisons when necessary using false discovery rate68.

Supplementary Material

Supplement 1

Acknowledgements

We thank the participants and their loved ones for their time, dedication, and generosity.

Funding

This work was supported by the National Institute on Deafness and Other Communication Disorders under awards F32DC022178 (J.T.), R01DC013270 (S.M.W), and R01DC020088 (A.G.H.).

Footnotes

Competing interests

A.G.H. and J.T. are inventors on a pending patent application (the applicant is The University of Texas System Board of Regents) that is directly relevant to the language decoding approach used in this work. All other authors declare no competing interests.

Data availability

All data used in the analyses will be made publicly available at OpenNeuro before publication.

Code availability

All code used in the analyses will be made publicly available at GitHub before publication.

References

  • 1.Pedersen P. M., Jørgensen H. S., Nakayama H., Raaschou H. O. & Olsen T. S. Aphasia in acute stroke: Incidence, determinants, and recovery. Ann. Neurol. 38, 659–666 (1995). [DOI] [PubMed] [Google Scholar]
  • 2.Kertesz A. & McCabe P. Recovery patterns and prognosis in aphasia. Brain 100, 1–18 (1977). [DOI] [PubMed] [Google Scholar]
  • 3.Wilson S. M. et al. Recovery from aphasia in the first year after stroke. Brain 146, 1021–1039 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Geschwind N. The organization of language and the brain. Science 170, 940–944 (1970). [DOI] [PubMed] [Google Scholar]
  • 5.Ivanova A. A. et al. The language network is recruited but not required for nonverbal event semantics. Neurobiol. Lang. 2, 176–201 (2021). [Google Scholar]
  • 6.Varley R. & Siegal M. Evidence for cognition without grammar from causal reasoning and ‘theory of mind’ in an agrammatic aphasic patient. Curr. Biol. 10, 723–726 (2000). [DOI] [PubMed] [Google Scholar]
  • 7.Fama M. E., Hayward W., Snider S. F., Friedman R. B. & Turkeltaub P. E. Subjective experience of inner speech in aphasia: Preliminary behavioral relationships and neural correlates. Brain Lang. 164, 32–42 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Dell G. S. & O’Seaghdha P. G. Stages of lexical access in language production. Cognition 42, 287–314 (1992). [DOI] [PubMed] [Google Scholar]
  • 9.Goldrick M. & Rapp B. Lexical and post-lexical phonological representations in spoken production. Cognition 102, 219–260 (2007). [DOI] [PubMed] [Google Scholar]
  • 10.Levelt W. J., Roelofs A. & Meyer A. S. A theory of lexical access in speech production. Behav. Brain Sci. 22, 1–38 (1999). [DOI] [PubMed] [Google Scholar]
  • 11.Binder J. R. & Desai R. H. The neurobiology of semantic memory. Trends Cogn. Sci. 15, 527–536 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.LeBel A. et al. A natural language fMRI dataset for voxelwise encoding models. Sci Data 10, 555 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Huth A. G., de Heer W. A., Griffiths T. L., Theunissen F. E. & Gallant J. L. Natural speech reveals the semantic maps that tile human cerebral cortex. Nature 532, 453–458 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Tang J., LeBel A., Jain S. & Huth A. G. Semantic reconstruction of continuous language from non-invasive brain recordings. Nat. Neurosci. 26, 858–866 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Szameitat A. J., Shen S. & Sterr A. The functional magnetic resonance imaging (fMRI) procedure as experienced by healthy participants and stroke patients–a pilot study. BMC Med. Imaging 9, 14 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Tang J. & Huth A. G. Semantic language decoding across participants and stimulus modalities. Curr. Biol. 35, 1023–1032 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Tang J. & Huth A. G. Rapid cortical mapping with cross-participant encoding models. bioRxiv 2026.03. 26.714320 (2026) doi: 10.64898/2026.03.26.714320. [DOI] [Google Scholar]
  • 18.Fairhall S. L. & Caramazza A. Brain regions that represent amodal conceptual knowledge. J. Neurosci. 33, 10552–10558 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Devereux B. J., Clarke A., Marouchos A. & Tyler L. K. Representational similarity analysis reveals commonalities and differences in the semantic processing of words and objects. J. Neurosci. 33, 18906–18916 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Popham S. F. et al. Visual and linguistic semantic representations are aligned at the border of human visual cortex. Nat. Neurosci. 24, 1628–1636 (2021). [DOI] [PubMed] [Google Scholar]
  • 21.Tang J., Du M., Vo V. A., Lal V. & Huth A. G. Brain encoding models based on multimodal transformers can transfer across language and vision. in Advances in Neural Information Processing Systems 29654–29666 (2023). [PMC free article] [PubMed] [Google Scholar]
  • 22.Wilson S. M., Eriksson D. K., Schneck S. M. & Lucanie J. M. A quick aphasia battery for efficient, reliable, and multidimensional assessment of language function. PLoS One 13, e0192773 (2018). [Google Scholar]
  • 23.Dunn L. M. & Dunn D. M. Peabody Picture Vocabulary Test—Fourth Edition. (2007). [Google Scholar]
  • 24.Howard D. & Patterson K. Pyramids and Palm Trees Test. (1992). [Google Scholar]
  • 25.Jain S. & Huth A. G. Incorporating context into language encoding models for fMRI. in Advances in Neural Information Processing Systems 6629–6638 (2018). [Google Scholar]
  • 26.LeBel A., Jain S. & Huth A. G. Voxelwise encoding models show that cerebellar language representations are highly conceptual. J. Neurosci. 41, 10341–10355 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Yamada K., Miyawaki Y. & Kamitani Y. Inter-subject neural code converter for visual image representation. Neuroimage 113, 289–297 (2015). [DOI] [PubMed] [Google Scholar]
  • 28.Zhang T., Kishore V., Wu F., Weinberger K. Q. & Artzi Y. BERTScore: Evaluating Text Generation with BERT. in 8th International Conference on Learning Representations (2020). [Google Scholar]
  • 29.Deniz F., Nunez-Elizalde A. O., Huth A. G. & Gallant J. L. The representation of semantic information across human cerebral cortex during listening versus reading is invariant to stimulus modality. J. Neurosci. 39, 7722–7736 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Chen C. et al. Bilingual language processing relies on shared semantic representations that are modulated by each language. Proc. Natl. Acad. Sci. U. S. A. 123, e2503721123 (2026). [Google Scholar]
  • 31.Fischl B., Sereno M. I., Tootell R. B. & Dale A. M. High-resolution intersubject averaging and a coordinate system for the cortical surface. Hum. Brain Mapp. 8, 272–284 (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Mell M. M., St-Yves G. & Naselaris T. Voxel-to-voxel predictive models reveal unexpected structure in unexplained variance. Neuroimage 238, 118266 (2021). [DOI] [PubMed] [Google Scholar]
  • 33.Défossez A., Caucheteux C., Rapin J., Kabeli O. & King J.-R. Decoding speech perception from non-invasive brain recordings. Nat. Mach. Intell. 5, 1097–1107 (2023). [Google Scholar]
  • 34.Metzger S. L. et al. A high-performance neuroprosthesis for speech decoding and avatar control. Nature 620, 1037–1046 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Willett F. R. et al. A high-performance speech neuroprosthesis. Nature 620, 1031–1036 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Allen E. J. et al. A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nat. Neurosci. 25, 116–126 (2022). [DOI] [PubMed] [Google Scholar]
  • 37.Eggebrecht A. T. et al. A quantitative spatial comparison of high-density diffuse optical tomography and fMRI cortical mapping. Neuroimage 61, 1120–1128 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Wairagkar M. et al. An instantaneous voice-synthesis neuroprosthesis. Nature 1–8 (2025). [Google Scholar]
  • 39.Dietz A., Wallace S. E. & Weissling K. Revisiting the role of augmentative and alternative communication in aphasia rehabilitation. Am. J. Speech. Lang. Pathol. 29, 909–913 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Jefferies E. & Lambon Ralph M. A. Semantic impairment in stroke aphasia versus semantic dementia: A case-series comparison. Brain 129, 2132–2147 (2006). [DOI] [PubMed] [Google Scholar]
  • 41.Chertkow H., Bub D., Deaudon C. & Whitehead V. On the status of object concepts in aphasia. Brain Lang. 58, 203–232 (1997). [DOI] [PubMed] [Google Scholar]
  • 42.Grober E., Perecman E., Kellar L. & Brown J. Lexical knowledge in anterior and posterior aphasics. Brain Lang. 10, 318–330 (1980). [DOI] [PubMed] [Google Scholar]
  • 43.Zurif E. B., Caramazza A., Myerson R. & Galvin J. Semantic feature representations for normal and aphasic language. Brain Lang. 1, 167–187 (1974). [Google Scholar]
  • 44.Perani D. et al. A fMRI study of word retrieval in aphasia. Brain Lang. 85, 357–368 (2003). [DOI] [PubMed] [Google Scholar]
  • 45.Turkeltaub P. E., Messing S., Norise C. & Hamilton R. H. Are networks for residual language function and recovery consistent across aphasic patients? Neurology 76, 1726–1734 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Stark B. C. et al. Neural organization of speech production: A lesion-based study of error patterns in connected speech. Cortex 117, 228–246 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Casilio M. et al. Four dimensions of naturalistic language production in aphasia after stroke. Brain 148, 291–312 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Boyle M. & Coelho C. A. Application of semantic feature analysis as a treatment for aphasic dysnomia. Am. J. Speech. Lang. Pathol. 4, 94–98 (1995). [Google Scholar]
  • 49.Verwoert M. et al. Moving beyond the motor cortex: A brain-wide evaluation of target locations for intracranial speech neuroprostheses. Cell Rep. 44, 116241 (2025). [DOI] [PubMed] [Google Scholar]
  • 50.Jamali M. et al. Semantic encoding during language comprehension at single-cell resolution. Nature 631, 610–616 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Dale A. M., Fischl B. & Sereno M. I. Cortical surface-based analysis: I. Segmentation and surface reconstruction. Neuroimage 9, 179–194 (1999). [DOI] [PubMed] [Google Scholar]
  • 52.Gao J. S., Huth A. G., Lescroart M. D. & Gallant J. L. Pycortex: An interactive surface visualizer for fMRI. Front. Neuroinform. 9, 23 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Gorman K., Howell J. & Wagner M. Prosodylab-aligner: A tool for forced alignment of laboratory speech. Canadian Acoustics 39, 192 (2011). [Google Scholar]
  • 54.Boersma P. & Weenink D. Praat: Doing Phonetics by Computer. (2014). [Google Scholar]
  • 55.Jenkinson M. & Smith S. A global optimisation method for robust affine registration of brain images. Med. Image Anal. 5, 143–156 (2001). [DOI] [PubMed] [Google Scholar]
  • 56.Kay K. N., David S. V., Prenger R. J., Hansen K. A. & Gallant J. L. Modeling low-frequency fluctuation and hemodynamic response timecourse in event-related fMRI. Hum. Brain Mapp. 29, 142–156 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Radford A., Narasimhan K., Salimans T. & Sutskever I. Improving language understanding by generative pre-training. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf (2018). [Google Scholar]
  • 58.Toneva M. & Wehbe L. Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain). in Advances in Neural Information Processing Systems 14928–14938 (2019). [Google Scholar]
  • 59.Caucheteux C. & King J.-R. Brains and algorithms partially converge in natural language processing. Commun. Biol. 5, 134 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Ho J. K., Horikawa T., Majima K., Cheng F. & Kamitani Y. Inter-individual deep image reconstruction via hierarchical neural code conversion. Neuroimage 271, 120007 (2023). [DOI] [PubMed] [Google Scholar]
  • 61.Naselaris T., Kay K. N., Nishimoto S. & Gallant J. L. Encoding and decoding in fMRI. Neuroimage 56, 400–410 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Nishimoto S. et al. Reconstructing visual experiences from brain activity evoked by natural movies. Curr. Biol. 21, 1641–1646 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Holtzman A., Buys J., Du L., Forbes M. & Choi Y. The Curious Case of Neural Text Degeneration. in International Conference on Learning Representations (2020). [Google Scholar]
  • 64.Harris C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Virtanen P. et al. SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Paszke A. et al. PyTorch: An imperative style, high-performance deep learning library. in Advances in Neural Information Processing Systems 8024–8035 (2019). [Google Scholar]
  • 67.Wolf T. et al. Transformers: State-of-the-art natural language processing. in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations 38–45 (2020). [Google Scholar]
  • 68.Benjamini Y. & Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B Stat. Methodol. 57, 289–300 (1995). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1

Data Availability Statement

All data used in the analyses will be made publicly available at OpenNeuro before publication.


Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES