Abstract
Aphasia is a language disorder that often involves speech comprehension impairments affecting communication. In face-to-face settings, speech is accompanied by mouth and facial movements, but little is known about the extent to which they benefit aphasic comprehension. This study investigated the benefit of visual information accompanying speech for word comprehension in people with aphasia (PWA) and the neuroanatomic substrates of any benefit. Thirty-six PWA and 13 neurotypical matched control participants performed a picture-word verification task in which they indicated whether a picture of an animate/inanimate object matched a subsequent word produced by an actress in a video. Stimuli were either audiovisual (with visible mouth and facial movements) or auditory-only (still picture of a silhouette) with audio being clear (unedited) or degraded (6-band noise-vocoding). We found that visual speech information was more beneficial for neurotypical participants than PWA, and more beneficial for both groups when speech was degraded. A multivariate lesion-symptom mapping analysis for the degraded speech condition showed that lesions to superior temporal gyrus, underlying insula, primary and secondary somatosensory cortices, and inferior frontal gyrus were associated with reduced benefit of audiovisual compared to auditory-only speech, suggesting that the integrity of these fronto-temporo-parietal regions may facilitate cross-modal mapping. These findings provide initial insights into our understanding of the impact of audiovisual information on comprehension in aphasia and the brain regions mediating any benefit.
Keywords: Language, Audiovisual Speech, Comprehension, Aphasia
Introduction
Post-stroke aphasia is a language disorder most frequently associated with difficulties with speech production and/or comprehension (Stroke Association UK, 2021). However, face-to-face communication goes beyond speech as it also involves processing a great deal of other communicative information, including mouth and facial movements. We know very little about whether these movements benefit the comprehension of people with aphasia (PWA) and if particular brain regions mediate any benefit. Studies with neurotypical individuals have shown that observing mouth movements facilitates auditory comprehension, particularly when speech is challenging to process due to message complexity (Arnold & Hill, 2001; Reisberg et al., 1987) or additional noise (Krason et al., 2021; Ma et al., 2009; Ross et al., 2007; J.-L. Schwartz et al., 2004; Sumby & Pollack, 1954; Tye-Murray et al., 2007). This benefit is thought to occur because mouth movements support temporal and phonological encoding of the auditory speech information, as well as constrain lexical competition (for a review see Peelle & Sommers, 2015). For instance, during a conversation in a busy restaurant, mouth movements inform the listener about when to attend to others’ speech and complement auditory signals by disambiguating the place of articulation of a consonant (e.g., / bæt / versus / cæt /).
Studies with PWA have primarily investigated audiovisual speech processing using the McGurk and MacDonald paradigm (McGurk & MacDonald, 1976). In this paradigm, simultaneous mismatching information from speech acoustics (e.g., “pa”) and mouth movements (e.g., “ka”) induce an audiovisual illusion in which individuals perceive a fused percept (e.g., “ta”; McGurk & MacDonald, 1976). Despite great individual variability in susceptibility to the McGurk effect (Brown et al., 2018), most neuroanatomically healthy individuals and PWA perceive a fused percept during mismatching presentations, which has been interpreted in terms of audiovisual integration mechanisms (see Alsius et al., 2018 for a review). However, processing mismatching information from mouth and auditory speech is of unknown relevance to word comprehension and may be driven by different cognitive mechanisms (Hickok et al., 2018; Van Engen et al., 2017).
Notably, and of greater potential relevance to comprehension, PWA also benefit from mouth movements when acoustic and visual speech cues match, e.g., when “pa” is produced both auditorily and visually, relative to when “pa” is produced auditorily only (Andersen & Starrfelt, 2015; Baum et al., 2012; Campbell et al., 1990; Hessler et al., 2012; Hickok et al., 2018; Michaelis, 2020; but see also Youse et al., 2004). However, in a study assessing the ability of individuals with left hemisphere stroke to extract visual speech information, Schmid and Ziegler (2006) showed that PWA did not benefit from audiovisual relative to auditory-only stimuli and were impaired in matching asynchronous stimuli across auditory and visual modalities. This was particularly the case for individuals with poor verbal repetition skills and apraxia of speech (i.e., a motor speech planning disorder), suggesting that these factors may be important for successful encoding of phonological information from mouth movements and integration with auditory speech. However, as with studies of the McGurk and McDonald illusion, the relevance of these findings for naturalistic speech comprehension may be limited: stimuli were nonsense syllables and non-speech sounds (e.g., whistling), as well as matching of asynchronous cross-modal information. Finally, associations between lesion site and behavior were not assessed.
Although lesion information is often not available in behavioral studies with clinical populations, it may strongly influence performance. Functional neuroimaging studies with neurotypical individuals generally, but not exclusively, report that three brain regions play central roles in audiovisual speech processing (for a review see Peelle, 2019). The left posterior superior temporal sulcus/gyrus (STS/STG) displays enhanced activation for audiovisual speech (with visible mouth and facial movements) relative to combined responses to auditory-only and visual-only stimuli (Callan et al., 2003; Calvert et al., 2000; Calvert & Campbell, 2003; Erickson et al., 2014; Nath & Beauchamp, 2012; Sekiyama et al., 2003; Skipper et al., 2005, 2007; Venezia et al., 2017; Wright et al., 2003), suggesting that it contributes to multisensory integration, including cross-modal integration for speech (Amedi et al., 2005; Beauchamp, 2005; Beauchamp et al., 2004; Baum et al., 2012; see also Olson et al., 2002; Hocking & Price, 2008 for contradictory results). Some fMRI studies have also reported increased activation in the auditory cortex, including primary auditory cortex (A1), for visual speech relative to silent non-speech movements (Calvert et al., 1997; Pekkola et al., 2005). Similar findings from electrophysiological experiments show that visual cues modulate oscillations in A1 (Crosse et al., 2015; Luo et al., 2010) and that this modulation starts early, i.e., approximately 100-300ms before speech onset, which is often related to mouth opening/closing (Chandrasekaran et al., 2009). Finally, the left inferior frontal cortex, including ventral premotor cortex (PMv) and inferior frontal gyrus (IFG), has also been associated with audiovisual processing (Calvert & Campbell, 2003; Erickson et al., 2014; Skipper et al., 2007; Watkins et al., 2003). These inferior frontal regions have been argued to play a role in mapping articulatory gestures to phoneme representations (Hickok & Poeppel, 2007; Rauschecker & Scott, 2009), with some suggesting that observing mouth movements while listening to speech evokes activity in similar frontal brain regions as during speech production (see Skipper et al., 2017 for a review).
There is very little converging evidence from PWA that those regions are involved in audiovisual processing, and the studies that exist are also focused on perception and not comprehension. Hickok et al. (2018) conducted a large-scale voxel-based lesion-symptom mapping study assessing performance of PWA with McGurk-type stimuli. They found that left posterior superior and middle temporal regions, insula (INS), as well as parts of the occipital cortex, but not the IFG, are associated with audiovisual integration (Hickok et al., 2018). More recently, Michaelis et al. (2020) tested audiovisual integration abilities of PWA using asynchronous auditory and visual signals. Lesions to the left supramarginal gyrus (SMG) and planum temporale of the STG were associated with reduced temporal sensitivity to the asynchronous audiovisual signal, indicating that these regions are important for temporal perception that mediates audiovisual integration. Although these findings provide important initial insights into the mechanisms driving audiovisual processing in PWA, both studies used the McGurk paradigm and are therefore subject to the criticisms raised above, i.e., they investigated syllable perception rather than comprehension.
The Current Study
This study is the first to investigate, using both lesion-symptom mapping and behavioral methods, the benefit of visual speech information for spoken word comprehension in PWA. We assessed 36 PWA and 13 neurotypical controls with a computer-based picture-verification task requiring judgements about whether a spoken word from a video matched a previously seen picture. We manipulated the presence of mouth and facial movements, and speech clarity. As face-to-face interactions are typically embedded in noise (e.g., a conversation on a busy street) and such adverse listening conditions increase reliance on visual speech information in neurotypical individuals (e.g., Ma et al., 2009; Ross et al., 2007; Sumby & Pollack, 1954; Krason et al., 2021), we compared clear speech to 6-band noise-vocoded stimuli. Finally, we assessed the neural regions associated with any benefit of visual speech information during word comprehension using Support-Vector Regression Lesion-Symptom mapping (SVR-LSM, Zhang et al., 2014).
Based on the current literature on the processing of audiovisual speech by neurotypical individuals, we predicted that performance of both groups would improve in the degraded condition when mouth movements were present thanks to the support they provide to phonological encoding of degraded auditory signals (e.g., Ross et al., 2007; Sumby & Pollack, 1954). Given limited studies on audiovisual speech processing beyond syllable level and involving individuals with post-stroke aphasia, it is unclear whether PWA would benefit from observing mouth and facial movements to a larger extent than neurotypical individuals. It is possible that PWA would use visual speech information to overcome noise (similarly to neurotypical individuals), but also to remedy any auditory speech deficits caused by aphasia. It may also be the case, however, that integrating visual and auditory channels is more challenging for PWA than for healthy adults, thus, resulting in a smaller audiovisual benefit. We hypothesized that any benefit from observing mouth and facial movements to PWA would depend on individuals’ lesion location. That is, we predicted that a reduced audiovisual benefit should be observed in patients with lesions to the posterior STS/STG, a key region for multisensory integration in studies with neurotypicals (e.g., Beauchamp et al., 2004). As we considered comprehension of real words with visible mouth and facial movements, other regions including A1 and inferior frontal cortices (PMv and IFG; e.g., Calvert et al., 1997; Watkins et al., 2003) may also contribute to visual speech benefit.
Methods
In the methods section, we report how we determined our sample size, all data exclusions, all inclusion/exclusion criteria, whether inclusion/exclusion criteria were established prior to data analysis, all manipulations, and all measures in the study.
Participants
Forty-nine native speakers of North American English were recruited from the Moss Rehabilitation Research Institute (MRRI) Research Registry (Schwartz et al., 2005) to participate in the study. Participants included (i) 36 individuals at least 6 months-post a single left hemispheric cerebrovascular accident who exhibited aphasia and, to ensure that they would be able to understand experimental task instructions, had a score of at least 5 (out of 10) on the auditory comprehension subtest of the WAB (Kertesz, 1982; PWA group; mean age=62, SD=11.55) and (ii) 13 neurotypical subjects (control group; mean age=64, SD=9.13) matched for age (t(47)=0.59, p=0.56) and educational level (t(47)=−0.28, p=0.78) to the PWA group. Control participants were included if they achieved a score of at least 27 on the Mini-Mental State Exam (Folstein et al., 1975). Exclusion criteria for both groups included a history of comorbid neurological disorders, psychosis, and alcohol or drug abuse. Additionally, 33 of the PWA passed a hearing screening at 50, 1000, 2000, and 4000Hz (if they were <65 years old) or 1000 and 2000 frequency (if they were >65 years old) in both ears. To maximize the sample size, 3 PWA were included in the study despite not passing the hearing screening1. All participants gave informed consent before taking part in the experiment according to the guidelines of the Institutional Review Board of Einstein Healthcare Network and were compensated for their time and travel expenses. The testing sessions took place in the MRRI laboratories in Elkins Park (Pennsylvania, USA). The de-identified data from this study are publicly available on Open Science Framework (OSF) at https://osf.io/fuscq/.
Neuroimaging acquisition
Twenty-nine participants with aphasia received research-quality structural MRI (26) or CT (3) scans if the former was medically contraindicated. The MRI scans included whole-brain T1-weighted images acquired on a 3T Siemens Trio (Erlangen, Germany) scanner with repetition time of 1620 ms, echo time of 3.87 ms, field of view of 192 256 mm, with 1x1x1 mm voxels, and using a Siemens eight-channel head coil. The CT scans were obtained without contrast (60 axial slices, 3-5 mm slice thickness) on a 64-slice Siemens SOMATOM Sensation scanner.
Lesions were manually segmented on each patient's high-resolution T-1 weighted structural images. Lesioned voxels were assigned a value of 1, and preserved voxels were assigned a value of 0. Both contained grey and white matter. Binarized lesion masks were then registered to an MNI template (Montreal Neurological Institute “Colin27”) using a symmetric diffeomorphic registration algorithm (Avants et al., 2008; www.picsl.upenn.edu/ANTS). First, volumes were registered to an intermediate template of healthy brain images acquired on the same scanner, and they were then mapped onto the “Colin27” template. Lesion maps were subsequently inspected by an experienced neurologist (H.B. Coslett), naive to the behavioral results of the study, to ensure mapping accuracy. The same neurologist drew the CT scans directly onto the “Colin27” template using MRIcron (Rorden & Brett, 2000). To ensure maximum accuracy with high intra- and inter-rater reliability (>0.85%), the pitch of the template was rotated to approximate the slice plane of each participant’s scan (see e.g., Schnur et al., 2009).
Materials
In the experimental picture-word verification task participants indicated whether a spoken stimulus matched a previously seen picture. Experimental materials for the study consisted of 120 words, a corpus of 480 pictures with high name agreement, and 240 video-clips. The list of words and the video-clips are publicly available at https://osf.io/fuscq/. The picture materials could not be publicly archived due to copyright concerns.
Words:
All words were concrete (Mn. 3.5 out of 5 on a concreteness scale; Brysbaert et al., 2014) and referred to common objects and living things. Words were grouped into sets of four (e.g., “cow”, “ear”, “egg”, “pie”) and items within a set were matched on number of syllables and as closely as possible on number of phonemes, lexical frequency (Brysbaert & New, 2009), age of acquisition (AoA; Kuperman et al., 2012), and phonological neighborhood density (Luce & Pisoni, 1998). Each participant saw all 120 words, but the words within a group were presented in different conditions (see below) to different participants. For example, participant 1 heard the word “cow” in the clear condition with visible mouth movements, whereas participant 2 heard the same word in the clear condition, but with no visible facial cues. The sets of four words remained constant across participants and experimental conditions.
Pictures:
Pictures with high name agreement, selected through a naming experiment or inter-rater agreement analysis, were selected from one of three databases (Druks & Masterson, 2000; Hebart et al., 2019; Snodgrass & Vanderwart, 1980). The pictures could refer to: (i) a target word (e.g., “chair”); (ii) a semantically related object (e.g., “table”); (iii) an object with a phonologically related (rhyming) name (e.g., “bear”); or (iv) an unrelated object (e.g., “shoe”). Object names for (ii), (iii), and (iv) had different onset phonemes than the target words on 90% of occasions.
Video-clips:
The video-clips were recorded in a professional, well-lit sound-proof booth at University College London. They depicted a female native speaker of American English with visible head and shoulders uttering target words. The videos were further manipulated in iMovie (version 10.1.12) in the following way. First, we extracted the audio from the video files and combined it with a still image of a female silhouette. As a result, each video had two versions: with (audiovisual) and without (auditory-only) visual cues. This contrast is analogous to real-life scenarios in which interlocutors have face-to-face versus telephone conversations. The decision to use a still image of a silhouette as an auditory-only baseline, rather than a still picture of a speaker or video of a speaker with a blurred lip area, was driven by the concern that the auditory-visual mismatch would create expectancy conflicts that would actively disrupt processing rather than serving as a truly neutral condition. In addition, blurring different parts of the face to control for their role in speech processing is ecologically less valid.
Second, we moderately noise-vocoded the audio in Praat (Boersma & Weenink, 2021) using a 6-band pass filter following Drijvers & Özyürek (2017) and Krason et al. (2021). Noise-vocoded speech is a type of degraded speech in which pitch-related information is manipulated to simulate the listening experience of someone with a cochlear implant (Shannon et al., 1995). Six-band filtering makes the speech challenging, but still intelligible (to a certain degree) and has been previously shown to increase neurotypical individuals’ use of visual cues in word recognition tasks (Drijvers & Özyürek, 2017; Krason et al., 2021). The final set of videos was therefore presented in one of the following conditions: clear audiovisual (clear audio + visible mouth and facial movements), degraded audiovisual (noise-vocoded audio + visible mouth and facial movements), clear auditory-only (clear audio + no visual cues), and degraded auditory-only (noise-vocoded audio + no visual cues). Figure 1 depicts the experimental conditions and trial types used in the study.
Figure 1.
Schematic representation of the experimental conditions and trial types (note: grey speech bubbles represent the noise-vocoded conditions).
The stimuli were displayed on a 24-inch monitor with 1920x1080 resolution. The videos occupied the upper 2/3 of the screen, and the pictures occupied the lower part.
Procedure
The experiment was programmed in Gorilla (https://gorilla.sc/). Participants wore high-quality headphones during the experiment. Participants’ task was to indicate whether a spoken stimulus matched a previously seen picture. Each trial started with a still image of an actress (or a female silhouette in the auditory-only modality) and a fixation cross beneath it. After 500ms, a picture of an object or living thing appeared in place of the fixation cross. After another 1500ms, a 200ms beep tone was played indicating the beginning of a ~1500ms video. The picture remained in view until the end of a video, after which a new screen with a question (“Does the speech match the picture?”) and two response boxes appeared. Participants used their left hand (i.e., the unaffected hand in the PWA) to indicate their responses using “z” and “x” buttons on the keyboard with corresponding colored stickers (“z” [yellow sticker] = “yes”, “x” [blue sticker] = “no”). See Figure 2 for an example of the trial sequence.
Figure 2.
Example of a matching trial sequence in audiovisual modality with clear speech.
Prior to the main task, participants were presented with four practice trials illustrating all possible conditions (i.e., clear audiovisual, degraded audiovisual, clear auditory-only, degraded auditory-only). The practice trials were repeated as necessary to ensure participants understood the task. Both visual and oral feedback was provided during the practice phase. In the main task, participants were exposed to all the target words twice, resulting in 240 trials, with 50% of the trials requiring a “yes” response (matching trials) and the other 50% requiring a “no” response (mismatching trials). None of the mismatching pictures appeared as targets. The second presentation of each word always appeared in a different experimental condition and in the second half of the experiment (after a 10-minute break). The trials were pseudo-randomized into eight blocks of 30 trials. Each session lasted approximately 1.5 hours.
Data Analysis
Behavioral analysis
The lme4 package (Bates et al., 2015) was used to perform a set of mixed-effect analyses in RStudio (RStudio Team, 2015). We carried out generalized logistic mixed-effect regressions (glmer) on accuracy separately for the matching and mismatching trials2. The decision to analyze matching and mismatching trials separately was driven by the findings from neurotypical individuals showing that different cognitive processes may be involved when responding yes/no to matching and mismatching picture-word pairs, with matching trials being overall more reliable (see, e.g., Stadthagen-Gonzalez et al., 2009). Specifically, matching trials have been suggested to reflect conceptual (semantic) matching, i.e., individuals access the meaning of both spoken words and pictures (Stadthagen-Gonzales et al., 2009). In contrast, mismatching trials have been shown to elicit much more variability in how people respond to them, which may be related to the number of additional “checks” one has to perform to decide that a word and a picture mismatch (Krueger, 1978). Potential cognitive mechanisms that may be triggered during mismatching, but less so during matching, trials are cognitive control and priming. Finally, assessing the benefit of congruent visual information is of clinical relevance.
Prior to the analyses, we removed trials with technical difficulties and trials with a phonologically related word “gauge”, because its visual speech information is identical with the visual information of its matching target word “cage” (21 trials in total). We entered the following predictors and up to three-way interactions between them in our models: Speech Clarity (clear, degraded), Modality (audiovisual, auditory-only) and Group (PWA, neurotypicals), as well as Relation Type (semantic, phonological, unrelated) in the mismatching trial analysis. Following a design-driven approach (Barr et al., 2013), we included by-Participant and by-Item random intercepts to account for participant and item variability. We also entered random slopes for Speech Clarity and Modality both by-Participant and by-Item to better control for type I error. Random slopes of Modality were removed from the analysis of the mismatching trials due to model singularity fit. The control variables entered in the models included the Number of Syllables of the target words, Log Frequency (Brysbaert & New, 2009), AoA (Kuperman et al., 2012), and Phonological Neighborhood Density (Luce & Pisoni, 1998). We applied the “bobyqa” algorithm to optimize model convergence and speed of iterations (Powell, 2009). There was no obvious multicollinearity, with Variance Inflation Factors (VIFs) below 2.7 and 4.8 for the matching and mismatching trial analyses, respectively. Finally, the coefficients were used to interpret the size and direction of effects (Jaeger, 2008) and significance values were assessed with Laplace approximation using the LmerTest package (Kuznetsova et al., 2017). Plots were created using the ggplot2 package (Wickham, 2009). The R code for the analyses is available on OSF at https://osf.io/fuscq/. No part of the study procedures or analyses was preregistered.
Finally, we calculated d’ and c, using the psycho package (Makowski, 2018), to check for task sensitivity and response bias, respectively. d’ was calculated by taking the difference in z-scores between hits (correct responses to “yes” trials) and false alarms (incorrect responses to “no” trials). Larger d’ values indicate better sensitivity to the task, and d’ values closer to 0 signify performance approximating chance level (Stanislaw & Todorov, 1999). c was calculated by looking at the number of standard deviations from the point where neither response is preferred (so-called “neutral point”), with positive values indicating a tendency towards “no” responses and negative values indicating a tendency towards “yes” responses (Stanislaw & Todorov, 1999). The d’ values in our study varied between 0.62 to 4.35, suggesting that task sensitivity was good and all participants responded above chance level. The c values ranged from −0.72 to 0.66 and fell well within +/−3SD from the neutral point, suggesting that participants were not biased towards “yes” or “no” responses.
Lesion-symptom mapping analysis
Support Vector Regression Lesion-Symptom Mapping (SVR-LSM) was performed in MATLAB using a toolbox developed by DeMarco and Turkeltaub (2018). SVR-LSM is a multivariate machine learning technique that uses a nonlinear function to determine the association between a map of lesioned voxels in the brain (rather than single voxels) and patients’ behavior (Zhang et al., 2014). As compared with its predecessor, voxel based lesion-symptom mapping (VLSM), it offers better specificity and sensitivity (Mah et al., 2014) by controlling for type I and type II errors caused by correlations between neighboring voxels (Pustina et al., 2018) and multiple comparisons (Bennett et al., 2009), respectively. Importantly, SVR-LSM also outperforms VLSM if a particular behavior is associated with multiple brain regions (Herbet et al., 2015; Mah et al., 2014), as may be the case for speech comprehension.
Here, we focused on the lesions associated with reduced audiovisual benefit (as compared to auditory-only speech) in the matching trials (i.e., requiring a “yes” response) because of their clinical relevance. Based on the accuracy data distribution, we looked at the degraded speech condition only. We used residuals of the audiovisual condition with auditory-only scores regressed out as the dependent variable. We excluded any voxels that were lesioned in less than three patients (~10% of the total number of patients). We regressed lesion volume from both the individual lesion masks and participants’ behavioral scores to control for total lesion volume following DeMarco and Turkeltaub (2018). We generated a null distribution using 10,000 Monte Carlo permutations to determine voxelwise statistical significance. We cross-validated our model by dividing our sample into 5-folds, with four subgroups used to create a regression model and the fifth subgroup used to validate it. The resulting map was then thresholded at p<0.05, and any clusters smaller than 500 voxels were excluded, following Garcea et al., (2019), Lacey et al. (2017), and Vigliocco et al. (2020).
Finally, we used the Johns-Hopkins DTI-based probabilistic white matter tractography atlas (Mori et al., 2008) to determine the overlap between significant voxels from the SVR-LSM analysis and major white matter tracts at a 75% probability threshold (Baldo et al., 2012; Schwartz et al., 2012).
Results
Behavioral Results
Matching trials
We found significant main effects of Speech Clarity (β=1.29, SE=0.22, z=5.92 p<0.001) and Group (β=0.50, SE=0.19, z=2.59, p=0.01), with participants performing better on the clear speech relative to degraded speech and the control group performing more accurately than the PWA group. There was also a significant interaction between Speech Clarity and Modality (β=−0.30, SE=0.13, z=−2.38, p=0.02). Pairwise comparison with Holm’s corrections showed that when the speech was degraded, participants made fewer errors on audiovisual compared to auditory-only presentations (p<0.001). There was no difference between audiovisual and auditory-only modalities when the speech was clear, likely because performance was at ceiling (p>0.05). One control variable was also significant (Number of Syllables: β=1.10, SE=0.29, z=3.76, p<0.001, with participants performing better on longer words). Figure 3 (A) shows mean accuracy scores per group for the matching trials.
Figure 3.
Mean accuracy scores for matching (A) and mismatching (B) trials for the control and PWA groups. Plots (i) and (ii) show Modality on the x-axis, whereas plot (iii) shows Relation Type on the x-axis. Speech Clarity is represented in colors. Error bars are standard errors of the mean.
Mismatching trials
There were significant main effects of Speech Clarity (β=0.56, SE=0.13, z=4.43, p<0.001), with fewer errors for the clear speech; Modality (β=0.38, SE=0.07, z=5.17, p<0.001), with fewer errors for audiovisual presentations; and Relation Type, with fewer errors for unrelated pictures as compared to phonologically (β=−0.58, SE=0.10, z=−5.76, p<0.001) and semantically related pictures (β=−0.64, SE=0.10, z=−6.26, p<0.001). There was also a significant interaction between Modality and Group (β=0.22, SE=0.07, z=3.18, p=0.001). Follow-up pairwise comparisons with Holm’s corrections showed that although both groups performed better in the audiovisual modality compared to auditory-only (p’s<0.04), the difference between conditions was larger for the control group (p=0.05). This effect was further examined in a post-hoc analysis with only the PWA for whom lesion information was available (29) and including a new variable - Lesion Volume (in mm3) - to establish whether lesion size is a significant predictor of smaller benefit from the audiovisual modality. There was no effect of lesion volume on the behavioral results (see Supplementary Materials for full results).
There was also a significant interaction between Speech Clarity and Relation Type (for the phonological type with the unrelated type as a reference: β=−0.59, SE=0.10, z=−6.00, p<0.001). Pairwise comparisons showed significantly better performance for the clear relative to degraded speech for phonological and unrelated pictures (p’s<0.01), but not semantically related pictures (p>0.05). When the speech was clear, participants were also more accurate on the phonological than semantic pictures (p=0.004), but when the speech was degraded, they were more accurate on the semantic trials compared to phonological ones (p<0.001). One control variable (Phonological Neighborhood Density) was also significant (β=−0.02, SE=0.01, z=−2.05, p=0.04, with participants performing better on words with smaller phonological neighborhood density). Figure 3 (B) shows mean accuracy scores per group for the mismatching trials.
Lesion-Symptom Mapping Results
To assess which brain areas, when lesioned, are associated with reduced benefit of visual speech cues, we carried out a SVR-LSM analysis in the 29 PWA who had scans. The overlap map with regions lesioned in at least three participants is depicted in Figure 4. The dependent variable was the residuals of the audiovisual condition with the auditory-only condition regressed out for degraded speech in the matching condition. The SVR-LSM analysis showed several significant clusters, including parts of the superior temporal pole (TPOsup, STG), postcentral gyrus (PoCG), SMG, INS, and IFG (pars triangularis and pars orbitalis). Table 2 and Figure 5 summarize the results. Finally, based on the Johns-Hopkins DTI probabilistic atlas (Mori et al., 2008), we found an overlap between significant clusters and superior longitudinal fasciculus (SLF). The probabilistic location of SLF and the overlap is presented in Figure 6. The percentage overlap between SLF and SVR-LSM results is presented in Table 3.
Figure 4.
Voxelwise lesion overlap for 29 participants. Only voxels lesioned in a minimum of 3 participants are displayed.
Table 2.
SVR-LSM results with X, Y and Z centers of mass associated with reduced benefit from the audiovisual speech relative to auditory-only in the degraded listening condition for the matching trials. Regions with clusters of >500 voxels were identified by Automated Anatomical Labeling (AAL).
| Regions | Abbrev. | Number of Voxels in Damaged Region |
Percentage of Voxels in Damaged Region |
MNI Centers of Mass |
||
|---|---|---|---|---|---|---|
| X | Y | Z | ||||
| Temporal Pole: Superior Temporal Gyrus | TPOsup | 1630 | 15.94 | −49 | 14 | −3 |
| Superior Temporal Gyrus | STG | 1433 | 7.83 | −54 | −43 | 21 |
| Postcentral Gyrus | PoCG | 1297 | 4.18 | −42 | −17 | 39 |
| Supramarginal Gyrus | SMG | 1038 | 10.48 | −49 | −29 | 31 |
| Insula | INS | 1001 | 6.66 | −30 | 19 | −8 |
| Inferior Frontal Gyrus, Pars Triangularis | IFGtriang | 975 | 4.85 | −55 | 21 | 3 |
| Inferior Frontal Gyrus, Pars Orbitalis | IFGorb | 719 | 5.29 | −51 | 28 | −2 |
Figure 5.
SVR-LSM results depicting significant voxels (in red), which when lesioned, are associated with reduced benefit from audiovisual presentation relative to auditory-only presentation during degraded listening condition for the matching trials. Voxelwise threshold set to p<0.05 with 10,000 Monte Carlo permutations and 5-fold cross-validation. Clusters of <500 contiguous 1mm3 voxels were excluded.
Figure 6.
Probabilistic location of the white matter tracts based on the JHU white matter atlas overlaid onto SVR-LSM results. The dependent variable was the amount of benefit from audiovisual speech relative to auditory-only speech in the degraded condition for the matching trials. White matter tract probability threshold: 75%.
Table 3.
The overlap percentage between peak voxels in MNI space identified in the SVR-LSM analysis and superior longitudinal fasciculus (SLF), as verified with the Johns Hopkins DTI-based probabilistic white matter tractography atlas.
| Regions | Abbrev. | Number of Voxels in SLF |
Number of Voxels Identified in SVR-LSM that Overlap |
Percentage of Overlapping Voxels |
MNI Centers of Mass |
||
|---|---|---|---|---|---|---|---|
| X | Y | Z | |||||
| Superior Longitudinal Fasciculus | SLF | 690 | 104 | 15.07 | −38 | −18 | 31 |
Discussion
The current study is the first to investigate the benefit of mouth and facial movements in word comprehension of people with aphasia using both behavioral and lesion-symptom mapping methods. In contrast to previous studies, we used a picture-verification task and manipulated the presence of visual speech information and the clarity of auditory signal to assess the extent to which these factors impact speech comprehension in adults with post-stroke aphasia and a neurotypical control group. We also conducted exploratory SVR-LSM to investigate the neural regions associated with any benefit of visual speech for word comprehension.
In line with previous studies assessing audiovisual comprehension of neurotypical individuals, we found that visual information accompanying speech benefits comprehension in challenging listening conditions and that such benefit is larger for the controls relative to PWA regardless of speech clarity conditions. Our SVR-LSM and tractographic analyses indicated that TPOsup, STG, SMG, PoCG, INS, IFG, and SLF may mediate the benefit of audiovisual information in comprehension.
Benefit of Visual Speech for Aphasic Comprehension
Potential benefits of visual speech information were assessed separately for matching (i.e., speech matched a previously-seen picture) and mismatching (i.e., speech mismatched a previously-seen picture) trials. For the matching trials, we showed that comprehension of degraded speech was easier when the speech was accompanied by mouth and facial movements relative to when the visual information was absent. This result is in line with previous findings with neurotypical adults (Krason et al., 2021; Ma et al., 2009; Ross et al., 2007; Schwartz et al., 2004; Sumby & Pollack, 1954; Tye-Murray et al., 2007), indicating that visual speech information plays a role particularly when phonological processing is more difficult. In such challenging listening conditions, mouth movements are likely to be beneficial because they support temporal predictions of the upcoming auditory speech information and constrain lexical competition (Peelle & Sommers, 2015). Our findings are also consistent with previous research showing similar performance of PWA and neurotypical adults under adverse listening conditions (Kittredge et al., 2006; Healy et al., 2007). The lack of audiovisual benefit for PWA in the clear speech condition is likely to be driven by a ceiling effect; that is, like controls, these individuals with mild-moderate aphasia performed relatively well in the clear condition. For this reason, the present findings may not generalize to individuals with more severe comprehension impairments.
Additionally, we found effects of visual speech for the mismatching trials. Both groups benefited from seeing mouth and facial movements in addition to hearing speech, but the control group showed a larger advantage than the aphasic group, which may be related to the involvement of additional cognitive processes (such as cognitive control that is often impaired in PWA; Brownsett et al., 2014) during mismatching presentations. To our knowledge, only one recent unpublished study investigated audiovisual speech benefit in a sentence repetition task in PWA and found a similar pattern of larger audiovisual advantage for neurotypical individuals in one of their experimental conditions (i.e., during very high noise levels of 0dB SNR; Raymer et al., 2021). Although the reported methods and data analysis are insufficiently detailed to allow strong comparisons to our findings, both our study and the study of Raymer et al. (2021) suggest the possibility that PWA may have difficulty integrating visual and auditory streams of information into a coherent percept, as would be required for mouth movements to be useful in disambiguating speech (Massaro & Jesse, 2007; Schmid & Ziegler, 2006).
Moreover, it is interesting to note that the control group in the present study also showed audiovisual benefit for the mismatching trials when the speech was clear, as well as when it was degraded. This is a different pattern than we observed in the matching trials; however, in the mismatching trials performance was “off-ceiling” in the auditory-only condition, leaving room for a benefit of visual information. Finally, neurotypical and aphasic individuals also responded more accurately to unrelated trials compared to both phonologically and semantically related trials. Moreover, the performance on the latter two relation types depended on speech clarity: Individuals performed equally well on semantic trials whether the auditory signal was clear or degraded; In contrast, they made more errors on phonological trials when speech was degraded than clear. Altogether, this finding demonstrates that phonological discriminability is reduced by noise, whereas semantic discriminability is not.
Neural Substrates of Visual Speech Benefit
Our exploratory lesion-symptom mapping analysis identified several clusters in the left hemisphere that appear to be involved in audiovisual speech comprehension. These include perisylvian regions in temporal (TPOsup, STG), insular (INS) and inferior frontal (IFG) cortices, as well as parts of parietal (SMG) and somatosensory cortices (PoCG). Although the SVR-LSM was conducted on a relatively small sample size (see Ivanova et al., 2021) and replication is needed, our results are consistent with previous findings in neurotypical populations suggesting involvement of a large fronto-temporo-parietal network, including STG, STS, INS, superior and inferior frontal cortex, as well as SMG and IPL, in sensorimotor speech interactions (Calvert et al., 2001; Campbell, 2008; Dick et al., 2010; Peelle, 2019; Bernstein et al., 2012). Our findings also indicate that both ventral and dorsal streams may contribute to the benefit of visual speech for word comprehension. Portions of the ventral stream, and in particular, posterior superior and middle temporal cortex, have been associated with sound-to-meaning mapping. In the present study we found a cluster of regions distributed along the lateral and medial surfaces of the STG to be associated with audiovisual speech comprehension. The STG is known for its multifunctionality and heteromodality (Hein & Knight, 2021; Venezia et al., 2017), and previous studies have found that posterior STG/STS play a crucial role in audiovisual and visual speech processing (Callan et al., 2003; Calvert et al., 2000; Calvert & Campbell, 2003; Erickson et al., 2014; Nath & Beauchamp, 2012; Okada & Hickok, 2009; Sekiyama et al., 2003; Skipper et al., 2005, 2007; Venezia et al., 2016; Wright et al., 2003), likely because of its multisensory integration properties (Amedi et al., 2005; Beauchamp, 2005; Beauchamp et al., 2004). Less is known about the involvement of the temporal pole in the processing of visual speech cues. The temporal pole has primarily been linked with higher-order cognitive processes, such as naming (e.g., Rice et al., 2015), word retrieval (e.g., Damasio et al., 2004), and semantic processing (Lambon Ralph, 2013; Patterson et al., 2007). A few studies have suggested a role for the anterior STG in audiovisual speech processing (Hertrich et al., 2011; Lee & Noppeney, 2011; Ozker et al., 2017). For example, Hertrich, Dietrich, & Ackermann (2011) showed that relatively anterior parts of STG are linked with the processing of visual speech information (e.g., syllables /pa/ and /ta/) and more posterior STG is associated with cross-modal integration with non-speech stimuli (e.g., moving shapes and tones). Although the stimuli in these studies were not directly relevant to comprehension, it is of interest to note the convergence of our results with these findings.
The dorsal stream, by contrast, including portions of the posterior-frontal and parietal-temporal cortices, has been previously associated with sound-to-articulatory mapping in speech production. Here, we showed that insular regions medial to the superior temporal surface and fronto-parietal regions of the dorsal stream may play a role in visual speech comprehension, in line with previous literature (Callan et al., 2003; Calvert et al., 2001; Hickok et al., 2018; Skipper et al., 2007). For instance, Hickok et al. (2018) found associations between INS and susceptibility to perceiving fused perceptions with McGurk stimuli, while Callan et al. (2003) reported INS to be involved in mouth movement processing when the auditory signal is degraded or absent. Although the precise role of the insula in audiovisual speech comprehension is still debated, these findings indicate that it may act as a mediator during cross-modal interactions and/or executive demand processing under challenging conditions (Callan et al., 2003; Calvert et al., 2001; Hickok et al., 2018; Skipper et al., 2007). Given that our stimuli consisted of videos of a speaker’s full face rather than solely mouth movements, the involvement of the insular cortex found in the current study may also be related to processing socio-emotional facial cues (Rae et al., 2018).
We also found that primary somatosensory cortex (PoCG) and parietal association areas (SMG) appear to mediate the benefit of visual speech information for comprehension. These regions may be engaged in encoding phonological information from mouth movements (Möttönen et al., 2005; Skipper et al., 2005, 2007) and binding it with the auditory signal (Bernstein et al., 2008; Bernstein & Liebenthal, 2014; Jones & Callan, 2003; Michaelis, 2020). Additionally, we showed that IFG may be associated with the benefit of mouth movements, which is in line with Skipper et al. (2005; 2007) and Watkins et al. (2003), but not other recent studies with PWA (Andersen & Starrfelt, 2015; Hickok et al., 2018). These findings may be discrepant because the involvement of IFG in audiovisual processing is task specific (for a review see Peelle, 2019). For example, when speech encoding is more challenging, IFG may play a compensatory role in supporting the extraction of visual information from the mouth. It is important to note that although our findings are consistent with the prior literature in our identification of multiple fronto-temporal brain regions involved in audiovisual processing, our sample was small for a robust SVR-LSM analysis and future studies may identify additional regions. Another limitation of the present study was that our sample of chronic patients largely consisted of anomic aphasics and lacked individuals with Wernicke’s or transcortical sensory aphasia. Although these aphasia types are less common in the chronic than acute phases of recovery, future research may benefit from a more diverse sample of PWA.
Finally, our results are also in line with a recent study of Zhang & Du (2022), showing involvement of the dorsal stream, including PMv, IFG, SMG and the underlying white matter tracts of the arcuate fasciculus, in phonological encoding from mouth movements during audiovisual speech perception. Their findings are also consistent with our white matter tractographic analysis demonstrating that the SLF is associated with the benefit of visual speech information. In particular, the parts of the SLF connecting superior temporal with inferior frontal regions have been found to be critical for phonological processing (Dick et al., 2014; Glasser & Rilling, 2008). Thus, a disruption to phonological processing caused by lesions to SLF may lead to cross-modal integration failure, which could explain the reduced benefit from audiovisual speech relative to the auditory signal alone. Future studies should investigate how the connectivity between these fronto-temporo-parietal regions, as well as between these regions and the right hemisphere, impacts audiovisual speech comprehension in aphasia.
Conclusions
The current study brings together behavioral and lesion-symptom mapping profiles of people with aphasia to establish the benefit of visual speech information for word comprehension. We have demonstrated that mouth and facial movements are more beneficial for the comprehension of neurotypical individuals than adults with aphasia, and are more beneficial for both groups when listening conditions are challenging. We have also provided preliminary evidence that the integrity of a number of specific inferior frontal, temporal, parietal regions as well as fronto-temporal connection via the superior longitudinal fasciculus may be associated with this benefit, consistent with the previously-demonstrated role of these regions in cross-modal mapping. Although studies of spoken word comprehension have typically focused on the auditory modality, our findings suggest that future investigations should consider whether and how visual speech information impacts comprehension in aphasia.
Supplementary Material
Table 1.
Patient demographics
| ID | Aphasia Diagnosis |
WAB AQ |
WAB Compr. |
Sex | Educat. (years) |
Age At Testing |
Months Since Stroke |
Lesion Volume (mm3) |
|---|---|---|---|---|---|---|---|---|
| P01 | Broca's | 43.7 | 7.4 | F | 12 | 55 | 173 | 111,206 |
| P02 | Conduction | 71.9 | 8.0 | M | 16 | 60 | 19 | 189,767 |
| P03 | Transcortical motor | 69.8 | 6.9 | M | 19 | 75 | 19 | 120,820 |
| P04 | Broca's | 66.0 | 9.0 | M | 12 | 69 | 174 | 71,750 |
| P05 | Anomic | 90.8 | 8.6 | M | 12 | 64 | 139 | 47,566 |
| P06 | Anomic | 87.8 | 8.9 | F | 14 | 61 | 135 | 41,502 |
| P07 | Anomic | 86.4 | 9.2 | M | 12 | 62 | 50 | 11,961 |
| P08 | Anomic | 92.6 | 8.5 | M | 18 | 50 | 52 | 96,147 |
| P09 | Anomic | 88.3 | 9.8 | F | 18 | 58 | 168 | 80,532 |
| P10 | Anomic | 89.7 | 9.1 | M | 14 | 57 | 126 | 55,685 |
| P11 | Anomic | 81.3 | 9.4 | M | 12 | 57 | 88 | 126,448 |
| P12 | Anomic | 82.4 | 10.0 | M | 13 | 63 | 168 | 193,421 |
| P13 | Broca's | 39.6 | 7.7 | M | 13 | 56 | 68 | n/a |
| P14 | Anomic | 92.7 | 9.5 | M | 18 | 77 | 36 | 87,120 |
| P15 | Anomic | 88.1 | 9.4 | F | 13 | 57 | 179 | 106,731 |
| P16 | Anomic | 92.8 | 10.0 | F | 13 | 33 | 97 | n/a |
| P17 | Anomic | 95.4 | 9.6 | F | 12 | 52 | 22 | 26,504 |
| P18 | Broca's | 50.3 | 9.4 | F | 16 | 70 | 118 | 109,181 |
| P19 | Conduction | 73.8 | 8.8 | M | 12 | 64 | 15 | n/a |
| P20 | Anomic | 93.9 | 9.4 | F | 12 | 42 | 84 | 27,840 |
| P21 | Anomic | 89.2 | 9.4 | F | 13 | 70 | 23 | n/a |
| P22 | Anomic | 90.1 | 8.6 | F | 14 | 67 | 10 | n/a |
| P23 | Anomic | 93.9 | 9.9 | M | 12 | 66 | 120 | 64,284 |
| P24 | Anomic | 87.4 | 9.5 | F | 16 | 41 | 53 | 181,199 |
| P25 | Broca's | 33.2 | 6.3 | M | 13 | 40 | 96 | 222,352 |
| P26 | Broca's | 32.4 | 7.9 | M | 12 | 84 | 170 | 145,170 |
| P27 | Anomic | 92.3 | 9.9 | M | 16 | 67 | 245 | 99,980 |
| P28 | Anomic | 88.5 | 9.1 | F | 16 | 60 | 232 | 124,678 |
| P29 | Anomic | 92.4 | 8.8 | F | 12 | 83 | 135 | 18,528 |
| P30 | Broca's | 31.9 | 7.8 | M | 16 | 59 | 19 | n/a |
| P31 | Broca's | 68.6 | 8.3 | F | 11 | 70 | 48 | 56,156 |
| P32 | Anomic | 91.2 | 9.3 | M | 19 | 75 | 67 | 32,003 |
| P33 | Anomic | 88.0 | 9.2 | F | 13 | 55 | 134 | 136,576 |
| P34 | Broca's | 61.6 | 6.6 | F | 19 | 73 | 378 | 231,141 |
| P35 | Anomic | 89.5 | 9.3 | F | 13 | 56 | 104 | 48,459 |
| P36 | Anomic | 94.6 | 10 | M | 19 | 69 | 96 | n/a |
| 24 Anomic | M=77.8 | M=8.9 | 17Fs | M=14.3 | M=61.6 | M=106.6 | M=98,783 | |
| 9 Broca’s | SD=20.0 | SD=1.0 | SD=2.6 | SD=11.5 | SD=78.4 | SD=61,483 | ||
| 2 Conduction 1 Transcortical motor | R=31.9-95.4 | R=6.3-10 | R=11-19 | R=33-84 | R=10-378 | R=11,961-231,141 |
Abbreviations: WAB AQ = Western Aphasia Battery Aphasia Quotient; WAB Compr. = Western Aphasia Battery Auditory Comprehension Score; Educat. = Education; M = mean; SD = standard deviation; R = range; n/a = not applicable.
Acknowledgments
We would like to thank Erica Middleton for sharing picture stimuli with us; Rachel Metzgar for helping with recruitment and testing; Frank Garcea for his expertise in lesion-symptom mapping analysis; H. Branch Coslett for help with lesion segmentation; Linda Drijvers for sharing the Praat script; and all participants who took part in our study.
This research was supported by Peer Review Committee funding (PRC FY19-2) awarded to LB and GV. The work was further supported by a European Research Council Advanced Grant (ECOLANG, 743035). While working on this project, GV was supported by a Royal Society Wolfson Research Merit Award (WRM\R3\170016). This research was also supported by the UCL Bogue Research Fellowship awarded to AK.
Footnotes
Author Information
Authors declare no conflict of interest.
CRediT Author Statement
AK: Conceptualization, methodology, validation, formal analysis, investigation, writing original draft, visualization, funding acquisition; GV: Conceptualization, methodology, resources, writing - review & editing, supervision, project administration, funding acquisition; MM: Methodology, writing - review & editing; HS: Software, validation, investigation; RV: Methodology, writing - review & editing, supervision; LB: Conceptualization, methodology, resources, writing - review & editing, supervision, project administration, funding acquisition.
We tested an accuracy model including all predictors of interest (see Data Analysis section) but excluding the 3 participants who did not pass the audiometry screening. The results are consistent with the results from the accuracy model with the full sample, suggesting that the hearing factor did not influence our findings. All results are presented in the Supplementary Materials for comparison.
Reaction time data were unreliable due to a number of responses prior to the response window, i.e., while videos played.
References
- Amedi A, von Kriegstein K, van Atteveldt NM, Beauchamp MS, & Naumer MJ (2005). Functional imaging of human crossmodal identification and object recognition. Experimental Brain Research, 166(3–4), 559–571. 10.1007/s00221-005-2396-5 [DOI] [PubMed] [Google Scholar]
- Andersen TS, & Starrfelt R (2015). Audiovisual integration of speech in a patient with Broca’s Aphasia. Frontiers in Psychology, 6. 10.3389/fpsyg.2015.00435 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Arnold P, & Hill F (2001). Bisensory augmentation: A speechreading advantage when speech is clearly audible and intact. British Journal of Psychology (London, England: 1953), 92 Part 2, 339–355. [PubMed] [Google Scholar]
- Avants BB, Epstein CL, Grossman M, & Gee JC (2008). Symmetric Diffeomorphic Image Registration with Cross-Correlation: Evaluating Automated Labeling of Elderly and Neurodegenerative Brain. Medical Image Analysis, 12(1), 26–41. 10.1016/j.media.2007.06.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baldo JV, Katseff S, & Dronkers NF (2012). Brain regions underlying repetition and auditory-verbal short-term memory deficits in aphasia: Evidence from voxel-based lesion symptom mapping. Aphasiology, 26(3–4), 338–354. 10.1080/02687038.2011.602391 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Barr DJ, Levy R, Scheepers C, & Tily HJ (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278. 10.1016/j.jml.2012.11.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bates D, Mächler M, Bolker B, & Walker S (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1–48. 10.18637/jss.v067.i01 [DOI] [Google Scholar]
- Baum SH, Martin RC, Hamilton AC, & Beauchamp MS (2012). Multisensory speech perception without the left superior temporal sulcus. NeuroImage, 62(3), 1825–1832. 10.1016/j.neuroimage.2012.05.034 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beauchamp MS (2005). See me, hear me, touch me: Multisensory integration in lateral occipital-temporal cortex. Current Opinion in Neurobiology, 15(2), 145–153. 10.1016/j.conb.2005.03.011 [DOI] [PubMed] [Google Scholar]
- Beauchamp MS, Lee KE, Argall BD, & Martin A (2004). Integration of Auditory and Visual Information about Objects in Superior Temporal Sulcus. Neuron, 41(5), 809–823. 10.1016/S0896-6273(04)00070-4 [DOI] [PubMed] [Google Scholar]
- Bennett CM, Wolford GL, & Miller MB (2009). The principled control of false positives in neuroimaging. Social Cognitive and Affective Neuroscience, 4(4), 417–422. 10.1093/scan/nsp053 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bernstein LE, & Liebenthal E (2014). Neural pathways for visual speech perception. Frontiers in Neuroscience, 8, 386. 10.3389/fnins.2014.00386 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brown VA, Hedayati M, Zanger A, Mayn S, Ray L, Dillman-Hasso N, & Strand JF (2018). What accounts for individual differences in susceptibility to the McGurk effect? PloS One, 13(11), e0207160. 10.1371/journal.pone.0207160 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brownsett SLE, Warren JE, Geranmayeh F, Woodhead Z, Leech R, & Wise RJS (2014). Cognitive control and its impact on recovery from aphasic stroke. Brain, 137(1), 242–254. 10.1093/brain/awt289 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brysbaert M, & New B (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977–990. 10.3758/BRM.41.4.977 [DOI] [PubMed] [Google Scholar]
- Brysbaert M, Warriner AB, & Kuperman V (2014). Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods, 46(3), 904–911. 10.3758/s13428-013-0403-5 [DOI] [PubMed] [Google Scholar]
- Callan DE, Jones JA, Munhall K, Callan AM, Kroos C, & Vatikiotis-Bateson E (2003). Neural processes underlying perceptual enhancement by visual speech gestures: NeuroReport, 14(17), 2213–2218. 10.1097/00001756-200312020-00016 [DOI] [PubMed] [Google Scholar]
- Calvert GA, Bullmore ET, Brammer MJ, Campbell R, Williams SC, McGuire PK, Woodruff PW, Iversen SD, & David AS (1997). Activation of auditory cortex during silent lipreading. Science (New York, N.Y.), 276(5312), 593–596. [DOI] [PubMed] [Google Scholar]
- Calvert GA, & Campbell R (2003). Reading Speech from Still and Moving Faces: The Neural Substrates of Visible Speech. Journal of Cognitive Neuroscience, 15(1), 57–70. 10.1162/089892903321107828 [DOI] [PubMed] [Google Scholar]
- Calvert GA, Campbell R, & Brammer MJ (2000). Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Current Biology: CB, 10(11), 649–657. [DOI] [PubMed] [Google Scholar]
- Calvert GA, Hansen PC, Iversen SD, & Brammer MJ (2001). Detection of Audio-Visual Integration Sites in Humans by Application of Electrophysiological Criteria to the BOLD Effect. NeuroImage, 14(2), 427–438. 10.1006/nimg.2001.0812 [DOI] [PubMed] [Google Scholar]
- Campbell R. (2008). The processing of audio-visual speech: Empirical and neural bases. Philosophical Transactions of the Royal Society B: Biological Sciences, 363(1493), 1001–1010. 10.1098/rstb.2007.2155 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Campbell R, Garwood J, Franklin S, Howard D, Landis T, & Regard M (1990). Neuropsychological studies of auditory-visual fusion illusions. Four case studies and their implications. Neuropsychologia, 28(8), 787–802. [DOI] [PubMed] [Google Scholar]
- Chandrasekaran C, Trubanova A, Stillittano S, Caplier A, & Ghazanfar AA (2009). The Natural Statistics of Audiovisual Speech. PLoS Computational Biology, 5(7), e1000436. 10.1371/journal.pcbi.1000436 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crosse MJ, Butler JS, & Lalor EC (2015). Congruent Visual Speech Enhances Cortical Entrainment to Continuous Auditory Speech in Noise-Free Conditions. Journal of Neuroscience, 35(42), 14195–14204. 10.1523/JNEUROSCI.1829-15.2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- DeMarco AT, & Turkeltaub PE (2018). A multivariate lesion symptom mapping toolbox and examination of lesion-volume biases and correction methods in lesion-symptom mapping. Human Brain Mapping, 39(11), 4169–4182. 10.1002/hbm.24289 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dick AS, Bernal B, & Tremblay P (2014). The Language Connectome: New Pathways, New Concepts. The Neuroscientist, 20(5), 453–467. 10.1177/1073858413513502 [DOI] [PubMed] [Google Scholar]
- Dick AS, Solodkin A, & Small SL (2010). Neural development of networks for audiovisual speech comprehension. Brain and Language, 114(2), 101–114. 10.1016/j.bandl.2009.08.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Drijvers L, & Özyürek A (2017). Visual Context Enhanced: The Joint Contribution of Iconic Gestures and Visible Speech to Degraded Speech Comprehension. Journal of Speech, Language, and Hearing Research, 60(1), 212–222. 10.1044/2016_JSLHR-H-16-0101 [DOI] [PubMed] [Google Scholar]
- Druks J, & Masterson J (2000). An Object and Action Naming Battery. Psychology Press. [Google Scholar]
- Erickson LC, Heeg E, Rauschecker JP, & Turkeltaub PE (2014). An ALE meta-analysis on the audiovisual integration of speech signals: ALE Meta-Analysis on AV Speech Integration. Human Brain Mapping, 35(11), 5587–5605. 10.1002/hbm.22572 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Folstein MF, Folstein SE, & McHugh PR (1975). ‘Mini-mental state’. A practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research, 12(3), 189–198. [DOI] [PubMed] [Google Scholar]
- Garcea FE, Stoll H, & Buxbaum LJ (2019). Reduced competition between tool action neighbors in left hemisphere stroke. Cortex, 120, 269–283. 10.1016/j.cortex.2019.05.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Glasser MF, & Rilling JK (2008). DTI Tractography of the Human Brain’s Language Pathways. Cerebral Cortex, 18(11), 2471–2482. 10.1093/cercor/bhn011 [DOI] [PubMed] [Google Scholar]
- Hebart MN, Dickter AH, Kidder A, Kwok WY, Corriveau A, Wicklin CV, & Baker CI (2019). THINGS: A database of 1,854 object concepts and more than 26,000 naturalistic object images. PLOS ONE, 14(10), e0223792. 10.1371/journal.pone.0223792 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hein G, & Knight RT (2021). Superior Temporal Sulcus—It’s My Area: Or Is It? 20(12), 12. [DOI] [PubMed] [Google Scholar]
- Herbet G, Lafargue G, & Duffau H (2015). Rethinking voxel-wise lesion-deficit analysis: A new challenge for computational neuropsychology. Cortex, 64, 413–416. 10.1016/j.cortex.2014.10.021 [DOI] [PubMed] [Google Scholar]
- Hertrich I, Dietrich S, & Ackermann H (2011). Cross-modal Interactions during Perception of Audiovisual Speech and Nonspeech Signals: An fMRI Study. Journal of Cognitive Neuroscience, 23(1), 221–237. 10.1162/jocn.2010.21421 [DOI] [PubMed] [Google Scholar]
- Hessler D, Jonkers R, & Bastiaanse R (2012). Processing of audiovisual stimuli in aphasic and non-brain-damaged listeners. Aphasiology, 26(1), 83–102. 10.1080/02687038.2011.608840 [DOI] [Google Scholar]
- Hickok G, & Poeppel D (2007). The cortical organization of speech processing. Nature Reviews Neuroscience, 8(5), 393–402. 10.1038/nrn2113 [DOI] [PubMed] [Google Scholar]
- Hickok G, Rogalsky C, Matchin W, Basilakos A, Cai J, Pillay S, Ferrill M, Mickelsen S, Anderson SW, Love T, Binder J, & Fridriksson J (2018). Neural networks supporting audiovisual integration for speech: A large-scale lesion study. Cortex, 103, 360–371. 10.1016/j.cortex.2018.03.030 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ivanova MV, Herron TJ, Dronkers NF, & Baldo JV (2021). An empirical comparison of univariate versus multivariate methods for the analysis of brain–behavior mapping. Human Brain Mapping, 42(4), 1070–1101. 10.1002/hbm.25278 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jaeger TF (2008). Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language, 59(4), 434–446. 10.1016/j.jml.2007.11.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jones JA, & Callan DE (2003). Brain activity during audiovisual speech perception: An fMRI study of the McGurk effect. NeuroReport: For Rapid Communication of Neuroscience Research, 14(8), 1129–1133. 10.1097/00001756-200306110-00006 [DOI] [PubMed] [Google Scholar]
- Kertesz A, Kertesz A, Raven JC, & PsychCorp (Firm). (2007). WAB-R: Western Aphasia Battery-Revised. PsychCorp. [Google Scholar]
- Kittredge A, Davis L, & Blumstein SE (2006). Effects of nonlinguistic auditory variations on lexical processing in Broca’s aphasics. Brain and Language, 97(1), 25–40. 10.1016/j.bandl.2005.07.012 [DOI] [PubMed] [Google Scholar]
- Krason A, Fenton R, Varley R, & Vigliocco G (2021). The role of iconic gestures and mouth movements in face-to-face communication. Psychonomic Bulletin & Review. 10.3758/s13423-021-02009-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kuperman V, Stadthagen-Gonzalez H, & Brysbaert M (2012). Age-of-acquisition ratings for 30,000 English words. Behavior Research Methods, 44(4), 978–990. 10.3758/s13428-012-0210-4 [DOI] [PubMed] [Google Scholar]
- Kuznetsova A, Brockhoff PB, & Christensen RHB (2017). lmerTest Package: Tests in Linear Mixed Effects Models. Journal of Statistical Software, 82(13). 10.18637/jss.v082.i13 [DOI] [Google Scholar]
- Lacey EH, Skipper-Kallal L, Xing S, Fama M, & Turkeltaub P (2017). Mapping common aphasia assessments to underlying cognitive processes and their neural substrates. Neurorehabilitation and Neural Repair, 31(5), 442–450. 10.1177/1545968316688797 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee H, & Noppeney U (2011). Physical and Perceptual Factors Shape the Neural Mechanisms That Integrate Audiovisual Signals in Speech Comprehension. Journal of Neuroscience, 31(31), 11338–11350. 10.1523/JNEUROSCI.6510-10.2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Luce PA, & Pisoni DB (1998). Recognizing Spoken Words: The Neighborhood Activation Model. Ear and Hearing, 19(1), 1–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Luo H, Liu Z, & Poeppel D (2010). Auditory Cortex Tracks Both Auditory and Visual Stimulus Dynamics Using Low-Frequency Neuronal Phase Modulation. PLOS Biology, 8(8), e1000445. 10.1371/journal.pbio.1000445 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ma WJ, Zhou X, Ross LA, Foxe JJ, & Parra LC (2009). Lip-Reading Aids Word Recognition Most in Moderate Noise: A Bayesian Explanation Using High-Dimensional Feature Space. PLoS ONE, 4(3), e4638. 10.1371/journal.pone.0004638 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mah Y-H, Husain M, Rees G, & Nachev P (2014). Human brain lesion-deficit inference remapped. Brain, 137(9), 2522–2531. 10.1093/brain/awu164 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Massaro DW, & Jesse A (2007). Audiovisual speech perception and word recognition. Oxford University Press. 10.1093/oxfordhb/9780198568971.013.0002 [DOI] [Google Scholar]
- McGurk H, & MacDonald J (1976). Hearing lips and seeing voices. Nature, 264, 746–748. 10.1038/264746a0 [DOI] [PubMed] [Google Scholar]
- Michaelis K. (2020). Effects of age and left hemisphere lesions on audiovisual integration of speech. Brain and Language, 12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mori S, Oishi K, Jiang H, Jiang L, Li X, Akhter K, Hua K, Faria AV, Mahmood A, Woods R, Toga AW, Pike GB, Neto PR, Evans A, Zhang J, Huang H, Miller MI, van Zijl P, & Mazziotta J (2008). Stereotaxic white matter atlas based on diffusion tensor imaging in an ICBM template. NeuroImage, 40(2), 570–582. 10.1016/j.neuroimage.2007.12.035 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Möttönen R, Järveläinen J, Sams M, & Hari R (2005). Viewing speech modulates activity in the left SI mouth cortex. NeuroImage, 24(3), 731–737. 10.1016/j.neuroimage.2004.10.011 [DOI] [PubMed] [Google Scholar]
- Nath AR, & Beauchamp MS (2012). A neural basis for interindividual differences in the McGurk effect, a multisensory speech illusion. NeuroImage, 59(1), 781–787. 10.1016/j.neuroimage.2011.07.024 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Okada K, & Hickok G (2009). Two cortical mechanisms support the integration of visual and auditory speech: A hypothesis and preliminary data. Neuroscience Letters, 452(3), 219–223. 10.1016/j.neulet.2009.01.060 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Okada K, Venezia JH, Matchin W, Saberi K, & Hickok G (2013). An fMRI Study of Audiovisual Speech Perception Reveals Multisensory Interactions in Auditory Cortex. PLoS ONE, 8(6), e68959. 10.1371/journal.pone.0068959 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ozker M, Schepers IM, Magnotti JF, Yoshor D, & Beauchamp MS (2017). A Double Dissociation between Anterior and Posterior Superior Temporal Gyrus for Processing Audiovisual Speech Demonstrated by Electrocorticography. Journal of Cognitive Neuroscience, 29(6), 1044–1060. 10.1162/jocn_a_01110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Paulesu E, Perani D, Blasi V, Silani G, Borghese NA, De Giovanni U, Sensolo S, & Fazio F (2003). A Functional-Anatomical Model for Lipreading. Journal of Neurophysiology, 90(3), 2005–2013. 10.1152/jn.00926.2002 [DOI] [PubMed] [Google Scholar]
- Peelle JE (n.d.). The neural basis for auditory and audiovisual speech perception. 25. [Google Scholar]
- Peelle JE, & Sommers MS (2015). Prediction and constraint in audiovisual speech perception. Cortex, 68, 169–181. 10.1016/j.cortex.2015.03.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pekkola J, Ojanen V, Autti T, Jääskeläinen IP, Möttönen R, Tarkiainen A, & Sams M (2005). Primary auditory cortex activation by visual speech: An fMRI study at 3 T. Neuroreport, 16(2), 125–128. [DOI] [PubMed] [Google Scholar]
- Powell MJD (n.d.). The BOBYQA algorithm for bound constrained optimization without derivatives. 39. [Google Scholar]
- Pustina D, Avants B, Faseyitan OK, Medaglia JD, & Coslett HB (2018). Improved accuracy of lesion to symptom mapping with multivariate sparse canonical correlations. Neuropsychologia, 115, 154–166. 10.1016/j.neuropsychologia.2017.08.027 [DOI] [PubMed] [Google Scholar]
- Rae CL, Polyanska L, Gould van Praag CD, Parkinson J, Bouyagoub S, Nagai Y, Seth AK, Harrison NA, Garfinkel SN, & Critchley HD (2018). Face perception enhances insula and motor network reactivity in Tourette syndrome. Brain, 141(11), 3249–3261. 10.1093/brain/awy254 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rauschecker JP, & Scott SK (2009). Maps and streams in the auditory cortex: Nonhuman primates illuminate human speech processing. Nature Neuroscience, 12(6), 718–724. 10.1038/nn.2331 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reisberg D, McLean J, & Goldfield A (1987). Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli. In Hearing by eye: The psychology of lip-reading (pp. 97–113). Lawrence Erlbaum Associates, Inc. [Google Scholar]
- Rorden C, & Brett M (2000). Stereotaxic display of brain lesions. Behavioural Neurology, 12(4), 191–200. 10.1155/2000/421719 [DOI] [PubMed] [Google Scholar]
- Ross LA, Saint-Amour D, Leavitt VM, Javitt DC, & Foxe JJ (2007). Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments. Cerebral Cortex (New York, N.Y.: 1991), 17(5), 1147–1153. 10.1093/cercor/bhl024 [DOI] [PubMed] [Google Scholar]
- Schepers IM, Yoshor D, & Beauchamp MS (n.d.). Electrocorticography Reveals Enhanced Visual Cortex Responses to Visual Speech. 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schnur TT, Schwartz MF, Kimberg DY, Hirshorn E, Coslett HB, & Thompson-Schill SL (2009). Localizing interference during naming: Convergent neuroimaging and neuropsychological evidence for the function of Broca’s area. Proceedings of the National Academy of Sciences, 106(1), 322–327. 10.1073/pnas.0805874106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schwartz J-L, Berthommier F, & Savariaux C (2004). Seeing to hear better: Evidence for early audio-visual interactions in speech identification. Cognition, 93(2), B69–B78. 10.1016/j.cognition.2004.01.006 [DOI] [PubMed] [Google Scholar]
- Schwartz MF, Brecher AR, Whyte J, & Klein MG (2005). A Patient Registry for Cognitive Rehabilitation Research: A Strategy for Balancing Patients’ Privacy Rights With Researchers’ Need for Access. Archives of Physical Medicine and Rehabilitation, 86(9), 1807–1814. 10.1016/j.apmr.2005.03.009 [DOI] [PubMed] [Google Scholar]
- Schwartz MF, Faseyitan O, Kim J, & Coslett HB (2012). The dorsal stream contribution to phonological retrieval in object naming. Brain, 135(12), 3799–3814. 10.1093/brain/aws300 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sekiyama K, Kanno I, Miura S, & Sugita Y (2003). Auditory-visual speech perception examined by fMRI and PET. Neuroscience Research, 47(3), 277–287. 10.1016/S0168-0102(03)00214-1 [DOI] [PubMed] [Google Scholar]
- Shannon RV, Zeng FG, Kamath V, Wygonski J, & Ekelid M (1995). Speech recognition with primarily temporal cues. Science (New York, N.Y.), 270(5234), 303–304. [DOI] [PubMed] [Google Scholar]
- Skipper JI, Devlin JT, & Lametti DR (2017). The hearing ear is always found close to the speaking tongue: Review of the role of the motor system in speech perception. Brain and Language, 164, 77–105. 10.1016/j.bandl.2016.10.004 [DOI] [PubMed] [Google Scholar]
- Skipper JI, Nusbaum HC, & Small SL (2005). Listening to talking faces: Motor cortical activation during speech perception. NeuroImage, 25(1), 76–89. 10.1016/j.neuroimage.2004.11.006 [DOI] [PubMed] [Google Scholar]
- Skipper JI, van Wassenhove V, Nusbaum HC, & Small SL (2007). Hearing Lips and Seeing Voices: How Cortical Areas Supporting Speech Production Mediate Audiovisual Speech Perception. Cerebral Cortex, 17(10), 2387–2399. 10.1093/cercor/bhl147 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Snodgrass JG, & Vanderwart M (1980). A standardized set of 260 pictures: Norms for name agreement, image agreement, familiarity, and visual complexity. Journal of Experimental Psychology: Human Learning and Memory, 6(2), 174–215. 10.1037/0278-7393.6.2.174 [DOI] [PubMed] [Google Scholar]
- Stadthagen-Gonzalez H, Damian MF, Pérez MA, Bowers JS, & Marín J (2009). Name–picture verification as a control measure for object naming: A task analysis and norms for a large set of pictures. Quarterly Journal of Experimental Psychology, 62(8), 1581–1597. 10.1080/17470210802511139 [DOI] [PubMed] [Google Scholar]
- Stanislaw H, & Todorov N (1999). Calculation of signal detection theory measures. Behavior Research Methods, Instruments, & Computers, 31(1), 137–149. 10.3758/BF03207704 [DOI] [PubMed] [Google Scholar]
- Sumby WH, & Pollack I (1954). Visual Contribution to Speech Intelligibility in Noise. The Journal of the Acoustical Society of America, 26(2), 212–215. 10.1121/1.1907309 [DOI] [Google Scholar]
- Tye-Murray N, Sommers M, & Spehar B (2007). Auditory and Visual Lexical Neighborhoods in Audiovisual Speech Perception. Trends in Amplification, 11(4), 233–241. 10.1177/1084713807307409 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van Engen KJ, Xie Z, & Chandrasekaran B (2017). Audiovisual sentence recognition not predicted by susceptibility to the McGurk effect. Attention, Perception, & Psychophysics, 79(2), 396–403. 10.3758/s13414-016-1238-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Venezia JH, Fillmore P, Matchin W, Lisette Isenberg A, Hickok G, & Fridriksson J (2016). Perception drives production across sensory modalities: A network for sensorimotor integration of visual speech. NeuroImage, 126, 196–207. 10.1016/j.neuroimage.2015.11.038 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Venezia JH, Vaden KI, Rong F, Maddox D, Saberi K, & Hickok G (2017). Auditory, Visual and Audiovisual Speech Processing Streams in Superior Temporal Sulcus. Frontiers in Human Neuroscience, 11. 10.3389/fnhum.2017.00174 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vigliocco G, Krason A, Stoll H, Monti A, & Buxbaum L (2020). Multimodal Comprehension in Left Hemisphere Stroke Patients [Preprint]. PsyArXiv. 10.31234/osf.io/umgk3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Watkins KE, Strafella AP, & Paus T (2003). Seeing and hearing speech excites the motor system involved in speech production. Neuropsychologia, 41(8), 989–994. [DOI] [PubMed] [Google Scholar]
- Wickham H. (2009). ggplot2: Elegant graphics for data analysis / Hadley Wickham. Dordrecht. [Google Scholar]
- Wright TM, Pelphrey KA, Allison T, McKeown MJ, & McCarthy G (2003). Polysensory Interactions along Lateral Temporal Regions Evoked by Audiovisual Speech. Cerebral Cortex, 13(10), 1034–1043. 10.1093/cercor/13.10.1034 [DOI] [PubMed] [Google Scholar]
- Youse KM, Cienkowski KM, & Coelho CA (2004). Auditory-visual speech perception in an adult with aphasia. Brain Injury, 18(8), 825–834. 10.1080/02699000410001671784 [DOI] [PubMed] [Google Scholar]
- Zhang Y, Kimberg DY, Coslett HB, Schwartz MF, & Wang Z (2014). Multivariate lesion-symptom mapping using support vector regression. Human Brain Mapping, 35(12), 5861–5876. 10.1002/hbm.22590 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.






