Abstract
Listeners can use prior knowledge to predict the content of noisy speech signals, enhancing perception. However, this process can also elicit misperceptions. For the first time, we employed a prime–probe paradigm and transcranial magnetic stimulation to investigate causal roles for the left and right posterior superior temporal gyri (pSTG) in the perception and misperception of degraded speech. Listeners were presented with spectrotemporally degraded probe sentences preceded by a clear prime. To produce misperceptions, we created partially mismatched pseudo-sentence probes via homophonic nonword transformations (e.g. The little girl was excited to lose her first tooth—Tha fittle girmn wam expited du roos har derst cooth). Compared to a control site (vertex), inhibitory stimulation of the left pSTG selectively disrupted priming of real but not pseudo-sentences. Conversely, inhibitory stimulation of the right pSTG enhanced priming of misperceptions with pseudo-sentences, but did not influence perception of real sentences. These results indicate qualitatively different causal roles for the left and right pSTG in perceiving degraded speech, supporting bilateral models that propose engagement of the right pSTG in sublexical processing.
Keywords: transcranial magnetic stimulation, right hemisphere, speech perception, sublexical processing, degraded speech
Introduction
Speech is a continuous auditory signal that usually occurs in a noisy environment and varies between speakers, yet speech comprehension typically remains robust. It is well established that in order to decode degraded or distorted speech input, listeners utilize knowledge of their native language, yet specific decoding mechanisms remain a matter of debate. Mattys and colleagues (Mattys et al. 2005) proposed three categories of cues that contribute to speech segmentation: lexical (derived from auditory word form knowledge), segmental (acoustic-phonetics and phonotactics) and suprasegmental metrical prosody (stress). When all cues are available, preference is given to whole word form information. However, if there is poor lexical information in the input (e.g. speech is degraded), listeners tend to rely on sublexical segmental cues (for a review, see Mattys et al. 2012). Further, listeners use prior expectations (predictions) about upcoming speech input to aid perception in challenging listening conditions (Davis and Sohoglu 2020). The neurobiological mechanisms supporting perception and misperception of speech are the focus of the present study.
The auditory cortex in the superior temporal plane of humans is organized in terms of three major subdivisions comprising a primary tonotopic “core” in Heschl’s gyrus surrounded by secondary “belt” and tertiary “parabelt” regions (Rauschecker and Scott 2009). Early processing of intelligible speech sounds occurs in the anterior belt and parabelt regions along the lateral superior temporal gyrus and/or sulcus (STG/STS), with mid to posterior parabelt regions (including the planum temporale) important for speech comprehension (Hickok and Poeppel 2007). Whether speech perception relies on lateralized brain mechanisms remains a topic of debate. There is evidence for lateralization of auditory belt cortex with respect to the processing of acoustic information in the STG/STS, with temporal and spectral cues preferentially processed in the left and right hemispheres, respectively (e.g. Zatorre and Belin 2001). Although statistically robust, the effect sizes are relatively small. Most models of speech perception assume that processing of these prelexical spectrotemporal features occurs in parallel. However, some accounts propose left-hemisphere specialization for subsequent sublexical (segmental-level) and lexical–semantic processing stages (e.g. Rauschecker and Scott 2009), while others propose bilateral engagement (Hickok and Poeppel 2000, 2004, 2007). The latter also assume an essential role for the right hemisphere specifically in sublexical (segmental-level) processing (for a review, see Hickok 2012).
Neuroimaging studies have provided mixed evidence for bilateral involvement of the pSTG/pSTS during speech perception. Some studies reported strong left hemisphere dominance for processing of naturalistic speech contrasted with acoustic control conditions (Evans et al. 2016; McGettigan et al. 2012; Narain et al. 2003), while others observed bilateral involvement during processing of intelligible speech (Okada et al. 2010; Rosen et al. 2011; Vaden et al. 2010). Lesion–symptom mapping (LSM) studies have provided some evidence for bilateral engagement during speech perception. Studies in people with post-stroke aphasia with lesions in the left temporal cortex have reported deficits in sublexical (segmental level) as well as lexical processing (Durfee et al. 2021; Ghaleh et al. 2018), although segmental-level impairments in phoneme discrimination tend to largely resolve after the acute stage (e.g. Kim et al. 2019). Deficits in segmental level processing have also been observed with right temporal lobe lesions (Gajardo-Vidal et al. 2018; Rogalsky et al. 2022). In addition, disrupting the function of each hemisphere with the Wada procedure does not result in significant impairments in phoneme perception, indicating that the right hemisphere likely plays a role (e.g. Hickok et al. 2008). The inconsistent findings have been attributed to various paradigmatic differences and the correlational nature of imaging studies (for a review, see Evans and McGettigan 2017; Turkeltaub and Coslett 2010).
Neuromodulation studies employing transcranial magnetic stimulation (TMS) have also explored causal relationships between bilateral temporal lobe areas and various aspects of auditory word form processing in speech perception (Alba-Ferrara et al. 2012; Bestelmeyer et al. 2011; Krieger-Redwood et al. 2013; Luthra et al. 2023), as well as the involvement of homologous regions elsewhere in the brain (Andoh and Paus 2011; Hartwigsen et al. 2010). However, to our knowledge, only two have investigated hemispheric differences in segmental-level processing, reporting different results. For example, Kennedy-Higgins et al. (2020) demonstrated that online TMS applied to both left and right anterior STG/STS equally disrupted speech perception in noise, while Nunez et al. (2020) did not observe an effect of TMS inhibitory stimulation applied to either the left posterior or right anterior STG on nonword discrimination. Consequently, further work is needed to investigate causal relationships between bilateral posterior STG/STS regions and segmental-level processing mechanisms during speech perception.
One paradigm used to successfully investigate lexical and sublexical speech perception mechanisms is grounded in the “pop-out” phenomenon that occurs when a previously unintelligible speech stimulus suddenly becomes understandable (also referred to as the “Eureka effect”; Ahissar and Hochstein 2004). The paradigm involves presenting clear speech primes before distorted/degraded probe stimuli, resulting in an “immediate” subjective understanding of the latter when the content is matched (Davis et al. 2005). The priming effect has been interpreted as indicating that listeners are able to use previously acquired segmental-level information to predict upcoming speech when the latter signal is degraded (Blank et al. 2018; Davis et al. 2005; Hervais-Adelman et al. 2008; Sohoglu et al. 2014). This research has shown that misperceptions (i.e. prediction errors) of monosyllabic words can also occur when there is near-identical phonological/phonetic overlap between prime and probe in cohort or rhyme (e.g. pie vs. tie, lisp vs. list) positions (Blank et al. 2018; Sohoglu et al. 2014). Much of this evidence comes from studies employing noise-vocoded speech signals that were spectrally degraded while preserving low-frequency temporal information, permitting relatively intact phoneme recognition (e.g. Shannon et al. 1995). However, some recent studies using the prime–probe paradigm have reported similar perceptual enhancements with matching spectrotemporally degraded sentence stimuli (Holdgraf et al. 2016).
The neural mechanisms responsible for the perceptual enhancement remain underspecified. In an functional Magnetic Resonance Imaging (fMRI) study by Blank et al. (2018), noise-vocoded probe monosyllabic words were preceded by a written prime that was either matched, mismatched or partially mismatched, with participants instructed to make same/different judgments. In addition to the expected priming effect with matching stimuli, the partially mismatched pairs produced misperceptions (prediction errors) in listeners. Reduced activation in the left pSTS was associated with both perception and misperception of degraded speech (i.e. “same” decisions for matched or partially mismatched prime–probe pairs). However, trials on which mismatches were detected (i.e. “different” decisions) were associated with increased activity in the same region. These findings were interpreted by the authors as indicating neural representations in the pSTS signal prediction error, consistent with predictive coding accounts (e.g. Blank and Davis 2016; Cope et al. 2023; Davis and Sohoglu 2020; Sohoglu and Davis 2020). However, the vocoding procedure employed by Blank et al. selectively removed spectral cues while preserving low-frequency temporal information, which may have contributed to the left lateralized activity observed (e.g. Zatorre and Belin 2001). In addition, as Vitevitch (2003) showed, same/different judgments with similar words in speech perception tasks primarily rely on lexical (word form–level) mechanisms. Hence, it is possible that the misperceptions occurred due to a phonological code associated with a whole word form rather than a sublexical (segmental-level) one. More recently, Al-Zubaidi et al. (2022) investigated sublexical processing in an fMRI study by employing spectrotemporally degraded disyllabic nonword probes that were either matched or mismatched with nonword primes, with participants required to verbally report the probe. They observed increased activation bilaterally in the pSTG/STS for the matched condition.
Compared to single-word and nonword stimuli, speech perception of more naturalistic sentence stimuli places greater demands on sublexical processing via segmentation of the temporal order of phonemes and corresponding phonotactic probabilities, i.e. the combination of legal phonological segments and sequences of segments to form words according to a specific language (Vitevitch and Luce 2004). We leveraged TMS and the prime–probe paradigm to investigate causal roles for the right and left pSTG in producing pop-out effects in speech perception and misperception. Specifically, we employed spatiotemporally degraded mismatched and matched sentences and partially mismatched pseudo-sentence probes created via homophonic nonword transformations. We hypothesized that inhibitory TMS applied to the left pSTG would impair priming of both real and pseudo-sentence probes compared to the stimulation of a control region (Vertex), consistent with predictive coding accounts. We also hypothesized that TMS applied to the right pSTG would impair the priming effect with both real and pseudo-sentence probes, due to its proposed role in sublexical processing during speech perception.
Materials and methods
Participants
To investigate the effect of inhibitory TMS on the priming effect in speech perception, 21 healthy, right-handed native English speakers were recruited (16 female, mean age = 23, SD = 5.68 yr, range = 18–40). One participant was excluded due to incidental findings on their structural magnetic resonance imaging (MRI), and two participants withdrew. This resulted in a sample size of 18 (14 female, mean age = 22.60, SD = 5.65 yr, range = 18 to 40). We did not conduct a formal power analysis to determine the sample size. Rather, we based this sample size on previous TMS studies of speech perception in noise (e.g. 16 participants; Kennedy-Higgins et al. 2020) and phoneme discrimination (e.g. 14 participants; Krieger-Redwood et al. 2013) that targeted the pSTG/STS. We did not attempt to recruit equal numbers of male and female participants as studies of speech perception in noise (e.g. Talarico et al. 2007) and phoneme discrimination (e.g. Criel et al. 2023) have not revealed significant gender differences in performance. Participants were offered credit toward an undergraduate psychology course (if applicable) and AUD$30 after the MRI session plus an additional $30 at the completion of all three TMS sessions. All participants had normal or corrected-to-normal vision and an absence of any history of neurological or psychiatric disorders, hearing deficits, and contraindications to TMS and MRI. Consent was provided by all participants in accordance with the protocol approved by the Queensland University of Technology Human Research Ethics Committee and the Royal Brisbane Women’s Hospital Ethics Committee.
Design
This study compared the effect of active offline inhibitory TMS over left pSTG, right pSTG and Vertex. It entailed a 3 × 1 within-subjects factorial design, including the factors (i) rTMS target region (left pSTG, right pSTG, and Vertex acting as a control site) and (ii) active stimulation. Participants underwent three sessions, each involving inhibitory stimulation of one target region, with order of regions being counterbalanced across participants. Stimulation was applied in one block of active stimulation within each session. Stimulation sessions were conducted at least 1 wk apart to prevent any carryover effects of the rTMS protocol or practice effects with the stimuli. Participants reported any adverse effects (e.g. headache, tingling, itching, skin redness) using the questionnaire from Brunoni et al. (2011), ranging from 1 (absent) to 4 (severe).
Materials
All materials and data are publicly available via the Open Science Framework at https://osf.io/vkdj7/. We selected a set of 60 high cloze-probability real English sentences from two normative databases (Block and Baldwin 2010; Peelle et al. 2020) that were between 9 and 16 syllables long and consisted of both monosyllabic and polysyllabic words (8 to 13 words long). Personal names in the original sentences were changed to matching neutral nouns (men, women, etc.) or pronouns (she, he, they). The sentences were recorded by a female native speaker of Australian English in a soundproof booth using a Blue Yeti 3-Capsule USB microphone. The length of the recorded sentences varied between 2 and 4 s. All stimuli were normalized and edited for noise reduction in Audacity Version 2.4.2 (DC offset was removed, and the peak amplitude was normalized to −1.0 dB, which is just below the maximum amplitude [0 dB] possible without clipping).
To create pseudo-sentence probes, a subset of 30 real English sentences was selected from the original set of high cloze-probability sentences as a reference set. Generation of nonwords for the matching pseudo-sentences consisted of several steps. First, for each word in the real English sentence, the single-phone and biphone phonotactic probabilities were calculated using the Phonological Corpus Tools (PCT) (Hall et al. 2019). Next, each real word was entered into the pseudoword generator Wuggy (Keuleers and Brysbaert 2010) or checked in the ARC nonwords database (Rastle et al. 2002) to create a list of potential nonword candidates. Since the aim of this manipulation was to design phonotactically matched word–nonword pairs, a minimum two out of three subsyllabic segments of the word were set to be shared. All pseudoword candidates conformed to phonotactically legal constraints for English. Next, the phonotactic probabilities (single and biphone) were calculated using PCT software for each pseudoword candidate of the single word. The candidate with (i) the closest phonotactic probabilities and (ii) matching CVC structure was chosen as the best pseudoword pairing for the word in the real sentence. In total, 30 pseudo-sentences were created, and single-phone and biphone phonotactic probabilities were calculated on a word-by-word basis for each sentence pair. Single phone probabilities in pseudo-sentences (M = 0.049, SD = 0.017) were closely matched to the single phone probabilities in real sentences (M = 0.051, SD = 0.017) and did not differ according to both frequentist and Bayesian independent-samples t-tests: t(358) = 0.734, P = 0.463; BF10 = 0.151 (moderate evidence for H0). Similarly, biphone probabilities in pseudo-sentences (M = 0.005, SD = 0.004) were matched to the biphone probabilities in the real sentences (M = 0.006, SD = 0.005) and did not differ: t(358) = 1.143, P = 0.254; BF10 = 0.218 (moderate evidence for H0). Pseudo-sentences were recorded by the same female native Australian English speaker, who tried to preserve the rhythm and intonation of the original real sentence similar to prior studies (e.g. by Davis et al. 2005). Postprocessing of the recorded audio samples was identical to the real sentences. The durations in seconds of the resulting audio samples of the pseudo-sentences (M = 3.07, SD = 0.423) were between 2 and 4 s and did not differ significantly to the durations of the matched real sentences (M = 2.91, SD = 0.404): t(58) = −1.46, P = 0.151; BF10 = 0.635 (anecdotal evidence for H0).
All degraded sentences for both real and pseudo-sentence probes were filtered using the protocol described by Elliott and Theunissen (2009). Specifically, we applied a low-pass filter of spectral modulation with frequency filter 0.5 cycles/kHz and a low-pass filter of temporal modulations of 3 Hz in MATLAB (The MathWorks, Inc. 2019) via the code developed for the modulation transfer function (Elliott and Theunissen 2009), matching the approach used by Holdgraf et al. (2016). The result is a stimulus, lacking substantial power in the modulation power spectrum (MPS) both spectrally and temporally, that sounds like speech, yet is incomprehensible to the naïve listener as it drastically reduces phoneme recognition and removes acoustic prosodic features of the speech signal such as intonation and lexical tone.
Degraded real and pseudo-sentence probes with their clear matched real sentence versions were used in congruent and incongruent conditions. For the congruent condition, 20 degraded real sentence probes were used with matched clear real sentence primes. Similarly, 20 degraded pseudo-sentence probes were used with matched clear real sentence primes. For the incongruent condition, 10 degraded real sentence probes with different clear real sentence primes, and 10 degraded pseudo-sentence probes with different clear real sentence primes were used. All degraded real and pseudo-sentence probes in both congruent and incongruent conditions were presented only once in the testing session. Sample trials for all four conditions from experiment 5 can be found in Table 1.
Table 1.
Example of the different prime sentences with real and pseudo probe sentences in congruent and incongruent conditions.
| Condition | Prime sentence (clear) | Probe sentence (degraded) |
|---|---|---|
| Real Congruent | The teacher wrote the problem on the board. | The teacher wrote the problem on the board. |
| Real Incongruent | The hot shower filled the bathroom with steam. | For his date the man bought a long-stemmed rose. |
| Pseudo Congruent | The little girl was excited to lose her first tooth. | Tha fittle girmn wam expited du roos har derst cooth. |
| Pseudo Incongruent | She wore a colorful scarf around her neck. | Te junged imm tha lape ank maig e kig splach. |
Sixty sentence pairs in four conditions were randomized into 54 series using Mix Software (van Casteren and Davis 2006), with the stipulation that consecutive trials could not be a repetition of a condition more than twice in a row. Due to a transcription error, three participants were administered a list in a single session in which three consecutive trials were from one condition. The 54 sets were used to ensure that each participant was administered a pseudorandomized order of sentence pairs within and across sessions. In total, across all sessions, participants were administered three different pseudorandomized lists (one unique list per session) and heard each sentence pair only once within each session. Therefore, each pseudorandomized list comprised 60 trials per session.
Apparatus
Presentation of auditory stimuli and response recording were accomplished on a desktop PC with 26″ display using the Cogent 2000 toolbox extension (v1.32; http://www.vislab.ucl.ac.uk/cogent_2000.php) for MATLAB Software (The MathWorks 2019). A noise-canceling microphone (Neewer NW-7000) was used to record participant responses in digital audio format. Reaction latencies were determined using Chronset (https://www.bcbl.eu/databases/chronset), an automated tool for detecting speech onset available for MATLAB Software. To verify the obtained speech onsets, a Praat script for checking Chronset speech onset times was used (Van Scherpenberg et al. 2020).
Transcranial magnetic stimulation
As a prerequisite for neuronavigation, T1-weighted structural images were acquired using a 3T MAGNETOM Prisma (Siemens Heathineers, Erlangen, Germany) equipped with a 64-channel receive-only phased-array head coil. The images were acquired using a magnetization-prepared rapid acquisition gradient-echo (MPRAGE) sequence [256 × 256 matrix, resolution (0.9 mm)3, flip angle 8°, TI 900 ms, TR 1,800 ms, TE 2.35 ms]. Neuronavigated rTMS was performed using the visor2 software (https://www.ant-neuro.com/products/visor2), aided by the Polaris Vicra camera. Montreal Neurological Institute (MNI) coordinates were defined for each targeted region. The coordinates for the left pSTG (MNI x, y, z = −59.56, −30.53, 7.08) were selected from the TMS study by Krieger-Redwood et al. (2013). The right pSTG coordinates (MNI x, y, z = 65, −27, 15) were selected from the TMS study by Bueti et al. (2008). The coordinates for the control site in Vertex (MNI x, y, z = 0, 0, +90) were selected from Kennedy-Higgins et al. (2020). Projections of the electrical fields over all three regions (see Fig. 1) were simulated with simNIBS (version 4.0.1) (Thielscher et al. 2015). To ensure precise targeting of these regions in each participant, their T1 images were intensity nonuniformity–corrected, segmented and normalized to the MNI template using SPM12 (Wellcome Trust Centre for Neuroimaging, University College London, UK). The MNI coordinates of each region were then back-transformed to each participant’s native space.
Fig. 1.
Simulated electric field projections for a) left pSTG at MNI = −59.56, −30.53, 7.08; b) right pSTG at MNI = 65, −27, 15; and c) vertex at MNI = 0, 0, 90. MagnE = magnitude (or strength) of the electric field (V/m) obtained from simNIBS (Thielscher et al. 2015).
T1-weighted images for each participant were uploaded to the visor2 software, which was then used to specify MRI fiducials (nasion, left and right preauricular points), and anatomical markers (anterior and posterior commissure, interhemispheric point, anterior point, posterior point, superior point, inferior point, and left and right points). The head model was created using an automated function to segment brain and scalp compartments. This process was conducted only once, prior to the first session: The file was saved for all participants and used again for sessions two and three. For each session, the relevant target region was positioned on the brain model, corresponding to the MNI coordinates transformed to individual space. Following coil calibration, relevant head placement and shape points were determined. The coil was then guided and fixed into place over the target of interest using the camera and visual feedback on the visor2 system.
Inhibitory stimulation protocol
We delivered low-frequency offline TMS prior to the speech perception task, over either the left pSTG, right pSTG, or Vertex at a frequency of 1 Hz for a duration of 900 s from the start of the stimulation (900 pulses) at 90% of participants motor threshold (MT). A stimulation duration of 900 s was selected to accommodate the behavioral task since the effect of repetitive offline TMS has been shown to last no longer than 30 min after stimulation (Krieger-Redwood et al. 2013; Lambon Ralph et al. 2009; Pascual-Leone et al. 1998; Pobric et al. 2007). The offline rTMS protocol was delivered using a Magstim Super Rapid2 Plus1 stimulator and AirFilm Coil—Rapid Version. Coil was held in place using Brainsight Tracker Fixation adapters, which were attached to the Brainsight 4th Generation Subject Chair.
Procedure
Participants were seated comfortably in an upright position facing the display at an approximate distance of 100 cm. Prior to the start of the TMS protocol, participants underwent a familiarization phase in which they were provided with feedback to ensure correct understanding of the task requirements and a stable performance level. The practice consisted of six example trials from the congruent and incongruent conditions with real sentences (two each) and a written version of the degraded sentence as feedback. Practice trials followed the same structure as the test trials. On each trial, participants were instructed to clearly report any words that they recognized from the degraded sentence (Probe). If they were unable to understand any words, they were instructed to say “Don’t know.” Figure 2 shows the experimental trial presentation: A crosshair appeared in the middle of the screen for 250 ms followed by a schematic picture of a person displayed for 1,000 ms. Then, a “speaking bubble” appeared next to the picture indicating the start of the trial, and the auditory presentation of the prime–probe sequence started automatically at the same time. We employed an interstimulus interval (ISI) of 1,000 ms between the speech samples as previous work showed that ISIs between 500 and 1,500 ms are sufficient to produce the priming effect with these stimuli (e.g. Holdgraf et al. 2016). Following presentation, participants were asked to produce a vocal response. The maximum recording time was 5,000 ms, after which recording stopped automatically and the next trial started. During practice, participants saw the written version of the degraded sentence (correct response) after the response screen and were given feedback on their performance. During the experimental task, no feedback was given and the response screen was followed automatically by the start of a new trial.
Fig. 2.

Example trial structure in the experiment.
Figure 3 shows the schematic of the TMS session. Once the practice phase was completed, the TMS phase started with finding the resting motor threshold (MT) for each participant. For this study, MT was defined as the lowest stimulation intensity required to produce a minimum EMG of 50 μV in 50% of pulses (5 out of 10) in the abductor pollicis brevis (APB) muscle. Once the MT was established, using neuronavigated TMS, the coil was oriented and placed over the left pSTG, right pSTG, or Vertex, based on the subjects’ T1-weighted MRI. The coil was fixed in place using the fixation adapter, and the chair headrest and forehead restraint were used to keep the participants’ heads in a fixed position. To reduce noise from the discharging TMS, coil participants were supplied with earplugs. Stimulation has been applied at the intensity of the 90% of the individual’s MT for 15 min (900 pulses). Although all participants tolerated this intensity well, one participant asked to receive a lowered stimulation of 86% over all regions across three TMS sessions. Since a minimum stimulation intensity of 60% of MT is required for the target region to be adequately stimulated, no participants were excluded on this basis. The coil remained fixed in the same position during the offline rTMS stimulation. After the stimulation was finished, the participants were allowed a short 1–2 min break while the coil was removed from the chair and the speech task was set ready on the experimental computer. Once the participant was ready, and the headphones and the microphone were in place, the experimental task began. Participants underwent 60 auditory trials that in total lasted about 15 min from the end of the stimulation.
Fig. 3.

Schematic of the TMS session: Each of the three TMS sessions started with a short practice task followed by preparation that included setting up navigated TMS (all sessions) and resting motor threshold search using single-pulse TMS (first session only). Next, active rTMS was applied for a duration of 15 min (1 Hz for 900 pulses) while the participant was resting. Following inhibitory stimulation, the participant was released from the TMS set-up, and positioned for the experimental task with headphones and microphone. This was accomplished within 2 min. Finally, the experimental task was performed for approximately 13 min. T = time, rTMS = repetitive transcranial magnetic stimulation.
Analyses
Participants’ vocal responses were transcribed and scored against the probe’s matched real-sentence counterpart. The percentage of words in each sentence that were reported correctly was established following Davis et al.’s criteria (Davis et al. 2005). Words were scored as correct if reported identically and in the same serial position, even if intervening incorrect words were reported. All morphological variants were scored as incorrect. We accepted a sole report of an article (i.e. only “a” or “the”) for an entire sentence as correct. However, if there were two or more correct words (including articles), the rule of word order applied. To ensure inter-rater reliability, 30% of the participants’ responses were randomly sampled, transcribed, and scored by an independent native speaker of Australian English using the same criteria above. The intraclass correlation coefficient was calculated using R version 4.2.1 (R Core Team 2021). Percent correct words per sentence was analyzed using Jamovi (The jamovi project 2022).
The accuracy responses from all 3,240 trials were analyzed with generalized linear mixed effects modelling (GLMM) using Jamovi (Version 2.2.5) with the GAMLj module (Galluci and Love 2018). We used a negative binomial distribution and logarithm link function to reproduce the distribution of our accuracy data. Condition, Region, and Session were included in the model as fixed effects. (We selected a negative binomial rather than Poisson distribution given the fixed number of trials and overdispersion (i.e. variances greater than means) of the accuracy data from the incongruent conditions with both sentence types. We included session as a fixed effect given the three sessions corresponded to our three stimulation sites and our inferences were restricted to them.) Participants and Items were included in the model as random factors. Session was included in the model since previous studies on degraded speech report a strong perceptual learning effect that occurs as early as after 30 presentations (e.g. Davis et al. 2005). We followed the procedure recommended by Matuschek and colleagues (Matuschek et al. 2017) to achieve optimal power without inflating the Type I error rate by selecting a random effect structure supported by the data. We started with a model with a maximal random effects structure: All fixed effects and their interactions were included as random intercepts and slopes for both participants and items. Upon nonconvergence or singular fit, the model was decreased in complexity by reducing the random effect structure until a fit occurred. We selected a “best fit” model as a model with lowest absolute Bayesian information criterion (BIC; Schwarz 1978) value given the confirmatory nature of our hypotheses (Aho et al. 2014). An alpha level of 0.05 was used for all main effects.
Following previous TMS studies of speech perception, reaction time (RT) for correct trials was adopted as the primary dependent variable (cf. Krieger-Redwood et al. 2013; Lambon Ralph et al. 2009; Pascual-Leone et al. 2000; Pobric et al. 2010; Pobric et al. 2007). The RTs for the correct sentences in the congruent conditions were screened for outliers using R (R Core Team 2021) and the package “trimr” (Grange 2015). We used a Gamma distribution and an identity link function as recommended by Lo and Andrews (2015). The model included the fixed effect of region (left pSTG, right pSTG, Vertex), condition (RealMatch, PseudoMatch), and session (first, second, third). Participants and items were included in the model as random factors. The process of choosing the best model was identical to the accuracy analysis.
Results
Accuracy analyses
All data and analysis scripts are publicly available at: https://osf.io/vkdj7/. Participants’ vocal responses were transcribed and scored for the percentage of words in each real and pseudo-sentence probe that were reported correctly based on their real counterpart following the approach of previous studies (e.g. Davis et al. 2005; Holdgraf et al. 2016). The ICC between the two scorers was r = 0.998 (mean diff = −0.101; SE = 1.90; P = 0.958). On average across the three stimulated regions, participants recognized 97% (SD = 11.0) of filtered words in the congruent condition with real sentence probes and 14% (SD = 21.8) in the incongruent condition. With pseudo-sentence probes, on average participants misperceived 79% (SD = 32.4) of the degraded speech in the congruent condition as prime words and 3% (SD = 7.17) in the incongruent condition. We also examined participants’ responses for any pseudo-words that might have been reported from the degraded probe sentences, despite the instruction to report words. There was only a single instance of one participant reporting a pseudoword in one sentence from the congruent condition.
Analysis of all conditions resulted in a best fit model including both by-participant and by-item random intercepts allowing for correlation. Table 2 presents the significant results from the GLMM (full model output is presented in Table S1 in Supplementary materials). The main effects of Condition [χ2(3) = 357.12, P < 0.001] and Session [χ2(2) = 53.70, P < 0.001], and interactions of Condition*Region [χ2(6) = 15.65, P = 0.016], Condition*Session [χ2(6) = 45.92, P < 0.001], and Condition*Region*Session [χ2(12) = 39.15, P < 0.001] were significant. The main effects of Region [χ2(2) = 5.20, P = 0.074] and interaction of Region*Session [χ2(4) = 6.95, P = 0.139] were not significant.
Table 2.
GLMM estimates for accuracy as a function of condition (PseudoMatch = degraded pseudo-sentence probe with clear matched real sentence, PseudoMismatch = degraded pseudo-sentence probe with clear mismatched real sentence, RealMatch = degraded real English sentence probe with clear matched real sentence, RealMismatch = degraded real English sentence probe with clear mismatched real sentence), region (vertex, left pSTG = left posterior superior temporal gyrus, right pSTG = right posterior superior temporal gyrus) and session (1, 2, 3). Only statistically significant contrasts are included. Full model is presented in Table S1 in Supplementary materials.
| Names | Effect | Estimate | SE | exp(B) | z | p |
|---|---|---|---|---|---|---|
| (Intercept) | (Intercept) | 2.72565 | 0.1032 | 15.2663 | 26.4209 | < 0.001 |
| Condition1 | PseudoMismatch—PseudoMatch | −4.39390 | 0.2778 | 0.0124 | −15.8188 | < 0.001 |
| Condition3 | RealMismatch—PseudoMatch | −2.34101 | 0.2732 | 0.0962 | −8.5680 | < 0.001 |
| Region2 | rightPSTG—Vertex | −0.10476 | 0.0466 | 0.9005 | −2.2492 | 0.024 |
| Session1 | 2–1 | 0.28997 | 0.0473 | 1.3364 | 6.1267 | < 0.001 |
| Session2 | 3–1 | 0.31371 | 0.0469 | 1.3685 | 6.6938 | < 0.001 |
| Condition3 * Region1 | RealMismatch—PseudoMatch * leftPSTG—Vertex | −0.28704 | 0.1121 | 0.7505 | −2.5611 | 0.010 |
| Condition1 * Region2 | PseudoMismatch—PseudoMatch * rightPSTG—Vertex | −0.32575 | 0.1493 | 0.7220 | −2.1819 | 0.029 |
| Condition1 * Session1 | PseudoMismatch—PseudoMatch * 2–1 | 0.36501 | 0.1514 | 1.4405 | 2.4106 | 0.016 |
| Condition3 * Session1 | RealMismatch—PseudoMatch * 2–1 | 0.47638 | 0.1139 | 1.6102 | 4.1832 | < 0.001 |
| Condition1 * Session2 | PseudoMismatch—PseudoMatch * 3–1 | 0.33088 | 0.1500 | 1.3922 | 2.2058 | 0.027 |
| Condition3 * Session2 | RealMismatch—PseudoMatch * 3–1 | 0.60045 | 0.1129 | 1.8229 | 5.3185 | < 0.001 |
| Region2 * Session1 | rightPSTG—Vertex * 2–1 | 0.30544 | 0.1402 | 1.3572 | 2.1786 | 0.029 |
| Condition1 * Region2 * Session1 | PseudoMismatch—PseudoMatch * rightPSTG—Vertex * 2–1 | 1.37645 | 0.3786 | 3.9608 | 3.6359 | < 0.001 |
| Condition3 * Region2 * Session1 | RealMismatch—PseudoMatch * rightPSTG—Vertex * 2–1 | 0.69564 | 0.2810 | 2.0050 | 2.4755 | 0.013 |
| Condition1 * Region2 * Session2 | PseudoMismatch—PseudoMatch * rightPSTG—Vertex * 3–1 | 0.96562 | 0.3722 | 2.6264 | 2.5945 | 0.009 |
Post hoc tests with Bonferroni–Holm family-wise error (FWE) correction for the main effect of Condition showed accuracy for real sentence probes was significantly higher in the congruent than incongruent condition [exp(B) = 13.09624, SE = 3.57866, z = 9.41, P < 0.001], confirming that the matched lexical (word form) and sublexical (segmental-level) information across the prime–probe pairs enhanced perception. Crucially, accuracy for pseudo sentence probes was also significantly higher in the congruent than incongruent condition [exp(B) = 80.95564, SE = 22.48655, z = 15.82, P < 0.001], demonstrating that the partially matching segmental level information across the prime and nonword probe pairs resulted in participants misperceiving the prime sentences’ lexical content. For the main effect of Session, the post hoc test showed there was a significant difference in accuracy between sessions 1 and 2 [exp(B) = 0.748, SE = 0.0354, z = −6.127, P < 0.001] and between sessions 1 and 3 [exp(B) = 0.731, SE = 0.0342, z = −6.694, P < 0.001], indicating that participants benefited from the repeated exposure to the prime–probe pairings. There was no significant difference in accuracy between sessions 2 and 3 [exp(B) = 0.977, SE = 0.0441, z = −0.526, P = 0.599].
To compare the priming effects for real and pseudo-sentence probes (i.e. misperceptions), we conducted an additional analysis for congruent real and pseudo-sentence probes only (n = 2,160 trials) following the same approach. The best fit model included both by-participant and by-item random intercepts allowing for correlation. Table 3 presents the overall results from the GLMM including only statistically significant contrasts (full model structure can be found in Table S2 in Supplementary Materials). The main effects of Condition (χ2(1) = 25.822, P < 0.001), Session (χ2(2) = 13.260, P = 0.001), and Condition*Region*Session (χ2(4) = 34.157, P < 0.001) were significant. The main effects of Region (χ2(2) = 2.327, P = 0.312), Condition*Region (χ2(2) = 0.325, P = 0.850), Condition*Session (χ2(2) = 2.452, P = 0.293), and Region*Session (χ2(4) = 5.486, P = 0.241) were not significant.
Table 3.
GLMM estimates for accuracy as a function of condition (PseudoMatch = degraded pseudo-sentence probe with clear matched real sentence, RealMatch = degraded real English sentence probe with clear matched real sentence), region (vertex, left pSTG = left posterior superior temporal gyrus, right pSTG = right posterior superior temporal gyrus) and session (1, 2, 3) in the congruent conditions only. Statistically significant contrasts are included. Full model can be found in Table S2 in supplementary materials.
| Names | Effect | Estimate | SE | exp(B) | z | p |
|---|---|---|---|---|---|---|
| (Intercept) | (Intercept) | 4.45148 | 0.0452 | 85.754 | 98.4795 | < 0.001 |
| Condition1 | RealMatch—PseudoMatch | 0.25405 | 0.0500 | 1.289 | 5.0815 | < 0.001 |
| Session1 | 2–1 | 0.06946 | 0.0220 | 1.072 | 3.1590 | 0.002 |
| Session2 | 3–1 | 0.06933 | 0.0220 | 1.072 | 3.1518 | 0.002 |
| Condition1 * Region2 * Session1 | RealMatch—PseudoMatch * rightPSTG—Vertex * 2–1 | 0.30620 | 0.1078 | 1.358 | 2.8395 | 0.005 |
| Condition1 * Region1 * Session2 | RealMatch—PseudoMatch * leftPSTG—Vertex * 3–1 | −0.40309 | 0.1084 | 0.668 | −3.7186 | < 0.001 |
| Condition1 * Region2 * Session2 | RealMatch—PseudoMatch * rightPSTG—Vertex * 3–1 | −0.23064 | 0.1078 | 0.794 | −2.1395 | 0.032 |
Post hoc tests for the main effect of Condition with Bonferroni–Holm FWE correction showed accuracy for congruent pseudo-sentence probes was significantly lower than for the congruent real-sentence probes [exp(B) = 0.776, SE = 0.0388, z = −5.08, P < 0.001], indicating participants benefited from the additional lexical level (i.e. word form) information in the real sentence prime-probe pairs (MeanDiff = 18%). For the main effect of Session, the post-hoc test showed there was a significant difference in accuracy scores between sessions 1 and 2 [exp(B) = 0.933, SE = 0.0205, z = −3.15897, P = 0.005] and between sessions 1 and 3 [exp(B) = 0.933, SE = 0.0205, z = −3.15179, P = 0.005], again indicating that participants benefited from the repeated exposure to the prime–probe pairings. There was no significant difference in accuracy scores between sessions 2 and 3 [exp(B) = 1.000, SE = 0.0220, z = 0.00593, P = 0.995].
Reaction time analyses
Trials involving any accuracy errors in the congruent conditions were removed (557 trials out of 2,160, 25.7% of the data). Outliers that were 2 SD above and below the subject and item means or below 200 ms were excluded from analysis (42 trials removed, 2.62% of the data). A total of 1,561 (72.2%) trials were available for further analysis. Table 4 presents a summary of the reaction time results.
Table 4.
Estimated marginal mean response latencies in milliseconds (ms) and standard error (SE) as a function of condition (PseudoMatch = degraded pseudo-sentence probe with clear matched real sentence, RealMatch = degraded real English sentence probe with clear matched real sentence), region (vertex, left pSTG = left posterior superior temporal gyrus, right pSTG = right posterior superior temporal gyrus) and session (1, 2, 3).
| Condition | Region | Session | Mean (ms) | SE |
|---|---|---|---|---|
| PseudoMatch | Vertex | 1 | 844 | 23.8 |
| RealMatch | Vertex | 1 | 660 | 16.1 |
| PseudoMatch | leftPSTG | 1 | 832 | 21.9 |
| RealMatch | leftPSTG | 1 | 738 | 16.9 |
| PseudoMatch | rightPSTG | 1 | 815 | 22.5 |
| RealMatch | rightPSTG | 1 | 711 | 31.9 |
| PseudoMatch | Vertex | 2 | 758 | 24.4 |
| RealMatch | Vertex | 2 | 670 | 19.1 |
| PseudoMatch | leftPSTG | 2 | 783 | 27.7 |
| RealMatch | leftPSTG | 2 | 674 | 21.8 |
| PseudoMatch | rightPSTG | 2 | 731 | 22.6 |
| RealMatch | rightPSTG | 2 | 677 | 21.3 |
| PseudoMatch | Vertex | 3 | 764 | 25.2 |
| RealMatch | Vertex | 3 | 683 | 23.9 |
| PseudoMatch | leftPSTG | 3 | 802 | 23.9 |
| RealMatch | leftPSTG | 3 | 690 | 18.4 |
| PseudoMatch | rightPSTG | 3 | 694 | 21.4 |
| RealMatch | rightPSTG | 3 | 621 | 18.1 |
Figure 4 presents the distribution of reaction time data according to condition, region, and session. The best fit model included both by-participant and by-item random intercepts and a condition by-item random slope allowing for correlation. Table 5 presents the overall results from the GLMM including only statistically significant contrasts (full model structure can be found in Table S3 in Supplementary materials). The main effects of Condition [χ2(1) = 61.64, P < 0.001], Region [χ2(2) = 20.41, P < 0.001], Session [χ2(2) = 58.39, P < 0.001], Condition*Region [χ2(2) = 6.79, P = 0.033], Condition*Session [χ2(2) = 16.69, P < 0.001], Region*Session [χ2(4) = 15.36, P = 0.004], and Condition*Region*Session [χ2(4) = 149.62, P < 0.001] were significant.
Fig. 4.

Distribution of response latencies (RT) according to region of inhibitory stimulation—Vertex, left PSTG (left posterior superior temporal gyrus), rightPSTG (right posterior superior temporal gyrus), session (first, second, third) and degraded probe type—PseudoMatch (pseudo-sentence), RealMatch (real English sentence).
Table 5.
GLMM estimates for response latencies as a function of condition (PseudoMatch = degraded pseudo-sentence probe with clear matched real sentence, RealMatch = degraded real English sentence probe with clear matched real sentence), region (vertex, left pSTG = left posterior superior temporal gyrus, right pSTG = right posterior superior temporal gyrus) and session (1, 2, 3). Only statistically significant contrasts are included. Full model can be found in Table S3 in Supplementary materials.
| Names | Effect | Estimate | SE | exp(B) | z | p |
|---|---|---|---|---|---|---|
| (Intercept) | (Intercept) | 730.4 | 16.50 | Inf | 44.272 | < 0.001 |
| Region1 | leftPSTG—Vertex | 23.6 | 9.59 | 1.73e+10 | 2.457 | 0.014 |
| Region2 | rightPSTG—Vertex | −21.3 | 9.00 | 5.63e-10 | −2.367 | 0.018 |
| Condition1 | RealMatch—PseudoMatch | −99.8 | 12.71 | 4.69e-44 | −7.851 | < 0.001 |
| Session1 | 2–1 | −51.4 | 9.34 | 4.61e-23 | −5.508 | < 0.001 |
| Session2 | 3–1 | −57.8 | 8.16 | 8.28e-26 | −7.076 | < 0.001 |
| Region2 * Condition1 | rightPSTG—Vertex * RealMatch—PseudoMatch | 40.5 | 17.29 | 3.96e+17 | 2.343 | 0.019 |
| Region2 * Session2 | rightPSTG—Vertex * 3–1 | −77.2 | 31.60 | 3.02e-34 | −2.443 | 0.015 |
| Condition1 * Session1 | RealMatch—PseudoMatch * 2–1 | 43.1 | 13.10 | 5.33e+18 | 3.292 | < 0.001 |
| Condition1 * Session2 | RealMatch—PseudoMatch * 3–1 | 38.5 | 11.83 | 5.13e+16 | 3.253 | 0.001 |
| Region1 * Condition1 * Session1 | leftPSTG—Vertex * RealMatch—PseudoMatch * 2–1 | −111.1 | 11.12 | 5.53e-49 | −9.989 | < 0.001 |
| Region2 * Condition1 * Session1 | rightPSTG—Vertex * RealMatch—PseudoMatch * 2–1 | −45.5 | 12.87 | 1.77e-20 | −3.535 | < 0.001 |
| Region1 * Condition1 * Session2 | leftPSTG—Vertex * RealMatch—PseudoMatch * 3–1 | −119.1 | 19.42 | 1.80e-52 | −6.134 | < 0.001 |
Post hoc tests with Bonferroni–Holm FWE correction for the main effect of Condition showed response latencies for pseudo-sentence probes were significantly slower compared to the real sentence probes (diff = 99.8, SE = 12.7, z = 7.85, P < 0.001). Post hoc comparisons for the significant main effect of Region revealed slower response latencies after inhibitory stimulation of left pSTG compared to the right pSTG (diff = 44.9, SE = 9.94, z = 4.51, P < 0.001). Surprisingly, faster response latencies were observed after inhibitory stimulation of the right pSTG compared to Vertex (diff = −21.3, SE = 9.00, z = 2.37, P = 0.028). As expected, slower response latencies were observed after inhibitory stimulation of left pSTG compared to Vertex (diff = 23.6, SE = 9.59, z = −2.46, P = 0.028). Post hoc comparisons for the significant main effect of Session revealed slower response latencies during session 1 compared to session 2 (diff = 51.43, SE = 9.34, z = 5.508, P < 0.001), and compared to session 3 (diff = 57.75, SE = 8.16, z = 7.076, P < 0.001), indicating that processing benefited from the repeated exposure to the prime–probe pairings. There was no difference in response latencies between sessions 2 and 3 (diff = 6.32, SE = 9.59, z = 0.659, P = 0.510).
Post-hoc comparisons for the significant interaction of Region*Condition showed that for matched pseudo-sentence probes, response latencies were slower after inhibitory stimulation of left pSTG compared to right pSTG (diff = 58.73, SE = 15.44, z = 3.804, P < 0.001). Faster response latencies for pseudo-sentences were observed after inhibitory stimulation of the right pSTG compared to Vertex (diff = −41.56, SE = 14.77, z = 2.814, P = 0.024), confirming a role for this region in generating misperceptions (i.e. prediction errors) from the partially matching segmental-level information in the prime–probe pairs. However, the direction of this effect differed to that predicted (facilitation vs. disruption). Contrary to predictions, there was no significant difference for pseudo-sentences after inhibitory stimulation of Vertex compared to left pSTG (diff = −17.18, SE = 11.65, z = −1.475, P = 0.281), indicating that disrupting left pSTG did not influence misperceptions with these pairings. For real sentence probes, we observed slower response latencies after inhibitory stimulation of the left pSTG compared to Vertex (diff = 29.97, SE = 10.64, z = −2.816, P = 0.024) as expected, confirming a role for this region in perceptual enhancement produced by matching lexical (word form) and sublexical (segmental-level) information in prime–probe pairs (i.e. correct predictions). Surprisingly, there was no significant difference in response latencies for real sentence probes after inhibitory stimulation of Vertex compared to right pSTG (diff = 1.04, SE = 9.66, z = 0.107, P = 0.914), indicating disrupting the latter region did not influence the perceptual enhancement with these stimuli. The trend to slower response latencies after stimulation of left pSTG compared to right pSTG was not significant (diff = 31.01, SE = 13.23, z = 2.344, P = 0.057). Based on the region of inhibitory stimulation, we observed significantly slower response latencies for pseudo-sentence probes compared to real sentence probes after inhibitory stimulation of right pSTG (diff = 77.02, SE = 15.89, z = 4.849, P < 0.001), left pSTG (diff = 104.74, SE = 16.82, z = 6.227, P < 0.001), and Vertex (diff = 117.54, SE = 15.36, z = 7.653, P < 0.001).
Self-report data
Mild sensations were reported by participants after TMS was applied (between 1 = absent and 3 = moderate; Brunoni et al. 2011). Table 6 shows the mean pre- and post-stimulation rating differences based on the region of stimulation. There were no significant differences between the mean sensation ratings when compared to the stimulation control region (Vertex).
Table 6.
Mean differences in pre- and poststimulation ratings for adverse effects of TMS in speech perception task. Left pSTG = left posterior superior temporal gyrus, right pSTG = right posterior superior temporal gyrus.
| Sensations | Left pSTG | Right pSTG | Vertex | |||
|---|---|---|---|---|---|---|
| Mean diff | SD | Mean diff | SD | Mean diff | SD | |
| Headache | 0.11 | 0.47 | 0.06 | 0.42 | 0 | 0 |
| Neck pain | 0.06 | 0.24 | 0 | 0 | 0.06 | 0.24 |
| Scalp pain | 0 | 0 | 0 | 0 | 0.06 | 0.24 |
| Tingling | 0 | 0 | 0.11 | 0.32 | 0.11 | 0.32 |
| Itching | 0.11 | 0.32 | 0.06 | 0.42 | 0.17 | 0.38 |
| Burning sensation | 0 | 0 | 0 | 0 | 0.06 | 0.24 |
| Skin redness | 0.11 | 0.32 | 0 | 0 | 0 | 0 |
| Sleepiness | 0.06 | 0.64 | 0.11 | 0.32 | 0.16 | 0.51 |
| Trouble concentrating | 0.06 | 0.42 | 0.17 | 0.51 | 0.06 | 0.24 |
| Ear ringing | 0 | 0 | 0.11 | 0.32 | 0 | 0 |
Discussion
We investigated causal roles for the left and right pSTG in the priming effect in speech perception using spectrotemporally degraded real English sentences and pseudo-sentences with nonwords matched for sublexical properties designed to elicit misperceptions. Specifically, we hypothesized that inhibitory stimulation of the right pSTG and left pSTG would disrupt the perceptual enhancement of degraded speech compared to a control site (vertex) with both real and pseudo-sentence probes as determined by response latencies. Our findings only partially supported these predictions. Inhibitory stimulation to left pSTG disrupted response latencies for real sentence probes. However, it did not significantly affect response latencies for pseudo-sentence probes. Right pSTG inhibitory stimulation did not differentially affect pop-out with real sentence probes. However, it paradoxically facilitated misperceptions of pseudo-sentences.
Consistent with previous reports, we found a perceptual enhancement effect with spectrotemporally degraded probe sentences when they were preceded by a clear prime matched in content compared to mismatched content (Al-Zubaidi et al. 2022; Blank et al. 2018; Holdgraf et al. 2016; Sohoglu et al. 2014). More importantly, we also found misperceptions for pseudo-sentence probes. Listeners misperceived a substantial proportion (~79%) of the lexical information from the prime sentences in the degraded pseudo-sentences, with the identical sentence-level meaning, lexical, grammatical and acoustic–phonetic information contained in the real sentence probes conferring a relatively smaller (~18%) gain in accuracy. These prediction errors are strong evidence that processing of sublexical (segmental-level) information is sufficient to reconstruct (predict) context relevant word form information in the absence of any lexical-level input, supporting the notion that they are responsible for much of the priming effect with these stimuli. These results extend earlier findings of misperceptions in spoken word recognition with spectrally noise-vocoded speech (e.g. Sohoglu et al. 2014; see also Blank et al. 2018) to spoken sentences. Unlike previous studies that employed spectral noise–vocoded words, the spectrotemporal degradation technique produced a more drastic reduction in phonemic and prosodic information in the acoustic–phonetic signal that also eliminated potential hemispheric differences in processing of spectral vs. temporal acoustic features (Elliott and Theunissen 2009).
Our TMS results provide novel evidence for left pSTG causal involvement in the priming effect. However, our hypothesis that the left pSTG would show involvement with both real and pseudo-sentence probes was only partially confirmed. As expected, we found that inhibitory TMS stimulation to the left pSTG produced slower response latencies with degraded real sentences, confirming its involvement in lexical-level speech processing (Evans and McGettigan 2017; Evans et al. 2016; McGettigan et al. 2012; Narain et al. 2003; Scott et al. 2000; Scott and Johnsrude 2003; Scott and McGettigan 2013). Surprisingly, inhibitory TMS to left pSTG did not disrupt priming of pseudo-sentence probes. This was an unexpected result as prior neuroimaging and electrophysiological studies indicated a role for the left pSTG in sublexical processing (Ghaleh et al. 2018; Leonard et al. 2016; Leonard et al. 2015; Mesgarani et al. 2014; Rogalsky et al. 2022; Turkeltaub and Coslett 2010; Yi et al. 2019). We will return to this discrepancy below.
Inhibitory TMS over the right pSTG did not produce a priming effect with real sentence probes as expected. However, we observed faster response latencies for priming of pseudo-sentence probes. To our knowledge, this is a novel finding that provides the first TMS evidence for a causal role for the right pSTG in sublexical processing, consistent with the predictions of bilateral accounts (e.g. Hickok 2012). Yet, the direction of this finding was unexpected as low-frequency (1 Hz) rTMS is generally used to produce an inhibitory effect that slows behavioral responses, although facilitatory effects have also been reported (Pascual-Leone et al. 1998; see also Luber and Lisanby 2014). Previous TMS studies investigating the involvement of superior temporal cortices in speech perception reported delayed RTs compared to the active stimulation of a control region (e.g. Krieger-Redwood et al. 2013). Krieger-Redwood et al. (2013) applied rTMS at 1 Hz for 600 ms at 120% of motor threshold rather than 1 Hz for 900 ms at 90% of motor threshold as in the present study, and their tasks entailed a simple button press response. As motor threshold parameters ranging from 80% to 120% are widely used for low-frequency rTMS protocols (Turi et al. 2021), it is possible the opposite polarity we observed for our right pSTG inhibitory stimulation reflects the difference in manual versus verbal response modalities employed across studies. Alternatively, as Luber and Lisanby (2014) noted, TMS facilitation effects might be due to either direct modulation of a brain area or rather a network comprised of interacting brain regions, leading to more efficient processing or to disruption of a processing mechanism that slows or intervenes in task performance. We consider both of these potential explanations in relation to sublexical processing below.
We selected our stimulation target areas in the left and right pSTG based on prior TMS studies (Bueti et al. 2008; Krieger-Redwood et al. 2013). These regions were not precise homologues, with the right hemisphere target approximately 7 mm more dorsal than the left. Hence, it is possible this discrepancy contributed to the differential effects we observed across the hemispheres and priming conditions. However, we think this unlikely for two reasons. Firstly, the macro-anatomical structure of the superior temporal plane varies substantially across hemispheres (Gulban et al. 2020). Hence, there is no evidence to support a precise homology for targeting with TMS. Secondly, it is well known that inhibitory TMS effects extend beyond the focal targeted area (Hartwigsen and Sylvanto 2023), and our electric field simulations showed that the effect of inhibitory stimulation encompassed the pSTG in both hemispheres.
Overall, our findings support the proposal that speech perception is bilaterally organized in superior temporal cortices (Hickok and Poeppel 2007; Leonard et al. 2016; Okada et al. 2010; Price 2012; Turkeltaub and Coslett 2010). The inhibitory TMS disruption to the priming of real sentences we observed supports previous studies’ attribution of a preferential role for left pSTG in lexical-level processing. Although the null effect of inhibitory stimulation with pseudo-sentences does not appear consistent with evidence from LSM studies demonstrating left pSTG involvement in sublexical (segmental-level) processing (e.g. Rogalsky et al. 2022), it might nonetheless be reconciled with the bilateral account if it is assumed that the right pSTG in healthy individuals is able to compensate for segmental-level processing after TMS disruption to the left pSTG (Hickok and Poeppel 2007). Instead, the impairments observed after left temporal lobe lesions might be due to more extensive damage to connecting interhemispheric fiber tracts such as the tapetum, preventing bilateral transfer of segmental-level information that would enable prediction of lexico-semantic content (e.g. Turken and Dronkers 2011). Yet, there is evidence suggesting that in the long term, the right pSTG is implicated in successful auditory comprehension rehabilitation in chronic poststroke aphasia (Fleming et al. 2020).
As we noted above, the paradoxical facilitation effect we observed with inhibitory right pSTG stimulation could have two potential explanations. For example, it could be that the right hemisphere simply processes segmental-level information more slowly compared to the left hemisphere as it is less specialized for language, thus being the rate-limiting node in the bilateral network. Inhibitory TMS applied to the right pSTG would therefore permit the left pSTG to complete the task more quickly on its own. Alternatively, inhibitory TMS applied to right pSTG might have impaired a mechanism engaged at a later processing stage, such as monitoring/verification of segmental-level phonotactic structure (e.g. Vitevitch 2003), that slows or intervenes in task performance. The latter explanation receives some support from a recent magnetoencephalography (MEG) study of narrative perception showing that listeners employ local and unified predictive models in parallel (Brodbeck et al. 2022). In that study, the right STG demonstrated a preference for local context processing of sublexical phoneme sequences (i.e. phonotactic probabilities) according to surprisal and entropy measures. It also receives some support from speech production models implicating right pSTG/pSTS in integrating auditory expectations with auditory input and/or involvement in vocal motor control (see Liu et al. 2023; Yamamoto et al. 2019).
Future directions
Future studies might consider employing an online high-frequency TMS design (e.g. Bestelmeyer et al. 2011; Kennedy-Higgins et al. 2020) to provide more insight into the timing of both regions’ responses to stimulation and nature of the processes involved. To confirm the bilateral nature of sublexical processing and control for the possibility of interhemispheric compensatory effects, bilateral inhibitory TMS could also be employed in future work.
Conclusion
We investigated causal roles for the left and right pSTG in perception and misperception of speech with inhibitory TMS and spectrotemporally degraded real English sentences and pseudo-sentences with nonwords matched for sublexical properties. We hypothesized inhibitory TMS applied to the left and right pSTG would disrupt the priming effect with both real and pseudo-sentence probes. Stimulation to left pSTG delayed responses for real sentence probes but did not significantly affect misperceptions for pseudo-sentence probes. In addition, right pSTG inhibitory stimulation paradoxically facilitated misperceptions of pseudo-sentences while not significantly affecting responses for real sentences. Together, these results confirm inhibitory stimulation of either the left or right pSTG influences the perception of degraded speech, consistent with accounts proposing the engagement of bilateral mechanisms during speech perception (e.g. Hickok 2012; Hickok and Poeppel 2007).
Supplementary Material
Acknowledgments
We thank Dr Emma Ward for her help with stimuli recording. We thank Dr Elaine Kearney, Dr Angelique Volfart, and Marko Krsmanovic for their help with data collection. We are particularly grateful to all participants for their contributions.
Contributor Information
Valeriya Tolkacheva, Queensland University of Technology, School of Psychology and Counselling, O Block, Kelvin Grove, Queensland, 4059, Australia.
Sonia L E Brownsett, Queensland Aphasia Research Centre, School of Health and Rehabilitation Sciences, University of Queensland, Surgical Treatment and Rehabilitation Services, Herston, Queensland, 4006, Australia; Centre of Research Excellence in Aphasia Recovery and Rehabilitation, La Trobe University, Melbourne, Health Sciences Building 1, 1 Kingsbury Drive, Bundoora, Victoria, 3086, Australia.
Katie L McMahon, Herston Imaging Research Facility, Royal Brisbane & Women’s Hospital, Building 71/918, Royal Brisbane & Women’s Hospital, Herston, Queensland, 4006, Australia; Queensland University of Technology, School of Clinical Sciences and Centre for Biomedical Technologies, 60 Musk Avenue, Kelvin Grove, Queensland, 4059, Australia.
Greig I de Zubicaray, Queensland University of Technology, School of Psychology and Counselling, O Block, Kelvin Grove, Queensland, 4059, Australia.
Author contributions
Valeriya Tolkacheva (Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Software, Validation, Visualization, Writing—original draft), Sonia L.E. Brownsett (Methodology, Supervision, Writing—review & editing), Katie L. McMahon (Methodology, Resources, Software, Supervision, Writing—review & editing), and Greig I. de Zubicaray (Conceptualization, Funding acquisition, Investigation, Methodology, Supervision, Writing—review & editing).
Funding
This research was supported by an Australian Research Council Discovery Project Grant DP220101853 awarded to G.Z. and Australian Government Research Training Program (RTP) Scholarship awarded to V.T.
Conflict of interest statement: None declared.
References
- Ahissar M, Hochstein S. The reverse hierarchy theory of visual perceptual learning. Trends Cogn Sci. 2004:8(10):457–464. [DOI] [PubMed] [Google Scholar]
- Aho K, Derryberry D, Peterson T. Model selection for ecologists: the worldviews of AIC and BIC. Ecology. 2014:95(3):631–636. [DOI] [PubMed] [Google Scholar]
- Al-Zubaidi A, Bräuer S, Holdgraf CR, Schepers IM, Rieger JW. Sublexical cues affect degraded speech processing: insights from fMRI. Cereb Cortex Commun. 2022:3(1). 10.1093/texcom/tgac007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alba-Ferrara L, Ellison A, Mitchell RLC. Decoding emotional prosody: resolving differences in functional neuroanatomy from fMRI and lesion studies using TMS. Brain Stimul. 2012:5(3):347–353. [DOI] [PubMed] [Google Scholar]
- Andoh J, Paus T. Combining functional neuroimaging with off-line brain stimulation: modulation of task-related activity in language areas. J Cogn Neurosci. 2011:23(2):349–361. [DOI] [PubMed] [Google Scholar]
- Bestelmeyer PE, Belin P, Grosbras MH. Right temporal TMS impairs voice detection. Curr Biol. 2011:21(20):R838–R839. [DOI] [PubMed] [Google Scholar]
- Blank H, Davis MH. Prediction errors but not sharpened signals simulate multivoxel fMRI patterns during speech perception. PLoS Biol. 2016:14(11):e1002577. 10.1371/journal.pbio.1002577. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blank H, Spangenberg M, Davis MH. Neural prediction errors distinguish perception and misperception of speech. J Neurosci. 2018:3258-3217(27):6076–6089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Block CK, Baldwin CL. Cloze probability and completion norms for 498 sentences: Behavioral and neural validation using event-related potentials. Behavior Research Methods. 2010:42(3):665–670. 10.3758/BRM.42.3.665. [DOI] [PubMed] [Google Scholar]
- Brodbeck C, Bhattasali S, Cruz Heredia AAL, Resnik P, Simon JZ, Lau E. Parallel processing in speech perception with local and global representations of linguistic context. elife. 2022:11:e72056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brunoni AR, Amadera J, Berbel B, Volz MS, Rizzerio BG, Fregni F. A systematic review on reporting and assessment of adverse effects associated with transcranial direct current stimulation. Int J Neuropsychopharmacol. 2011:14(8):1133–1145. [DOI] [PubMed] [Google Scholar]
- Bueti D, Dongen EV, Walsh V. The role of superior temporal cortex in auditory timing. PLoS One. 2008:3(6):e2481. 10.1371/journal.pone.0002481. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cope TE, Sohoglu E, Peterson KA, Jones PS, Rua C, Passamonti L, Sedley W, Post B, Coebergh J, Butler CR, et al. Temporal lobe perceptual predictions for speech are instantiated in motor cortex and reconciled by inferior frontal cortex. Cell Rep. 2023:42(5):112422. 10.1016/j.celrep.2023.112422. [DOI] [PubMed] [Google Scholar]
- Criel Y, Boon C, Depuydt E, Stalpaert J, Huysman E, Miatton M, Santens P, Mierlo P, De Letter M. Aging and sex effects on phoneme perception: an exploratory mismatch negativity and P300 investigation. Int J Psychophysiol. 2023:190:69–83. [DOI] [PubMed] [Google Scholar]
- Davis MH, Johnsrude IS, Hervais-Adelman A, Taylor K, McGettigan C. Lexical information drives perceptual learning of distorted speech: evidence from the comprehension of noise-vocoded sentences. J Exp Psychol Gen. 2005:134(2):222–241. [DOI] [PubMed] [Google Scholar]
- Davis MH, Sohoglu E. Three functions of prediction error for Bayesian inference in speech perception. In: Poeppel D, Mangun GR, Gazzaniga MS editors. The cognitive neurosciences. Sixth ed. Cambridge, MA: MIT Press; 2020. [Google Scholar]
- Durfee AZ, Sheppard SM, Blake ML, Hillis AE. Lesion loci of impaired affective prosody: a systematic review of evidence from stroke. Brain Cogn. 2021:152:105759. 10.1016/j.bandc.2021.105759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Elliott TM, Theunissen FE. The modulation transfer function for speech intelligibility. PLoS Comput Biol. 2009:5(3):e1000302. 10.1371/journal.pcbi.1000302. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Evans S, McGettigan C. Comprehending auditory speech: previous and potential contributions of functional MRI. Lang Cogn Neurosci. 2017:32(7):829–846. [Google Scholar]
- Evans S, McGettigan C, Agnew ZK, Rosen S, Scott SK. Getting the cocktail party started: masking effects in speech perception. J Cogn Neurosci. 2016:28(3):483–500. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fleming V, Brownsett S, Krason A, Maegli MA, Coley-Fisher H, Ong YH, Nardo D, Leach R, Howard D, Robson H, et al. Efficacy of spoken word comprehension therapy in patients with chronic aphasia: a cross-over randomised controlled trial with structural imaging. J Neurol Neurosurg Psychiatry. 2020:92(4):418–424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gajardo-Vidal A, Lorca-Puls DL, Hope TMH, Parker Jones O, Seghier ML, Prejawa S, Crinion JT, Leff AP, Green DW, Price CJ. How right hemisphere damage after stroke can impair speech comprehension. Brain. 2018:141(12):3389–3404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Galluci M, Love J. GAMLj: general analyses for linear models. [jamovi module]; 2018.
- Ghaleh M, Skipper-Kallal LM, Xing S, Lacey E, DeWitt I, DeMarco A, Turkeltaub PE. Phonotactic processing deficit following left-hemisphere stroke. Cortex. 2018:99:346–357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grange JA. Trimr: an implementation of common response time trimming methods. 2015: R package version 1.0.1.
- Gulban OF, Goebel R, Moerel M, Zachlod D, Mohlberg H, Amunts K, de Martino F. Improving a probabilistic cytoarchitectonic atlas of auditory cortex using a novel method for inter-individual alignment. eLife. 2020:9:e56963. 10.7554/eLife.56963. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hall KC, Mackie JS, Lo RY-H. Phonological CorpusTools: Software for doing phonological analysis on transcribed corpora. International Journal of Corpus Linguistics. 2019:24(4):522–35. 10.1075/ijcl.18009.hal. [DOI] [Google Scholar]
- Hartwigsen G, Baumgaertner A, Price CJ, Koehnke M, Ulmer S, Siebner HR. Phonological decisions require both the left and right supramarginal gyri. Proc Natl Acad Sci. 2010:107(38):16494–16499. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hartwigsen G, Silvanto J. Noninvasive Brain Stimulation: Multiple Effects on Cognition. The Neuroscientist. 2023:29(5):639–653. 10.1177/10738584221113806. [DOI] [PubMed] [Google Scholar]
- Hervais-Adelman AG, Davis MH, Johnsrude IS, Carlyon RP. Perceptual learning of noise vocoded words: effects of feedback and lexicality. J Exp Psychol Hum Percept Perform. 2008:34(2):460–474. [DOI] [PubMed] [Google Scholar]
- Hickok G. The cortical organization of speech processing: feedback control and predictive coding the context of a dual-stream model. J Commun Disord. 2012:45(6):393–402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hickok G, Okada K, Barr W, Pa J, Rogalsky C, Donnelly K, Barde L, Grant A. Bilateral capacity for speech sound processing in auditory comprehension: evidence from Wada procedures. Brain and language. 2008:107(3):179–184. 10.1016/j.bandl.2008.09.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hickok G, Poeppel D. Towards a functional neuroanatomy of speech perception. Trends Cogn Sci. 2000:4(4):131–138. [DOI] [PubMed] [Google Scholar]
- Hickok G, Poeppel D. Dorsal and ventral streams: a framework for understanding aspects of the functional anatomy of language. Cognition. 2004:92(1–2):67–99. [DOI] [PubMed] [Google Scholar]
- Hickok G, Poeppel D. The cortical organization of speech processing. Nat Rev Neurosci. 2007:8(5):393–402. [DOI] [PubMed] [Google Scholar]
- Holdgraf CR, De Heer W, Pasley B, Rieger J, Crone N, Lin JJ, Knight RT, Theunissen FE. Rapid tuning shifts in human auditory cortex enhance speech intelligibility. Nat Commun. 2016:7:Article 13654. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kennedy-Higgins D, Devlin JT, Nuttall HE, Adank P. The causal role of left and right superior temporal gyri in speech perception in noise: a transcranial magnetic stimulation study. J Cogn Neurosci. 2020:32(6):1092–1103. [DOI] [PubMed] [Google Scholar]
- Keuleers E, Brysbaert M. Wuggy: a multilingual pseudoword generator. Behav Res Methods. 2010:42(3):627–633. [DOI] [PubMed] [Google Scholar]
- Kim K, Adams L, Keator LM, Sheppard SM, Breining BL, Rorden C, Fridriksson J, Bonilha L, Rogalsky C, Love T, et al. Neural processing critical for distinguishing between speech sounds. Brain Lang. 2019:197:104677. 10.1016/j.bandl.2019.104677. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krieger-Redwood K, Gaskell MG, Lindsay S, Jefferies E. The selective role of premotor cortex in speech perception: a contribution to phoneme judgements but not speech comprehension. J Cogn Neurosci. 2013:25(12):2179–2188. [DOI] [PubMed] [Google Scholar]
- Lambon Ralph MA, Pobric G, Jefferies E. Conceptual knowledge is underpinned by the temporal pole bilaterally: convergent evidence from rTMS. Cerebral Cortex (New York, N.Y. : 1991). 2009:19(4):832–838. [DOI] [PubMed] [Google Scholar]
- Leonard MK, Baud MO, Sjerps MJ, Chang EF. Perceptual restoration of masked speech in human cortex. Nat Commun. 2016:7(1):13619. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leonard MK, Bouchard KE, Tang C, Chang EF. Dynamic encoding of speech sequence probability in human temporal cortex. J Neurosci. 2015:35(18):7203–7214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu D, Chang Y, Dai G, Guo Z, Jones JA, Li T, Chen X, Chen M, Li J, Wu X, et al. Right, but not left, posterior superior temporal gyrus is causally involved in vocal feedback control. NeuroImage. 2023:278:120282. 10.1016/j.neuroimage.2023.120282. [DOI] [PubMed] [Google Scholar]
- Lo S, Andrews S. To transform or not to transform: using generalized linear mixed models to analyse reaction time data. Front Psychol. 2015:6:1171. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Luber B, Lisanby SH. Enhancement of human cognitive performance using transcranial magnetic stimulation (TMS). NeuroImage. 2014:85(0 3):961–970. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Luthra S, Mechtenberg H, Giorio C, Theodore RM, Magnuson JS, Myers EB. Using TMS to evaluate a causal role for right posterior temporal cortex in talker-specific phonetic processing. Brain Lang. 2023:240:105264. 10.1016/j.bandl.2023.105264. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mattys SL, Davis MH, Bradlow AR, Scott SK. Speech recognition in adverse conditions: a review. Lang Cogn Process. 2012:27(7–8):953–978. [Google Scholar]
- Mattys SL, White L, Melhorn JF. Integration of multiple speech segmentation cues: a hierarchical framework. J Exp Psychol Gen. 2005:134(4):477–500. [DOI] [PubMed] [Google Scholar]
- Matuschek H, Kliegl R, Vasishth S, Baayen H, Bates D. Balancing type I error and power in linear mixed models. J Mem Lang. 2017:94:305–315. [Google Scholar]
- McGettigan C, Evans S, Rosen S, Agnew ZK, Shah P, Scott SK. An application of univariate and multivariate approaches in FMRI to quantifying the hemispheric lateralization of acoustic and linguistic processes. J Cogn Neurosci. 2012:24(3):636–652. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mesgarani N, Cheung C, Johnson K, Chang EF. Phonetic feature encoding in human superior temporal gyrus. Science. 2014:343(6174):1006–1010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Narain C, Scott SK, Wise RJ, Rosen S, Leff A, Iversen SD, Matthews PM. Defining a left-lateralized response specific to intelligible speech using fMRI. Cereb Cortex. 2003:13(12):1362–1368. [DOI] [PubMed] [Google Scholar]
- Nunez AIR, Yue QH, Pasalar S, Martin RC. The role of left vs. right superior temporal gyrus in speech perception: an fMRI-guided TMS study. Brain Lang. 2020:209:Article 104838. [DOI] [PubMed] [Google Scholar]
- Okada K, Rong F, Venezia J, Matchin W, Hsieh IH, Saberi K, Serences JT, Hickok G. Hierarchical organization of human auditory cortex: evidence from acoustic invariance in the response to intelligible speech. Cereb Cortex. 2010:20(10):2486–2495. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pascual-Leone A, Tormos JM, Keenan J, Tarazona F, Cañete C, Catalá MD. Study and modulation of human cortical excitability with transcranial magnetic stimulation. J Clin Neurophysiol. 1998:15(4):333–343. [DOI] [PubMed] [Google Scholar]
- Pascual-Leone A, Walsh V, Rothwell J. Transcranial magnetic stimulation in cognitive neuroscience--virtual lesion, chronometry, and functional connectivity. Curr Opin Neurobiol. 2000:10(2):232–237. [DOI] [PubMed] [Google Scholar]
- Peelle JE, Miller RL, Rogers CS, Spehar B, Sommers MS, Van Engen KJ. Completion norms for 3085 English sentence contexts. Behavior Research Methods. 2020:52(4):1795–1799. 10.3758/s13428-020-01351-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pobric G, Jefferies E, Lambon Ralph MA. Category-specific versus category-general semantic impairment induced by transcranial magnetic stimulation. Curr Biol. 2010:20(10):964–968. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pobric G, Jefferies E, Ralph MA. Anterior temporal lobes mediate semantic representation: mimicking semantic dementia by using rTMS in normal participants. Proc Natl Acad Sci USA. 2007:104(50):20137–20141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Price CJ. A review and synthesis of the first 20 years of PET and fMRI studies of heard speech, spoken language and reading. NeuroImage. 2012:62(2):816–847. [DOI] [PMC free article] [PubMed] [Google Scholar]
- R Core Team . A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2021. [Google Scholar]
- Rastle K, Harrington J, Coltheart M. 358,534 nonwords: The ARC nonword database. Q J Exp Psychol A. 2002:55(4):1339–1362. [DOI] [PubMed] [Google Scholar]
- Rauschecker JP, Scott SK. Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing. Nat Neurosci. 2009:12(6):718–724. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rogalsky C, Basilakos A, Rorden C, Pillay S, LaCroix AN, Keator L, Mickelsen S, Anderson SW, Love T, Fridriksson J, et al. The neuroanatomy of speech processing: a large-scale lesion study. J Cogn Neurosci. 2022:34(8):1355–1375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rosen S, Wise RJS, Chadha S, Conway E-J, Scott SK. Hemispheric asymmetries in speech perception: sense, nonsense and modulations. PLoS One. 2011:6(9):e24672. 10.1371/journal.pone.0024672. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schwarz G. Estimating the dimension of a model. Ann Stat. 1978:6(2):461–464. http://www.jstor.org/stable/2958889. [Google Scholar]
- Scott SK, Blank CC, Rosen S, Wise RJ. Identification of a pathway for intelligible speech in the left temporal lobe. Brain. 2000:123 Pt 12(Pt 12, 12):2400–2406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scott SK, Johnsrude IS. The neuroanatomical and functional organization of speech perception. Trends Neurosci. 2003:26(2):100–107. [DOI] [PubMed] [Google Scholar]
- Scott SK, McGettigan C. Do temporal processes underlie left hemisphere dominance in speech perception? Brain Lang. 2013:127(1):36–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shannon RV, Zeng F-G, Kamath V, Wygonski J, Ekelid M. Speech recognition with primarily temporal cues. Science. 1995:270(5234):303–304. [DOI] [PubMed] [Google Scholar]
- Sohoglu E, Davis MH. Rapid computations of spectrotemporal prediction error support perception of degraded speech. elife. 2020:9:e58077. 10.7554/eLife.58077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sohoglu E, Peelle JE, Carlyon RP, Davis MH. Top-down influences of written text on perceived clarity of degraded speech. J Exp Psychol Hum Percept Perform. 2014:40(1):186–199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Talarico M, Abdilla G, Aliferis M, Balazic I, Giaprakis I, Stefanakis T, Foenander K, Grayden DB, Paolini AG. Effect of age and cognition on childhood speech in noise perception abilities. Audiol Neurootol. 2007:12(1):13–19. [DOI] [PubMed] [Google Scholar]
- The jamovi project. Jamovi. (Version 2.3) [Computer Software]. Sydney, Australia, 2022. Retrieved from https://www.jamovi.org. [Google Scholar]
- The MathWorks, I. MATLAB R2019b [computer software]. Natick, Massachusetts, United States: The MathWorks Inc.; 2019. [Google Scholar]
- Thielscher A, Antunes A, Saturnino GB. 2015. Field modeling for transcranial magnetic stimulation: a useful tool to understand the physiological effects of TMS? 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). Milan, Italy, 2015, pp. 222–225. 10.1109/EMBC.2015.7318340. [DOI] [PubMed] [Google Scholar]
- Turi Z, Lenz M, Paulus W, Mittner M, Vlachos A. Selecting stimulation intensity in repetitive transcranial magnetic stimulation studies: a systematic review between 1991 and 2020. Eur J Neurosci. 2021:53(10):3404–3415. [DOI] [PubMed] [Google Scholar]
- Turkeltaub PE, Coslett HB. Localization of sublexical speech perception components. Brain Lang. 2010:114(1):1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Turken AU, Dronkers NF. The neural architecture of the language comprehension network: converging evidence from lesion and connectivity analyses. Front Syst Neurosci FEBRUARY 2011. 2011:1. 10.3389/fnsys.2011.00001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vaden KI, Muftuler LT, Hickok G. Phonological repetition-suppression in bilateral superior temporal sulci. NeuroImage. 2010:49(1):1018–1023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Casteren M, Davis MH. Mix, a program for pseudorandomization. Behav Res Methods. 2006:38(4):584–589. [DOI] [PubMed] [Google Scholar]
- Van Scherpenberg C, Just AC, Hauber RC. Check Voice Onset Times from Chronset with Praat script [Internet]. OSF; 2020. Available from: osf.io/fmwqb. [Google Scholar]
- Vitevitch MS. The influence of sublexical and lexical representations on the processing of spoken words in English. Clin Linguist Phon. 2003:17(6):487–499. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vitevitch MS, Luce PA. A web-based interface to calculate phonotactic probability for words and nonwords in English. Behav Res Methods Instrum Comput. 2004:36(3):481–487. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yamamoto AK, Parker Jones O, Hope TMH, Prejawa S, Oberhuber M, Ludersdorfer P, Yousry TA, Green DW, Price CJ. A special role for the right posterior superior temporal sulcus during speech production. NeuroImage. 2019:203:116184. 10.1016/j.neuroimage.2019.116184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yi HG, Leonard MK, Chang EF. The encoding of speech sounds in the superior temporal gyrus. Neuron. 2019:102(6):1096–1110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zatorre RJ, Belin P. Spectral and temporal processing in human auditory cortex. Cereb Cortex. 2001:11(10):946–953. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

