Skip to main content
Journal of Speech, Language, and Hearing Research : JSLHR logoLink to Journal of Speech, Language, and Hearing Research : JSLHR
. 2025 Dec 16;69(1):166–181. doi: 10.1044/2025_JSLHR-24-00872

The Role of Speech Reading During Visual Word Processing in Hearing Children: A Functional Magnetic Resonance Imaging Study

Anna Banaszkiewicz a,b,, Neelima Wagley a,c, Clara Plutzer a,d, Rachael Rice a,e, James R Booth a
PMCID: PMC12926796  PMID: 41401803

Abstract

Purpose:

Speech reading, or the ability to identify speech components from visual cues of the face, contributes to the development of phonological awareness, which in turn supports reading acquisition. The left superior temporal sulcus (STS) is a key region known to be involved in multisensory integration of speech stimuli. Previous studies have shown a behavioral relation between speech reading and reading skill and, separately, engagement of the STS in audiovisual integration for speech reading and word reading. No prior study has directly demonstrated that speech reading mechanisms in the STS are related to word reading skill.

Method:

In the current study, we evaluate the role of the left STS in 10- to 16-year-old hearing children (N = 39) during a speech reading task and during phonological processing of visual words to examine the extent to which the left STS is involved with reading skills.

Results:

Based on a series of preregistered and exploratory analyses, we report three main findings. First, the left STS, functionally localized using an independent speech reading task, was engaged during a visual word-rhyming task. Second, there was weak evidence that the activation of the left STS during word rhyming was related to word reading skills. Third, there was strong evidence that reading skill was more strongly related to phonological processing in the STS than to semantic processing.

Conclusion:

Our results suggest that better reading skill relies on more robust engagement of specific phonological mechanisms in the STS.

Supplemental Material:

https://doi.org/10.23641/asha.30767582


Human face-to-face spoken communication carries not only auditory but also visual sensory signals. Speech comprehension is a multimodal function dependent upon the perception and integration of auditory components of spoken words and visual speech gestures (e.g., movements of the lips and tongue; Fröhlich et al., 2019; Rosenblum, 2008). One of the widely studied examples of the integration of multimodal speech components is the “McGurk effect” (McGurk & MacDonald, 1976). This occurs when a syllable presented auditorily (e.g., “ba”) is paired with a video of a different syllable (e.g., “ga”), leading to the illusory perception of a third syllable (“da”). Perception and integration of audiovisual information has been consistently shown to facilitate speech intelligibility (see Grant & Bernstein, 2019, for review), and this ability emerges early in life (Gijbels et al., 2021; Kyle et al., 2013; Lalonde & Holt, 2015).

Previous research suggests that visual speech gestures conveyed by speech reading (SR) support the development of linguistic competencies including phonological representations (e.g., Jerger et al., 2018; Teinonen et al., 2008), which in turn can contribute to reading acquisition (Bradley & Bryant, 1983; Melby-Lervåg et al., 2012). The evidence for the relation between SR, phonological, and reading abilities has been primarily observed in studies with deaf and hard-of-hearing children (e.g., Buchanan-Worster et al., 2020; Kyle & Harris, 2010, 2011; Pimperton et al., 2019), suggesting a facilitatory role of visual speech when access to auditory input is reduced. However, studies of hearing children indicate that SR also contributes to the linguistic development when auditory information is fully accessible. For instance, in their longitudinal experiment, Kyle and Harris (2011) found a strong positive association between SR abilities in 4- to 5-year-old hearing children and their phonological awareness assessed 2 years later using a picture task that measured alliteration and rhyme similarity judgment skill. Buchanan-Worster et al. (2020) reported that a concurrent relation between SR and single-word reading in 5- to 8-year-old hearing children was mediated by phonological awareness, measured as phoneme deletion skill. Finally, a recently published study of hearing 6- to 7-year-old participants investigated the associations between SR, word reading, and a variety of measures tapping into phonological skill (rhyme and alliteration awareness, nonword reading, and rapid automatized naming; Kyle & Trickey, 2024). The authors found that SR and phonological measures are positively correlated regardless of the type of task. They also report a significant relation between SR and word reading skill; however, the subsequent multiple regression analysis did not indicate that the SR skill serves as a unique predictor of the word reading score beyond the contribution of the phonological measures (entered in the analysis as a composite score). Furthermore, by analyzing errors during SR, the authors found that better speech readers use phonological strategies while performing the SR task. In summary, existing behavioral research provides some evidence for the association between SR and various aspects of phonological skill in hearing children. However, it should be noted that there are also studies that failed to report a positive relationship between SR and phonology (see Harris et al., 2017; Kyle & Harris, 2006; Tye-Murray et al., 2014).

Neuroimaging studies indicate that a key region for multisensory integration for nonspeech (e.g., objects) and speech stimuli is the left superior temporal sulcus (STS; Bernstein & Liebenthal, 2014; Calvert, 2001; Gao et al., 2023; Nath et al., 2011; Stevenson & James, 2009; Venezia et al., 2017). Located between the auditory cortex in the superior temporal gyrus (STG) and visual cortex in the posterior lateral temporal cortex, STS contains populations of topographically organized neurons that respond to auditory, visual, and audiovisual stimuli (Beauchamp et al., 2004; see also Bernstein & Liebenthal, 2014, for review). One framework for the neurobiology of auditory speech processing—the dual route model by Hickok and Poeppel (2007)—holds that together with the STG (which receives the initial auditory signal), the bilateral STS is involved in speech processing by subserving the access to phonological representations. This sublexical processing further diverges into two pathways: the ventral stream in the temporal lobe provides lexical and conceptual-semantic access, whereas the dorsal stream into parietal lobe enables sensory-motor integration that underlies mapping between speech sounds and motor representations in the frontal cortex. A number of studies with adult participants have linked the left STS with sublexical processing of speech in the auditory (e.g., Liebenthal et al., 2005; Vaden et al., 2010; Scharinger et al., 2016; Venezia et al., 2017), visual (SR; e.g., Okada & Hickok, 2009; Venezia et al., 2017) and audiovisual (e.g., Beauchamp et al., 2010; Nath & Beauchamp, 2012; Venezia et al., 2017; see also meta-analyses: Erickson et al., 2014; Gao et al., 2023, 2024; Liebenthal et al., 2014; Scheliga et al., 2023; Turkeltaub & Coslett, 2010) domains. Moreover, the STS has been indicated as a convergence region for auditory (spoken) and visual (written) phonological processing, using both low-level stimuli such as single phonemes and letters (van Atteveldt et al., 2004) and higher-level stimuli such as intelligible and unintelligible spoken and written narrative segments (Wilson et al., 2018). In summary, previous research with adults consistently points to multimodality of the left STS and its crucial role in various linguistic processes, including phonology (Oron et al., 2016).

Although the developmental literature is limited, there is some evidence suggesting the STS in children is involved in audiovisual integration similar to what is observed with adults (e.g., see Nath et al., 2011). As with speech comprehension, reading development is also an intrinsically multimodal process—fluent reading requires learning the association between letters and their corresponding auditory phonological representations. McNorgan et al. (2013) used a rhyming judgment task in unimodal (e.g., word pairs presented visually or auditorily) and cross-modal (e.g., auditory followed by visual) conditions during functional magnetic resonance imaging (fMRI). They examined the relations between phonological awareness (measured with elision task) and congruence of pairs of stimuli (overlapping orthography and phonology vs. overlapping phonology only) in typically developing readers and those who had a reading difficulty (Mage = 11 years). They found a positive correlation between the elision score and sensitivity to cross-modal congruency effect within the left STS in the group of typically developing readers. A subsequent study by McNorgan et al. (2014) examined the relationship between cross-modal congruency effects within the temporal lobe and literacy (measured as reading and spelling). Reading skill–dependent sensitivity of the planum temporale, located in the STG, was related to cross-modal congruency effect; however, they did not find a similar association for the STS. The authors suggest that while the left posterior STS might subserve a successful initial integration at smaller grain sizes (e.g., phonemes), the planum temporale might be involved in maintaining phonological representations, especially during reading of more phonologically complex items (e.g., word pairs).

Although there is research showing a behavioral relation between SR and reading skill, perhaps through phonological processing, and there is research showing that the STS is involved in audiovisual integration for SR and word reading, previous work has not directly demonstrated that SR mechanisms in the STS are related to word reading skill. The current study directly evaluates the role of the left STS in 10- to 16-year-old hearing children during phonological processing of visual words and the extent to which the left STS is involved with reading skills.

In an fMRI scanner, we used an SR localizer task to measure visual-to-phonological mapping, which requires converting mouth movements into likely phonemes. Using behavioral standardized measurement, we assessed phonological skill—an awareness of the sounds in spoken words and the ability to manipulate those sounds. We then correlated the phonological awareness behavioral measure with SR activation to more precisely identify voxels in the STS involved in phonological processing. Next, we examined how these identified voxels were engaged in a visual word-rhyming task in the scanner that requires grapheme-to-phoneme mapping and whether these voxels were related to individual differences in a standardized behavioral measure of reading skill. We further investigated whether the activation in the left STS and its association with reading skill are selective to phonological processing in comparison to semantic processing. Finally, in a behavioral analysis, we investigated whether SR skill is positively associated with phonological skill. We hypothesized that (a) activation within the left STS, localized using an SR task, will be observed during a visual word-rhyming task; (b) the strength of activation within the left STS during the visual word-rhyming task will be positively associated with word reading skill; (c) the correlation between the strength of activation within the left STS and word reading skill will be stronger for the visual word-rhyming task than for the meaning task; and (d) there will be a positive correlation between SR skill and phonological skill, but not between SR skill and semantic skill.

Method

The study hypotheses and analytical plan were preregistered through Open Science Framework (OSF) after data cleaning but prior to analyses (https://doi.org/10.17605/OSF.IO/8CHUK). This study analyses data from a larger neuroimaging dataset on children and adolescents.The dataset including participants' structural and functional MRI scans, questionnaire data, standardized behavioral scores, and experimental materials are publicly shared on Open Neuro (see Wang et al., 2025, for details; https://doi.org/10.18112/openneuro.ds006239.v1.0.2). The code used for data analysis as well as the list of participants included in the current study from the larger dataset is available on OSF (https://osf.io/zx2y9/; see “Data Analysis Scripts” and “Behavioral Data” folders).

Participants

All participants were recruited by September 1, 2023. Forty-six children with complete standardized assessment data and two full runs of the three experimental fMRI tasks were originally selected for analysis. Seven participants were excluded for not meeting the inclusion criteria (see details below). Thus, data from 39 children (20 girls, 19 boys) were analyzed in the final sample (Mage = 12.6, SDage = 2.0, range: 10.1–16.9 years old on the day of the reading skill assessment; see Table 1). All participants were recruited from the greater Nashville, Tennessee, area. The experimental procedures were approved by the institutional review board of Vanderbilt University (Project #171739). Informed consent was collected from participants' parents or guardians, and assent was collected from children before participation in the study. We collected the following demographic and background information from participants' parents: maternal education background (n = 1, had a 2-year/associate's degree; n = 20, held a bachelor's degree or higher; n = 5, chose not to report) and household income (n = 1, $25,000–$50,000; n = 1, $50,000–$75,000; n = 3, $75,000–$100,000; n = 14, more than $100,000; n = 7, did not report). This indicates that most participants in our sample came from households where the maternal education was higher than the national U.S. norms (Osterman et al., 2024), and most participants in our sample fall above the national median household income of ~$75,000 (Osterman et al., 2024; U.S. Census Bureau, 2024).

Table 1.

Participant (N = 39) demographics and performance on standardized assessments of nonverbal IQ, phonological awareness, and word reading skill.

Variable M (SD)
Age in years 12.70 (2.06)
Gender 20 females, 19 males
KBIT-2, Matrices* 113.56 (13.44)
CTOPP-2, Elision** 10.25 (2.13)
WJ-III, Letter–Word Identification* 111.64 (12.90)

Note. KBIT-2 = Kaufman Brief Intelligence Test–Second Edition, Matrices subtest; CTOPP-2 = Comprehensive Test of Phonological Processing–Second Edition, Elision subtest; WJ-III = Woodcock-Johnson Test of Academic Achievement–Third Edition, Letter–Word Identification subtest.

*

Standard score.

**

Scaled score.

The inclusionary criteria were as follows: (a) primarily right-handed assessed using five actions of writing, drawing, picking up, opening, and throwing; score ≥ 3 indicates right-handedness; (b) native English speaker as reported in parent questionnaire; (c) no clinical diagnosis of neurological, psychiatric, or developmental disorders as reported in parent questionnaire; (d) normal hearing and normal/corrected vision as reported in parent questionnaire; (e) nonverbal IQ standard score of 80 or higher assessed using the Matrices subtest of the Kaufman Brief Intelligence Test–Second Edition (KBIT-2; Kaufman & Kaufman, 2004; n = 2 excluded); (f) complete standardized assessments data; (g) complete data for the fMRI experimental tasks; (h) acceptable in-scanner task performance as defined in the task details section below (n = 1 excluded); and (i) acceptable in-scanner motion for all fMRI experimental tasks as defined in the fMRI preprocessing section below (n = 4 excluded).

Behavioral Standardized Assessments

The Matrices subtest of the KBIT-2 (used to measure nonverbal intelligence) measures the ability to solve new problems, perceive relationships, and complete visual analogies without testing vocabulary or language skill. Standard scores were calculated from 46 test items (M = 113.6, SD = 13.4, range: 81–140).

Phonological skill was assessed using the Elision subtest of the Comprehensive Test of Phonological Processing–Second Edition (CTOPP-2; Wagner et al., 2013). This subtest measures the ability to remove phonological segments from spoken words to form other words (e.g., Say “spider.” Now say “spider” without saying “der”; correct answer: “spy”). We selected this subtest as a widely accepted measure of phonemic awareness, which is a predictor of individual differences in reading development (Barbosa-Pereira et al., 2020; Bradley & Bryant, 1983; Melby-Lervåg et al., 2012). Scaled scores were calculated from 34 test items.

Word reading skill was assessed using the Letter–Word Identification subtest of the Woodcock-Johnson Test of Academic Achievement–Third Edition (WJ-III; Wendling et al., 2009). This subtest measures the ability to correctly read single words aloud from an increasingly difficult list. Standard scores were calculated from 76 test items.

fMRI Tasks and Stimuli

SR Task

Participants performed three in-scanner fMRI tasks. First, a silent SR judgment task was implemented as a functional localizer to identify a region of interest (ROI) within the left STS (see Figure 1). Participants were presented with two sequential silent videos of a man or woman saying a single monosyllabic word and were asked to indicate whether pairs of words were similar (see Table 2). The SR task included three experimental conditions in which pairs of words had the same initial consonant and vowel (onset; e.g., “ bat” and “ bad”; 12 trials), the same final consonant and vowel (offset; e.g., “b id” and “r id”; 12 trials), or words that did not share any phonemes (different; e.g., “zip” and “bun”; 12 trials).

Figure 1.

Brain scans showing regions of the left superior temporal sulcus (STS), superior temporal gyrus (STG), and middle temporal gyrus (MTG) highlighted in different colors. The red area highlights the STS, the blue area highlights the STG, and the green area highlights the MTG. A yellow area highlights the top 1,000 voxels within the left STS that show the strongest correlation between brain activity during speech reading and phonological awareness, as measured by the Comprehensive Test of Phonological Processing\u2013Second Edition (CTOPP-2) Elision task. The scans show brain activity in multiple views (axial and sagittal).

Defining the region of interest in the left superior temporal sulcus (STS; red) as the intersection between the left superior temporal gyrus (STG; blue) and the left middle temporal gyrus (MTG; green) for the primary analyses. In yellow is the mask of the top 1,000 voxels within the left STS showing the strongest correlation between task activation during speech reading (lexical > fixation) and phonological awareness measured using CTOPP-2 Elision.

Table 2.

Task conditions and participant accuracy (percent correct) for the in-scanner tasks.

Condition Response Example M (SD) Range
Speech reading
Offset Yes Bid–Rid 72.6 (23.2) 33–100
Onset Yes Bat–Bad 84.8 (20.1) 38–100
Different No Sit–Hug 87.9 (11.0) 67–100
Perceptual control Yes Two faceshot photos of a woman with a long hair wearing a black shirt. 95.9 (8.9) 75–100
No Two faceshots of a woman with long hair and a black shirt. Each photo shows a different facial expression. 95.7 (9.0) 75–100
Fixation control Yes 2 green plus signs or 2 red plus signs. 96.3 (8.3) 75–100
No Green and red plus signs or red and green plus signs. 93.7 (11.6) 67–100
Phonology
O + P+ (Rhyming) Yes Goat–Float 91.8 (14.6) 56–100
O−P+ (Rhyming) Yes Roast–Ghost 88.7 (14.4) 50–100
O + P− (Non-rhyming) No Pint–Mint 75.1 (23.8) 4–96
O−P− (Non-rhyming) No June–Loop 89.9 (13.6) 61–100
Perceptual control Yes The image displays 8 identical symbols. A hyphen is between the fourth and fifth symbol. 95.9 (10.3) 72–100
No The image displays 2 sets of symbols. The first set consists of 4 identical symbols. The second set also has 4 identical symbols. A hyphen is between the two sets. 95.3 (10.9) 67–100
Fixation control Yes 2 green plus signs or 2 red plus signs. 95.1 (12.1) 67–100
No Green and red plus signs or red and green plus signs. 95.1 (13.7) 61–100
Semantic
High association (Related) Yes Little–Big 95.5 (11.6) 47–100
Low association (Related) Yes Water–Drink 95.5 (8.5) 69–100
Unrelated No Corn–Blame 96.9 (5.5) 88–100
Perceptual control Yes The image displays 8 identical symbols. A hyphen is between the fourth and fifth symbol. 95.4 (9.5) 67–100
No The image displays 2 sets of symbols. The first set consists of 4 identical symbols. The second set also has 4 identical symbols. A hyphen is between the two sets. 98.9 (4.1) 89–100
Fixation control Yes 2 green plus signs or 2 red plus signs. 96.9 (7.5) 83–100
No Green and red plus signs or red and green plus signs. 96.9 (8.4) 75–100

Words for the SR task were selected to exclude glide consonants (/w/ and /y/) as well as past tense verbs due to their phonological complexity and irregularity in English. Both within-conditions (prime and target) and between-conditions (onset, offset, and different) lists were matched for part of speech, written log frequency, orthographic neighborhood, phonological neighborhood, bi-gram sum, and mean based on characteristics obtained from the English Lexicon Project (Balota et al., 2007; http://elexicon.wustl.edu/). The SR task also included two control conditions: the perceptual control and fixation control. In the perceptual control condition, participants indicated whether two nonlinguistic mouth movements matched or not (12 trials, e.g., tongue protrusion trying to touch the tip of nose, puffing the cheeks out, quick and repetitive teeth clenching motion with mouth open). In the fixation control condition, participants indicated whether two green or red fixation crosses matched in color or not (12 trials).

Stimuli videos were presented for 1,200 ms, followed by a 200-ms blank interstimulus interval and a second 1,200-ms video stimulus. The response window showed a black fixation cross and varied from 2,500 to 3,500 ms. An additional blue fixation cross appeared for 500 ms, as an indication that the next trial was to soon begin. Participants were instructed to use their right index finger to respond “yes” to linguistically similar or perceptually matching trials and their right middle finger to respond “no” to linguistically nonsimilar or perceptually nonmatching trials. All trials were pseudorandomized (i.e., randomized fixed order) and evenly split into two fMRI runs with each run lasting 4:54 min over 150 volumes.

Phonology and Semantic Tasks

Second, participants performed two visual word reading tasks assessing phonological and semantic processing. Timings for both word reading tasks were as follows: the first word was presented for 800 ms, followed by a 200-ms blank interstimulus interval and a second 800-ms word stimulus. The response window showed a black fixation cross and varied from 2,500 to 3,500 ms, with a third of trials having each of the intertrial intervals: 2,500, 3,000, and 3,500 ms. An additional blue fixation cross was shown for 500 ms, as an indicator the next trial would begin soon. Participants were instructed to use their right index finger to respond “yes” to linguistic (rhyming/semantically associated) and perceptually matching trials and their right middle finger to respond “no” to non-rhyming/no-association linguistic and nonmatching perceptual trials.

In the phonology task, participants determined whether two sequentially presented monosyllabic words rhymed or didn't rhyme. This task included four experimental conditions with 24 trials in each condition (see Table 2). Half of all the word pairs rhymed and half did not. For the rhyming conditions, two words had either similar orthography and similar phonology (O+P+; e.g., “goat” and “float”) or different orthography and similar phonology (O−P+; e.g., “roast” and “ghost”). For the non-rhyming conditions, two words had either similar orthography and different phonology (O+P−; e.g., “pint” and “hint”) or different orthography and different phonology (O−P−; e.g., “june” and “loop”). All words were non-homophonic, monosyllabic, and matched across condition and word pair (prime, target) for written frequency in children (Plaut et al., 1996; Zeno et al., 1995). Friends (number of rhymes that were spelled the same or number of rhymes that were pronounced the same), orthographic enemies (number of different spellings for the same rhyme), and phonological enemies (number of different pronunciations for the same rhyme) were also matched across word pairs based on the English Lexicon Project database (Balota et al., 2007). The rhyme judgment task also included two control conditions: a false-font perceptual control condition where participants indicated whether pairs of visually presented symbols matched or not (12 trials) and a fixation control condition as described above (12 trials). All trials were evenly split into two runs, balanced for items per condition, with each run lasting 6:22 min over 195 volumes.

In the semantic task, participants determined whether two sequentially presented monosyllabic words are semantically related or unrelated. This meaning judgment task included three experimental conditions with 24 trials in each condition (see Table 2). The high association condition was defined as word pairs having a strong semantic relation (e.g., “little” and “big”) with an association strength between 0.36 and 0.77 (M = 0.60, SD = 0.11). The low association condition was defined as word pairs having a weak semantic relation (e.g., “lettuce” and “salad”) with an association strength between 0.14 and 0.60 (M = 0.30, SD = 0.11). Seven word pairs from the low association condition overlapped in association values (> 0.36) with the high association condition. The unrelated condition was defined as word pairs that shared no semantic relation (e.g., “corn” and “blame”). Word-pair semantic relations were based on free association values (USF Free Association Norms; Nelson et al., 2004). All words were non-homophonic, contain two syllables or less, and are matched by condition (related, unrelated) for written word frequency in children (Zeno et al., 1995). The correlations of word frequency with association strength were not significant, indicating that any association effects should not be due to frequency differences. The meaning judgment task also included the same two control conditions as in the rhyming task. All trials were evenly split into two fMRI runs, balanced for items per condition, with each run lasting 5:19 min over 163 volumes.

Only participants who completed both runs of all three tasks within 6 months and those who scored acceptable accuracy and had no response bias were included in the final analysis. Acceptable accuracy was defined as at least 50% in the onset condition for the SR task, O+P+ condition in the phonology task, high association condition in the semantic task, and the fixation and perceptual control conditions in all three tasks. Response bias was defined as having greater than a 50% difference in accuracy between the onset and different conditions for the SR task, O+ P+ and O−P− conditions for the rhyming task, and the high association and unrelated conditions for the meaning task.

fMRI Data Acquisition

Images were acquired using a Phillips Ingenia Elition 3.0 T X scanner with a 32-channel head coil. Participants laid supine in the scanner with a response button box placed in their right hand. Visual stimuli were projected onto a screen and viewed via a mirror attached to the head coil. The blood oxygen level–dependent signal was measured using a susceptibility-weighted single-shot echo planar imaging (EPI) method. Functional images were acquired with multiband EPI. The following parameters were used: echo time (TE) = 30 ms, flip angle = 72°, matrix size = 96 × 94, field of view (FOV) = 216 mm2, slice thickness = 2.25 mm, number of slices = 54, response time (TR) = 2,000 ms, multiband acceleration factor = 24, voxel size = 2.25 × 2.25 × 2.25 mm. A high-resolution T1-weighted (T1w) MPRAGE scan was acquired prior to functional image acquisition with the following scan parameters: TR = 1,900 ms, TE = 4.60 ms, flip angle = 8°, matrix size = 256 × 256, FOV = 256 mm2, slice thickness = 1 mm, number of slices = 192.

fMRI Data Analysis

Preprocessing

fMRI data were analyzed using SPM12 (http://www.fil.ion.ucl.ac.uk/spm) in MATLAB R2018ba (The MathWorks Inc.). First, for all participants, functional images were realigned to their mean functional image across runs and motion corrected. The anatomical T1w image was segmented and warped to a tissue probability map template to get the transformation field. An anatomical brain mask was created by combining the segmented products (i.e., gray, white, and cerebrospinal fluid) and then applied to its original anatomical image to produce a skull-stripped anatomical image. All functional images, including the mean functional image, were coregistered to the skull-stripped anatomical image. Functional images were normalized to the Montreal Neurological Institute (MNI) space by applying the transformation fields acquired from T1w images, with a voxel size of 2 × 2 × 2 mm. All the normalized functional images were smoothed with a 6-mm isotropic Gaussian kernel. ArtRepair (Mazaika et al., 2007) was used to identify outlier volumes among the functional images. Outlier volumes were defined as volumes exceeding 1.5-mm volume-to-volume head movement in any direction, head movement greater than 5 mm in any direction from the mean functional image, or deviations of more than 4% from the global mean signal. Outlier volumes were repaired using interpolated values of the adjacent nonoutlier volumes. All participants included in the final analyses had less than 10% of outlier volumes and no more than six consecutive volumes repaired within each run.

Task-Based Activation Contrasts

Statistical analyses at the first level on individual participants' data and the group-level second-level analyses were performed using a general linear model. For each run of each task, the first-level model included trial timings of all conditions (onset, offset, different, and both control conditions in the SR localizer task; rhyming, non-rhyming, and both control conditions in the phonology task and; high, low, unrelated, and both control conditions in the semantic task) as regressors of interest and six head movements obtained from the realignment step (x, y, z translations and pitch, roll, yaw rotations) as regressors of no interest (repaired volumes were deweighted). The hemodynamic response function convolved data were high-pass filtered with a cutoff period of 1/128 Hz.

In the preregistered primary analyses, brain activation for each task was evaluated using the respective contrasts of interest: [(Offset & Onset) > Fixation] for SR, [Rhyming (O+P+ & O−P+) > Fixation] for the phonology task, and [Related (High & Low Association) > Fixation] for the semantic task. To assess the difference in activation between the phonology and semantic tasks, we used the [Rhyming (O+P+ & O−P+) > Fixation] − [Related (High & Low) > Fixation] and [Related (High & Low) > Fixation] − [Rhyming (O+P+ & O−P+) > Fixation]. In the preregistered secondary analyses, we duplicated the primary analyses using a higher-level baseline measure for each task, the perceptual control conditions. For ease of comprehension, the phonology task conditions O+P+ and O−P+ analyzed together will be referred to as the “rhyming” condition, and similarly, the semantic task conditions of high and low association analyzed together will be referred to as the “related” condition.

Localizing Phonological Processing in the Left STS

The focus of this study was on the relation of phonological processing to word identification. To localize phonological processing, we correlated phonological awareness to activation in the left STS during an SR task. To test our primary hypotheses, we defined the ROI in the following steps (see Figure 1). First, the anatomical mask of the left STS was created using the automated anatomical labeling (AAL) atlas integrated in the WFU PickAtlas toolbox (Maldjian et al., 2003). The anatomical left STS was constructed by isolating the intersection between AAL definitions of the left STG and the left middle temporal gyrus, each 3D dilated with a factor of 2. Second, we computed a voxel-wise regression analysis within the left STS anatomical mask to evaluate the correlation between brain activation during SR [(Offset & Onset) > Fixation] and the Elision subtest (scaled scores). In the secondary analyses, we used the [(Offset & Onset) > Perceptual] contrast for this voxel-wise regression. Lastly, for all subsequent analyses, the top 1,000 voxels showing the highest correlation regardless of significance were selected and used as the STS ROI, separately, for the primary and secondary analyses (see Figure 1 and Supplemental Materials S3 and S4).

Preregistered Primary and Secondary fMRI Analyses

To investigate whether SR-related voxels were engaged during a visual word-rhyming task, we computed a one-sample t test for the phonology task (Rhyming > Fixation) within the STS ROI. To investigate whether there was an association between word reading skill and the strength of activation within the left STS during the word-rhyming task, we computed a voxel-wise regression analysis during the phonology task (Rhyming > Fixation) and standardized scores on the Letter–Word Identification subtest. In this analysis, we controlled for participants' in-scanner task performance by including it as a covariate of no interest, measured as a mean score for the two rhyming conditions across both runs. To test whether the association between the strength of activation in the left STS ROI and word reading skill will be selective for phonological processing, we computed two voxel-wise regression analyses. We used the [(Phonology: Rhyming > Fixation) − (Semantic: Related > Fixation)] and [(Semantic: Related > Fixation) − (Phonology: Rhyming > Fixation)] contrast while controlling for participants' in-scanner task performance as covariates of no interest. In-scanner task performance was measured as a mean score of the rhyming conditions for the phonology task and a mean score of the related conditions for the semantic task. Additionally, we duplicated all the above analyses in a set of secondary analyses using the perceptual control conditions in lieu of the fixation control conditions.

Preregistered Planned Behavioral Analyses

To investigate whether the positive relation between the accuracy in the SR task—which involved phonological processing—is specific to the visual rhyming task, and not evident in the semantic task, we ran two correlation analyses. Due to the non-normal distribution of the data, we performed a Spearman correlation of the in-scanner task accuracy for SR, measured as a mean score for the matching conditions (onset and offset) with the in-scanner task accuracy for phonology, measured as a mean score for rhyming conditions. We repeated the same correlation of task accuracy for SR with task accuracy for semantics, measured as a mean score for the related conditions. Second, we compared the correlation coefficients across the two word reading tasks. To do so, we used an online quantpsy tool with a Fisher's r-to-z transformation to test the difference between two dependent correlations with one variable in common (Lee & Preacher, 2013; see also Steiger, 1980).

Results

Behavioral Task Performance

Distribution of scores on the phonological awareness and word reading standardized behavioral measures are shown in Figure 2. Overall, participants scored average to above average on the CTOPP-2 Elision (M = 10.3, SD = 2.1, range: 5–14 scaled score) and the WJ-III Letter–Word Identification (M = 111.6, SD = 12.9, range: 73–114 standard score). In-scanner task accuracies for the SR, phonology, and semantic tasks are reported in Table 2. Distribution of task performance on the conditions of interest for the experimental tasks is plotted in Supplemental Material S6.

Figure 2.

Two box plots. A. The first box plot is for the distribution of the scaled score of the CTOPP-2 Elision. The minimum and maximum values are 6 and 14, respectively. The first and third quartiles are 9 and 12, respectively. The median is 10. The mean is also 10. B. The second box plot is for the distribution of the standard score of the WJ-III Letter-Word Identification. The minimum and maximum values are 90 and 128, respectively. The first and third quartiles are 104 and 120, respectively. The median is 110. The mean is also 110.

Distribution of task performance on the CTOPP-2 Elision (A) and WJ-III Letter–Word Identification (ID; B) standardized assessments. Red dots indicate mean values.

Preregistered Primary fMRI Analysis

Results from the preregistered primary analyses are shown in Figure 3. We first examined whether SR-related voxels are also engaged during the visual word-rhyming task. Within the left STS ROI, there was significant functional activation for rhyming words compared to the fixation baseline (see Table 3, Hypothesis 1). However, results from a voxel-wise regression analysis showed no correlation between the strength of the STS ROI engagement in the word-rhyming task and word reading skill as measured behaviorally outside of the scanner, controlled for the word-rhyming task accuracy (see Table 3, Hypothesis 2). Lastly, we examined whether the strength of the activation in the STS ROI, as associated with word reading skill, was selective for phonological processing in comparison to semantic processing. Results of two voxel-wise regression analyses indicate no significant differences for Rhyming > Related and Related > Rhyming comparisons (see Table 3, Hypothesis 3).

Figure 3.

Brain activation patterns during a visual word rhyming task, analyzed in relation to reading skill. The image displays results from preregistered primary analyses. Left panel shows activation associated with rhyming compared to fixation, with a significant result (colored cluster). Middle panel shows no significant result for rhyming compared to fixation in relation to reading skill. Right panel shows no significant results for rhyming compared to meaning and meaning compared to rhyming in relation to reading skill. All analyses were conducted within the superior temporal sulcus region of interest. Error correction methods (p < .001 uncorrected, p < .05 FWEc, small volume corrected) were applied. A color scale indicates t-values.

Results from the preregistered primary analyses using fixation as the baseline examining activation during the visual word rhyming task (left), its relation to word reading skill (middle), and activation for the comparison between word rhyming and meaning tasks in relation to reading skill (right). All analyses were performed within the superior temporal sulcus region of interest. FWEc = familywise error, cluster level.

Table 3.

Results from analyses within the superior temporal sulcus region of interest with coordinates based on the automated anatomical labeling atlas.

Variable MNI coordinates
Extent t value x y z
Preregistered primary analysis
Preregistered threshold p < .001 uncorrected, p < .05 FWEc, small volume corrected
 Hypothesis 1
  L middle temporal 295 6.1 −58 −30 0
  L middle temporal 5.4 −60 −40 8
 Hypothesis 2
 Hypothesis 3
Preregistered secondary analysis
Preregistered threshold p < .001 uncorrected, p < .05 FWEc, small volume corrected
 Hypothesis 1
  L middle temporal 246 7.0 −56 −40 8
  L middle temporal 5.1 −56 −32 2
  L middle temporal 4.3 −62 −52 8
 Hypothesis 2
 Hypothesis 3
  L middle temporal 121 5.3 −64 −24 −2
  L middle temporal 4.0 −66 −30 8
  L superior temporal 128 4.7 −48 −36 14
  L cerebral white matter* 4.2 −40 −38 6
  L middle temporal 4.1 −50 −40 0

Note. Em dashes indicate no suprathreshold clusters. MNI = Montreal Neurological Institute; FWEc = familywise error, cluster level; L = left.

*

Region labels with asterisks were identified using the Neuromorphometrics, Inc., atlas implemented in SPM12.

After viewing the results of the planned primary analyses, we performed a set of exploratory analyses (not preregistered). Firstly, we repeated the above analyses using a reduced significance threshold. Results are reported in Supplemental Material S1. Secondly, we agreed with an anonymous reviewer's comment that the lack of correlation between activation in STS ROI during the word-rhyming task and word reading skill may not be independent from the word-rhyming task accuracy (see Supplemental Material S8, panel B). Therefore, we computed two additional exploratory voxel-wise regression analyses: (a) to test the association between activation in the STS ROI during the word-rhyming task and word reading skill without controlling for word-rhyming task accuracy and (b) to test the association between activation in the STS ROI during the word-rhyming task and word-rhyming task accuracy. We observed no suprathreshold voxels at both p < .001 and p < .05 for both analyses.

Preregistered Secondary fMRI Analysis

Results from the preregistered secondary analyses are shown in Figure 4. We replicated all primary analyses using the higher-level perceptual baseline as the control condition for each task. Consistent with our previous result, there was significant functional activation for rhyming words compared to the perceptual baseline, further confirming the involvement of SR voxels within the left STS during a visual word-rhyming task. Also consistent with the results of the primary analysis, there was no correlation between the strength of the STS engagement during the word-rhyming task and word reading score measured behaviorally. Contrary to the primary analysis, this secondary analysis revealed a significant difference between phonological processing in comparison to semantic processing (but not for the reverse contrast), indicating selectivity for phonological processing within the STS ROI in association with word reading skill.

Figure 4.

Brain activation patterns during a visual word rhyming task, analyzed in relation to reading skill. The image displays results from preregistered secondary analyses. Left panel shows activation associated with rhyming compared to perceptual tasks, with a significant result (colored cluster). Middle panel shows no significant result for rhyming compared to perceptual tasks in relation to reading skill. Right panel shows activation for rhyming compared to meaning in relation to reading skill and no significant result for meaning compared to rhyming in relation to reading skill. All analyses were conducted within the superior temporal sulcus region of interest. Error correction methods (p < .001 uncorrected, p < .05 FWEc, small volume corrected) were applied. A color scale indicates t-values.

Results from the preregistered secondary analyses using perceptual as the baseline examining activation during the visual word rhyming task (left), its relation to word reading skill (middle), and activation for the comparison between word rhyming and meaning tasks in relation to reading skill (right). All analyses were performed within the superior temporal sulcus region of interest. FWEc = familywise error, cluster level.

Preregistered Primary Behavioral Analyses

We observed a significant positive correlation between the in-scanner task accuracy for SR and the in-scanner task accuracy for phonology, ρ(37) = .50, p = .0011 (see Figure 5A), as well as between the task accuracy for SR and semantics, ρ(37) = .37, p = .019 (see Figure 5B). To compare the coefficients of two dependent correlations sharing one common variable, we additionally computed a third correlation between accuracies of the phonology and semantic tasks, ρ(37) = .43, p = .006. These coefficients were entered into the quantpsy calculator using the Fisher's r-to-z transformation (Lee & Preacher, 2013; see also Steiger, 1980). There was no significant difference between the coefficients for the SR and phonology correlation and the SR and semantic correlation (Z = .82, one-tailed p = .21).

Figure 5.

Two scatterplots. In both scatterplots, the y-axis represents the speech reading task accuracy onset and offset. A. In the first scatterplot, the x-axis represents the phonology task accuracy (O plus P plus and O minus P plus). The regression line indicates a positive correlation between the variables and it runs between (68, 62) and (100, 82). B. In the second scatterplot, the x-axis represents the semantic task accuracy for high and low association. The regression line indicates a positive correlation between the 2 variables and it runs between (70, 55) and (100, 80).

Scatter plots showing correlations between the in-scanner accuracy for the speech reading task and in-scanner accuracy for the phonology (A) and semantic (B) tasks using contrasts of interest. Assc. = association.

It should be noted that the accuracy scores of two participants were identified as outliers in the semantic task, Z = −3.32 (73%) and Z = −4.46 (65%; see Supplemental Material S6). Thus, we repeated the correlation analysis for SR and semantics after removing the two outliers. This exploratory analysis revealed no significant association, ρ(35) = .28, p = .098 (see Supplemental Material S7), between SR and semantic task accuracy. Subsequently, we repeated the comparison between correlation coefficients after removing the two outliers. First, we recalculated the correlation between accuracies for phonology and SR tasks, ρ(35) = .46, p = .0043. Second, we recalculated the correlation between accuracies for phonology and semantic tasks, ρ(35) = .37, p = .023. The difference between correlation coefficients recalculated using the quantpsy calculator remained insignificant (Z = 1.26, one-tailed p = .10).

Discussion

The current study examined whether the left STS was engaged during a visual word-rhyming task requiring phonological processing, whether engagement of the left STS during this task was related to reading skills measured behaviorally outside the scanner, and whether this relation was selective to phonological processing in comparison to a meaning task requiring semantic processing. A novel aspect of this study is that the STS was functionally localized using the correlation of phonological awareness skill to an independent silent SR task. Across a set of preregistered and exploratory analyses, we observed results broadly in support of our hypotheses. First, we found strong evidence in the planned analyses that the left STS was engaged during a visual word-rhyming task. Second, there was some evidence in exploratory analyses that the left STS activated during word rhyming was related to word reading skills. Lastly, there was strong evidence that reading skill was more related to phonological processing in the STS during the word-rhyming task as compared to semantic processing during the meaning task. Together, these results suggest that better reading skill relies on more robust engagement of specific phonological mechanisms in the STS.

The most robust finding of the present study was that the left STS voxels were engaged during SR and the visual word-rhyming task. This pattern of result was consistent when using a lexical versus lower-level fixation baseline contrast and when using a lexical versus high-level perceptual baseline contrast. Overlapping engagement of the same voxels during SR and visual word rhyming suggests that both tasks were tapping into phonological processing mechanisms (Kyle & Harris, 2011; Kyle & Trickey, 2024). Based on the dual-stream model (Hickok & Poeppel, 2007), converging STS engagement indicates accessing phonological representations when analyzing visual speech gestures to detect spoken word differences and when recognizing sound differences in written words, both without any auditory speech input. Our results are in line with prior neuroimaging research showing engagement of the left STS during sublexical and lexical phonological processing across auditory, visual, and audiovisual domains in adults (Liebenthal et al., 2014; Turkeltaub & Coslett, 2010) and in children (McNorgan et al., 2013; Nath et al., 2011).

Functional localizers have previously been used in fMRI studies with children and adults to locate areas engaged in audiovisual integration during speech perception, specifically the STS (e.g., Nath et al., 2011; Szycik et al., 2008). A novel methodological contribution of this study was the use of an independent functional localizer task that was correlated with phonological awareness skill to delineate the key voxels of interest within the left STS. The left STS is a large region involved in multisensory processing. To focus on phonological mechanisms specifically, we correlated activation during the localizer task with scores on a standardized measure collected outside of the scanner. We then used the top 1,000 voxels that showed the highest correlation as the ROI in subsequent analyses. We used two different functional contrasts (higher- and lower-level control conditions) to localize these voxels for the primary and secondary analyses: lexical condition > fixation cross and lexical condition > perceptual condition. Based on visual inspection, there is large overlap in the STS voxels active during the SR task in relation to phonological awareness when assessed using the fixation and perceptual baselines (see Supplemental Material S5). This overlap confirms that activation patterns are tapping into linguistic processing and not solely sensory visual processing and instills confidence in the primary and secondary analyses results despite some spatial differences in voxel locations across these analyses.

Activation of the left STS during visual word-rhyming judgments was not related to children's reading skills in the planned analyses. We observed no active clusters at the preregistered threshold for both the primary and secondary analyses. However, in exploratory analyses at a reduced threshold, we observed a cluster showing a correlation of reading skill to activation during the visual word-rhyming task. Prior research in hearing children ages 5–8 years shows a positive relation between SR and single-word reading (Buchanan-Worster et al., 2020; Kyle & Trickey, 2024), and this association is mediated by phonological awareness (Buchanan-Worster et al., 2020). Although the analysis testing our first hypothesis showed the voxels engaged in both SR and visual word-rhyming tasks, activation during the rhyming task was only weakly related to reading skill in our exploratory analyses.

We suspect three possible explanations for these weak effects. First, reading acquisition involves multiple mechanisms when learning to map orthographic onto phonological representations. A neural model of audiovisual integration in reading suggests that, in addition to the STS, the left planum temporale and the fusiform gyrus are key regions involved (Blau et al., 2008; Hickok et al., 2009). A study by McNorgan et al. (2014) examined brain activation in children ages 8–14 years during a rhyming judgment task using word pairs where one word was presented auditorily followed by the second word presented visually. Activation in the temporal lobe during these cross-modal conditions was related to reading skill, but only in the planum temporale and not in the STS (McNorgan et al., 2014). These findings suggest that the left STS may be engaged initially in the integration of visual and phonological information at smaller units (e.g., grapheme-to-phoneme correspondence) followed by engagement of the planum temporale in processing larger grain sizes and more complex units such as rhyming word pairs. Thus, the weak findings showing a relation between left STS engagement and reading skill may be related to less sensitivity in this region specific to large-grain orthographic–phonological integration. Second, prior research of children with and without reading difficulties shows that the two groups have differential activation in the STS during cross-modal conditions (Blau et al., 2008; McNorgan et al., 2013). Similarly in adults with dyslexia, Rüsseler et al. (2018) found there was less activation in the left STS in comparison to typical readers, suggesting a potential difficulty in recruiting audiovisual processing areas in lower skilled readers. Participants in our analyses are typically developing with strong reading skills (majority standard scores being above 90). Thus, the weak findings in relation to the STS could also be due to the lack of variability in reading skill in the current sample. Finally, an anonymous reviewer suggested that the weak relationship between the STS activation and word reading skill could be influenced by rhyming task accuracy. A subsequent exploratory voxel-wise regression analysis did not reveal significant association between the STS ROI activity and word reading skill, when not partialing for rhyming accuracy. This contrasts with the original exploratory analysis, where a weak correlation was found when controlling for accuracy. It is important to emphasize that the original analyses were conducted using a liberal threshold, so the relation of reading skill to SR mechanisms is unclear. It is also possible that this reflects a statistical suppression effect (Martinez Gutierrez & Cribbie, 2021). Altogether, our findings will need to be replicated in the future including a group of children with greater variability in reading skills.

During word reading, both phonological and semantic representations of words may be automatically engaged, regardless of the task demands (e.g., Booth & Burman, 2005; Brozdowski & Booth, 2021; Cao et al., 2006, 2010; Joo et al., 2021). For example, in a study of hearing children with typical reading skills, SR has been shown to account for unique variance in reading accuracy, whereas vocabulary was predictive of reading accuracy and comprehension (Kyle et al., 2016). Thus, in our final set of analyses, we tested whether a relation of reading skill to activation of the STS is stronger for phonological processing during the rhyming task as compared to semantic processing during the meaning task. In the neuroimaging results, we observed a sizable cluster of activation related to reading skill for the rhyming compared to the meaning task and no clusters for the opposite comparison. This suggests that the relation between STS activation during the rhyming task and reading skill is selective to phonological processing in comparison to semantic processing. However, this result was only evident in the secondary analysis using the perceptual baseline condition. When using the fixation baseline condition, this effect was only present at the exploratory reduced threshold.

In the initial behavioral analysis, SR performance was positively related to both word-rhyming and word-meaning task performance, with no reliable difference in the strength of their correlation coefficients. This would suggest that those who were better at SR were also good at both the phonology and semantic judgments, which may be generally related to the overall strong reading skill in this sample. The subsequent reanalysis after removing outliers in the semantic task revealed that the association between the performance in SR and word-meaning task was driven by those participants with lower semantic scores. Thus, behavioral results are at least partially in line with prior findings showing that better speech readers may use phonological strategies while performing SR tasks (Kyle & Trickey, 2024). However, given that the previous study used an SR task involving deciphering words, sentences, and short stories and that our results highlight the need for including participants with a greater range of reading skills, we emphasize that this finding also needs to be replicated by future studies. Considering the neuroimaging and behavioral results together, our findings suggest a trend toward a positive relation between SR and visual word rhyming, both engaging underlying phonological mechanisms.

Our results are in line with prior research showing that SR skill may account for individual differences in reading ability (e.g., Buchanan-Worster et al., 2020; Gullick & Booth, 2014; Hahn et al., 2014; Karipidis et al., 2021; Kyle & Harris, 2011; Žarić et al., 2014). All individuals, including hearing children and adults, those with hearing loss, and those who are struggling with reading, rely on audiovisual information to support reading acquisition. Prior research suggests that SR may be particularly helpful for those with lower reading skills (e.g., dyslexia; Pennington & Bishop, 2009) or when there is limited access to auditory input (e.g., deaf and hard of hearing; Harris et al., 2017; Kyle & Harris, 2010; Pimperton et al., 2019). Francisco et al. (2017) found that SR was a unique contributor to variance in phonological awareness in dyslexic adult readers. Those who scored higher on SR also scored lower on phonological awareness, suggesting that the increased reliance on visual speech may be a compensatory factor in dyslexia (Francisco et al., 2017). However, prior behavioral studies also show positive associations between SR abilities in hearing children and phonological awareness (Buchanan-Worster et al., 2020; Kyle & Harris, 2011; Kyle & Trickey, 2024). Through behavioral and neuroimaging evidence, our results add to this literature to suggest that SR skill may be a facilitatory mechanism for literacy development in hearing populations. However, more studies are needed to better understand links between SR skills and word reading ability. By incorporating both behavioral and neuroimaging approaches, researchers could further examine the individual differences by which SR supports reading acquisition.

Conclusions

Overall, our study adds to the growing body of evidence that better reading skill in hearing children is supported by more robust engagement of specific phonological mechanisms, including those conveyed by visual speech gestures. Moreover, our results are the first to show that this relation is facilitated by the left STS—a key region for multisensory integration of various linguistic stimuli. These findings contribute to the current knowledge of the development of reading abilities and how neurobiological mechanisms of SR moderate the improvement of these skills.

Author Contributions

Anna Banaszkiewicz: Conceptualization, Methodology, Data curation, Formal analysis, Visualization, Writing – original draft, Writing – review & editing. Neelima Wagley: Conceptualization, Methodology, Project administration, Investigation, Data curation, Formal analysis, Visualization, Writing – original draft, Writing – review & editing. Clara Plutzer: Investigation, Data curation, Resources, Writing – review & editing. Rachael Rice: Investigation, Data curation, Resources, Writing – review & editing. James R. Booth: Funding acquisition, Supervision, Conceptualization, Methodology, Data curation, Writing – review & editing.

Data Availability Statement

The data set including participants' structural and functional MRI scans, questionnaire data, standardized behavioral scores, and experimental materials are publicly shared on Open Neuro (see Wang et al., 2025, for details; https://doi.org/10.18112/openneuro.ds006239.v1.0.2).

Supplementary Material

Supplemental Material S1. Results of the exploratory fMRI analysis at reduced significance threshold.
JSLHR-69-166-s001.pdf (581.1KB, pdf)
Supplemental Material S2. Results from exploratory analysis within the STS ROI with coordinates based on the AAL atlas. (*Region labels with asterisks were identified using the Neuromorphometrics, Inc. atlas implemented in SPM12).
JSLHR-69-166-s002.pdf (629.4KB, pdf)
Supplemental Material S3. Defining the region of interest in the left superior temporal sulcus (STS; red) for the secondary analyses. In yellow is the mask of the top 1000 voxels within the left STS showing the strongest correlation between task activation during speech reading (lexical>perceptual) and phonological awareness measured using CTOPP-2 Elision.
JSLHR-69-166-s003.pdf (642.7KB, pdf)
Supplemental Material S4. Results of the correlation between the speech reading task with phonological awareness to identify the functional regions of interest. The top 1000 voxels within the STS anatomical mask (outlined in white) that most correlated with the Elision subtest are shown. The color bar indicates the beta values from the GLM regression where Elision was entered as a covariate of interest. There were two main clusters at the following MNI coordinates, one at x = –56 y = –50 z = 18 (k = 969) and another at x = –40 y = 2 z = –18 (k = 31).
JSLHR-69-166-s004.pdf (700.2KB, pdf)
Supplemental Material S5. Visualization of the two STS regions of interest used in the primary and secondary analyses, respectively. The mask of the top 1000 voxels within the left STS showing the strongest correlation between task activation during speech reading and phonological awareness for the lexical>fixation contrast (light blue) and the lexical>perceptual contrast (light green).
JSLHR-69-166-s005.pdf (608.7KB, pdf)
Supplemental Material S6. Distribution of task performance on the conditions of interest for the Speech Reading (A), Phonology (B), and Semantic (C) in-scanner tasks. Red dots indicate mean values.
JSLHR-69-166-s006.pdf (549.5KB, pdf)
Supplemental Material S7. Correlation between the in-scanner accuracy for the Speech Reading task and in-scanner accuracy for the Semantic tasks using contrasts of interest, after removal of two outlier participants whose z-scores were over 3 standard deviations in the Semantic task.
JSLHR-69-166-s007.pdf (531.1KB, pdf)
Supplemental Material S8. Correlations between the word-reading skill and in-scanner accuracy for speech reading (Panel A: rho = 0.34, p = 0.033), phonology (Panel B: rho = 0.55, p < 0.001) and semantic tasks (Panel C: rho = 0.29, p = 0.074), as requested by the reviewers.
JSLHR-69-166-s008.pdf (617.9KB, pdf)

Acknowledgments

The study was supported by the National Institutes of Health (R01 DC018171) awarded to James R. Booth. This article is published Open Access under a read and publish agreement between Vanderbilt University and the American Speech-Language-Hearing Association. The authors gratefully acknowledge all the participants.

Funding Statement

The study was supported by the National Institutes of Health (R01 DC018171) awarded to James R. Booth. This article is published Open Access under a read and publish agreement between Vanderbilt University and the American Speech-Language-Hearing Association.

References

  1. Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., Neely, J. H., Nelson, D. L., Simpson, G. B., & Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 39(3), 445–459. 10.3758/BF03193014 [DOI] [PubMed] [Google Scholar]
  2. Barbosa-Pereira, D., Martins, P. S., Guimarães, A. P., Silva, E. D. O., Batista, L. T., Haase, V. G., & Lopes-Silva, J. B. (2020). How good is the phoneme elision test in assessing reading, spelling and arithmetic-related abilities? Archives of Clinical Neuropsychology, 35(4), 413–428. 10.1093/arclin/acz085 [DOI] [PubMed] [Google Scholar]
  3. Beauchamp, M. S., Argall, B. D., Bodurka, J., Duyn, J. H., & Martin, A. (2004). Unraveling multisensory integration: Patchy organization within human STS multisensory cortex. Nature Neuroscience, 7(11), 1190–1192. 10.1038/nn1333 [DOI] [PubMed] [Google Scholar]
  4. Beauchamp, M. S., Nath, A. R., & Pasalar, S. (2010). fMRI-guided transcranial magnetic stimulation reveals that the superior temporal sulcus is a cortical locus of the McGurk effect. Journal of Neuroscience, 30(7), 2414–2417. 10.1523/JNEUROSCI.4865-09.2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bernstein, L. E., & Liebenthal, E. (2014). Neural pathways for visual speech perception. Frontiers in Neuroscience, 8, Article 386. 10.3389/fnins.2014.00386 [DOI] [Google Scholar]
  6. Blau, V., Van Atteveldt, N., Formisano, E., Goebel, R., & Blomert, L. (2008). Task-irrelevant visual letters interact with the processing of speech sounds in heteromodal and unimodal cortex. European Journal of Neuroscience, 28(3), 500–509. 10.1111/j.1460-9568.2008.06350.x [DOI] [PubMed] [Google Scholar]
  7. Booth, J. R., & Burman, D. D. (2005). Using neuroimaging to test developmental models of reading acquisition. In Catts H. W. & Kamhi A. G. (Eds.), The connections between language and reading disabilities (pp. 131–153). Lawrence Erlbaum Associates Publishers. [Google Scholar]
  8. Bradley, L., & Bryant, P. E. (1983). Categorizing sounds and learning to read: A causal connection. Nature, 301(5899), 419–421. 10.1038/301419a0 [DOI] [Google Scholar]
  9. Brozdowski, C., & Booth, J. R. (2021). Reading skill correlates in frontal cortex during semantic and phonological processing. Open Science Framework. 10.31234/osf.io/d3mj7 [DOI]
  10. Buchanan-Worster, E., MacSweeney, M., Pimperton, H., Kyle, F., Harris, M., Beedie, I., Ralph-Lewis, A., & Hulme, C. (2020). Speechreading ability is related to phonological awareness and single-word reading in both deaf and hearing children. Journal of Speech, Language, and Hearing Research, 63(11), 3775–3785. 10.1044/2020_JSLHR-20-00159 [DOI] [Google Scholar]
  11. Calvert, G. A. (2001). Crossmodal processing in the human brain: Insights from functional neuroimaging studies. Cerebral Cortex, 11(12), 1110–1123. 10.1093/cercor/11.12.1110 [DOI] [PubMed] [Google Scholar]
  12. Cao, F., Bitan, T., Chou, T. L., Burman, D. D., & Booth, J. R. (2006). Deficient orthographic and phonological representations in children with dyslexia revealed by brain activation patterns. Journal of Child Psychology and Psychiatry, 47(10), 1041–1050. 10.1111/j.1469-7610.2006.01684.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Cao, F., Khalid, K., Zaveri, R., Bolger, D. J., Bitan, T., & Booth, J. R. (2010). Neural correlates of priming effects in children during spoken word processing with orthographic demands. Brain and Language, 114(2), 80–89. 10.1016/j.bandl.2009.07.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Erickson, L. C., Heeg, E., Rauschecker, J. P., & Turkeltaub, P. E. (2014). An ALE meta-analysis on the audiovisual integration of speech signals. Human Brain Mapping, 35(11), 5587–5605. 10.1002/hbm.22572 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Francisco, A. A., Jesse, A., Groen, M. A., & McQueen, J. M. (2017). A general audiovisual temporal processing deficit in adult readers with dyslexia. Journal of Speech, Language, and Hearing Research, 60(1), 144–158. 10.1044/2016_JSLHR-H-15-0375 [DOI] [Google Scholar]
  16. Fröhlich, M., Sievers, C., Townsend, S. W., Gruber, T., & van Schaik, C. P. (2019). Multimodal communication and language origins: Integrating gestures and vocalizations. Biological Reviews, 94(5), 1809–1829. 10.1111/brv.12535 [DOI] [PubMed] [Google Scholar]
  17. Gao, C., Green, J. J., Yang, X., Oh, S., Kim, J., & Shinkareva, S. V. (2023). Audiovisual integration in the human brain: A coordinate-based meta-analysis. Cerebral Cortex, 33(9), 5574–5584. 10.1093/cercor/bhac443 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Gao, D., Liang, X., Ting, Q., Nichols, E. S., Bai, Z., Xu, C., Mingnan, C., & Liu, L. (2024). A meta-analysis of letter–sound integration: Assimilation and accommodation in the superior temporal gyrus. Human Brain Mapping, 45(15), Article e26713. 10.1002/hbm.26713 [DOI] [Google Scholar]
  19. Gijbels, L., Yeatman, J. D., Lalonde, K., & Lee, A. K. C. (2021). Audiovisual speech processing in relationship to phonological and vocabulary skills in first graders. Journal of Speech, Language, and Hearing Research, 64(12), 5022–5040. 10.1044/2021_JSLHR-21-00196 [DOI] [Google Scholar]
  20. Grant, K. W., Bernstein, J. G. W. (2019). Toward a model of auditory-visual speech intelligibility. In Lee A., Wallace M., Coffin A., Popper A., & Fay R. (Eds.), Multisensory processes: The auditory perspective (pp. 33–57). Springer, Cham. 10.1007/978-3-030-10461-0_3 [DOI] [Google Scholar]
  21. Gullick, M. M., & Booth, J. R. (2014). Individual differences in crossmodal brain activity predict arcuate fasciculus connectivity in developing readers. Journal of Cognitive Neuroscience, 26(7), 1331–1346. 10.1162/jocn_a_00581 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Hahn, N., Foxe, J. J., & Molholm, S. (2014). Impairments of multisensory integration and cross-sensory learning as pathways to dyslexia. Neuroscience & Biobehavioral Reviews, 47, 384–392. 10.1016/j.neubiorev.2014.09.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Harris, M., Terlektsi, E., & Kyle, F. E. (2017). Concurrent and longitudinal predictors of reading for deaf and hearing children in primary school. The Journal of Deaf Studies and Deaf Education, 22(2), 233–242. 10.1093/deafed/enw101 [DOI] [PubMed] [Google Scholar]
  24. Hickok, G., Okada, K., Serences, J. T. (2009). Area Spt in the human planum temporale supports sensory-motor integration for speech processing. Journal of Neurophysiology, 101(5), 2725–2732. 10.1152/jn.91099.2008 [DOI] [PubMed] [Google Scholar]
  25. Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature Reviews Neuroscience, 8(5), 393–402. 10.1038/nrn2113 [DOI] [PubMed] [Google Scholar]
  26. Jerger, S., Damian, M. F., McAlpine, R. P., & Abdi, H. (2018). Visual speech fills in both discrimination and identification of non-intact auditory speech in children. Journal of Child Language, 45(2), 392–414. 10.1017/S0305000917000265 [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Joo, S. J., Tavabi, K., Caffarra, S., & Yeatman, J. D. (2021). Automaticity in the reading circuitry. Brain and Language, 214, Article 104906. 10.1016/j.bandl.2020.104906 [DOI] [Google Scholar]
  28. Karipidis, I. I., Pleisch, G., Di Pietro, S. V., Fraga-González, G., & Brem, S. (2021). Developmental trajectories of letter and speech sound integration during reading acquisition. Frontiers in Psychology, 12, Article 750491. 10.3389/fpsyg.2021.750491 [DOI] [Google Scholar]
  29. Kaufman, A. S., & Kaufman, N. L. (2004). Kaufman Brief Intelligence Test–Second Edition. AGS. [Google Scholar]
  30. Kyle, F. E., Campbell, R., & MacSweeney, M. (2016). The relative contributions of speechreading and vocabulary to deaf and hearing children's reading ability. Research in Developmental Disabilities, 48, 13–24. 10.1016/j.ridd.2015.10.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Kyle, F. E., Campbell, R., Mohammed, T., Coleman, M., & MacSweeney, M. (2013). Speechreading development in deaf and hearing children: Introducing the Test of Child Speechreading. Journal of Speech, Language, and Hearing Research, 56(2), 416–426. 10.1044/1092-4388(2012/12-0039) [DOI] [Google Scholar]
  32. Kyle, F. E., & Harris, M. (2006). Concurrent correlates and predictors of reading and spelling achievement in deaf and hearing school children. Journal of Deaf Studies and Deaf Education, 11(3), 273–288. 10.1093/deafed/enj037 [DOI] [PubMed] [Google Scholar]
  33. Kyle, F. E., & Harris, M. (2010). Predictors of reading development in deaf children: A 3-year longitudinal study. Journal of Experimental Child Psychology, 107(3), 229–243. 10.1016/j.jecp.2010.04.011 [DOI] [PubMed] [Google Scholar]
  34. Kyle, F. E., & Harris, M. (2011). Longitudinal patterns of emerging literacy in beginning deaf and hearing readers. Journal of Deaf Studies and Deaf Education, 16(3), 289–304. 10.1093/deafed/enq069 [DOI] [PubMed] [Google Scholar]
  35. Kyle, F. E., & Trickey, N. (2024). Speechreading, phonological skills, and word reading ability in children. Language, Speech, and Hearing Services in Schools, 55(3), 756–766. 10.1044/2024_LSHSS-23-00129 [DOI] [PubMed] [Google Scholar]
  36. Lalonde, K., & Holt, R. F. (2015). Preschoolers benefit from visually salient speech cues. Journal of Speech, Language, and Hearing Research, 58(1), 135–150. 10.1044/2014_JSLHR-H-13-0343 [DOI] [Google Scholar]
  37. Lee, I. A., & Preacher, K. J. (2013). Calculation for the test of the difference between two dependent correlations with one variable in common [Computer software]. https://quantpsy.org/corrtest/corrtest2.htm
  38. Liebenthal, E., Binder, J. R., Spitzer, S. M., Possing, E. T., & Medler, D. A. (2005). Neural substrates of phonemic perception. Cerebral Cortex, 15(10), 1621–1631. 10.1093/cercor/bhi040 [DOI] [PubMed] [Google Scholar]
  39. Liebenthal, E., Desai, R. H., Humphries, C., Sabri, M., & Desai, A. (2014). The functional organization of the left STS: A large scale meta-analysis of PET and fMRI studies of healthy adults. Frontiers in Neuroscience, 8, Article 289. 10.3389/fnins.2014.00289 [DOI] [Google Scholar]
  40. Maldjian, J. A., Laurienti, P. J., Kraft, R. A., & Burdette, J. H. (2003). An automated method for neuroanatomic and cytoarchitectonic atlas-based interrogation of fMRI data sets. NeuroImage, 19(3), 1233–1239. 10.1016/S1053-8119(03)00169-1 [DOI] [PubMed] [Google Scholar]
  41. Martinez Gutierrez, N., & Cribbie, R. (2021). Incidence and interpretation of statistical suppression in psychological research. Canadian Journal of Behavioural Science, 53(4), 480–488. 10.1037/cbs0000267 [DOI] [Google Scholar]
  42. Mazaika, P., Whitfield-Gabrieli, S., Reiss, A., & Glover, G. (2007). Artifact repair for fMRI data from high motion clinical subjects [Conference presentation]. Organization of Human Brain Mapping Annual Meeting, Chicago, IL, United States. [Google Scholar]
  43. McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264(5588), 746–748. 10.1038/264746a0 [DOI] [PubMed] [Google Scholar]
  44. McNorgan, C., Awati, N., Desroches, A. S., & Booth, J. R. (2014). Multimodal lexical processing in auditory cortex is literacy skill dependent. Cerebral Cortex, 24(9), 2464–2475. 10.1093/cercor/bht100 [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. McNorgan, C., Randazzo-Wagner, M., & Booth, J. R. (2013). Cross-modal integration in the brain is related to phonological awareness only in typical readers, not in those with reading difficulty. Frontiers in Human Neuroscience, 7, Article 388. 10.3389/fnhum.2013.00388 [DOI] [Google Scholar]
  46. Melby-Lervåg, M., Lyster, S. A., & Hulme, C. (2012). Phonological skills and their role in learning to read: A meta-analytic review. Psychological Bulletin, 138(2), 322–352. 10.1037/a0026744 [DOI] [PubMed] [Google Scholar]
  47. Nath, A. R., & Beauchamp, M. S. (2012). A neural basis for interindividual differences in the McGurk effect, a multisensory speech illusion. NeuroImage, 59(1), 781–787. 10.1016/j.neuroimage.2011.07.024 [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Nath, A. R., Fava, E. E., & Beauchamp, M. S. (2011). Neural correlates of interindividual differences in children's audiovisual speech perception. Journal of Neuroscience, 31(39), 13963–13971. 10.1523/JNEUROSCI.2605-11.2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Nelson, D. L., McEvoy, C. L., & Schreiber, T. A. (2004). The University of South Florida free association, rhyme, and word fragment norms. Behavior Research Methods, Instruments, & Computers, 36(3), 402–407. 10.3758/BF03195588 [DOI] [Google Scholar]
  50. Okada, K., & Hickok, G. (2009). Two cortical mechanisms support the integration of visual and auditory speech: A hypothesis and preliminary data. Neuroscience Letters, 452(3), 219–223. 10.1016/j.neulet.2009.01.060 [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Oron, A., Wolak, T., Zeffiro, T., & Szelag, E. (2016). Cross-modal comparisons of stimulus specificity and commonality in phonological processing. Brain and Language, 155-156, 12–23. 10.1016/j.bandl.2016.02.001 [DOI] [PubMed] [Google Scholar]
  52. Osterman, M. J., Hamilton, B. E., Martin, J. A., Driscoll, A. K., & Valenzuela, C. P. (2024). Births: Final data for 2022. National Vital Statistics Reports: From the Centers for Disease Control and Prevention, National Center for Health Statistics, National Vital Statistics System, 73(2), 1–56. 10.15620/cdc:145588 [DOI] [Google Scholar]
  53. Pennington, B. F., & Bishop, D. V. (2009). Relations among speech, language, and reading disorders. Annual Review of Psychology, 60(1), 283–306. 10.1146/annurev.psych.60.110707.163548 [DOI] [Google Scholar]
  54. Pimperton, H., Kyle, F., Hulme, C., Harris, M., Beedie, I., Ralph-Lewis, A., Worster, E., Rees, R., Donlan, Ch., & MacSweeney, M. (2019). Computerized speechreading training for deaf children: A randomized controlled trial. Journal of Speech, Language, and Hearing Research, 62(8), 2882–2894. 10.1044/2019_JSLHR-H-19-0073 [DOI] [Google Scholar]
  55. Plaut, D. C., McClelland, J. L., Seidenberg, M. S., & Patterson, K. (1996). Understanding normal and impaired word reading: computational principles in quasi-regular domains. Psychological Review, 103(1), 562–115. 10.1037/0033-295x.103.1.56 [DOI] [Google Scholar]
  56. Rosenblum, L. D. (2008). Speech perception as a multimodal phenomenon. Current Directions in Psychological Science, 17(6), 405–409. 10.1111/j.1467-8721.2008.00615.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Rüsseler, J., Ye, Z., Gerth, I., Szycik, G. R., & Münte, T. F. (2018). Audio–visual speech perception in adult readers with dyslexia: An fMRI study. Brain Imaging and Behavior, 12(2), 357–368. 10.1007/s11682-017-9694-y [DOI] [PubMed] [Google Scholar]
  58. Scharinger, M., Domahs, U., Klein, E., & Domahs, F. (2016). Mental representations of vowel features asymmetrically modulate activity in superior temporal sulcus. Brain and Language, 163, 42–49. 10.1016/j.bandl.2016.09.002 [DOI] [PubMed] [Google Scholar]
  59. Scheliga, S., Kellermann, T., Lampert, A., Rolke, R., Spehr, M., & Habel, U. (2023). Neural correlates of multisensory integration in the human brain: An ALE meta-analysis. Reviews in the Neurosciences, 34(2), 223–245. 10.1515/revneuro-2022-0065 [DOI] [PubMed] [Google Scholar]
  60. Steiger, J. H. (1980). Tests for comparing elements of a correlation matrix. Psychological Bulletin, 87(2), 245–251. 10.1037/0033-2909.87.2.245 [DOI] [Google Scholar]
  61. Stevenson, R. A., & James, T. W. (2009). Audiovisual integration in human superior temporal sulcus: Inverse effectiveness and the neural processing of speech and object recognition. NeuroImage, 44(3), 1210–1223. 10.1016/j.neuroimage.2008.09.034 [DOI] [PubMed] [Google Scholar]
  62. Szycik, G. R., Tausche, P., & Münte, T. F. (2008). A novel approach to study audiovisual integration in speech perception: Localizer fMRI and sparse sampling. Brain Research, 1220, 142–149. 10.1016/j.brainres.2007.08.027 [DOI] [PubMed] [Google Scholar]
  63. Teinonen, T., Aslin, R. N., Alku, P., & Csibra, G. (2008). Visual speech contributes to phonetic learning in 6-month-old infants. Cognition, 108(3), 850–855. 10.1016/j.cognition.2008.05.009 [DOI] [PubMed] [Google Scholar]
  64. Turkeltaub, P. E., & Coslett, H. B. (2010). Localization of sublexical speech perception components. Brain and Language, 114(1), 1–15. 10.1016/j.bandl.2010.03.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Tye-Murray, N., Hale, S., Spehar, B., Myerson, J., & Sommers, M. S. (2014). Lipreading in school-age children: The roles of age, hearing status, and cognitive ability. Journal of Speech, Language, and Hearing Research, 57(2), 556–565. 10.1044/2013_JSLHR-H-12-0273 [DOI] [Google Scholar]
  66. United States Census Bureau. (2024, September 19). Median household income by county in the United States and Puerto Rico [Interactive visualization]. https://www.census.gov/library/visualizations/interactive/median-household-income.html
  67. Vaden, K. I., Jr., Muftuler, L. T., & Hickok, G. (2010). Phonological repetition–suppression in bilateral superior temporal sulci. NeuroImage, 49(1), 1018–1023. 10.1016/j.neuroimage.2009.07.063 [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. van Atteveldt, N., Formisano, E., Goebel, R., & Blomert, L. (2004). Integration of letters and speech sounds in the human brain. Neuron, 43(2), 271–282. 10.1016/j.neuron.2004.06.025 [DOI] [PubMed] [Google Scholar]
  69. Venezia, J. H., Vaden Jr, K. I., Rong, F., Maddox, D., Saberi, K., & Hickok, G. (2017). Auditory, visual and audiovisual speech processing streams in superior temporal sulcus. Frontiers in Human Neuroscience, 11, Article 174. 10.3389/fnhum.2017.00174 [DOI] [Google Scholar]
  70. Wagner, R., Torgesen, J., & Rashotte, C. (2013). Comprehensive Test of Phonological Processing–Second Edition. PRO-ED. [Google Scholar]
  71. Wang, J., Vess, A., Mathur, A., Wagley, N., Quinto-Pozos, D., & Booth, J. R. (2025). A fMRI neuroimaging dataset of word reading with semantic and phonological localizers in children and adolescents. Data in Brief, 63, Article 112248. 10.1016/j.dib.2025.112248 [DOI] [Google Scholar]
  72. Wendling B. J., Mather N., & Schrank F. A. (2009). Woodcock-Johnson III Tests of Cognitive Abilities. In Naglieri J. A. & Goldstein S. (Eds.), Practitioner's guide to assessing intelligence and achievement (pp. 191–232). Wiley. [Google Scholar]
  73. Wilson, S. M., Bautista, A., & McCarron, A. (2018). Convergence of spoken and written language processing in the superior temporal sulcus. NeuroImage, 171, 62–74. 10.1016/j.neuroimage.2017.12.068 [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Žarić, G., Fraga González, G., Tijms, J., van der Molen, M. W., Blomert, L., & Bonte, M. (2014). Reduced neural integration of letters and speech sounds in dyslexic children scales with individual differences in reading fluency. PLOS ONE, 9(10), Article e110337. 10.1371/journal.pone.0110337 [DOI] [Google Scholar]
  75. Zeno, S. M., Ivens, S. H., Millard, R. T., & Duvvuri, R. (1995). The educator's word frequency guide. Touchstone Applied Science Associates. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material S1. Results of the exploratory fMRI analysis at reduced significance threshold.
JSLHR-69-166-s001.pdf (581.1KB, pdf)
Supplemental Material S2. Results from exploratory analysis within the STS ROI with coordinates based on the AAL atlas. (*Region labels with asterisks were identified using the Neuromorphometrics, Inc. atlas implemented in SPM12).
JSLHR-69-166-s002.pdf (629.4KB, pdf)
Supplemental Material S3. Defining the region of interest in the left superior temporal sulcus (STS; red) for the secondary analyses. In yellow is the mask of the top 1000 voxels within the left STS showing the strongest correlation between task activation during speech reading (lexical>perceptual) and phonological awareness measured using CTOPP-2 Elision.
JSLHR-69-166-s003.pdf (642.7KB, pdf)
Supplemental Material S4. Results of the correlation between the speech reading task with phonological awareness to identify the functional regions of interest. The top 1000 voxels within the STS anatomical mask (outlined in white) that most correlated with the Elision subtest are shown. The color bar indicates the beta values from the GLM regression where Elision was entered as a covariate of interest. There were two main clusters at the following MNI coordinates, one at x = –56 y = –50 z = 18 (k = 969) and another at x = –40 y = 2 z = –18 (k = 31).
JSLHR-69-166-s004.pdf (700.2KB, pdf)
Supplemental Material S5. Visualization of the two STS regions of interest used in the primary and secondary analyses, respectively. The mask of the top 1000 voxels within the left STS showing the strongest correlation between task activation during speech reading and phonological awareness for the lexical>fixation contrast (light blue) and the lexical>perceptual contrast (light green).
JSLHR-69-166-s005.pdf (608.7KB, pdf)
Supplemental Material S6. Distribution of task performance on the conditions of interest for the Speech Reading (A), Phonology (B), and Semantic (C) in-scanner tasks. Red dots indicate mean values.
JSLHR-69-166-s006.pdf (549.5KB, pdf)
Supplemental Material S7. Correlation between the in-scanner accuracy for the Speech Reading task and in-scanner accuracy for the Semantic tasks using contrasts of interest, after removal of two outlier participants whose z-scores were over 3 standard deviations in the Semantic task.
JSLHR-69-166-s007.pdf (531.1KB, pdf)
Supplemental Material S8. Correlations between the word-reading skill and in-scanner accuracy for speech reading (Panel A: rho = 0.34, p = 0.033), phonology (Panel B: rho = 0.55, p < 0.001) and semantic tasks (Panel C: rho = 0.29, p = 0.074), as requested by the reviewers.
JSLHR-69-166-s008.pdf (617.9KB, pdf)

Data Availability Statement

The data set including participants' structural and functional MRI scans, questionnaire data, standardized behavioral scores, and experimental materials are publicly shared on Open Neuro (see Wang et al., 2025, for details; https://doi.org/10.18112/openneuro.ds006239.v1.0.2).


Articles from Journal of Speech, Language, and Hearing Research : JSLHR are provided here courtesy of American Speech-Language-Hearing Association

RESOURCES