Abstract
Despite being a relatively new cultural phenomenon, the ability to perform letter–sound integration is readily acquired even though it has not had time to evolve in the brain. Leading theories of how the brain accommodates literacy acquisition include the neural recycling hypothesis and the assimilation–accommodation hypothesis. The neural recycling hypothesis proposes that a new cultural skill is developed by “invading” preexisting neural structures to support a similar cognitive function, while the assimilation–accommodation hypothesis holds that a new cognitive skill relies on direct invocation of preexisting systems (assimilation) and adds brain areas based on task requirements (accommodation). Both theories agree that letter–sound integration may be achieved by reusing pre‐existing functionally similar neural bases, but differ in their proposals of how this occurs. We examined the evidence for each hypothesis by systematically comparing the similarities and differences between letter–sound integration and two other types of preexisting and functionally similar audiovisual (AV) processes, namely object–sound and speech–sound integration, by performing an activation likelihood estimation (ALE) meta‐analysis. All three types of AV integration recruited the left posterior superior temporal gyrus (STG), while speech–sound integration additionally activated the bilateral middle STG and letter–sound integration directly invoked the AV areas involved in speech–sound integration. These findings suggest that letter–sound integration may reuse the STG for speech–sound and object–sound integration through an assimilation–accommodation mechanism.
Keywords: assimilation–accommodation, audiovisual integration, neural reuse, object–sound, speech–sound
Object–sound, speech–sound, and letter–sound audiovisual integration all recruited the left posterior superior temporal gyrus (assimilation). Speech–sound integration additionally activated the bilateral middle superior temporal gyrus (accommodation). And letter–sound integration directly invoked audiovisual areas involved in speech–sound integration (assimilation). Assimilation–accommodation might be the neural reuse mechanism for the acquisition of letter–sound integration.
Practitioner Points.
There is a double dissociation between validating and conflicting integration, with the superior temporal gyrus (STG) involved in validating integration and the insula/middle frontal gyrus involved in conflicting integration.
All three types of AV integration recruited the left posterior STG, speech–sound integration additionally activated the bilateral middle STG, and letter–sound integration shared the same AV areas as speech–sound integration.
Letter–sound integration may rely on a neural reuse mechanism of assimilation and accommodation of pre‐existing audiovisual integration.
1. INTRODUCTION
Hearing and seeing are two key perceptual channels that carry rich information about the world, the integration of which provides an obvious survival benefit, helping organisms to identify objects and events more quickly and accurately, and to avoid dangerous environments. As a relatively new cultural skill, reading has emerged and developed so recently that our genes have not yet been able to express an inherently literate brain. However, as a critical step in reading acquisition (Brem et al., 2010; Ehri, 2005; Preston et al., 2016), letter–sound integration, the formation of connections between visual symbols and corresponding speech sounds is acquired effortlessly and rapidly. How then do we successfully achieve letter–sound integration without a neural foundation established from birth?
“Neural reuse” has been proposed as a principle of such brain exaptation (Anderson, 2010, 2021), which describes an adaptation that requires a shift from the function originally served. During both development and evolution, neural circuits built for a specific purpose can retain their original functions while being used for other new purposes, such as reading.
There are two competing theories that attempt to explain how neural reuse occurs in literacy acquisition.
The first is that of neural recycling (Dehaene & Cohen, 2007), which proposes that a new cultural skill is developed by “invading” a preexisting neural niche within a structure to support a similar cognitive function. Learning promotes the reproduction of the neural niches by reorienting a proportion of the originally less specialized neural resources to a new use with limited anatomical reorganization. This hypothesis has been supported by the development of the visual word form area (VWFA) during reading acquisition. The occipitotemporal cortex naturally comprises a mosaic of specialized subareas involved in processing different visual categories such as buildings, faces, and objects (Dehaene et al., 2010; Hasson et al., 2003). With literacy instruction, a new subarea sensitive to written words (VWFA) emerges within this mosaic between the face and the object subareas, where it was initially weakly responsive to any type of visual stimulus (Dehaene‐Lambertz et al., 2018). As visual word form processing, letter–sound integration is also a key process in reading acquisition. With proficiency increases, the majority of individuals are able to integrate letter–sound automatically or even unconsciously. Therefore, it might be possible to develop a specialized area for letter–sound integration. Hence, we expect that the neural recycling mechanism may be extended from visual word form processing to letter–sound integration.
The second theory of neural reuse is the assimilation–accommodation hypothesis derived from Jean Piaget's theory of cognitive development, which offers a promising approach for adapting to endless new cultural skills without reproducing the brain (Piaget & Mays, 1972). Assimilation represents responding to new cognitive requirements by directly accessing similar existing brain network. If the new requirements exceed the function of the current network, accommodation allows a modified network based on the old one by adding supplemental areas and subtracting unnecessary areas. This hypothesis has been well‐supported in second language learning (Perfetti et al., 2007). Some support assimilation by finding overlapping brain activation between first (L1) and second language (L2) processing (Cao et al., 2013; Nichols et al., 2021; Tan et al., 2003); others support accommodation by showing that L2 requires the involvement of additional brain regions not involved in L1 (Liu et al., 2007; Nelson et al., 2009; Nichols & Joanisse, 2016). This suggests that letter–sound integration can directly adopt other similar neural circuits for its own use.
We, therefore, asked which of these two hypotheses could explain the reuse mechanism of letter–sound integration. As there is a common premise of both hypotheses, it is necessary to start with a search for functionally similar candidates for letter–sound integration. Two types of audiovisual (AV) processing are of interest here because they are both inherent functions of the brain and develop earlier than letter–sound integration.
One is object–sound integration, which refers to the process of recognizing an object by seeing its visual image while hearing the natural sound it makes. Studies have demonstrated that 4‐ to 8‐month‐old infants can integrate wood or metal objects with their corresponding sound information (Ujiie et al., 2018) and that 6‐month‐old infants have the ability to fuse asynchronous hand movements and clapping sounds (Kopp, 2014). In the absence of schooling, the area of the occipitotemporal cortex that could have developed into the VWFA was found to be gradually invaded by nearby object representation cortex (Dehaene‐Lambertz et al., 2018). Furthermore, almost all modern alphabets are derived from Egyptian hieroglyphs, whose ideographs are made up of simplified symbols for objects (Murray & Murray, 2017). The close connection between writing and objects in their neural basis and origin raises the possibility of reusing object–sound integration for letter–sound integration.
The other type of AV processing of interest speech–sound integration, which refers to the process of speech perception by hearing speech sounds and seeing the articulatory movements of the speaker's lips. Mercure et al. (2022) found that 7‐ to 10‐month‐old infants are already able to bind speech with sound. Given that a high degree of overlap in brain activation exists between speech and reading in the frontal, parietal, and temporal lobes (Price, 2012; Rueckl et al., 2015), reading ability is likely acquired by linking grapheme regions to the existing spoken networks (Bouhali et al., 2014; Chyl et al., 2018; Dehaene et al., 2015). Thus, letter–sound and speech–sound integration are likely to share certain neural circuits.
Neuroimaging studies of these three types of AV integration have provided favorable evidence for the possibility of reuse.
Beauchamp et al. (2004) investigated the neural basis of object–sound integration by using pictures and sounds of animals and tools as stimuli, finding that the posterior superior temporal sulcus (pSTS) was activated to a greater degree when AV channels were presented simultaneously than when presented alone, and that activation was associated with performance on the object identification task. Similar results have been found for simple physical stimuli such as tones and gratings (Starke et al., 2020; Werner & Noppeney, 2011). Moreover, the superior temporal sulcus and superior temporal gyrus (STS/STG) have also been detected as a site of AV integration in studies manipulating AV congruency or synchronization or signal‐to‐noise ratio (SNR) (Hein et al., 2007; Marchant et al., 2012; Stevenson & James, 2009). In addition to the STG, the frontal lobe, insula and claustrum are sometimes involved in object–sound integration (Naghavi et al., 2007), but these regions are less consistent depending on the experimental paradigm, stimulus characteristics, and analysis methods.
The STS/STG is also identified in speech–sound integration process, regardless of whether the stimuli are at the sub‐lexical level (phonemes, vowel‐consonant combinations) or the lexical level (words, sentences), and regardless of whether the experiment is manipulated for AV congruency, synchrony, or perceptual measures (e.g., the McGurk effect) (Aparicio et al., 2017; Calvert et al., 2000; Francisco et al., 2018; Miller, 2005; Nath & Beauchamp, 2011; Pekkola et al., 2006; Ye et al., 2017). In line with these studies, a meta‐analysis revealed consistent brain activation in the bilateral pSTG (Erickson et al., 2014). Further, the STG was found to be involved in congruent AV speech condition, whereas more dorsolateral regions such as the inferior frontal gyri were involved in incongruent AV speech condition (Erickson et al., 2014).
Raij et al. (2000) investigated the mechanism of letter–sound integration using magnetoencephalography (MEG) and demonstrated that bilateral STS/STG activation was stronger in a matched letter–sound condition than in non‐matched and control conditions. Similarly, fMRI studies in adults and children, as well as individuals with dyslexia, have replicated and reinforced the critical role played by STS/STG in learning grapheme–phoneme connections (Blau et al., 2009, 2010; van Atteveldt et al., 2004).
Taken together, The STS/STG, traditionally considered as classical auditory processing regions (Buchsbaum et al., 2001; Howard et al., 2000; Schönwiesner et al., 2007; Yi et al., 2019), are involved in all three types of AV integration. But the specific locations sensitive to the different types may not be identical. The only within‐subject study to examine object–sound and speech–sound integration found that object–sound integration was located posterior to speech–sound integration along the STS/STG (Stevenson & James, 2009). This suggests a need for straightforward comparisons of object–sound, speech–sound and letter–sound.
We postulate that letter–sound integration is achieved by reusing the pre‐existing AV integration functions of the STS/STG. Specifically, (1) according to the neural recycling hypothesis, letter–sound integration would produce a new sensitive subarea in the STG/STS, that is spatially separate from object–sound and speech–sound integration; (2) according to the assimilation–accommodation hypothesis, letter–sound integration may directly recruit AV areas for object–sound integration or speech–sound integration (assimilation). If the STG/STS is not fully capable, other regions would be co‐activated to support the process (accommodation). In either case, one is sure to see a full or partial overlap between the letter–sound and the object–sound or the speech–sound AV integration areas.
To test these predictions, we conducted an activation likelihood estimation (ALE) meta‐analysis of object–sound, speech–sound, and letter–sound AV integration studies and compared the similarities and differences among the three types of AV integration. As the results are likely to be more stable in the mature brain, and the number of studies of children is too small to perform meta‐analysis, we have focused on adult native speakers of alphabetic languages.
2. MATERIALS AND METHODS
2.1. Literature search and studies selection
We searched “Web of Science” and “ProQuest” for studies on object–sound, speech–sound, and letter–sound AV integration published between January 1900 and October 2022 using titles, abstracts, and keywords as search scope. To reduce the omission of relevant studies, we adopted a “loose‐in” strategy by using broader search terms. Each combination of search terms has three key elements: (1) AV integration, or other synonyms or substitutes used in the literature, which may be common to all types of integration or unique to a particular one; (2) the category of integration (object, speech, reading); and (3) neuroimaging techniques (fMRI, PET) with high spatial resolution that can provide coordinates. For example, “audiovisual, object, fMRI” forms a set of combinations, with each term corresponding to one element (see Table S1 for a full list of the combinations). We also tracked research groups with a long‐standing interest in AV integration and reviewed the references of identified studies for additional publications.
After removing duplicate studies, two rounds of screening were performed to obtain eligible studies (Figure 1). The purpose of the first screening round was to eliminate irrelevant articles due to the loose‐in search strategy by reviewing titles and abstracts. Studies were excluded if they were (1) not related to any type of AV integration (object–sound, speech–sound, or letter–sound) or confounded with other cognitive processes such as attention, emotion, memory or learning; (2) reviews, meta‐analyses, case studies or research on statistical methods; (3) not using fMRI or PET; and (4) not available in full text or not formally published, such as papers in preparation and conference abstracts.
FIGURE 1.
Flowchart of the selection process. The lowercase n indicates the number of studies, which is the number of included papers. The capital N indicates the number of experiments, which is the number of different subject groups.
Once the candidate literature had been significantly narrowed down, full‐text articles were assessed for eligibility in a second screening round. Included studies had to (1) be restricted to alphabetic languages, as the neural basis of integration in ideographic languages is different from that in alphabetic languages (Shinozaki et al., 2016); and (2) include at least one group of healthy adults, excluding children, older adults and clinical populations. More restrictively, only healthy native speakers were included, as the brain mechanisms of speech–sound integration are different for first‐ and second‐language speakers (Barros‐Loscertales et al., 2013). Studies also had to (3) present stimuli in both visual and auditory modalities and use eligible stimulus types. Specifically, object visual stimuli included tools, animals, musical instruments, body parts, natural scenes, simple physical features such as lines, dots, flashes, which can be presented as photographs, pictures, or line drawings. Auditory stimuli were sounds emitted by the corresponding objects. Speech visual stimuli were vocalized lips or faces and auditory stimuli included spoken letters, strings, words, numbers, and sentences. Letter visual stimuli were scripts and auditory stimuli included spoken letters, strings, words, and numbers. Studies had to (4) perform whole‐brain analysis, not just region of interest (ROI) analysis; and (5) perform general linear modeling (GLM). It should be noted that we also retrieved a MEG study (Raij et al., 2000) because the electrodes covered almost the whole brain and it is generally cited as a classic study of letter–sound integration.
2.2. Coordinate retrieval
To ensure reliable and unbiased results, we included studies that met at least one of the following generally accepted criteria (Table 1). If an experiment met multiple criteria, all coordinates obtained according to the eligible contrasts were qualified.
Criterion 1: Multisensory area. The AV integration area should be activated in both the visual‐only and the auditory‐only condition [(A > B) ∩ (V > B), A stands for auditory‐only condition, V stands for visual‐only condition, B stands for baseline, such as fixation, scrambled picture, still face.]
Criterion 2: Interaction effect. The activation of the AV condition is not equivalent to the simple addition of the activities in the visual‐only and auditory‐only conditions [AV ≠ (A + V)] This criterion can be tested by one of the following contrasts: (1) super‐additivity criterion: AV activation is greater than the sum of A and V [AV > (A + V)]; (2) sub‐additivity criterion: AV activation is weaker than the sum of A and V [AV < (A + V)]; (3) max criterion: AV activation is greater than the maximum value of A and V [AV > max (A, V)]; (4) mean criterion: AV activation is greater than the mean value of A and V [AV > mean (A, V)].
Criterion 3: Conceptual congruence. Visual and auditory stimuli are more likely to be integrated when they are semantically or conceptually congruence [AVconC ≠ AVconI, con stands for concept, C stands for congruence, I stands for incongruence].
Criterion 4: Temporal congruence. Visual and auditory stimuli are more likely to be integrated when they present simultaneously [AVtemC ≠ AVtemI, tem stands for temporal].
Criterion 5: Spatial congruence. Visual and auditory stimuli are more likely to be integrated when they have the same spatial location [AVspaC ≠ AVspaI, spa stands for spatial].
Criterion 6: Inverse effect. The lower the signal‐to‐noise ratio (SNR) of auditory or visual stimuli, the greater the benefit of AV stimuli [Benefit = (AV – [A + V])/(A + V), Benefit of AV low‐SNR > Benefit of AV high‐SNR].
Criterion 7: Adaptation effect. For two AV stimuli presented back‐to‐back, brain activation decreases when the second stimulus is identical to the first and increases when the second stimulus is different from the first [AV different > AV identical].
Criterion 8: Perception effect. Subjects report their subjective perception of whether the visual and auditory information is integrated into a whole [AV fuse ≠ AV non‐fuse]. In general, congruence of information is a prerequisite for fusion and the more congruent it is, the easier it is to fuse. One exception is the McGurk effect, in which participants are able to integrate the different auditory and visual information into a completely new concept, such as hearing /ba/, seeing /ga/, perceiving it as /da/ [AV McGurk ≠ AV non‐McGurk; AVC ≠ AV McGurk].
Criterion 9: Type specific effect. This represents the unique integration area of a particular type of AV integration compared with others [AV type1 > AV type2, type stands for one of the three types of AV integration in our study].
Criterion 10: Brain–behavior correlation. The brain areas associated with integrated AV processing should be correlated with the corresponding behavioral indicators [positive or negative correlations].
TABLE 1.
Criteria, corresponding experimental conditions, and contrasts used to extract the coordinates.
Criteria | Experiment conditions | Contrast | |
---|---|---|---|
Validating integration | Conflicting integration | ||
Multisensory area | A V B | A > B ∩ V > B | |
Interaction effect | A V AV |
AV > (A + V) AV < (A + V) AV > max (A, V) AV > mean (A, V) |
|
Conceptual congruence | Conceptually congruent and incongruent AV | AVconC > AVconI | AVconC < AVconI |
Temporal congruence | Temporally congruent and incongruent AV | AVtemC > AVtemI | AVtemC< AVtemI |
Spatial congruence | Spatially congruent and incongruent AV | AVspaC > AVspaI | AVspaC < AVspaI |
Inverse effect | AV with different SNR | Benefit of AV low‐SNR > Benefit of AV high‐SNR | |
Adaptation effect | Two identical or different AV pairs presented one after the other | AV different > AV identical | |
Perceptual effect | Fused and unfused AV |
AV fuse > AV non‐fuse AV McGurk > AV non‐McGurk AVC > AV McGurk |
AV fuse < AV non‐fuse AV McGurk < AV non‐McGurk AVC < AV McGurk |
Type specific effect | Two types of AV integration | AV type1 > AV type2 | |
Brain–behavior correlation | Brain and behavioral indicators for AV | Positive correlation | Negative correlation |
Abbreviations: A, auditory stimuli; AV, audiovisual stimuli; B, baseline; C, congruence; con, conceptual; I, incongruence; SNR, signal‐noise ratio; spa, spatial; tem, temporal; V, visual stimuli.
Additionally, inspired by the meta‐analysis study on speech–sound integration (Erickson et al., 2014), which found that validating AV speech recruits ventral stream areas and conflicting AV speech recruits dorsal stream areas, we divided all involved contrasts into two classes. Validation of AV contrasts means that (1) visual and auditory information are conceptually, temporally, or spatially congruent; or that (2) subjects report that information from different sensory channels is fused into a whole; or that (3) the researcher specifies that the detected area contributes to multisensory integration. Conflicting integration is the opposite of validating integration (Table 1).
All of the above steps for screening studies and extracting coordinates were first carried out independently by three graduate students. Controversial studies were then discussed together to decide on their inclusion. The final study list included 33 object–sound experiments with 248 foci in 508 individuals, 33 speech–sound experiments with 371 foci in 458 individuals, and 10 letter–sound experiments with 56 foci in 140 individuals (Table 2).
TABLE 2.
Studies included in the meta‐analysis.
Studies | Number of subjects (male/female) | Mean age (SD/range) | Handiness (right/left) | Language | Stimuli | Task | Baseline | Criteria | Validating integration | Conflicting integration | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Contrast | Source | Number of foci | Contrast | Source | Number of foci | |||||||||
Object–sound integration | ||||||||||||||
Baumann et al. (2018) | 34 (18/16) | 21 (18–27) | ~ | Gabor grating, tone | Judgment | Time congruency | AVtemC < AVtemI | Table 4 | 3 | |||||
Beauchamp et al. (2004) | 26 (~/~) | ~ (~) | ~ | Animal, tool | Judgment | Fixation | Multisensory area | (A > B) ∩ (V > B) | Table 1 | 1 | ||||
Olivetti Belardinelli et al. (2004) | 13 (13/0) | 22.8 (~) | 13/0 | Animal, tool, human | Identification | Interaction | AV > max (A, V) | Table 1 | 12 | AVI > max (A, V) | Table 2 | 5 | ||
Bushara et al. (2001) | 12 (7/5) | ~ (27–56) | 12/0 | Circle, tone | Judgment | Time congruency | AVtemC < AVtemI | Table 1 | 9 | |||||
Bushara et al. (2003) | 7 (3/4) | ~ (~) | 7/0 | Bar, collision sound | Judgment | Perception | AV fused > AV non‐fused | Table 2 | 8 | AV fused < AV non‐fused | Table 2 | 6 | ||
10 (5/5) | 24.0 (2.2) | Artificial object | Passive perception | Interaction | AV > (A + V) | Table 1 | 2 | |||||||
Butler et al. (2011) | 10 (5/5) | 24.7 (4.1) | 10/0 | Artificial object | Passive perception | Interaction | AV < (A + V) | Table 1 | 1 | |||||
Butler and James (2013) | 15 (6/9) | 23 (3) | 10/0 | Artificial object | Passive perception | Fixation | Multisensory area | (A > B) ∩ (V > B) | Table 1e | 2 | ||||
Hein et al. (2007) | 18 (11/7) | 29.8 (23–41) | 17/1 | Animal | Passive perception | Fixation | Multisensory area and Interaction | (A > B) ∩ (V > B) ∩ [AVC > max (A, V)] | Figure 2 | 3 | (A > B) ∩ (V > B) ∩ [AVI > max (A, V)] | Figure 2 | 2 | |
Hocking and Price (2009) | 17 (6/12) | 26 (20–36) | 17/0 | Object | Judgment | Type specific effect | AV object–sound > AV letter–sound | Table 3 | 1 | |||||
James et al. (2011) | 12 (6/6) | 21.7 (~) | 12/0 | Tool | One‐back | Scrambled A or V | Multisensory area | (A > B) ∩ (V > B) | Table 1 | 6 | ||||
James et al. (2011b) | 12 (6/6) | 21.7 (~) | 12/0 | Tool | One‐back | Scrambled A or V | Multisensory area | (A > B) ∩ (V > B) | Table 1 | 6 | ||||
Kassuba et al. (2011) | 19 (9/10) | 25.37 (21–33) | 19/0 | Object | Judgment | Texture | Multisensory area‐and type‐specific effect | (A > B) ∩ (V > B) ∩ (A > rest > V > rest) | Table 1 | 1 | ||||
Laing et al. (2015) | 9 (7/2) | 24 (21–26) | 9/0 | Cuboid, tone | Judgment | Congruency effect | AVconC > AVconI | Table 4 | 4 | |||||
Laurienti et al. (2003) | 12 (9/7) | 32 (~) | 15/1 | Object | Judgment | Congruent effect | AVconC > AVconI | Table 1 | 1 | AVconC < AVconI | Table 1 | 2 | ||
Time congruency | AVtemC > AVtemI | Table 1 | 14 | |||||||||||
Lewis and Noppeney (2010) | 16 (11/5) | ~ (18–31) | 14/2 | Dot, click | Judgment | Brain–behavior correlation | Positive correlation | Table 1 | 1 | |||||
van der Linden et al. (2011) | 16 (5/11) | 21.6 (18–26) | ~ | Bird | Passive perception | Congruent effect | AVconC > AVconI | Table 2 | 34 | AVconC < AVconI | Table 2 | 3 | ||
Love et al. (2018) | 20 (10/10) | 24 (20–32) | 20/0 | Drummer beat | Judgment | Time congruency | AVtemC > AVtemI | Figure 5 | 2 | AVtemC < AVtemI | Figure 5 | 4 | ||
Man et al. (2015) | 18 (10/8) | ~ (~) | 18/0 | Object | Passive perception | Rest | Multisensory area | (A > B) ∩ (V > B) | Figure 2 | 1 | ||||
Marchant et al. (2012) | 16 (7/9) | 24.7 (~) | 16/0 | Checkerboard, tone | Detection | Time congruency | AVtemC > AVtemI | Table 2 | 9 | |||||
One‐back | Time congruency | AVtemC > AVtemI | Table 2a | 2 | AVtemC < AVtemI | Table 2b | 3 | |||||||
McCormick et al. (2018) | 18 (9/9) | 24.75 (~) | 18/0 | Circle, tone | Oddball | Congruent effect | AVconC > AVconI | Table 4a | 8 | |||||
Naghavi et al. (2007) | 23 (12/11) | 24 (19–30) | 23/0 | Animal, tool | Passive perception | Congruent effect | AVconC > AVconI | Figure 1 | 3 | |||||
Noesselt et al. (2007) | 24 (14/10) | 24 (~) | ~ | Optic fibers, tone | Detection | Rest | Multisensory area and time congruency | (A > B) ∩ (V > B) ∩ (AVtemC > AVtemI) | Table 2 | 2 | ||||
Rest | Multisensory area and inverse effect | (A > B) ∩ (V > B) ∩ (Benefit of AV low‐SNR > Benefit of AV high‐SNR) | Table 1 | 6 | ||||||||||
Noesselt et al. (2010) | 12 (6/6) | ~ (21–29) | ~ | Grating, tone | Detection | Brain–behavior correlation | Positive correlation | Table 3 | 4 | |||||
Noppeney et al. (2010) | 19 (7/12) | 22.1 (19–26) | 18/1 | Tool, instrument | Judgment | Congruent effect | AVconC < AVconI | Table 2 | 3 | |||||
Space congruency | AVspaC > AVspaI | Results | 1 | |||||||||||
Plank et al. (2012) | 15 (4/11) | ~ (20–34) | 13/2 | Animal, tool, vehicle, instrument | Judgment | Brain–behavior correlation | Positive correlation | Figure 7 | 1 | |||||
Porada et al. (2021) | 16 (9/7) | 26.9 (3.2) | 16/0 | Object | Judgment | Interaction | AV > (A + V) | Table A.1 | 13 | |||||
Starke et al. (2020) | 20 (13/7) | 26.2 (21–33) | ~ | Checkerboard, tone | Detection | Interaction | AV > (A + V) | Table 3 | 10 | |||||
Identification | Congruent effect | AVconC > AVconI | Table 3 | 1 | ||||||||||
Sestieri et al. (2006) | 10 (10/0) | 26.2 (20–34) | 10/0 | Animal, weapon, instrument, vehicle | Judgment | Space congruency | AVspaC > AVspaI | Table 4 | 2 | |||||
Study1 11 (5/6) | 25.9 (~) | 11/0 | Tool | Identification | A or V noise | Multisensory area | (A > B) ∩ (V > B) | Table 1 | 2 | |||||
Stevenson and James (2009) | Study2 11 (5/6) | 24.4 (~) | 11/0 | Tool | Identification | A or V noise | Multisensory area | (A > B) ∩ (V > B) | Table 1 | 2 | ||||
Stevenson et al. (2011) | Study1 10 (5/6) | 26.5 (~) | 11/0 | Tool | Identification | Inverse effect | Benefit of AV low‐SNR > Benefit of AV high‐SNR | Table 1 | 9 | Benefit of AV low‐SNR < Benefit of AV high‐SNR | Table 1 | 6 | ||
AV > (A + V) | Table 1 | 3 | ||||||||||||
Categorization | Interaction | AV < (A + V) | Table 1 | 8 | ||||||||||
Inverse effect | Benefit of AV low‐SNR > Benefit of AV high‐SNR | Table 1 | 7 | |||||||||||
Werner and Noppeney (2011) | 20 (10/10) | 25.8 (~) | 20/0 | Tool, instrument | Detection | Brain–behavior correlation | Positive correlation | Table 1 | 6 | |||||
Werner and Noppeney (2011) | 17 (9/8) | 26.5 (~) | 17/0 | Dot, tone | Detection | Fixation | Multisensory area and interaction | (A > B) ∩ (V > B) ∩ [AV > (A + V)] | Table 1 | 3 | ||||
Speech–sound integration | ||||||||||||||
Aparicio et al. (2017) | 15 (6/9) | 25.2 (20–37) | 15/0 | French | Word | Detection | Still face | Multisensory area and interaction | (A > B) ∩ (V > B) ∩ [AV > (A + V)] | Table 5 | 3 | |||
Barros‐Loscertales et al. (2013) | 42 (23/19) | ~ (20–46) | ~ | English, Spanish | Sentence | Passive perception | Rest | Multisensory area and interaction | (A > B) ∩ (V > B) ∩ AVC > max (A, V) | Table 3 | 3 | |||
Perceptual effect | AVC < AV McGurk | Table 2 | 14 | |||||||||||
Congruent effect | AVconC < AVconI | Table 2 | 14 | |||||||||||
Benoit et al. (2009) | 15 (5/11) | 29.6 (19–47) | 15/0 | English | McGurk (CV) | Judgment | Perception | AV McGurk < AV non‐McGurk | Table 2 | 12 | ||||
Callan et al. (2003) | 6 (6/0) | ~ (20–45) | 6/0 | English | Word | Passive perception | Still face | Inverse effect | (AV low‐SNR > B) ∩ (Benefit of AV low‐SNR > Benefit of AV high‐SNR) | Table 1 | 12 | |||
Calvert et al. (2000) | 5 (3/2) | 35 (24–49) | 5/0 | English | Number | Rehearsing | Interaction | AV > max (A, V) | Table 1 | 6 | ||||
Calvert et al. (2000) | 10 (5/5) | 30.1 (22–45) | 10/0 | English | Sentence | Passive perception | Rest | Multisensory area and interaction | (A > B) ∩ (V > B) ∩ [AVC > (A + V)] | Table 1 | 9 | (A > B) ∩ (V > B) ∩ [AVI < (A + V)] | Table2 | 6 |
Erickson et al. (2014) | 10 (4/6) | 25.72 (3.01) | 10/0 | English | McGurk (CV) | Count | Interaction | AV > max (A, V) | Table 1 | 6 | ||||
Fairhall and Macaluso (2009) | 12 (6/7) | 26.6 (~/~) | 12/0 | Italian | Story | Passive perception | Congruent effect | AVconC > AVconI | Table 1 | 6 | ||||
Francisco et al. (2018) | 20 (8/12) | 25.75 (4.06) | ~ | Dutch | Word, CV | One‐back | Interaction | AV > (A + V) | Table B3 | 6 | ||||
Congruent effect | AVconC < AVconI | Table 2 | 21 | |||||||||||
Gau and Noppeney (2016) | 16 (10/6) | 30.1 (22–45) | 16/0 | German | McGurk (CV) | Identification | Perception | AV McGurk < AV non‐McGurk | Table 2 | 1 | ||||
Lee and Noppeney (2011a) | 30 (15/16) | 24.7 (~) | 30/0 | German | Sentence | Detection | Interaction | AV < (A + V) | Table 3 | 6 | ||||
Lee and Noppeney (2011b) | 19 (~/~) | 27.1 (3.4) | ~ | German | Sentence | Passive perception | Time congruency | AVtemC < AVtemI | Table S3 | 6 | ||||
Lüttke et al. (2016) | 23 (3/20) | ~ (19–30) | 23/0 | Dutch | McGurk (CVC) | Identification | Perceptual effect | AVC > AV McGurk | Table 1 | 4 | ||||
Macaluso et al. (2004) | 8 (8/0) | 36 (3) | 8/0 | English | Word | Detection | Time concurrency | AVtemC > AVtemI | Table 1a | 4 | ||||
Matchin et al. (2014) | 20 (8/12) | ~ (20–30) | 20/0 | English | McGurk (CV) | Identification | Perceptual effect | AVC < AV McGurk | Table 3 | 11 | ||||
Miller (2005) | 11 (5/6) | ~ (18–33) | 11/0 | English | VCV | Judgment | Perception | AV fused > AV non‐fused | Table 1 | 2 | AV fused < AV non‐fused | Table 1 | 15 | |
Multisensory area and perception | (A > B) ∩ (V > B) ∩ (AV fused > AV non‐fused) | Table 1 | 4 | (A > B) ∩ (V > B) ∩ (AV fused < AV non‐fused) | Table 1 | 7 | ||||||||
Noesselt et al. (2012) | 14 (7/7) | ~ (~) | ~ | German | Sentence | Judgment | Fixation | Multisensory area and time congruency | (A > B) ∩ (V > B) ∩ (AVtemC > AVtemI) | Table 2 | 3 | (A > B) ∩ (V > B) ∩ (AVtemC < AVtemI) | Table 2 | 15 |
Ojanen et al. (2005) | 10 (5/5) | 26 (22–31) | 10/0 | Finish | Letter | Detection | Congruent effect | AVconC < AVconI | Table 1 | 4 | ||||
Pekkola et al. (2006) | 10 (6–4) | 27.0 (22–34) | 10/0 | Finish | Letter | Detection | Congruent effect | AVconC < AVconI | Table 3 | 2 | ||||
Stevenson and James (2009) | Study2 11 (5/6) | 24.4 (~) | 11/0 | English | Word | Judgment | A or V noise | Multisensory area | (A > B) ∩ (V > B) | Table 1 | 2 | |||
A or V noise | Multisensory area | (A > B) ∩ (V > B) | Table 1 | 2 | ||||||||||
Stevenson et al. (2011) | 8 (4/4) | 24.1 (~) | 8/0 | English | Word | Judgment | Time concurrency | (AVtemC > AVtemI) | Table 1 | 2 | ||||
Fixation | Multisensory area | (A > B) ∩ (V > B) | Table 1 | 2 | ||||||||||
Stevenson et al. (2011) | 12 (6/6) | 22.3 (2.8) | 12/0 | English | Word | Judgment | Time concurrency | (AVtemC > AVtemI) | Table 1 | 2 | ||||
Skipper et al. (2007) | 13 (~) | ~ (~) | 13/0 | English | McGurk (CV) | Passive perception | Perceptual effect | AVC > AV McGurk | Table 4 | 57 | AVC < AV McGurk | Table 3 | 30 | |
Rest | Multisensory area and interaction | (A > B) ∩ (V > B) ∩ [AV > max (A, V)] | Results 2.1 | 2 | ||||||||||
Szycik et al. (2008) | 12 (~/~) | 23 (21–26) | 12/0 | German | Word | Detection | Congruency effect | AVconC > AVconI | Table 2 | 1 | ||||
Szycik et al. (2009) | 11 (6/5) | 24.6 (21–29) | 11/0 | German | Word | Detection | Congruent effect | Table 1 | AVconC < AVconI | 9 | ||||
Szycik et al. (2009) | 15 (7/8) | 36.5 (9.4) | 15/0 | German | Word | Identification | Congruent effect | Table 1 | AVconC < AVconI | 15 | ||||
Szycik et al. (2012) | 7 (7/5) | ~ (21–39) | 12/0 | German | McGurk (CV) | Identification | Perception | AV McGurk > AV non‐McGurk | Table 2 | 2 | AVC < AV McGurk | Table 2 | 19 | |
Tietze et al. (2019) | 14 (15/1) | 33.75 (8.22) | 12/2 | German | Word | Judgment | Congruent effect | Table 2 | AVconC < AVconI | 6 | ||||
Treille et al. (2017) | 12 (7/7) | 26 (18–44) | 12/0 | French | CV | Passive perception | Interaction | AV > max (A, V) | Table 5 | 6 | ||||
Venezia et al. (2017) | 18 (17/3) | ~ (~) | 18/0 | English | CV | Oddball | Rotated A or silent gestured face | Multisensory area | (A > B) ∩ (V > B) | Whole‐Brain results | 2 | |||
Wiersinga‐Post et al. (2010) | 14 (9/5) | ~ (22–45) | ~ | Dutch | McGurk (CVC) | Identification | Perception | Brain‐behavior correlation | Negative correlation | Table 1 | 7 | |||
AV > mean (A, V) | Table 3 | 2 | ||||||||||||
Ye et al. (2017) | 13 (11/2) | 25.3 (6.8) | 9/1 | German | Number | Judgment | Interaction | AV > max (A, V) | Table 3 | 1 | ||||
Letter–sound integration | ||||||||||||||
Blau et al. (2008) | 19 (8/11) | 21.4 (3.5) | 19/0 | Dutch | Letter | Judgment | Congruent effect | AVconC > AVconI | Table 1 | 9 | ||||
Blau et al. (2009) | 13 (9/4) | 26.8 (5.4) | ~ | Dutch | Letter | Passive perception | Congruent effect | AVconC > AVconI | Figure 2 | 2 | ||||
Holloway et al. (2015) | 18 (9/9) | 24 (19–35) | 18/0 | English | Letter, number | Passive perception | Congruent effect | AVconC > AVconI | Figure 3 | 3 | AVconC < AVconI | Table 1 | 9 | |
Hocking and Price (2009) | 17 (6/12) | 26 (20–36) | 17/0 | English | Word | Judgment | Type specific effect | AV letter–sound > AV object–sound | Table 2 | 1 | ||||
Raij et al. (2000) | 8 (5/4) | ~ (22–32) | 8/1 | Finish | Letter | Detection | Deformed A or V | Multisensory area and interaction | (A > B) ∩ (V > B) ∩ [AV < (A + V)] | Table 1 | 5 | |||
van Atteveldt et al. (2004) | 16 (3/13) | 22 (19–27) | 16/0 | Dutch | Letter | Passive perception | Fixation | Multisensory area and interaction | (A > B) ∩ (V > B) ∩ [AV > max (A, V)] | Table 1 | 4 | |||
Congruent effect | AVconC > AVconI | Table 2 | 4 | |||||||||||
van Atteveldt et al. (2006) | 8 (1/7) | 23 (19–29) | 8/0 | Dutch | Letter | Passive perception | Fixation | Multisensory area and interaction | (A > B) ∩ (V > B) ∩ [AV > max (A, V)] | Table 3 | 1 | |||
Study1 12 (4/8) | 23 (20–27) | 12/0 | Dutch | Letter | Passive perception | Congruent effect | AVconC > AVconI | Table 2 | 5 | |||||
van Atteveldt et al. (2007) | Study2 13 (4/9) | 23 (18–34) | 13/0 | Dutch | Letter | Passive perception and judgment | Congruent effect | AVconC > AVconI | Table 3 | 2 | AVconC < AVconI | Table 4 | 7 | |
van Atteveldt et al. (2010) | 16 (6/10) | 22.8 (19–32) | 16/0 | Dutch | Letter | Detection | Adaptation effect | AV different > AV identical | Table 1 | 4 |
2.3. ALE statistical analysis
2.3.1. Localization of three different types of AV integration
We first localized object–sound, speech–sound, and letter–sound AV integration by conducting three separate ALE analyses in GingerALE 3.0.2 (www.brainmap.org). To build the dataset for each AV integration condition, we converted the coordinates of foci to Montreal Neurological Institute (MNI) space using the tal2mni or bretttal2mni conversion tools provided by Ginger. Coordinates of foci were then organized as an experiment by subject group rather than contrast to eliminate false positives due to within‐group effects (Turkeltaub et al., 2012).
After preparing each AV integration dataset, the ALE analysis steps were conducted as follows. First, ALE was used to model Gaussian probability distributions for all foci reported in a given experiment, centered on the peak coordinates with a width based on an empirical estimate of the spatial uncertainty due to different brain templates and sample sizes. Second, the Gaussian distributions were used to generate the modeled activation map (MA map) for the given experiment, where each value represented the activation probability at an exact voxel. The first two steps were repeated for all included experiments, and the voxel‐wise union of all MA maps yielded the overall ALE map. Finally, above‐chance convergence in the ALE map was obtained by a random‐effect significance test against the null hypothesis that the foci were homogeneously distributed over the brain. Significant convergence clusters were identified at an uncorrected formation‐level threshold of p < .001 and with a family‐wise error (FWE) corrected cluster‐level threshold of p < .05 (Eickhoff et al., 2016, 2017). The minimum cluster size was 200 mm3and the number of permutation tests was 1000. To avoid the results of the meta‐analysis being biased by a non‐homogeneous study (Button et al., 2013; David et al., 2013), we additionally required at least two experimental contributions per convergence cluster in all analyses (Erickson et al., 2014; Turkeltaub et al., 2012).
Although only 10 experiments were included in the letter–sound category, previous work suggests that if the effect is strong, a smaller sample size may be sufficient for a reliable meta‐analysis (Müller et al., 2018). Thus the robustness of the letter–sound integration effect was examined by performing a leave‐one‐out analysis, generating the ALE map with one experiment removed at a time (Enge et al., 2021).
2.3.2. Localization of validating and conflicting AV integration
We further performed ALE analysis on validating and conflicting subcategories for each type of AV integration. As letter–sound integration contained only two conflicting experiments, we obtained only its validating map. The steps for analysis and the threshold used for all subcategories were the same as in the previous analysis.
For object–sound and speech–sound integration, we also compared the similarities and differences between their validating and conflicting ALE maps. The conjunction map was created using the voxel‐level minimum value of the thresholded validating and conflicting ALE maps. A conjunction cluster was required to contain at least 200 mm3. The contrast map was generated by directly subtracting one from the thresholded ALE map of the other. In generating an empirical null distribution, GingerALE merged the foci from both validating and conflicting maps and then randomly divided them into two simulated datasets the same size as the original. Similarly, we conducted a subtraction analysis between the two simulated datasets to obtain their contrast map. Significantly different clusters emerged by comparing the contrast map of the simulated dataset with that of the true dataset. After 10,000 iterations, a voxel‐wise p‐value map was obtained, showing where the values of the true data lie in the distribution of values in that voxel. An uncorrected threshold p < .001 at the voxel level with a minimum cluster volume of 200 mm3 was used (Enge et al., 2021). In addition, we also used a looser threshold of p < .01 and a stricter threshold of p < .0001 to explore convergence trends and the robustness of the results. We then converted the p‐value into a z‐score.
2.3.3. Similarities and differences of the three types of validating AV integration
To determine whether letter–sound integration reuses the previously developed neural basis of AV integration through neural recycling or assimilation–accommodation, we conducted a pairwise conjunction and contrast analysis of the three types of validating AV integration. A total of three pairs of comparisons are involved: letter–sound versus object–sound, letter–sound versus speech–sound, and object–sound versus speech–sound.
However, as the ALE maps of the three types of audiovisual integration were derived from a diverse set of experiments, the nature of the stimuli varied (including simple physical features, concepts, and sub‐lexical and lexical items) and different cognitive processes of integration were engaged (including both the initial binding process and the retrieval of unified multisensory representations). Thus, we conducted an additional screening step to further improve the homogeneity and comparability between the types of integration before comparison.
All letter–sound experiments used stimuli at the conceptual level (letters, numbers, words) and were related to the retrieval process. This is because conceptual stimuli imply that the brain must have stored the corresponding unified representations, and thus AV integration involves the process of retrieving the multisensory representations. To match letter–sound experiments, only object–sound experiments using conceptual stimuli such as animals, tools, instruments, and bodies, rather than simple physical stimuli, were included in the new dataset for comparison. Finally, speech–sound experiments using conceptual stimuli such as letters, words, numbers, and sentences were included. All experiments were concerned with the retrieval process except for the McGurk effect, in which different AV stimuli are bound together to form a new stimulus.
Experiments from the temporal congruence, spatial congruence, and type‐specific effect criteria were dropped. The first two criteria were excluded because none of the experiments on letter–sound integration used both criteria to define integrated brain areas, and studies have found a double dissociation between time/space‐sensitive and integration‐sensitive areas (Miller, 2005; Sestieri et al., 2006; Stevenson et al., 2011). The third criterion was excluded because it potentially favored the neural recycling hypothesis, as the type‐specific effect emphasized differences rather than overlap between different AV integrations.
After obtaining convergent areas for each of the three types of AV integration based on the reorganized datasets, we then performed pairwise comparison. The steps and thresholds were the same as in the previous two analyses.
2.3.4. Subgroup analysis of validating AV integration
In the above analyses, the experiments that formed each ALE dataset used a variety of stimuli. The object–sound experiments included simple physical features (e.g., sounds, fibers) as well as representational concepts (e.g., animals, tools), and speech–sound and letter–sound experiments included both sub‐lexical (e.g., letters) and lexical (e.g., words) stimuli. To investigate whether different stimulus types influence the locations of convergent clusters, we divided each type of AV integration into different subgroups according to the properties of the stimuli.
Object–sound validating integration was categorized into feature (binding) and conceptual (retrieval) subgroups. The feature subgroup was composed of experiments with tones, fibers, gratings, and checkboards. As simple physical features have no inherent representations, and are bound together for the first time in the experiment, we also refer to it as the binding subgroup. The concept subgroup was composed of experiments with animals, tools, instruments, and bodies. As a unified representation must be retrieved in these experiments, we also refer this subgroup as the retrieval subgroup.
Speech–sound validating integration was categorized into sub‐lexical and lexical subgroups. The sub‐lexical subgroup was composed of experiments using letters or strings. The lexical subgroup was composed of experiments using words, numbers or sentences.
Letter–sound validating integration was also categorized into sub‐lexical and lexical subgroups. The sub‐lexical subgroup was composed of experiments with letters. The lexical subgroup was composed of experiments with words and numbers.
If the number of experiments in any subgroup was >5, we obtained its corresponding ALE map, otherwise no meta‐analysis was performed. If the ALE map of two mutually exclusive subgroups could both be obtained, we then produced their conjunction and contrast maps. The steps and thresholds for all analyses were the same as in Sections 2.3.1 and 2.3.2 previous analyses.
3. RESULTS
3.1. Localization of three different types of AV integration
To investigate the brain regions consistently responsible for object–sound, speech–sound and letter–sound AV integration, the following maps were obtained for each AV type (Figure 2). (1) An overall ALE map was obtained based on all contrasts that met the criteria described in the methods. (2) A validating map and (3) a conflicting ALE map were obtained based on contrasts representing validating and conflicting integration, respectively (Table 1). Finally, (4) a validating–conflicting contrast map, (5) a conflicting–validating contrast map, and (6) a validating and conflicting conjunction map were obtained based on the ALE maps from (2) and (3).
FIGURE 2.
Localization of three types of audiovisual integration. From top to bottom are the clusters that are consistently activated in the audiovisual integration of (A) object–sound (blue), (B) speech–sound (green), and (C) letter–sound (red). The maps are, from left to right: (1) the overall map obtained from all contrasts; (2) the validating map obtained from validating contrasts; (3) the conflicting map obtained from conflicting contrasts; (4) the validating–conflicting map obtained from the validating > conflicting contrast; (5) the conflicting–validating map obtained from the validating < conflicting contrast. The z value below each brain slice represents the axial coordinate. For maps (1) to (3), the significant convergence clusters were identified at an uncorrected formation‐level threshold of p < .001 and a family‐wise error corrected cluster‐level thresholds of p < .05 with a minimum cluster size of 200 mm3. The shade of the color represents the ALE statistics with a color bar on the right. For map (4) and (5), the significant clusters were identified at an uncorrected threshold of p < .01 at a voxel level with a minimum cluster volume of 200 mm3. The shade of the color represents the Z statistics with a color bar on the right. L and R stand for left and right hemisphere, respectively. In the lower right corner is an axial slice of z = 10, illustrating the positions of the middle and posterior superior temporal gyrus (mSTG and pSTG). The numbers at the end of the dashed line represent the y coordinates.
3.1.1. Object–sound integration
For object–sound integration (Figure 2a and Table 3), the overall map was derived from 33 experiments with 508 subjects and 248 foci. Five significantly convergent clusters were identified. The two larger clusters were located in the bilateral pSTG/MTG with Y values of −50 and −34 for the left and right peaks with the maximum ALE value respectively. A smaller cluster was found in the left middle STG (mSTG)/MTG with a Y value of −20. Two more convergences were located in the bilateral insula close to the inferior frontal gyrus (IFG).
TABLE 3.
Brain areas consistently active during object–sound integration.
Clusters | Volume (mm3) | x | y | z | ALE | Z | Brain region |
---|---|---|---|---|---|---|---|
Overall map | |||||||
1 | 4256 | 62 | −34 | 18 | 0.023 | 5.30 | Right insula |
60 | −48 | 12 | 0.019 | 4.57 | Right superior temporal gyrus | ||
58 | −38 | 0 | 0.017 | 4.21 | Right middle temporal gyrus | ||
48 | −40 | 14 | 0.011 | 3.24 | Right superior temporal gyrus | ||
2 | 3808 | −54 | −50 | 8 | 0.026 | 5.71 | Left middle temporal gyrus |
−58 | −36 | 8 | 0.014 | 3.65 | Left middle temporal gyrus | ||
3 | 744 | −34 | 24 | −2 | 0.017 | 4.29 | Left insula |
−32 | 26 | 6 | 0.014 | 3.82 | Left insula | ||
4 | 720 | −52 | −20 | 6 | 0.020 | 4.75 | Left superior temporal gyrus |
5 | 680 | 38 | 22 | −6 | 0.017 | 4.31 | Right insula |
36 | 24 | 4 | 0.012 | 3.26 | Right insula | ||
Validating map | |||||||
1 | 4368 | −54 | −50 | 8 | 0.026 | 5.89 | Left middle temporal gyrus |
−58 | −36 | 8 | 0.014 | 3.80 | Left middle temporal gyrus | ||
2 | 4184 | 60 | −36 | 12 | 0.019 | 4.74 | Right superior temporal gyrus |
60 | −48 | 12 | 0.019 | 4.73 | Right superior temporal gyrus | ||
58 | −38 | 0 | 0.016 | 4.18 | Right middle temporal gyrus | ||
3 | 728 | 50 | −20 | 10 | 0.019 | 4.69 | Right transverse temporal gyrus |
Conflicting map | |||||||
1 | 760 | 36 | 24 | 4 | 0.011 | 4.17 | Right insula |
Validating–conflicting map | |||||||
1 | 1968 | −58 | −40 | 13 | 2.65 | Left superior temporal gyrus | |
−57 | −50 | 8 | 2.52 | Left middle temporal gyrus | |||
−54 | −56 | 2 | 2.54 | Left middle temporal gyrus | |||
−52 | −44 | 8 | 2.52 | Left middle temporal gyrus | |||
−58 | −49 | 10 | 2.52 | Left middle temporal gyrus | |||
−50 | −56 | 4 | 2.49 | Left middle temporal gyrus | |||
−61 | −38 | 6 | 2.48 | Left middle temporal gyrus | |||
−54 | −48 | 2 | 2.41 | Left middle temporal gyrus | |||
−48 | −52 | 6 | 2.37 | Left middle temporal gyrus | |||
−61 | −41 | 2 | 2.36 | Left middle temporal gyrus | |||
−49 | −49 | 9 | 2.35 | Left superior temporal gyrus | |||
Conflicting–validating map | |||||||
1 | 328 | 40 | 24 | 4 | 2.44 | Right insula | |
36 | 24 | 7 | 2.44 | Right insula | |||
Comparable map | |||||||
1 | 1952 | −50 | −48 | 12 | 0.016 | 4.75 | Left superior temporal gyrus |
−58 | −50 | 16 | 0.016 | 4.72 | Left superior temporal gyrus | ||
2 | 1512 | 60 | −36 | 16 | 0.013 | 4.55 | Right superior temporal gyrus |
3 | 720 | 44 | −20 | 8 | 0.014 | 3.92 | Right insula |
Feature map | |||||||
1 | 2560 | −54 | −50 | 8 | 0.018 | 5.24 | Left middle temporal gyrus |
−58 | −36 | 6 | 0.012 | 4.06 | Left middle temporal gyrus | ||
2 | 1840 | 64 | −34 | 10 | 0.014 | 4.32 | Right superior temporal gyrus |
58 | −36 | 2 | 0.013 | 4.17 | Right middle temporal gyrus | ||
54 | −34 | 8 | 0.013 | 4.10 | Right superior temporal gyrus | ||
50 | −38 | 2 | 0.009 | 3.49 | Right superior temporal gyrus | ||
66 | −32 | 16 | 0.009 | 3.45 | Right superior temporal gyrus | ||
Concept map | |||||||
1 | 1827 | −50 | −48 | 12 | 0.016 | 4.73 | Left superior temporal gyrus |
−58 | −50 | 16 | 0.016 | 4.73 | Left superior temporal gyrus | ||
2 | 1408 | 60 | −36 | 16 | 0.014 | 4.16 | Right superior temporal gyrus |
3 | 672 | 44 | −20 | 8 | 0.014 | 4.31 | Right insula |
Feature map ∩ Concept map | |||||||
1 | 296 | −52 | −50 | 10 | 0.012 | Left superior temporal gyrus |
The validating map was derived from 30 experiments with 443 subjects and 202 foci. All three clusters identified were located in the temporal cortex. The two larger ones were located in the bilateral pSTG/MTG and were almost identical in location as in the overall map. The smaller cluster was found in the right mSTG/MTG (Y = −20). The conflicting map was derived from 11 experiments with 179 subjects and 46 foci. Only one cluster was found in the right insula, which had the same peak point as in the overall map.
In the contrast analysis, the bilateral pSTG/MTG were more involved in validating integration, while the right insula was more involved in conflicting integration. In the conjunction analysis, we did not find clusters shared by the validating map and conflicting map. Taken together, the results illustrated a double dissociation between validating and conflicting integration.
3.1.2. Speech–sound integration
For speech–sound integration (Figure 2b and Table 4), the overall map was derived from 33 experiments with 458 subjects and 371 foci. We found four significantly convergent clusters in the bilateral temporal and frontal cortex. The largest cluster contained 7 peaks from the left pSTG/MTG extending forward to the mSTG/MTG, with Y values from −54 to −22. The right STG/MTG was relatively concentrated containing 3 peaks with Y values from −34 to −14. The remaining two smaller clusters were distributed in the left and right inferior and middle frontal gyri (IFG/MFG).
TABLE 4.
Brain areas consistently active during speech–sound integration.
Clusters | Volume (mm3) | x | y | z | ALE | Z | Brain region |
---|---|---|---|---|---|---|---|
Overall map | |||||||
1 | 7784 | −52 | −22 | 10 | 0.032 | 6.33 | Left superior temporal gyrus |
−60 | −28 | 6 | 0.024 | 5.23 | Left superior temporal gyrus | ||
−54 | −54 | 10 | 0.023 | 5.08 | Left superior temporal gyrus | ||
−56 | −40 | 10 | 0.021 | 4.69 | Left superior temporal gyrus | ||
−62 | −34 | −2 | 0.019 | 4.44 | Left middle temporal gyrus | ||
−64 | −36 | 14 | 0.018 | 4.20 | Left superior temporal gyrus | ||
−48 | −46 | 16 | 0.014 | 3.57 | Left superior temporal gyrus | ||
2 | 6816 | 60 | −30 | 4 | 0.028 | 5.78 | Right superior temporal gyrus |
64 | −14 | −6 | 0.021 | 4.77 | Right superior temporal gyrus | ||
42 | −34 | 4 | 0.017 | 4.13 | Right superior temporal gyrus | ||
3 | 1304 | −46 | 24 | 22 | 0.025 | 5.28 | Left middle frontal gyrus |
−46 | 16 | 32 | 0.017 | 3.99 | Left middle frontal gyrus | ||
4 | 768 | 52 | 24 | 22 | 0.023 | 4.96 | Right middle frontal gyrus |
Validating map | |||||||
1 | 7768 | −54 | −20 | 10 | 0.031 | 7.00 | Left transverse temporal gyrus |
−58 | −40 | 10 | 0.019 | 5.10 | Left superior temporal gyrus | ||
−48 | −44 | 16 | 0.012 | 3.84 | Left superior temporal gyrus | ||
−60 | −34 | −4 | 0.011 | 3.60 | Left middle temporal gyrus | ||
−50 | −64 | 10 | 0.009 | 3.24 | Left middle temporal gyrus | ||
2 | 6816 | 58 | −34 | 8 | 0.024 | 6.04 | Right middle temporal gyrus |
64 | −22 | 6 | 0.018 | 5.00 | Right superior temporal gyrus | ||
42 | −34 | 4 | 0.017 | 4.83 | Right superior temporal gyrus | ||
64 | −14 | −6 | 0.017 | 4.79 | Right superior temporal gyrus | ||
Conflicting map | |||||||
1 | 1640 | −48 | 24 | 24 | 0.019 | 4.79 | Left middle frontal gyrus |
−46 | 16 | 32 | 0.017 | 4.43 | Left middle frontal gyrus | ||
2 | 1120 | 52 | 24 | 22 | 0.023 | 5.46 | Right middle frontal gyrus |
Validating–conflicting map | |||||||
1 | 1344 | −56 | −17 | 4 | 3.24 | Left superior temporal gyrus | |
Conflicting–validating map | |||||||
1 | 1120 | 50 | 21 | 22 | 3.89 | Right middle frontal gyrus | |
53 | 27 | 21 | 2.97 | Right middle frontal gyrus | |||
2 | 992 | −47 | 14 | 32 | 3.72 | Left middle frontal gyrus | |
−43 | 15 | 35 | 3.54 | Left middle frontal gyrus | |||
−49 | 21 | 29 | 2.79 | Left middle frontal gyrus | |||
Comparable map | |||||||
1 | 8240 | −54 | −20 | 10 | 0.031 | 7.06 | Left transverse temporal gyrus |
−58 | −40 | 10 | 0.019 | 5.16 | Left superior temporal gyrus | ||
−48 | −44 | 16 | 0.012 | 3.89 | Left superior temporal gyrus | ||
−60 | −34 | −4 | 0.011 | 3.65 | Left middle temporal gyrus | ||
−50 | −64 | 10 | 0.009 | 3.29 | Left middle temporal gyrus | ||
2 | 7216 | 58 | −34 | 8 | 0.024 | 6.09 | Right middle temporal gyrus |
64 | −22 | 6 | 0.018 | 5.05 | Right superior temporal gyrus | ||
64 | −14 | −6 | 0.017 | 4.84 | Right superior temporal gyrus | ||
42 | −32 | 2 | 0.015 | 4.48 | Right caudate | ||
Sub‐lexical map | |||||||
1 | 1032 | −50 | −22 | 8 | 0.013 | 4.51 | Left superior temporal gyrus |
2 | 760 | 66 | −32 | 6 | 0.014 | 4.67 | Right superior temporal gyrus |
Lexical map | |||||||
1 | 7208 | 58 | −34 | 8 | 0.021 | 5.97 | Right middle temporal gyrus |
64 | −20 | 6 | 0.017 | 5.38 | Right superior temporal gyrus | ||
42 | −34 | 4 | 0.017 | 5.36 | Right superior temporal gyrus | ||
64 | −14 | −6 | 0.017 | 5.31 | Right superior temporal gyrus | ||
2 | −54 | −20 | 10 | 0.019 | 5.66 | Left transverse temporal gyrus | |
−60 | −26 | 4 | 0.014 | 4.78 | Left superior temporal gyrus | ||
−60 | −34 | −4 | 0.011 | 4.04 | Left middle temporal gyrus | ||
3 | −56 | −40 | 10 | 0.014 | 4.72 | Left superior temporal gyrus | |
−48 | −44 | 16 | 0.012 | 4.32 | Left superior temporal gyrus | ||
−54 | −54 | 10 | 0.010 | 3.84 | Left superior temporal gyrus | ||
Sub‐lexical map ∩ Lexical map | |||||||
1 | 792 | −50 | −22 | 8 | 0.013 | Left superior temporal gyrus | |
2 | 200 | 62 | −32 | 6 | 0.010 | Right middle temporal gyrus |
The validating map was derived from 23 experiments with 314 subjects and 157 foci. Bilateral posterior and middle STG/MTG were consistently activated and 5 criteria contributed to each cluster. The conflicting map was derived from 15 experiments with 199 subjects and 214 foci. Bilateral IFG/MFG were consistently activated (Table S2).
Similar to object–sound integration, in the contrast analysis, we confirmed a double dissociation trend of validating and conflicting. Specifically, validating integration recruited more of the left mSTG/MTG while conflicting integration recruited more of the bilateral IFG/MFG. In the conjunction analysis, there were no brain areas shared by the validating and conflicting maps.
3.1.3. Letter–sound integration
For letter–sound integration (Figure 2c and Table 5), the overall map was derived from 10 experiments with 140 subjects and 56 foci. Two significantly convergent clusters were identified in the bilateral STG. The larger left cluster contained 7 peaks along the left STG with Y values from −46 to −6. The smaller cluster on the right side contained only two very close peaks with Y values of −20 and − 28.
TABLE 5.
Brain areas consistently active during letter–sound integration.
Clusters | Volume (mm3) | x | y | z | ALE | Z | Brain region |
---|---|---|---|---|---|---|---|
Overall map | |||||||
1 | 5112 | −48 | −20 | 4 | 0.015 | 5.02 | Left superior temporal gyrus |
−58 | −6 | 6 | 0.014 | 4.80 | Left superior temporal gyrus | ||
−62 | −38 | 14 | 0.012 | 4.35 | Left superior temporal gyrus | ||
−58 | −46 | 12 | 0.010 | 3.92 | Left superior temporal gyrus | ||
−52 | −14 | 12 | 0.009 | 3.76 | Left transverse temporal gyrus | ||
−56 | −18 | 10 | 0.009 | 3.72 | Left transverse temporal gyrus | ||
−64 | −24 | 16 | 0.009 | 3.60 | Left postcentral gyrus | ||
2 | 1576 | 54 | −20 | 4 | 0.018 | 5.47 | Right superior temporal gyrus |
42 | −28 | 8 | 0.008 | 3.58 | Right transverse temporal gyrus | ||
Validating map | |||||||
1 | 5800 | −48 | −20 | 4 | 0.015 | 5.21 | Left superior temporal gyrus |
−58 | −6 | 6 | 0.014 | 4.99 | Left superior temporal gyrus | ||
−62 | −38 | 14 | 0.012 | 4.53 | Left superior temporal gyrus | ||
−58 | −46 | 12 | 0.010 | 4.09 | Left superior temporal gyrus | ||
−52 | −14 | 12 | 0.009 | 3.94 | Left transverse temporal gyrus | ||
−56 | −18 | 10 | 0.009 | 3.90 | Left transverse temporal gyrus | ||
−64 | −24 | 16 | 0.009 | 3.78 | Left postcentral gyrus | ||
2 | 1792 | 54 | −20 | 4 | 0.018 | 5.67 | Right superior temporal gyrus |
42 | −28 | 8 | 0.008 | 3.77 | Right transverse temporal gyrus | ||
Comparable map | |||||||
1 | 5528 | −48 | −20 | 4 | 0.015 | 5.24 | Left superior temporal gyrus |
−58 | −6 | 6 | 0.014 | 5.01 | Left superior temporal gyrus | ||
−62 | −38 | 14 | 0.012 | 4.48 | Left superior temporal gyrus | ||
−58 | −46 | 12 | 0.010 | 4.11 | Left superior temporal gyrus | ||
−52 | −14 | 12 | 0.009 | 3.96 | Left transverse temporal gyrus | ||
−56 | −18 | 10 | 0.009 | 3.92 | Left transverse temporal gyrus | ||
−64 | −24 | 16 | 0.009 | 3.80 | Left postcentral gyrus | ||
2 | 1816 | 54 | −20 | 4 | 0.018 | 5.70 | Right superior temporal gyrus |
42 | −28 | 8 | 0.008 | 3.78 | Right transverse temporal gyrus | ||
Sub‐lexical map | |||||||
1 | 4656 | −48 | −20 | 4 | 0.015 | 5.29 | Left superior temporal gyrus |
−62 | −38 | 14 | 0.012 | 4.53 | Left superior temporal gyrus | ||
−58 | −46 | 12 | 0.010 | 4.14 | Left superior temporal gyrus | ||
−52 | −14 | 12 | 0.009 | 3.97 | Left transverse temporal gyrus | ||
−56 | −18 | 10 | 0.009 | 3.94 | Left transverse temporal gyrus | ||
−64 | −24 | 16 | 0.009 | 3.84 | Left postcentral gyrus | ||
2 | 1592 | 54 | −20 | 48 | 0.018 | 5.75 | Right superior temporal gyrus |
As the sample size for letter–sound integration was small, the results were susceptible to being influenced by a particular heterogeneous study. To avoid such a bias, the sources of the convergence clusters were verified. We found that all experiments contributed to at least one cluster, with 9 out of 10 for the left STG and 6 out of 10 for the right STG. The leave‐one‐out analysis showed that significant convergence of the bilateral STG was consistently detected after systematically removing each of the experiments, and the center coordinates of the clusters were almost identical (Table 6).
TABLE 6.
The leave‐one‐out analysis of letter–sound integration.
ALE analysis without an experiment | Center MNI coordinates in the left STG (x, y, z) | Center MNI coordinates in the right STG (x, y, z) |
---|---|---|
Blau et al. (2008) | −62, −36, 14; −48, −20, 4 | 54, −20, 4 |
Blau et al. (2009) | −62, −38, 10 | 54, −20, 2 |
Holloway et al. (2015) | −62, −38, 10 | 54, −20, 4 |
Hocking and Price (2009) | −62, −38, 12 | 54, −20, 4 |
Raij et al. (2000) | −62, −38, 10 | 54, −20, 2 |
van Atteveldt et al. (2004) | −62, −36, 10 | 54, −20, 4 |
van Atteveldt et al. (2006) | −62, −36, 8 | 54, −20, 4 |
van Atteveldt et al. (2007), study 1 | −62, −38, 10 | 54, −22, 4 |
van Atteveldt et al. (2007), study 2 | −62, −38, 8; −42, −24, 10 | 54, −20, 4 |
van Atteveldt et al. (2010) | −62, −36, 8; −48, −20, 4 | 54, −20, 4 |
Abbreviation: STG, superior temporal gyrus.
The validating map was derived from 10 experiments with 140 subjects and 40 foci. Again, two significantly convergent clusters were found, located in the bilateral STG.
3.2. Similarities and differences of the three types of AV integration
As described in the methods, we reorganized the datasets to improve homogeneity and re‐obtained the ALE maps of the three AV integrations for comparison (comparable maps, middle row of Figure 4a). Object–sound integration retained 17 experiments with 232 subjects and 101 foci. Two larger clusters in the bilateral pSTG/MTG and a smaller cluster in the right transverse temporal gyrus (TTG)/insula were identified (Table 3). Speech–sound integration retained 19 experiments with 261 subjects and 83 foci. Bilateral STG/MTG from posterior to middle were identified (Table 4). Letter–sound integration retained 9 experiments with 223 subjects and 39 foci. Bilateral STG were identified, with the cluster on the left being 3 times larger than on the right (Table 5).
FIGURE 4.
Convergent clusters of the subgroup analyses for the three types of AV integration. From top to bottom are the clusters that are consistently activated in the subgroup analyses of AV integration of (A) object–sound (blue), (B) speech–sound (green), and (C) letter–sound (red). The z value below each brain slice represents the axial coordinate. Colored areas represent locations of statistically significantly convergent clusters, which were identified at an uncorrected formation‐level threshold of p < .001 and a family‐wise error (FWE) corrected cluster‐level threshold of p < .05 with a minimum cluster size of 200 mm3.
Figure 3a and Table 7 show the common and unique clusters for the three types of AV integration in a pairwise comparison.
FIGURE 3.
The common and unique convergent clusters for three types of audiovisual integration in a pairwise comparison. Colored areas represent locations of statistically significant clusters rather than statistical values. (A) The middle row shows the comparable maps of object–sound (blue), speech–sound (green), and letter–sound (red) audiovisual integration. The two rows above and below show the results of the pairwise comparison of the conjunction and contrast analyses. The z value below each brain slice represents the axial coordinate. For comparable maps, significant convergence clusters were identified at an uncorrected formation‐level threshold of p < .001 and an FWE (family‐wise error) corrected cluster‐level thresholds of p < .05 with a minimum cluster size of 200 mm3. For contrast maps, significant clusters were identified at an uncorrected threshold of p < .001 at a voxel level with a minimum cluster volume of 200 mm3. O represents object–sound integration, S represents speech–sound integration, L represents letter–sound integration. No sig. indicates that no significant convergence clusters were found. The numbers at the end of the dashed line represent the y coordinates. The red circles represent the brain region shared by the three types of AV integration. (B) A conceptual assimilation–accommodation scheme for letter–sound integration. Blue clusters are recruited in object–sound integration and are consistently invoked in subsequent speech–sound and letter–sound integrations. Green clusters are recruited synergistically with the blue ones with the acquisition of speech–sound integration and are consistently invoked in subsequent letter–sound integration. Letter–sound integration no longer recruits new regions compared with speech–sound integration. The arrows indicate the development order of the three types of audiovisual integration abilities.
TABLE 7.
Results of the conjunction and contrast analysis for the three audio–visual integrations based on comparable maps.
Clusters | Volume (mm3) | x | y | z | ALE | Z | Brain region |
---|---|---|---|---|---|---|---|
Object‐sound and speech–sound | |||||||
Object–sound ∩ speech–sound | |||||||
1 | 1024 | −48 | −44 | 14 | 0.011 | Left middle temporal gyrus | |
−54 | −52 | 12 | 0.011 | Left superior temporal gyrus | |||
Speech–sound > object–sound | |||||||
1 | 1864 | 62 | −24 | −1 | 3.89 | Left superior temporal gyrus | |
66 | −16 | −7 | 3.72 | Left superior temporal gyrus | |||
70 | −16 | 6 | 3.54 | Left transverse temporal gyrus | |||
2 | 632 | −57 | −29 | 6 | 3.89 | Right superior temporal gyrus | |
−58 | −26 | 12 | 3.54 | Right superior temporal gyrus | |||
Letter–sound and speech–sound | |||||||
Letter–sound ∩ speech–sound | |||||||
1 | 1224 | −62 | −38 | 14 | 0.012 | Left superior temporal gyrus | |
−58 | −46 | 12 | 0.010 | Left superior temporal gyrus | |||
2 | 880 | −50 | −22 | 8 | 0.012 | Left superior temporal gyrus | |
−56 | −18 | 10 | 0.009 | Left transverse temporal gyrus | |||
3 | 384 | 58 | −22 | 4 | 0.011 | Right superior temporal gyrus | |
60 | −16 | −2 | 0.009 | Right superior temporal gyrus | |||
Letter‐sound and object–sound | |||||||
Letter–sound ∩ object–sound | |||||||
1 | 200 | −56 | −48 | 12 | 0.010 | Left superior temporal gyrus | |
Letter–sound > object–sound | |||||||
1 | 1224 | −48 | −25 | 7 | 3.89 | Left superior temporal gyrus | |
−46 | −19 | 6 | 3.72 | Left superior temporal gyrus | |||
−46 | −14 | 4 | 3.43 | Left insula | |||
−43 | −21 | 3 | 3.35 | Left insula | |||
2 | 280 | 60 | −20 | 2 | 3.43 | Right superior temporal gyrus | |
58 | −18 | −4 | 3.35 | Right superior temporal gyrus | |||
55 | −24 | −1 | 3.09 | Right superior temporal gyrus |
Both object–sound and speech–sound integration, consistently activated the left pSTG/MTG. Along the STG, slightly more anteriorly, speech–sound integration additionally recruited bilateral mSTG relative to object–sound integration. Object–sound integration did not recruit any additional areas relative to speech–sound integration.
For letter–sound and speech–sound integration, both recruited the left posterior STG/MTG and the bilateral middle STG/MTG. No significant differences were found between AV conditions.
For letter–sound and object–sound integration, only the left pSTG was shared. Letter–sound integration recruited an additional cluster located in the bilateral mSTG relative to object–sound integration. Object–sound integration did not recruit any additional areas relative to letter–sound integration.
In summary (Figure 3b), object–sound, speech–sound and letter–sound integration all recruited the left pSTG. On this basis, speech–sound integration recruited two more additional regions in the bilateral mSTG. Finally, letter–sound integration directly recruited the regions of speech–sound integration.
3.3. Subgroup localizations of three types of validating AV integration
3.3.1. Object–sound integration
Feature and concept subgroups
The feature map was derived from 10 experiments with 159 subjects and 95 foci. Two significantly convergent clusters were identified in the bilateral pSTG/MTG. The concept map was derived from 20 experiments with 284 subjects and 107 foci. Two larger significantly convergent clusters were identified in the bilateral pSTG. A smaller cluster was located in the right middle TTG extending inwards to the insula. Both feature and concept level stimuli activated the left pSTG. No differences were found between the two subgroups (Figure 4a and Table 3).
3.3.2. Speech–sound integration
Sub‐lexical and lexical subgroups
The sub‐lexical map was derived from 8 experiments with 144 subjects and 85 foci. Two significantly convergent clusters were identified in the bilateral mSTG/MTG. The lexical map was derived from 15 experiments with 198 subjects and 72 foci. The bilateral mSTG and the left pSTG were detected. Both subgroups activated the bilateral mSTG and no differences were found between them (Figure 4b and Table 4).
3.3.3. Letter–sound integration
Sub‐lexical and lexical subgroups
The sub‐lexical map was derived from 9 experiments with 123 subjects and 37 foci. Two significantly convergent clusters were identified in the bilateral STG/TTG. Only two experiments used lexical stimuli, so the ALE map for the lexical subgroup was not available (Figure 4c and Table 5).
4. DISCUSSION
The present study links letter–sound integration in reading with the brain's innate ability to perform object–sound and speech–sound integration from a neural reuse perspective at the individual level. Using the ALE meta‐analysis approach, we investigated the neural basis of the three types of AV integration and identified the similarities and differences between them. Our results showed a double dissociation between validating and conflicting integration, with the STG recruited in validating integration and the insula/MFG recruited in conflicting integration. Importantly, the comparison across the three types of AV integration provides support for the assimilation–accommodation hypothesis. All three types of AV integration recruited the left posterior superior temporal gyrus (STG), speech–sound integration additionally activated the bilateral middle STG, and letter–sound integration directly invoked the AV areas involved in speech–sound integration.
4.1. Regions recruited for AV integration: STG and MFG/insula
The vast majority of previous imaging studies have linked the STG/STS to the integration of natural and artificial AV information, during object–sound (Beauchamp et al., 2004) speech–sound (Noesselt et al., 2012), or letter–sound associations (Raij et al., 2000). Our results supported this previous work, showing that bilateral STG was activated during all three types of AV integration. More specifically, the STG was involved in validating integration, as it significantly converged in validating maps for all three integration types and was more active in validating–conflicting maps for object–sound and speech–sound integration. Furthermore, the STG contained the posterior and middle subregions. The left pSTG is an essential cluster as it was recruited regardless of the stimulus types in all three types of AV integration. Although the pSTG is thought to more involved in processing items at the conceptual or lexical level (Choi et al., 2015; Graves et al., 2008; Okada & Hickok, 2006), we found that it was also activated during object–sound integration at the feature level and letter–sound integration at the sub‐lexical level. The mSTG was activated by both sub‐lexical and lexical AV stimuli during speech–sound and letter–sound integration, but not by either type of stimulus during object–sound integration. This is understandable because the mSTG is thought to be associated with the processing of phonological features (Hickok & Poeppel, 2004) or stimuli with temporal linear characteristics (Stevenson et al., 2011), suggesting a greater involvement in phonetic and continuous temporal dynamic detection in language‐related AV integration.
In addition to the STG, the right insula for object–sound and the bilateral MFG/IFG for speech–sound, were also recruited during AV integration. Consistent with previous review and meta‐analysis studies (Doehrmann & Naumer, 2008; Erickson et al., 2014), we found that these regions were more activated in the conflicting and conflicting–validating maps. Because the insula and IFG/MFG have tended to show activation under the same contrast in previous studies, researchers have argued that they have similar rather than different functions (Bushara et al., 2001; Noesselt et al., 2012; Szycik et al., 2012). Due to the multifunctionality of the insula and the MFG/IFG, several interpretations are offered. First, according to the error likelihood hypothesis (Brown & Braver, 2007), the involvement of the insula may reflect its role in error processing and conflict monitoring (Bossaerts, 2010; Botvinick et al., 2004). Second, the MFG/IFG are thought to be selectively involved in the storage, retrieval, and manipulation of semantic representations (Badre & Wagner, 2007; Wagner et al., 2001). Violations of auditory and visual information increase the processing load, leading to higher activation in the conflict condition. Finally, as the IFG has long been considered a multisensory brain region, it may be directly involved in cross‐modal binding (Hein et al., 2007). In this case, the conflicting stimulus can be seen as a new AV association that revises the original concept.
All of the above clusters were derived from a variety of criteria, suggesting that their stable activation may be independent of the experimental paradigm, and more relevant to the type of AV integration they are involved in.
4.2. The assimilation–accommodation mechanism in letter–sound integration
The analysis of the similarities and differences across the three types of AV integration allowed us to investigate whether letter–sound integration supports the neural recycling or the assimilation–accommodation hypothesis. The spatial overlap in the left pSTG and the bilateral mSTG during both letter–sound and speech–sound integration suggests an assimilation mechanism; to acquire later‐developing letter–sound integration, areas for speech–sound integration are directly invoked, instead of producing a subregion in the STG specialized for letter–sound integration.
The present results also support an assimilation–accommodation effect in speech–sound integration. Specifically, speech–sound integration directly invoked the left pSTG of object–sound integration (assimilation) and added the bilateral mSTG (accommodation). In terms of development, it is difficult to determine whether object–sound or speech–sound integration arises earlier in life. However, on an evolutionary scale, it is clear that object–sound stimuli emerged before speech–sound stimuli. Thus, although the assimilation–accommodation is inherently rooted in the process of individual learning during development, our results seem to provide indirect evidence for extending this hypothesis to an evolutionary scale.
Of note, these two mechanisms of reuse may not be contradictory, but rather complementary for different steps in reading acquisition. For visual word processing, numerous studies have supported neural recycling, but for AV integration assimilation–accommodation seems to provide a better explanation. Considering the whole process of literacy acquisition, multiple reuse mechanisms may be needed for different cognitive processes.
4.3. Limitations
It should be noted that we did not find a letter–sound sensitive subarea, possibly due to the inadequate spatial resolution of fMRI. Thus, our hypotheses could be tested with other methods, such as using a rapid adaptation paradigm or higher spatial resolution fMRI. Additionally, tracking the three types of AV integration longitudinally before and after reading acquisition could provide more direct insight into the development of letter–sound integration within the STG. Finally, we cannot assume that the assimilation–accommodation hypothesis can be extended to explain reuse on a broader evolutionary scale. Future work should provide more direct evidence to validate our findings by investigating multisensory integration processes in other populations (i.e., illiterate, sign language and braille users) and writing systems, and through cross‐species comparisons.
5. CONCLUSION
Using an ALE meta‐analysis, we performed pairwise comparisons of the similarities and differences between object–sound, speech–sound, and letter–sound AV integration. Results showed that speech–sound overlapped with object–sound integration in the left pSTG (assimilation) and additionally activated bilateral mSTG (accommodation). Letter–sound overlapped with speech–sound integration in both the left pSTG and bilateral mSTG (assimilation). Given the emerging sequence of three types of AV integration over the lifespan of an individual, letter–sound integration may be achieved by reusing areas that developed early for speech–sound integration, suggesting that assimilation–accommodation could support reading acquisition instead of, or in addition to, neural recycling. The assimilation–accommodation mechanism may have the potential to be extended to explain the evolution of letter–sound or speech–sound integration considering the emerging sequence of the three types of AV integration on an evolutionary time scale, but more direct evidence is needed.
AUTHOR CONTRIBUTIONS
Conceptualization: Danqi Gao, Li Liu. Investigation: Danqi Gao, Xitong Liang, Zilin Bai. Methodology: Danqi Gao, Xitong Liang, Mingnan Cai. Visualization: Danqi Gao. Supervision: Li Liu. Writing – original draft: Danqi Gao. Writing – review and editing: Chaoying Xu, Qi Ting, Emily S. Nichols.
CONFLICT OF INTEREST STATEMENT
The authors declare that they have no competing interests.
Supporting information
Table S1. Combination of search terms used in literature retrieval.
Table S2. Criteria that contribute to each of the convergence clusters.
ACKNOWLEDGMENTS
This research was supported by the STI 2030—Major Projects (2021ZD0200500), the National Natural Science Foundation of China (31970977, 31571155), the Interdisciplinary Research Funds of Beijing Normal University and the Fundamental Research Funds for the Central Universities (2015KJJCB28).
Gao, D. , Liang, X. , Ting, Q. , Nichols, E. S. , Bai, Z. , Xu, C. , Cai, M. , & Liu, L. (2024). A meta‐analysis of letter–sound integration: Assimilation and accommodation in the superior temporal gyrus. Human Brain Mapping, 45(15), e26713. 10.1002/hbm.26713
DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.
REFERENCES
- Anderson, M. L. (2010). Neural reuse: A fundamental organizational principle of the brain. The Behavioral and Brain Sciences, 33, 245–266. 10.1017/S0140525X10000853 [DOI] [PubMed] [Google Scholar]
- Anderson, M. L. (2021). After phrenology: Neural reuse and the interactive brain. MIT Press. [DOI] [PubMed] [Google Scholar]
- Aparicio, M. , Peigneux, P. , Charlier, B. , Balériaux, D. , Kavec, M. , & Leybaert, J. (2017). The neural basis of speech perception through lipreading and manual cues: Evidence from deaf native users of cued speech. Frontiers in Psychology, 8, 426. 10.3389/fpsyg.2017.00426 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Badre, D. , & Wagner, A. D. (2007). Left ventrolateral prefrontal cortex and the cognitive control of memory. Neuropsychologia, 45, 2883–2901. 10.1016/j.neuropsychologia.2007.06.015 [DOI] [PubMed] [Google Scholar]
- Barros‐Loscertales, A. , Ventura‐Campos, N. , Visser, M. , Alsius, A. , Pallier, C. , Avila Rivera, C. , & Soto‐Faraco, S. (2013). Neural correlates of audiovisual speech processing in a second language. Brain and Language, 126, 253–262. 10.1016/j.bandl.2013.05.009 [DOI] [PubMed] [Google Scholar]
- Baumann, O. , Vromen, J. M. G. , Cheung, A. , McFadyen, J. , Ren, Y. , & Guo, C. C. (2018). Neural correlates of temporal complexity and synchrony during audiovisual correspondence detection. Eneuro, 5(1), e0294‐17.2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beauchamp, M. S. , Lee, K. E. , Argall, B. D. , & Martin, A. (2004). Integration of auditory and visual information about objects in superior temporal sulcus. Neuron, 41, 809–823. 10.1016/S0896-6273(04)00070-4 [DOI] [PubMed] [Google Scholar]
- Benoit, M. M. , Raij, T. , Lin, F.‐.H. , Jääskeläinen, I. P. , & Stufflebeam, S. (2009). Primary and multisensory cortical activity is correlated with audiovisual percepts. Human Brain Mapping, 31, 526–538. 10.1002/hbm.20884 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blau, V. , Reithler, J. , van Atteveldt, N. , Seitz, J. , Gerretsen, P. , Goebel, R. , & Blomert, L. (2010). Deviant processing of letters and speech sounds as proximate cause of reading failure: A functional magnetic resonance imaging study of dyslexic children. Brain, 133, 868–879. 10.1093/brain/awp308 [DOI] [PubMed] [Google Scholar]
- Blau, V. , van Atteveldt, N. , Ekkebus, M. , Goebel, R. , & Blomert, L. (2009). Reduced neural integration of letters and speech sounds links phonological and reading deficits in adult dyslexia. Current Biology, 19, 503–508. 10.1016/j.cub.2009.01.065 [DOI] [PubMed] [Google Scholar]
- Blau, V. , van Atteveldt, N. , Formisano, E. , Goebel, R. , & Blomert, L. (2008). Task‐irrelevant visual letters interact with the processing of speech sounds in heteromodal and unimodal cortex. The European Journal of Neuroscience, 28, 500–509. 10.1111/j.1460-9568.2008.06350.x [DOI] [PubMed] [Google Scholar]
- Bossaerts, P. (2010). Risk and risk prediction error signals in anterior insula. Brain Structure & Function, 214, 645–653. 10.1007/s00429-010-0253-1 [DOI] [PubMed] [Google Scholar]
- Botvinick, M. M. , Cohen, J. D. , & Carter, C. S. (2004). Conflict monitoring and anterior cingulate cortex: An update. Trends in Cognitive Sciences, 8, 539–546. 10.1016/j.tics.2004.10.003 [DOI] [PubMed] [Google Scholar]
- Bouhali, F. , Thiebaut de Schotten, M. , Pinel, P. , Poupon, C. , Mangin, J.‐F. , Dehaene, S. , & Cohen, L. (2014). Anatomical connections of the visual word form area. The Journal of Neuroscience, 34, 15402–15414. 10.1523/JNEUROSCI.4918-13.2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brem, S. , Bach, S. , Kucian, K. , Guttorm, T. K. , Martin, E. , Lyytinen, H. , Brandeis, D. , & Richardson, U. (2010). Brain sensitivity to print emerges when children learn letter‐speech sound correspondences. Proceedings of the National Academy of Sciences of the United States of America, 107, 7939–7944. 10.1073/pnas.0904402107 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brown, J. W. , & Braver, T. S. (2007). Risk prediction and aversion by anterior cingulate cortex. Cognitive, Affective, & Behavioral Neuroscience, 7, 266–277. 10.3758/CABN.7.4.266 [DOI] [PubMed] [Google Scholar]
- Bushara, K. O. , Hanakawa, T. , Immisch, I. , Toma, K. , Kansaku, K. , & Hallett, M. (2003). Neural correlates of cross ‐ modal binding. Nature Neuroscience, 6(2), 190–195. 10.1038/nn993 [DOI] [PubMed] [Google Scholar]
- Butler, A. J. , & James, K. H. (2013). Active Learning of novel sound ‐ producing objects: Motor reactivation and enhancement of visuo ‐ motor connectivity. Journal of Cognitive Neuroscience, 25(2), 203–218. 10.1162/jocn_a_00284 [DOI] [PubMed] [Google Scholar]
- Butler, A. J. , James, T. W. , & James, K. H. (2011). Enhanced multisensory integration and motor reactivation after active motor learning of audiovisual associations. Journal of Cognitive Neuroscience, 23(11), 3515–3528. 10.1162/jocn_a_00015 [DOI] [PubMed] [Google Scholar]
- Buchsbaum, B. R. , Hickok, G. , & Humphries, C. (2001). Role of left posterior superior temporal gyrus in phonological processing for speech perception and production. Cognitive Science, 25, 663–678. 10.1207/s15516709cog2505_2 [DOI] [Google Scholar]
- Bushara, K. O. , Grafman, J. , & Hallett, M. (2001). Neural correlates of auditory‐visual stimulus onset asynchrony detection. The Journal of Neuroscience, 21, 300–304. 10.1523/JNEUROSCI.21-01-00300.2001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Button, K. S. , Ioannidis, J. P. A. , Mokrysz, C. , Nosek, B. A. , Flint, J. , Robinson, E. S. J. , & Munafò, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365–376. 10.1038/nrn3475 [DOI] [PubMed] [Google Scholar]
- Callan, D. , Jones, J. , Munhall, K. , Callan, A. , Kroos, C. , & Vatikiotis ‐ Bateson, E. (2003). Neural processes underlying perceptual enhancement by visual speech gestures. Neuroreport, 14(17), 2213–2218. [DOI] [PubMed] [Google Scholar]
- Calvert, G. A. , Campbell, R. , & Brammer, M. J. (2000). Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Current Biology, 10, 649–657. 10.1016/S0960-9822(00)00513-3 [DOI] [PubMed] [Google Scholar]
- Cao, F. , Tao, R. , Liu, L. , Perfetti, C. A. , & Booth, J. R. (2013). High proficiency in a second language is characterized by greater involvement of the first language network: Evidence from chinese learners of english. Journal of Cognitive Neuroscience, 25, 1649–1663. 10.1162/jocn_a_00414 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Choi, Y.‐H. , Park, H. K. , & Paik, N.‐J. (2015). Role of the posterior temporal lobe during language tasks: A virtual lesion study using repetitive transcranial magnetic stimulation. Neuroreport, 26, 314–319. 10.1097/WNR.0000000000000339 [DOI] [PubMed] [Google Scholar]
- Chyl, K. , Kossowski, B. , Debska, A. , Luniewska, M. , Banaszkiewicz, A. , Zelechowska, A. , Frost, S. J. , Mencl, W. E. , Wypych, M. , Marchewka, A. , Pugh, K. R. , & Jednorog, K. (2018). Prereader to beginning reader: Changes induced by reading acquisition in print and speech brain networks. Journal of Child Psychology and Psychiatry, 59, 76–87. 10.1111/jcpp.12774 [DOI] [PMC free article] [PubMed] [Google Scholar]
- David, S. P. , Ware, J. J. , Chu, I. M. , Loftus, P. D. , Fusar‐Poli, P. , Radua, J. , Munafò, M. R. , & Ioannidis, J. P. A. (2013). Potential reporting bias in fMRI studies of the brain. PLoS One, 8, e70104. 10.1371/journal.pone.0070104 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dehaene, S. , & Cohen, L. (2007). Cultural recycling of cortical maps. Neuron, 56, 384–398. 10.1016/j.neuron.2007.10.004 [DOI] [PubMed] [Google Scholar]
- Dehaene, S. , Cohen, L. , Morais, J. , & Kolinsky, R. (2015). Illiterate to literate: Behavioural and cerebral changes induced by reading acquisition. Nature Reviews Neuroscience, 16, 234–244. 10.1038/nrn3924 [DOI] [PubMed] [Google Scholar]
- Dehaene, S. , Pegado, F. , Braga, L. W. , Ventura, P. , Filho, G. N. , Jobert, A. , Dehaene‐Lambertz, G. , Kolinsky, R. , Morais, J. , & Cohen, L. (2010). How learning to read changes the cortical networks for vision and language. Science, 330, 1359–1364. 10.1126/science.1194140 [DOI] [PubMed] [Google Scholar]
- Dehaene‐Lambertz, G. , Monzalvo, K. , & Dehaene, S. (2018). The emergence of the visual word form: Longitudinal evolution of category‐specific ventral visual areas during reading acquisition. PLoS Biology, 16, e2004103. 10.1371/journal.pbio.2004103 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Doehrmann, O. , & Naumer, M. J. (2008). Semantics and the multisensory brain: How meaning modulates processes of audio‐visual integration. Brain Research, 1242, 136–150. 10.1016/j.brainres.2008.03.071 [DOI] [PubMed] [Google Scholar]
- Ehri, L. C. (2005). Learning to read words: Theory, findings, and issues. Scientific Studies of Reading, 9, 167–188. 10.1207/s1532799xssr0902_4 [DOI] [Google Scholar]
- Eickhoff, S. B. , Laird, A. R. , Fox, P. M. , Lancaster, J. L. , & Fox, P. T. (2017). Implementation errors in the GingerALE software: Description and recommendations. Human Brain Mapping, 38, 7–11. 10.1002/hbm.23342 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eickhoff, S. B. , Nichols, T. E. , Laird, A. R. , Hoffstaedter, F. , Amunts, K. , Fox, P. T. , Bzdok, D. , & Eickhoff, C. R. (2016). Behavior, sensitivity, and power of activation likelihood estimation characterized by massive empirical simulation. NeuroImage, 137, 70–85. 10.1016/j.neuroimage.2016.04.072 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Enge, A. , Abdel Rahman, R. , & Skeide, M. A. (2021). A meta‐analysis of fMRI studies of semantic cognition in children. NeuroImage, 241, 118436. 10.1016/j.neuroimage.2021.118436 [DOI] [PubMed] [Google Scholar]
- Erickson, L. C. , Heeg, E. , Rauschecker, J. P. , & Turkeltaub, P. E. (2014). An ALE meta‐analysis on the audiovisual integration of speech signals: ALE meta‐analysis on AV speech integration. Human Brain Mapping, 35, 5587–5605. 10.1002/hbm.22572 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fairhall, S. L. , & Macaluso, E. (2009). Spatial attention can modulate audiovisual integration at multiple cortical and subcortical sites. European Journal of Neuroscience, 29(6), 1247–1257. 10.1111/j.1460-568.2009.06688.x [DOI] [PubMed] [Google Scholar]
- Francisco, A. A. , Takashima, A. , McQueen, J. M. , van den Bunt, M. , Jesse, A. , & Groen, M. A. (2018). Adult dyslexic readers benefit less from visual input during audiovisual speech processing: fMRI evidence. Neuropsychologia, 117, 454–471. 10.1016/j.neuropsychologia.2018.07.009 [DOI] [PubMed] [Google Scholar]
- Gau, R. , & Noppeney, U. (2016). How prior expectations shape multisensory perception. Neuroimage, 124, 876–886. 10.1016/j.neuroimage.2015.09.045 [DOI] [PubMed] [Google Scholar]
- Graves, W. W. , Grabowski, T. J. , Mehta, S. , & Gupta, P. (2008). Left posterior superior temporal gyrus participates specifically in accessing lexical phonology. Journal of Cognitive Neuroscience, 20, 1698–1710. 10.1162/jocn.2008.20113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hasson, U. , Harel, M. , Levy, I. , & Malach, R. (2003). Large‐scale mirror‐symmetry organization of human occipito‐temporal object areas. Neuron, 37, 1027–1041. 10.1016/S0896-6273(03)00144-2 [DOI] [PubMed] [Google Scholar]
- Hein, G. , Doehrmann, O. , Muller, N. G. , Kaiser, J. , Muckli, L. , & Naumer, M. J. (2007). Object familiarity and semantic congruency modulate responses in cortical audiovisual integration areas. The Journal of Neuroscience, 27, 7881–7887. 10.1523/JNEUROSCI.1740-07.2007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hickok, G. , & Poeppel, D. (2004). Dorsal and ventral streams: A framework for understanding aspects of the functional anatomy of language. Cognition, 92, 67–99. 10.1016/j.cognition.2003.10.011 [DOI] [PubMed] [Google Scholar]
- Hocking, J. , & Price, C. J. (2009). Dissociating verbal and nonverbal audiovisual object processing. Brain and Language, 108, 89–96. 10.1016/j.bandl.2008.10.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Holloway, I. D. , van Atteveldt, N. , Blomert, L. , & Ansari, D. (2015). Orthographic dependency in the neural correlates of reading: Evidence from audiovisual integration in english readers. Cerebral Cortex, 25, 1544–1553. 10.1093/cercor/bht347 [DOI] [PubMed] [Google Scholar]
- Howard, M. A. , Volkov, I. O. , Mirsky, R. , Garell, P. C. , Noh, M. D. , Granner, M. , Damasio, H. , Steinschneider, M. , Reale, R. A. , Hind, J. E. , & Brugge, J. F. (2000). Auditory cortex on the human posterior superior temporal gyrus. The Journal of Comparative Neurology, 416, 79–92. [DOI] [PubMed] [Google Scholar]
- James, T. W. , Stevenson, R. A. , Kim, S. , VanDerKlok, R. M. , & James, K. H. (2011). Shape from sound: Evidence for a shape operator in the lateral occipital cortex. Neuropsychologia, 49(7), 1807–1815. 10.1016/j.neuropsychologia.2011.03.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- James, T. W. , VanDerKlok, R. M. , Stevenson, R. A. , & James, K. H. (2011). Multisensory perception of action in posterior temporal and parietal cortices. Neuropsychologia, 49(1), 108–114. 10.1016/j.neuropsychologia.2010.10.030 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kassuba, T. , Klinge, C. , Hölig, C. , Menz, M. M. , Ptito, M. , Röder, B. , & Siebner, H. R. (2011). The left fusiform gyrus hosts trisensory representations of manipulable objects. NeuroImage, 56(3), 1566–1577. 10.1016/j.neuroimage.2011.02.032 [DOI] [PubMed] [Google Scholar]
- Kopp, F. (2014). Audiovisual temporal fusion in 6‐month‐old infants. Developmental Cognitive Neuroscience, 9, 56–67. 10.1016/j.dcn.2014.01.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Laing, M. , Rees, A. , & Vuong, Q. C. (2015). Amplitude ‐ modulated stimuli reveal auditory ‐ visual interactions in brain activity and brain connectivity. Frontiers in Psychology, 6. 10.3389/fpsyg.2015.01440 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Laurienti, P. J. , Wallace, M. T. , Maldjian, J. A. , Susi, C. M. , Stein, B. E. , & Burdette, J. H. (2003). Cross ‐ modal sensory processing in the anterior cingulate and medial prefrontal cortices. Human Brain Mapping, 19(4), 213–223. 10.1002/hbm.10112 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee, H. , & Noppeney, U. (2011a). Long ‐ term music training tunes how the brain temporally binds signals from multiple senses. Proceedings of the National Academy of Sciences of the United States of America, 108(51), E1441–E1450. 10.1073/pnas.1115267108 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee, H. , & Noppeney, U. (2011b). Physical and perceptual factors shape the neural mechanisms that integrate audiovisual signals in speech comprehension. The Journal of Neuroscience, 31(31), 11338–11350. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lewis, R. , & Noppeney, U. (2010). Audiovisual synchrony improves motion discrimination via enhanced connectivity between early visual and auditory areas. Journal of Neuroscience, 30(37), 12329–12339. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Love, S. A. , Petrini, K. , Pernet, C. R. , Latinus, M. , & Pollick, F. E. (2018). Overlapping but divergent neural correlates underpinning audiovisual synchrony and temporal order judgments. Frontiers in Human Neuroscience, 12, 274. 10.3389/fnhum.2018.00274 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lüttke, C. S. , Ekman, M. , van Gerven, M. A. J. , & de Lange, F. P. (2016). Preference for audiovisual speech congruency in superior temporal cortex. Journal of Cognitive Neuroscience, 28(1), 1–7. 10.1162/jocn_a_00874 [DOI] [PubMed] [Google Scholar]
- Liu, Y. , Dunlap, S. , Fiez, J. , & Perfetti, C. (2007). Evidence for neural accommodation to a writing system following learning. Human Brain Mapping, 28, 1223–1234. 10.1002/hbm.20356 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Macaluso, E. , George, N. , Dolan, R. , Spence, C. , & Driver, J. (2004). Spatial and temporal factors during processing of audiovisual speech: A PET study. NeuroImage, 21(2), 725–732. 10.1016/j.neuroimage.2003.09.049 [DOI] [PubMed] [Google Scholar]
- Man, K. , Damasio, A. , Meyer, K. , & Kaplan, J. T. (2015). Convergent and invariant object representations for sight, sound, and touch. Human Brain Mapping, 36(9), 3629–3640. 10.1002/hbm.22867 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matchin, W. , Groulx, K. , & Hickok, G. (2014). Audiovisual speech integration does not rely on the motor system: Evidence from articulatory suppression, the McGurk effect, and fMRI. Journal of Cognitive Neuroscience, 26(3), 606–620. 10.1162/jocn_a_00515 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marchant, J. L. , Ruff, C. C. , & Driver, J. (2012). Audiovisual synchrony enhances BOLD responses in a brain network including multisensory STS while also enhancing target‐detection performance for both modalities. Human Brain Mapping, 33, 1212–1224. 10.1002/hbm.21278 [DOI] [PMC free article] [PubMed] [Google Scholar]
- McCormick, K. , Lacey, S. , Stilla, R. , Nygaard, L. C. , & Sathian, K. (2018). Neural basis of the crossmodal correspondence between auditory pitch and visuospatial elevation. Neuropsychologia, 112, 19–30. 10.1016/j.neuropsychologia.2018.02.029 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mercure, E. , Bright, P. , Quiroz, I. , & Filippi, R. (2022). Effect of infant bilingualism on audiovisual integration in a McGurk task. Journal of Experimental Child Psychology, 217, 105351. 10.1016/j.jecp.2021.105351 [DOI] [PubMed] [Google Scholar]
- Miller, L. M. (2005). Perceptual fusion and stimulus coincidence in the cross‐modal integration of speech. The Journal of Neuroscience, 25, 5884–5893. 10.1523/JNEUROSCI.0896-05.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Müller, V. I. , Cieslik, E. C. , Laird, A. R. , Fox, P. T. , Radua, J. , Mataix‐Cols, D. , Tench, C. R. , Yarkoni, T. , Nichols, T. E. , Turkeltaub, P. E. , Wager, T. D. , & Eickhoff, S. B. (2018). Ten simple rules for neuroimaging meta‐analysis. Neuroscience and Biobehavioral Reviews, 84, 151–161. 10.1016/j.neubiorev.2017.11.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Murray, M. A. (2017). Elementary egyptian grammar. Bernard Quaritch. [Google Scholar]
- Naghavi, H. R. , Eriksson, J. , Larsson, A. , & Nyberg, L. (2007). The claustrum/insula region integrates conceptually related sounds and pictures. Neuroscience Letters, 422, 77–80. 10.1016/j.neulet.2007.06.009 [DOI] [PubMed] [Google Scholar]
- Nath, A. R. , & Beauchamp, M. S. (2011). Dynamic changes in superior temporal sulcus connectivity during perception of noisy audiovisual speech. The Journal of Neuroscience, 31, 1704–1714. 10.1523/JNEUROSCI.4853-10.2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nelson, J. R. , Liu, Y. , Fiez, J. , & Perfetti, C. A. (2009). Assimilation and accommodation patterns in ventral occipitotemporal cortex in learning a second writing system. Human Brain Mapping, 30, 810–820. 10.1002/hbm.20551 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nichols, E. S. , Gao, Y. , Fregni, S. , Liu, L. , & Joanisse, M. F. (2021). Representational dissimilarity of first and second language in the bilingual brain. Human Brain Mapping, 42, 5433–5445. 10.1002/hbm.25633open_in_new [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nichols, E. S. , & Joanisse, M. F. (2016). Functional activity and white matter microstructure reveal the independent effects of age of acquisition and proficiency on second‐language learning. NeuroImage, 143, 15–25. 10.1016/j.neuroimage.2016.08.053 [DOI] [PubMed] [Google Scholar]
- Noesselt, T. , Rieger, J. W. , Schoenfeld, M. A. , Kanowski, M. , Hinrichs, H. , Heinze, H.‐ J. , Driver, J . (2007). Audiovisual temporal correspondence modulates human multisensory superior temporal sulcus plus primary sensory cortices. Journal of Neuroscience, 27(42), 11431–11441. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Noesselt, T. , Tyll, S. , Boehler, C. N. , Budinger, E. , Heinze, H.‐. J. , & Driver, J. (2010). Sound ‐ induced enhancement of low ‐ intensity vision: Multisensory influences on human sensory ‐ specific cortices and thalamic bodies relate to perceptual enhancement of visual detection sensitivity. Journal of Neuroscience, 30(41), 13609–13623. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Noesselt, T. , Bergmann, D. , Heinze, H.‐J. , Münte, T. , & Spence, C. (2012). Coding of multisensory temporal patterns in human superior temporal sulcus. Frontiers in Integrative Neuroscience, 6, 64. 10.3389/fnint.2012.00064 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Okada, K. , & Hickok, G. (2006). Identification of lexical–phonological networks in the superior temporal sulcus using functional magnetic resonance imaging. Neuroreport, 17, 1293–1296. 10.1097/01.wnr.0000233091.82536.b2 [DOI] [PubMed] [Google Scholar]
- Ojanen, V. , Möttönen, R. , Pekkola, J. , Jääskeläinen, I. P. , Joensuu, R. , Autti, T. , & Sams, M. (2005). Processing of audiovisual speech in Broca's area. NeuroImage, 25(2), 333–338. 10.1016/j.neuroimage.2004.12.001 [DOI] [PubMed] [Google Scholar]
- Olivetti Belardinelli, M. , Sestieri, C. , Di Matteo, R. , Delogu, F. , Del Gratta, C. , Ferretti, A. , Caulo, M. , Tartaro, A. , & Romani, G. L. (2004). Audio ‐ visual crossmodal interactions in environmental perception: An fMRI investigation. Cognitive Processing, 5(3), 167–174. [DOI] [Google Scholar]
- Pekkola, J. , Laasonen, M. , Ojanen, V. , Autti, T. , Jääskeläinen, I. P. , Kujala, T. , & Sams, M. (2006). Perception of matching and conflicting audiovisual speech in dyslexic and fluent readers: An fMRI study at 3 T. NeuroImage, 29, 797–807. 10.1016/j.neuroimage.2005.09.069 [DOI] [PubMed] [Google Scholar]
- Perfetti, C. A. , Liu, Y. , Fiez, J. , Nelson, J. , Bolger, D. J. , & Tan, L.‐H. (2007). Reading in two writing systems: Accommodation and assimilation of the brain's reading network. Bilingualism: Language and Cognition, 10, 131–146. 10.1017/S1366728907002891 [DOI] [Google Scholar]
- Piaget, J. , & Mays, W. (1972). The principles of genetic epistemology: Selected works (Vol. 7). Routledge & Kegan Paul Ltd. [Google Scholar]
- Plank, T. , Rosengarth, K. , Song, W. , Ellermeier, W. , & Greenlee, M. W. (2012). Neural correlates of audio ‐ visual object recognition: Effects of implicit spatial congruency. Human Brain Mapping, 33(4), 797–811. 10.1002/hbm.21254 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Porada, D. K. , Regenbogen, C. , Freiherr, J. , Seubert, J. , & Lundström, J. N. (2021). Trimodal processing of complex stimuli in inferior parietal cortex is modality ‐ independent. Cortex, 139, 198–210. 10.1016/j.cortex.2021.03.008 [DOI] [PubMed] [Google Scholar]
- Preston, J. L. , Molfese, P. J. , Frost, S. J. , Mencl, W. E. , Fulbright, R. K. , Hoeft, F. , Landi, N. , Shankweiler, D. , & Pugh, K. R. (2016). Print‐speech convergence predicts future reading outcomes in early readers. Psychological Science, 27, 75–84. 10.1177/0956797615611921 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Price, C. J. (2012). A review and synthesis of the first 20 years of PET and fMRI studies of heard speech, spoken language and reading. NeuroImage, 62, 816–847. 10.1016/j.neuroimage.2012.04.062 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Raij, T. , Uutela, K. , & Hari, R. (2000). Audiovisual integration of letters in the human brain. Neuron, 28, 617–625. 10.1016/S0896-6273(00)00138-0 [DOI] [PubMed] [Google Scholar]
- Rueckl, J. G. , Paz‐Alonso, P. M. , Molfese, P. J. , Kuo, W.‐J. , Bick, A. , Frost, S. J. , Hancock, R. , Wu, D. H. , Mencl, W. E. , Dunabeitia, J. A. , Lee, J.‐R. , Oliver, M. , Zevin, J. D. , Hoeft, F. , Carreiras, M. , Tzeng, O. J. L. , Pugh, K. R. , & Frost, R. (2015). Universal brain signature of proficient reading: Evidence from four contrasting languages. Proceedings of the National Academy of Sciences of the United States of America, 112, 15510–15515. 10.1073/pnas.1509321112 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schönwiesner, M. , Novitski, N. , Pakarinen, S. , Carlson, S. , Tervaniemi, M. , & Näätänen, R. (2007). Heschl's gyrus, posterior superior temporal gyrus, and mid‐ventrolateral prefrontal cortex have different roles in the detection of acoustic changes. Journal of Neurophysiology, 97, 2075–2082. 10.1152/jn.01083.2006 [DOI] [PubMed] [Google Scholar]
- Sestieri, C. , Di Matteo, R. , Ferretti, A. , Del Gratta, C. , Caulo, M. , Tartaro, A. , Olivetti Belardinelli, M. , & Romani, G. L. (2006). “What” versus “where” in the audiovisual domain: An fMRI study. NeuroImage, 33, 672–680. 10.1016/j.neuroimage.2006.06.045 [DOI] [PubMed] [Google Scholar]
- Shinozaki, J. , Hiroe, N. , Sato, M. , Nagamine, T. , & Sekiyama, K. (2016). Impact of language on functional connectivity for audiovisual speech integration. Scientific Reports, 6, 31388. 10.1038/srep31388 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skipper, J. I. , van Wassenhove, V. , Nusbaum, H. C. , & Small, S. L. (2007). Hearing lips and seeing voices: how cortical areas supporting speech production mediate audiovisual speech perception. Cerebral Cortex, 17(10), 2387–2399. 10.1093/cercor/bhl147 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Szycik, G. R. , Jansma, H. , & Münte, T. F. (2009). Audiovisual integration during speech comprehension: An fMRI study comparing ROI ‐ based and whole brain analyses. Human Brain Mapping, 30(7), 1990–1999. 10.1002/hbm.20640 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Szycik, G. R. , Münte, T. F. , Dillo, W. , Mohammadi, B. , Samii, A. , Emrich, H. M. , & Dietrich, D. E. (2009). Audiovisual integration of speech is disturbed in schizophrenia: An fMRI study. Schizophrenia Research, 110(1–3), 111–118. 10.1016/j.schres.2009.03.003 [DOI] [PubMed] [Google Scholar]
- Szycik, G. R. , Tausche, P. , & Münte, T. F. (2008). A novel approach to study audiovisual integration in speech perception: Localizer fMRI and sparse sampling. Brain Research, 1220, 142–149. 10.1016/j.brainres.2007.08.027 [DOI] [PubMed] [Google Scholar]
- Starke, J. , Ball, F. , Heinze, H. , & Noesselt, T. (2020). The spatio‐temporal profile of multisensory integration. The European Journal of Neuroscience, 51, 1210–1223. 10.1111/ejn.13753 [DOI] [PubMed] [Google Scholar]
- Stevenson, R. A. , & James, T. W. (2009). Audiovisual integration in human superior temporal sulcus: Inverse effectiveness and the neural processing of speech and object recognition. NeuroImage, 44, 1210–1223. 10.1016/j.neuroimage.2008.09.034 [DOI] [PubMed] [Google Scholar]
- Stevenson, R. A. , VanDerKlok, R. M. , Pisoni, D. B. , & James, T. W. (2011). Discrete neural substrates underlie complementary audiovisual speech integration processes. NeuroImage, 55, 1339–1345. 10.1016/j.neuroimage.2010.12.063 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Szycik, G. R. , Stadler, J. , Tempelmann, C. , & Muente, T. F. (2012). Examining the McGurk illusion using high‐field 7 Tesla functional MRI. Frontiers in Human Neuroscience, 6, 95. 10.3389/fnhum.2012.00095 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tan, L. , Spinks, J. , Feng, C. , Siok, W. , Perfetti, C. , Xiong, J. , Fox, P. , & Gao, J. (2003). Neural systems of second language reading are shaped by native language. Human Brain Mapping, 18, 158–166. 10.1002/hbm.10089 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tietze, F.‐A. , Hundertmark, L. , Roy, M. , Zerr, M. , Sinke, C. , Wiswede, D. , Walter, M. , Münte, T. F. , & Szycik, G. R . (2019). Auditory deficits in audiovisual speech perception in adult Asperger's syndrome: fMRI study. Frontiers in Psychology, 10, 2286. 10.3389/fpsyg.2019.02286 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Treille, A. , Vilain, C. , Hueber, T. , Lamalle, L. , & Sato, M. (2017). Inside speech: Multisensory and modality ‐ specific processing of tongue and lip speech actions. Journal of Cognitive Neuroscience, 29(3), 448–466. 10.1162/jocn_a_01057 [DOI] [PubMed] [Google Scholar]
- Turkeltaub, P. E. , Eickhoff, S. B. , Laird, A. R. , Fox, M. , Wiener, M. , & Fox, P. (2012). Minimizing within‐experiment and within‐group effects in activation likelihood estimation meta‐analyses. Human Brain Mapping, 33, 1–13. 10.1002/hbm.21186 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ujiie, Y. , Yamashita, W. , Fujisaki, W. , Kanazawa, S. , & Yamaguchi, M. K. (2018). Crossmodal association of auditory and visual material properties in infants. Scientific Reports, 8, 9301. 10.1038/s41598-018-27153-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- van Atteveldt, N. , Formisano, E. , Goebel, R. , & Blomert, L. (2004). Integration of letters and speech sounds in the human brain. Neuron, 43, 271–282. 10.1016/j.neuron.2004.06.025 [DOI] [PubMed] [Google Scholar]
- van Atteveldt, N. M. , Blau, V. C. , Blomert, L. , & Goebel, R. (2010). fMR‐adaptation indicates selectivity to audiovisual content congruency in distributed clusters in human superior temporal cortex. BMC Neuroscience, 11, 11. 10.1186/1471-2202-11-11 [DOI] [PMC free article] [PubMed] [Google Scholar]
- van Atteveldt, N. M. , Formisano, E. , Blomert, L. , & Goebel, R. (2006). The effect of temporal asynchrony on the multisensory integration of letters and speech sounds. Cerebral Cortex, 17, 962–974. 10.1093/cercor/bhl007 [DOI] [PubMed] [Google Scholar]
- van Atteveldt, N. M. , Formisano, E. , Goebel, R. , & Blomert, L. (2007). Top–down task effects overrule automatic multisensory responses to letter–sound pairs in auditory association cortex. NeuroImage, 36, 1345–1360. 10.1016/j.neuroimage.2007.03.065 [DOI] [PubMed] [Google Scholar]
- van der Linden, M. , van Turennout, M. , & Fernandez, G. (2011). Category training induces cross ‐ modal object representations in the adult human brain. Journal of Cognitive Neuroscience, 23(6), 1315–1331. 10.1162/jocn.2010.21522 [DOI] [PubMed] [Google Scholar]
- Venezia, J. H. , Vaden, K. I., Jr. , Rong, F. , Maddox, D. , Saberi, K. , & Hickok, G. (2017). Auditory, visual and audiovisual speech processing streams in superior temporal sulcus. Frontiers in Human Neuroscience, 11. 10.3389/fnhum.2017.00174 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wagner, A. D. , Paré‐Blagoev, E. J. , Clark, J. , & Poldrack, R. A. (2001). Recovering meaning. Neuron, 31, 329–338. 10.1016/S0896-6273(01)00359-2 [DOI] [PubMed] [Google Scholar]
- Werner, S. , & Noppeney, U. (2011). The contributions of transient and sustained response codes to audiovisual integration. Cerebral Cortex, 21, 920–931. 10.1093/cercor/bhq161 [DOI] [PubMed] [Google Scholar]
- Wiersinga‐ Post, E. , Tomaskovic, S. , Slabu, L. , Renken, R. , de Smit, F. , & Duifhuis, H. (2010). Decreased BOLD responses in audiovisual processing. NeuroReport, 21(18), 1146–1151. 10.1097/WNR.0b013e328340cc47 [DOI] [PubMed] [Google Scholar]
- Ye, Z. , Rüsseler, J. , Gerth, I. , & Münte, T. F. (2017). Audiovisual speech integration in the superior temporal region is dysfunctional in dyslexia. Neuroscience, 356, 1–10. 10.1016/j.neuroscience.2017.05.017 [DOI] [PubMed] [Google Scholar]
- Yi, H. G. , Leonard, M. K. , & Chang, E. F. (2019). The encoding of speech sounds in the superior temporal gyrus. Neuron, 102, 1096–1110. 10.1016/j.neuron.2019.04.023 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Table S1. Combination of search terms used in literature retrieval.
Table S2. Criteria that contribute to each of the convergence clusters.
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.