A meta‐analysis of letter–sound integration: Assimilation and accommodation in the superior temporal gyrus

Danqi Gao; Xitong Liang; Qi Ting; Emily Sophia Nichols; Zilin Bai; Chaoying Xu; Mingnan Cai; Li Liu

doi:10.1002/hbm.26713

. 2024 Oct 24;45(15):e26713. doi: 10.1002/hbm.26713

A meta‐analysis of letter–sound integration: Assimilation and accommodation in the superior temporal gyrus

Danqi Gao ¹, Xitong Liang ¹, Qi Ting ², Emily Sophia Nichols ³, Zilin Bai ¹, Chaoying Xu ¹, Mingnan Cai ¹, Li Liu ^1,^✉

PMCID: PMC11501095 PMID: 39447213

Abstract

Despite being a relatively new cultural phenomenon, the ability to perform letter–sound integration is readily acquired even though it has not had time to evolve in the brain. Leading theories of how the brain accommodates literacy acquisition include the neural recycling hypothesis and the assimilation–accommodation hypothesis. The neural recycling hypothesis proposes that a new cultural skill is developed by “invading” preexisting neural structures to support a similar cognitive function, while the assimilation–accommodation hypothesis holds that a new cognitive skill relies on direct invocation of preexisting systems (assimilation) and adds brain areas based on task requirements (accommodation). Both theories agree that letter–sound integration may be achieved by reusing pre‐existing functionally similar neural bases, but differ in their proposals of how this occurs. We examined the evidence for each hypothesis by systematically comparing the similarities and differences between letter–sound integration and two other types of preexisting and functionally similar audiovisual (AV) processes, namely object–sound and speech–sound integration, by performing an activation likelihood estimation (ALE) meta‐analysis. All three types of AV integration recruited the left posterior superior temporal gyrus (STG), while speech–sound integration additionally activated the bilateral middle STG and letter–sound integration directly invoked the AV areas involved in speech–sound integration. These findings suggest that letter–sound integration may reuse the STG for speech–sound and object–sound integration through an assimilation–accommodation mechanism.

Keywords: assimilation–accommodation, audiovisual integration, neural reuse, object–sound, speech–sound

Object–sound, speech–sound, and letter–sound audiovisual integration all recruited the left posterior superior temporal gyrus (assimilation). Speech–sound integration additionally activated the bilateral middle superior temporal gyrus (accommodation). And letter–sound integration directly invoked audiovisual areas involved in speech–sound integration (assimilation). Assimilation–accommodation might be the neural reuse mechanism for the acquisition of letter–sound integration.

graphic file with name HBM-45-e26713-g005.jpg

Practitioner Points.

There is a double dissociation between validating and conflicting integration, with the superior temporal gyrus (STG) involved in validating integration and the insula/middle frontal gyrus involved in conflicting integration.
All three types of AV integration recruited the left posterior STG, speech–sound integration additionally activated the bilateral middle STG, and letter–sound integration shared the same AV areas as speech–sound integration.
Letter–sound integration may rely on a neural reuse mechanism of assimilation and accommodation of pre‐existing audiovisual integration.

1. INTRODUCTION

Hearing and seeing are two key perceptual channels that carry rich information about the world, the integration of which provides an obvious survival benefit, helping organisms to identify objects and events more quickly and accurately, and to avoid dangerous environments. As a relatively new cultural skill, reading has emerged and developed so recently that our genes have not yet been able to express an inherently literate brain. However, as a critical step in reading acquisition (Brem et al., 2010; Ehri, 2005; Preston et al., 2016), letter–sound integration, the formation of connections between visual symbols and corresponding speech sounds is acquired effortlessly and rapidly. How then do we successfully achieve letter–sound integration without a neural foundation established from birth?

“Neural reuse” has been proposed as a principle of such brain exaptation (Anderson, 2010, 2021), which describes an adaptation that requires a shift from the function originally served. During both development and evolution, neural circuits built for a specific purpose can retain their original functions while being used for other new purposes, such as reading.

There are two competing theories that attempt to explain how neural reuse occurs in literacy acquisition.

The first is that of neural recycling (Dehaene & Cohen, 2007), which proposes that a new cultural skill is developed by “invading” a preexisting neural niche within a structure to support a similar cognitive function. Learning promotes the reproduction of the neural niches by reorienting a proportion of the originally less specialized neural resources to a new use with limited anatomical reorganization. This hypothesis has been supported by the development of the visual word form area (VWFA) during reading acquisition. The occipitotemporal cortex naturally comprises a mosaic of specialized subareas involved in processing different visual categories such as buildings, faces, and objects (Dehaene et al., 2010; Hasson et al., 2003). With literacy instruction, a new subarea sensitive to written words (VWFA) emerges within this mosaic between the face and the object subareas, where it was initially weakly responsive to any type of visual stimulus (Dehaene‐Lambertz et al., 2018). As visual word form processing, letter–sound integration is also a key process in reading acquisition. With proficiency increases, the majority of individuals are able to integrate letter–sound automatically or even unconsciously. Therefore, it might be possible to develop a specialized area for letter–sound integration. Hence, we expect that the neural recycling mechanism may be extended from visual word form processing to letter–sound integration.

The second theory of neural reuse is the assimilation–accommodation hypothesis derived from Jean Piaget's theory of cognitive development, which offers a promising approach for adapting to endless new cultural skills without reproducing the brain (Piaget & Mays, 1972). Assimilation represents responding to new cognitive requirements by directly accessing similar existing brain network. If the new requirements exceed the function of the current network, accommodation allows a modified network based on the old one by adding supplemental areas and subtracting unnecessary areas. This hypothesis has been well‐supported in second language learning (Perfetti et al., 2007). Some support assimilation by finding overlapping brain activation between first (L1) and second language (L2) processing (Cao et al., 2013; Nichols et al., 2021; Tan et al., 2003); others support accommodation by showing that L2 requires the involvement of additional brain regions not involved in L1 (Liu et al., 2007; Nelson et al., 2009; Nichols & Joanisse, 2016). This suggests that letter–sound integration can directly adopt other similar neural circuits for its own use.

We, therefore, asked which of these two hypotheses could explain the reuse mechanism of letter–sound integration. As there is a common premise of both hypotheses, it is necessary to start with a search for functionally similar candidates for letter–sound integration. Two types of audiovisual (AV) processing are of interest here because they are both inherent functions of the brain and develop earlier than letter–sound integration.

One is object–sound integration, which refers to the process of recognizing an object by seeing its visual image while hearing the natural sound it makes. Studies have demonstrated that 4‐ to 8‐month‐old infants can integrate wood or metal objects with their corresponding sound information (Ujiie et al., 2018) and that 6‐month‐old infants have the ability to fuse asynchronous hand movements and clapping sounds (Kopp, 2014). In the absence of schooling, the area of the occipitotemporal cortex that could have developed into the VWFA was found to be gradually invaded by nearby object representation cortex (Dehaene‐Lambertz et al., 2018). Furthermore, almost all modern alphabets are derived from Egyptian hieroglyphs, whose ideographs are made up of simplified symbols for objects (Murray & Murray, 2017). The close connection between writing and objects in their neural basis and origin raises the possibility of reusing object–sound integration for letter–sound integration.

The other type of AV processing of interest speech–sound integration, which refers to the process of speech perception by hearing speech sounds and seeing the articulatory movements of the speaker's lips. Mercure et al. (2022) found that 7‐ to 10‐month‐old infants are already able to bind speech with sound. Given that a high degree of overlap in brain activation exists between speech and reading in the frontal, parietal, and temporal lobes (Price, 2012; Rueckl et al., 2015), reading ability is likely acquired by linking grapheme regions to the existing spoken networks (Bouhali et al., 2014; Chyl et al., 2018; Dehaene et al., 2015). Thus, letter–sound and speech–sound integration are likely to share certain neural circuits.

Neuroimaging studies of these three types of AV integration have provided favorable evidence for the possibility of reuse.

Beauchamp et al. (2004) investigated the neural basis of object–sound integration by using pictures and sounds of animals and tools as stimuli, finding that the posterior superior temporal sulcus (pSTS) was activated to a greater degree when AV channels were presented simultaneously than when presented alone, and that activation was associated with performance on the object identification task. Similar results have been found for simple physical stimuli such as tones and gratings (Starke et al., 2020; Werner & Noppeney, 2011). Moreover, the superior temporal sulcus and superior temporal gyrus (STS/STG) have also been detected as a site of AV integration in studies manipulating AV congruency or synchronization or signal‐to‐noise ratio (SNR) (Hein et al., 2007; Marchant et al., 2012; Stevenson & James, 2009). In addition to the STG, the frontal lobe, insula and claustrum are sometimes involved in object–sound integration (Naghavi et al., 2007), but these regions are less consistent depending on the experimental paradigm, stimulus characteristics, and analysis methods.

The STS/STG is also identified in speech–sound integration process, regardless of whether the stimuli are at the sub‐lexical level (phonemes, vowel‐consonant combinations) or the lexical level (words, sentences), and regardless of whether the experiment is manipulated for AV congruency, synchrony, or perceptual measures (e.g., the McGurk effect) (Aparicio et al., 2017; Calvert et al., 2000; Francisco et al., 2018; Miller, 2005; Nath & Beauchamp, 2011; Pekkola et al., 2006; Ye et al., 2017). In line with these studies, a meta‐analysis revealed consistent brain activation in the bilateral pSTG (Erickson et al., 2014). Further, the STG was found to be involved in congruent AV speech condition, whereas more dorsolateral regions such as the inferior frontal gyri were involved in incongruent AV speech condition (Erickson et al., 2014).

Raij et al. (2000) investigated the mechanism of letter–sound integration using magnetoencephalography (MEG) and demonstrated that bilateral STS/STG activation was stronger in a matched letter–sound condition than in non‐matched and control conditions. Similarly, fMRI studies in adults and children, as well as individuals with dyslexia, have replicated and reinforced the critical role played by STS/STG in learning grapheme–phoneme connections (Blau et al., 2009, 2010; van Atteveldt et al., 2004).

Taken together, The STS/STG, traditionally considered as classical auditory processing regions (Buchsbaum et al., 2001; Howard et al., 2000; Schönwiesner et al., 2007; Yi et al., 2019), are involved in all three types of AV integration. But the specific locations sensitive to the different types may not be identical. The only within‐subject study to examine object–sound and speech–sound integration found that object–sound integration was located posterior to speech–sound integration along the STS/STG (Stevenson & James, 2009). This suggests a need for straightforward comparisons of object–sound, speech–sound and letter–sound.

We postulate that letter–sound integration is achieved by reusing the pre‐existing AV integration functions of the STS/STG. Specifically, (1) according to the neural recycling hypothesis, letter–sound integration would produce a new sensitive subarea in the STG/STS, that is spatially separate from object–sound and speech–sound integration; (2) according to the assimilation–accommodation hypothesis, letter–sound integration may directly recruit AV areas for object–sound integration or speech–sound integration (assimilation). If the STG/STS is not fully capable, other regions would be co‐activated to support the process (accommodation). In either case, one is sure to see a full or partial overlap between the letter–sound and the object–sound or the speech–sound AV integration areas.

To test these predictions, we conducted an activation likelihood estimation (ALE) meta‐analysis of object–sound, speech–sound, and letter–sound AV integration studies and compared the similarities and differences among the three types of AV integration. As the results are likely to be more stable in the mature brain, and the number of studies of children is too small to perform meta‐analysis, we have focused on adult native speakers of alphabetic languages.

2. MATERIALS AND METHODS

2.1. Literature search and studies selection

We searched “Web of Science” and “ProQuest” for studies on object–sound, speech–sound, and letter–sound AV integration published between January 1900 and October 2022 using titles, abstracts, and keywords as search scope. To reduce the omission of relevant studies, we adopted a “loose‐in” strategy by using broader search terms. Each combination of search terms has three key elements: (1) AV integration, or other synonyms or substitutes used in the literature, which may be common to all types of integration or unique to a particular one; (2) the category of integration (object, speech, reading); and (3) neuroimaging techniques (fMRI, PET) with high spatial resolution that can provide coordinates. For example, “audiovisual, object, fMRI” forms a set of combinations, with each term corresponding to one element (see Table S1 for a full list of the combinations). We also tracked research groups with a long‐standing interest in AV integration and reviewed the references of identified studies for additional publications.

After removing duplicate studies, two rounds of screening were performed to obtain eligible studies (Figure 1). The purpose of the first screening round was to eliminate irrelevant articles due to the loose‐in search strategy by reviewing titles and abstracts. Studies were excluded if they were (1) not related to any type of AV integration (object–sound, speech–sound, or letter–sound) or confounded with other cognitive processes such as attention, emotion, memory or learning; (2) reviews, meta‐analyses, case studies or research on statistical methods; (3) not using fMRI or PET; and (4) not available in full text or not formally published, such as papers in preparation and conference abstracts.

Flowchart of the selection process. The lowercase n indicates the number of studies, which is the number of included papers. The capital N indicates the number of experiments, which is the number of different subject groups.

Once the candidate literature had been significantly narrowed down, full‐text articles were assessed for eligibility in a second screening round. Included studies had to (1) be restricted to alphabetic languages, as the neural basis of integration in ideographic languages is different from that in alphabetic languages (Shinozaki et al., 2016); and (2) include at least one group of healthy adults, excluding children, older adults and clinical populations. More restrictively, only healthy native speakers were included, as the brain mechanisms of speech–sound integration are different for first‐ and second‐language speakers (Barros‐Loscertales et al., 2013). Studies also had to (3) present stimuli in both visual and auditory modalities and use eligible stimulus types. Specifically, object visual stimuli included tools, animals, musical instruments, body parts, natural scenes, simple physical features such as lines, dots, flashes, which can be presented as photographs, pictures, or line drawings. Auditory stimuli were sounds emitted by the corresponding objects. Speech visual stimuli were vocalized lips or faces and auditory stimuli included spoken letters, strings, words, numbers, and sentences. Letter visual stimuli were scripts and auditory stimuli included spoken letters, strings, words, and numbers. Studies had to (4) perform whole‐brain analysis, not just region of interest (ROI) analysis; and (5) perform general linear modeling (GLM). It should be noted that we also retrieved a MEG study (Raij et al., 2000) because the electrodes covered almost the whole brain and it is generally cited as a classic study of letter–sound integration.

2.2. Coordinate retrieval

To ensure reliable and unbiased results, we included studies that met at least one of the following generally accepted criteria (Table 1). If an experiment met multiple criteria, all coordinates obtained according to the eligible contrasts were qualified.

Criterion 1: Multisensory area. The AV integration area should be activated in both the visual‐only and the auditory‐only condition [(A > B) ∩ (V > B), A stands for auditory‐only condition, V stands for visual‐only condition, B stands for baseline, such as fixation, scrambled picture, still face.]
Criterion 2: Interaction effect. The activation of the AV condition is not equivalent to the simple addition of the activities in the visual‐only and auditory‐only conditions [AV ≠ (A + V)] This criterion can be tested by one of the following contrasts: (1) super‐additivity criterion: AV activation is greater than the sum of A and V [AV > (A + V)]; (2) sub‐additivity criterion: AV activation is weaker than the sum of A and V [AV < (A + V)]; (3) max criterion: AV activation is greater than the maximum value of A and V [AV > max (A, V)]; (4) mean criterion: AV activation is greater than the mean value of A and V [AV > mean (A, V)].
Criterion 3: Conceptual congruence. Visual and auditory stimuli are more likely to be integrated when they are semantically or conceptually congruence [AVconC ≠ AVconI, con stands for concept, C stands for congruence, I stands for incongruence].
Criterion 4: Temporal congruence. Visual and auditory stimuli are more likely to be integrated when they present simultaneously [AVtemC ≠ AVtemI, tem stands for temporal].
Criterion 5: Spatial congruence. Visual and auditory stimuli are more likely to be integrated when they have the same spatial location [AVspaC ≠ AVspaI, spa stands for spatial].
Criterion 6: Inverse effect. The lower the signal‐to‐noise ratio (SNR) of auditory or visual stimuli, the greater the benefit of AV stimuli [Benefit = (AV – [A + V])/(A + V), Benefit of AV low‐SNR > Benefit of AV high‐SNR].
Criterion 7: Adaptation effect. For two AV stimuli presented back‐to‐back, brain activation decreases when the second stimulus is identical to the first and increases when the second stimulus is different from the first [AV different > AV identical].
Criterion 8: Perception effect. Subjects report their subjective perception of whether the visual and auditory information is integrated into a whole [AV fuse ≠ AV non‐fuse]. In general, congruence of information is a prerequisite for fusion and the more congruent it is, the easier it is to fuse. One exception is the McGurk effect, in which participants are able to integrate the different auditory and visual information into a completely new concept, such as hearing /ba/, seeing /ga/, perceiving it as /da/ [AV McGurk ≠ AV non‐McGurk; AVC ≠ AV McGurk].
Criterion 9: Type specific effect. This represents the unique integration area of a particular type of AV integration compared with others [AV type1 > AV type2, type stands for one of the three types of AV integration in our study].
Criterion 10: Brain–behavior correlation. The brain areas associated with integrated AV processing should be correlated with the corresponding behavioral indicators [positive or negative correlations].

TABLE 1.

Criteria, corresponding experimental conditions, and contrasts used to extract the coordinates.

Criteria	Experiment conditions	Contrast
Criteria	Experiment conditions	Validating integration	Conflicting integration
Multisensory area	A V B	A > B ∩ V > B
Interaction effect	A V AV	AV > (A + V) AV < (A + V) AV > max (A, V) AV > mean (A, V)
Conceptual congruence	Conceptually congruent and incongruent AV	AVconC > AVconI	AVconC < AVconI
Temporal congruence	Temporally congruent and incongruent AV	AVtemC > AVtemI	AVtemC< AVtemI
Spatial congruence	Spatially congruent and incongruent AV	AVspaC > AVspaI	AVspaC < AVspaI
Inverse effect	AV with different SNR	Benefit of AV low‐SNR > Benefit of AV high‐SNR
Adaptation effect	Two identical or different AV pairs presented one after the other	AV different > AV identical
Perceptual effect	Fused and unfused AV	AV fuse > AV non‐fuse AV McGurk > AV non‐McGurk AVC > AV McGurk	AV fuse < AV non‐fuse AV McGurk < AV non‐McGurk AVC < AV McGurk
Type specific effect	Two types of AV integration	AV type1 > AV type2
Brain–behavior correlation	Brain and behavioral indicators for AV	Positive correlation	Negative correlation

Open in a new tab

Abbreviations: A, auditory stimuli; AV, audiovisual stimuli; B, baseline; C, congruence; con, conceptual; I, incongruence; SNR, signal‐noise ratio; spa, spatial; tem, temporal; V, visual stimuli.

Additionally, inspired by the meta‐analysis study on speech–sound integration (Erickson et al., 2014), which found that validating AV speech recruits ventral stream areas and conflicting AV speech recruits dorsal stream areas, we divided all involved contrasts into two classes. Validation of AV contrasts means that (1) visual and auditory information are conceptually, temporally, or spatially congruent; or that (2) subjects report that information from different sensory channels is fused into a whole; or that (3) the researcher specifies that the detected area contributes to multisensory integration. Conflicting integration is the opposite of validating integration (Table 1).

All of the above steps for screening studies and extracting coordinates were first carried out independently by three graduate students. Controversial studies were then discussed together to decide on their inclusion. The final study list included 33 object–sound experiments with 248 foci in 508 individuals, 33 speech–sound experiments with 371 foci in 458 individuals, and 10 letter–sound experiments with 56 foci in 140 individuals (Table 2).

TABLE 2.

Studies included in the meta‐analysis.

Studies	Number of subjects (male/female)	Mean age (SD/range)	Handiness (right/left)	Language	Stimuli	Task	Baseline	Criteria	Validating integration			Conflicting integration
Studies	Number of subjects (male/female)	Mean age (SD/range)	Handiness (right/left)	Language	Stimuli	Task	Baseline	Criteria	Contrast	Source	Number of foci	Contrast	Source	Number of foci
Object–sound integration
Baumann et al. (2018)	34 (18/16)	21 (18–27)	~		Gabor grating, tone	Judgment		Time congruency				AVtemC < AVtemI	Table 4	3
Beauchamp et al. (2004)	26 (~/~)	~ (~)	~		Animal, tool	Judgment	Fixation	Multisensory area	(A > B) ∩ (V > B)	Table 1	1
Olivetti Belardinelli et al. (2004)	13 (13/0)	22.8 (~)	13/0		Animal, tool, human	Identification		Interaction	AV > max (A, V)	Table 1	12	AVI > max (A, V)	Table 2	5
Bushara et al. (2001)	12 (7/5)	~ (27–56)	12/0		Circle, tone	Judgment		Time congruency				AVtemC < AVtemI	Table 1	9
Bushara et al. (2003)	7 (3/4)	~ (~)	7/0		Bar, collision sound	Judgment		Perception	AV fused > AV non‐fused	Table 2	8	AV fused < AV non‐fused	Table 2	6
	10 (5/5)	24.0 (2.2)			Artificial object	Passive perception		Interaction	AV > (A + V)	Table 1	2
Butler et al. (2011)	10 (5/5)	24.7 (4.1)	10/0		Artificial object	Passive perception		Interaction	AV < (A + V)	Table 1	1
Butler and James (2013)	15 (6/9)	23 (3)	10/0		Artificial object	Passive perception	Fixation	Multisensory area	(A > B) ∩ (V > B)	Table 1e	2
Hein et al. (2007)	18 (11/7)	29.8 (23–41)	17/1		Animal	Passive perception	Fixation	Multisensory area and Interaction	(A > B) ∩ (V > B) ∩ [AVC > max (A, V)]	Figure 2	3	(A > B) ∩ (V > B) ∩ [AVI > max (A, V)]	Figure 2	2
Hocking and Price (2009)	17 (6/12)	26 (20–36)	17/0		Object	Judgment		Type specific effect	AV object–sound > AV letter–sound	Table 3	1
James et al. (2011)	12 (6/6)	21.7 (~)	12/0		Tool	One‐back	Scrambled A or V	Multisensory area	(A > B) ∩ (V > B)	Table 1	6
James et al. (2011b)	12 (6/6)	21.7 (~)	12/0		Tool	One‐back	Scrambled A or V	Multisensory area	(A > B) ∩ (V > B)	Table 1	6
Kassuba et al. (2011)	19 (9/10)	25.37 (21–33)	19/0		Object	Judgment	Texture	Multisensory area‐and type‐specific effect	(A > B) ∩ (V > B) ∩ (A > rest > V > rest)	Table 1	1
Laing et al. (2015)	9 (7/2)	24 (21–26)	9/0		Cuboid, tone	Judgment		Congruency effect	AVconC > AVconI	Table 4	4
Laurienti et al. (2003)	12 (9/7)	32 (~)	15/1		Object	Judgment		Congruent effect	AVconC > AVconI	Table 1	1	AVconC < AVconI	Table 1	2
								Time congruency	AVtemC > AVtemI	Table 1	14
Lewis and Noppeney (2010)	16 (11/5)	~ (18–31)	14/2		Dot, click	Judgment		Brain–behavior correlation	Positive correlation	Table 1	1
van der Linden et al. (2011)	16 (5/11)	21.6 (18–26)	~		Bird	Passive perception		Congruent effect	AVconC > AVconI	Table 2	34	AVconC < AVconI	Table 2	3
Love et al. (2018)	20 (10/10)	24 (20–32)	20/0		Drummer beat	Judgment		Time congruency	AVtemC > AVtemI	Figure 5	2	AVtemC < AVtemI	Figure 5	4
Man et al. (2015)	18 (10/8)	~ (~)	18/0		Object	Passive perception	Rest	Multisensory area	(A > B) ∩ (V > B)	Figure 2	1
Marchant et al. (2012)	16 (7/9)	24.7 (~)	16/0		Checkerboard, tone	Detection		Time congruency	AVtemC > AVtemI	Table 2	9
						One‐back		Time congruency	AVtemC > AVtemI	Table 2a	2	AVtemC < AVtemI	Table 2b	3
McCormick et al. (2018)	18 (9/9)	24.75 (~)	18/0		Circle, tone	Oddball		Congruent effect	AVconC > AVconI	Table 4a	8
Naghavi et al. (2007)	23 (12/11)	24 (19–30)	23/0		Animal, tool	Passive perception		Congruent effect	AVconC > AVconI	Figure 1	3
Noesselt et al. (2007)	24 (14/10)	24 (~)	~		Optic fibers, tone	Detection	Rest	Multisensory area and time congruency	(A > B) ∩ (V > B) ∩ (AVtemC > AVtemI)	Table 2	2
							Rest	Multisensory area and inverse effect	(A > B) ∩ (V > B) ∩ (Benefit of AV low‐SNR > Benefit of AV high‐SNR)	Table 1	6
Noesselt et al. (2010)	12 (6/6)	~ (21–29)	~		Grating, tone	Detection		Brain–behavior correlation	Positive correlation	Table 3	4
Noppeney et al. (2010)	19 (7/12)	22.1 (19–26)	18/1		Tool, instrument	Judgment		Congruent effect				AVconC < AVconI	Table 2	3
								Space congruency	AVspaC > AVspaI	Results	1
Plank et al. (2012)	15 (4/11)	~ (20–34)	13/2		Animal, tool, vehicle, instrument	Judgment		Brain–behavior correlation	Positive correlation	Figure 7	1
Porada et al. (2021)	16 (9/7)	26.9 (3.2)	16/0		Object	Judgment		Interaction	AV > (A + V)	Table A.1	13
Starke et al. (2020)	20 (13/7)	26.2 (21–33)	~		Checkerboard, tone	Detection		Interaction	AV > (A + V)	Table 3	10
						Identification		Congruent effect	AVconC > AVconI	Table 3	1
Sestieri et al. (2006)	10 (10/0)	26.2 (20–34)	10/0		Animal, weapon, instrument, vehicle	Judgment		Space congruency	AVspaC > AVspaI	Table 4	2
	Study1 11 (5/6)	25.9 (~)	11/0		Tool	Identification	A or V noise	Multisensory area	(A > B) ∩ (V > B)	Table 1	2
Stevenson and James (2009)	Study2 11 (5/6)	24.4 (~)	11/0		Tool	Identification	A or V noise	Multisensory area	(A > B) ∩ (V > B)	Table 1	2
Stevenson et al. (2011)	Study1 10 (5/6)	26.5 (~)	11/0		Tool	Identification		Inverse effect	Benefit of AV low‐SNR > Benefit of AV high‐SNR	Table 1	9	Benefit of AV low‐SNR < Benefit of AV high‐SNR	Table 1	6
									AV > (A + V)	Table 1	3
						Categorization		Interaction	AV < (A + V)	Table 1	8
								Inverse effect	Benefit of AV low‐SNR > Benefit of AV high‐SNR	Table 1	7
Werner and Noppeney (2011)	20 (10/10)	25.8 (~)	20/0		Tool, instrument	Detection		Brain–behavior correlation	Positive correlation	Table 1	6
Werner and Noppeney (2011)	17 (9/8)	26.5 (~)	17/0		Dot, tone	Detection	Fixation	Multisensory area and interaction	(A > B) ∩ (V > B) ∩ [AV > (A + V)]	Table 1	3
Speech–sound integration
Aparicio et al. (2017)	15 (6/9)	25.2 (20–37)	15/0	French	Word	Detection	Still face	Multisensory area and interaction	(A > B) ∩ (V > B) ∩ [AV > (A + V)]	Table 5	3
Barros‐Loscertales et al. (2013)	42 (23/19)	~ (20–46)	~	English, Spanish	Sentence	Passive perception	Rest	Multisensory area and interaction	(A > B) ∩ (V > B) ∩ AVC > max (A, V)	Table 3	3
								Perceptual effect				AVC < AV McGurk	Table 2	14
								Congruent effect				AVconC < AVconI	Table 2	14
Benoit et al. (2009)	15 (5/11)	29.6 (19–47)	15/0	English	McGurk (CV)	Judgment		Perception				AV McGurk < AV non‐McGurk	Table 2	12
Callan et al. (2003)	6 (6/0)	~ (20–45)	6/0	English	Word	Passive perception	Still face	Inverse effect	(AV low‐SNR > B) ∩ (Benefit of AV low‐SNR > Benefit of AV high‐SNR)	Table 1	12
Calvert et al. (2000)	5 (3/2)	35 (24–49)	5/0	English	Number	Rehearsing		Interaction	AV > max (A, V)	Table 1	6
Calvert et al. (2000)	10 (5/5)	30.1 (22–45)	10/0	English	Sentence	Passive perception	Rest	Multisensory area and interaction	(A > B) ∩ (V > B) ∩ [AVC > (A + V)]	Table 1	9	(A > B) ∩ (V > B) ∩ [AVI < (A + V)]	Table2	6
Erickson et al. (2014)	10 (4/6)	25.72 (3.01)	10/0	English	McGurk (CV)	Count		Interaction	AV > max (A, V)	Table 1	6
Fairhall and Macaluso (2009)	12 (6/7)	26.6 (~/~)	12/0	Italian	Story	Passive perception		Congruent effect	AVconC > AVconI	Table 1	6
Francisco et al. (2018)	20 (8/12)	25.75 (4.06)	~	Dutch	Word, CV	One‐back		Interaction	AV > (A + V)	Table B3	6
								Congruent effect				AVconC < AVconI	Table 2	21
Gau and Noppeney (2016)	16 (10/6)	30.1 (22–45)	16/0	German	McGurk (CV)	Identification		Perception				AV McGurk < AV non‐McGurk	Table 2	1
Lee and Noppeney (2011a)	30 (15/16)	24.7 (~)	30/0	German	Sentence	Detection		Interaction	AV < (A + V)	Table 3	6
Lee and Noppeney (2011b)	19 (~/~)	27.1 (3.4)	~	German	Sentence	Passive perception		Time congruency				AVtemC < AVtemI	Table S3	6
Lüttke et al. (2016)	23 (3/20)	~ (19–30)	23/0	Dutch	McGurk (CVC)	Identification		Perceptual effect	AVC > AV McGurk	Table 1	4
Macaluso et al. (2004)	8 (8/0)	36 (3)	8/0	English	Word	Detection		Time concurrency	AVtemC > AVtemI	Table 1a	4
Matchin et al. (2014)	20 (8/12)	~ (20–30)	20/0	English	McGurk (CV)	Identification		Perceptual effect				AVC < AV McGurk	Table 3	11
Miller (2005)	11 (5/6)	~ (18–33)	11/0	English	VCV	Judgment		Perception	AV fused > AV non‐fused	Table 1	2	AV fused < AV non‐fused	Table 1	15
								Multisensory area and perception	(A > B) ∩ (V > B) ∩ (AV fused > AV non‐fused)	Table 1	4	(A > B) ∩ (V > B) ∩ (AV fused < AV non‐fused)	Table 1	7
Noesselt et al. (2012)	14 (7/7)	~ (~)	~	German	Sentence	Judgment	Fixation	Multisensory area and time congruency	(A > B) ∩ (V > B) ∩ (AVtemC > AVtemI)	Table 2	3	(A > B) ∩ (V > B) ∩ (AVtemC < AVtemI)	Table 2	15
Ojanen et al. (2005)	10 (5/5)	26 (22–31)	10/0	Finish	Letter	Detection		Congruent effect				AVconC < AVconI	Table 1	4
Pekkola et al. (2006)	10 (6–4)	27.0 (22–34)	10/0	Finish	Letter	Detection		Congruent effect				AVconC < AVconI	Table 3	2
Stevenson and James (2009)	Study2 11 (5/6)	24.4 (~)	11/0	English	Word	Judgment	A or V noise	Multisensory area	(A > B) ∩ (V > B)	Table 1	2
							A or V noise	Multisensory area	(A > B) ∩ (V > B)	Table 1	2
Stevenson et al. (2011)	8 (4/4)	24.1 (~)	8/0	English	Word	Judgment		Time concurrency	(AVtemC > AVtemI)	Table 1	2
							Fixation	Multisensory area	(A > B) ∩ (V > B)	Table 1	2
Stevenson et al. (2011)	12 (6/6)	22.3 (2.8)	12/0	English	Word	Judgment		Time concurrency	(AVtemC > AVtemI)	Table 1	2
Skipper et al. (2007)	13 (~)	~ (~)	13/0	English	McGurk (CV)	Passive perception		Perceptual effect	AVC > AV McGurk	Table 4	57	AVC < AV McGurk	Table 3	30
							Rest	Multisensory area and interaction	(A > B) ∩ (V > B) ∩ [AV > max (A, V)]	Results 2.1	2
Szycik et al. (2008)	12 (~/~)	23 (21–26)	12/0	German	Word	Detection		Congruency effect	AVconC > AVconI	Table 2	1
Szycik et al. (2009)	11 (6/5)	24.6 (21–29)	11/0	German	Word	Detection		Congruent effect		Table 1		AVconC < AVconI		9
Szycik et al. (2009)	15 (7/8)	36.5 (9.4)	15/0	German	Word	Identification		Congruent effect		Table 1		AVconC < AVconI		15
Szycik et al. (2012)	7 (7/5)	~ (21–39)	12/0	German	McGurk (CV)	Identification		Perception	AV McGurk > AV non‐McGurk	Table 2	2	AVC < AV McGurk	Table 2	19
Tietze et al. (2019)	14 (15/1)	33.75 (8.22)	12/2	German	Word	Judgment		Congruent effect		Table 2		AVconC < AVconI		6
Treille et al. (2017)	12 (7/7)	26 (18–44)	12/0	French	CV	Passive perception		Interaction	AV > max (A, V)	Table 5	6
Venezia et al. (2017)	18 (17/3)	~ (~)	18/0	English	CV	Oddball	Rotated A or silent gestured face	Multisensory area	(A > B) ∩ (V > B)	Whole‐Brain results	2
Wiersinga‐Post et al. (2010)	14 (9/5)	~ (22–45)	~	Dutch	McGurk (CVC)	Identification		Perception	Brain‐behavior correlation			Negative correlation	Table 1	7
									AV > mean (A, V)	Table 3	2
Ye et al. (2017)	13 (11/2)	25.3 (6.8)	9/1	German	Number	Judgment		Interaction	AV > max (A, V)	Table 3	1
Letter–sound integration
Blau et al. (2008)	19 (8/11)	21.4 (3.5)	19/0	Dutch	Letter	Judgment		Congruent effect	AVconC > AVconI	Table 1	9
Blau et al. (2009)	13 (9/4)	26.8 (5.4)	~	Dutch	Letter	Passive perception		Congruent effect	AVconC > AVconI	Figure 2	2
Holloway et al. (2015)	18 (9/9)	24 (19–35)	18/0	English	Letter, number	Passive perception		Congruent effect	AVconC > AVconI	Figure 3	3	AVconC < AVconI	Table 1	9
Hocking and Price (2009)	17 (6/12)	26 (20–36)	17/0	English	Word	Judgment		Type specific effect	AV letter–sound > AV object–sound	Table 2	1
Raij et al. (2000)	8 (5/4)	~ (22–32)	8/1	Finish	Letter	Detection	Deformed A or V	Multisensory area and interaction	(A > B) ∩ (V > B) ∩ [AV < (A + V)]	Table 1	5
van Atteveldt et al. (2004)	16 (3/13)	22 (19–27)	16/0	Dutch	Letter	Passive perception	Fixation	Multisensory area and interaction	(A > B) ∩ (V > B) ∩ [AV > max (A, V)]	Table 1	4
								Congruent effect	AVconC > AVconI	Table 2	4
van Atteveldt et al. (2006)	8 (1/7)	23 (19–29)	8/0	Dutch	Letter	Passive perception	Fixation	Multisensory area and interaction	(A > B) ∩ (V > B) ∩ [AV > max (A, V)]	Table 3	1
	Study1 12 (4/8)	23 (20–27)	12/0	Dutch	Letter	Passive perception		Congruent effect	AVconC > AVconI	Table 2	5
van Atteveldt et al. (2007)	Study2 13 (4/9)	23 (18–34)	13/0	Dutch	Letter	Passive perception and judgment		Congruent effect	AVconC > AVconI	Table 3	2	AVconC < AVconI	Table 4	7
van Atteveldt et al. (2010)	16 (6/10)	22.8 (19–32)	16/0	Dutch	Letter	Detection		Adaptation effect	AV different > AV identical	Table 1	4

Open in a new tab

Note: ~ represents that relevant information was not reported in the study. CVC and CV represent strings of consonants and vowels. The other abbreviations in Table 2 are identical to Table 1.

2.3. ALE statistical analysis

2.3.1. Localization of three different types of AV integration

We first localized object–sound, speech–sound, and letter–sound AV integration by conducting three separate ALE analyses in GingerALE 3.0.2 (www.brainmap.org). To build the dataset for each AV integration condition, we converted the coordinates of foci to Montreal Neurological Institute (MNI) space using the tal2mni or bretttal2mni conversion tools provided by Ginger. Coordinates of foci were then organized as an experiment by subject group rather than contrast to eliminate false positives due to within‐group effects (Turkeltaub et al., 2012).

After preparing each AV integration dataset, the ALE analysis steps were conducted as follows. First, ALE was used to model Gaussian probability distributions for all foci reported in a given experiment, centered on the peak coordinates with a width based on an empirical estimate of the spatial uncertainty due to different brain templates and sample sizes. Second, the Gaussian distributions were used to generate the modeled activation map (MA map) for the given experiment, where each value represented the activation probability at an exact voxel. The first two steps were repeated for all included experiments, and the voxel‐wise union of all MA maps yielded the overall ALE map. Finally, above‐chance convergence in the ALE map was obtained by a random‐effect significance test against the null hypothesis that the foci were homogeneously distributed over the brain. Significant convergence clusters were identified at an uncorrected formation‐level threshold of p < .001 and with a family‐wise error (FWE) corrected cluster‐level threshold of p < .05 (Eickhoff et al., 2016, 2017). The minimum cluster size was 200 mm³and the number of permutation tests was 1000. To avoid the results of the meta‐analysis being biased by a non‐homogeneous study (Button et al., 2013; David et al., 2013), we additionally required at least two experimental contributions per convergence cluster in all analyses (Erickson et al., 2014; Turkeltaub et al., 2012).

Although only 10 experiments were included in the letter–sound category, previous work suggests that if the effect is strong, a smaller sample size may be sufficient for a reliable meta‐analysis (Müller et al., 2018). Thus the robustness of the letter–sound integration effect was examined by performing a leave‐one‐out analysis, generating the ALE map with one experiment removed at a time (Enge et al., 2021).

2.3.2. Localization of validating and conflicting AV integration

We further performed ALE analysis on validating and conflicting subcategories for each type of AV integration. As letter–sound integration contained only two conflicting experiments, we obtained only its validating map. The steps for analysis and the threshold used for all subcategories were the same as in the previous analysis.

For object–sound and speech–sound integration, we also compared the similarities and differences between their validating and conflicting ALE maps. The conjunction map was created using the voxel‐level minimum value of the thresholded validating and conflicting ALE maps. A conjunction cluster was required to contain at least 200 mm³. The contrast map was generated by directly subtracting one from the thresholded ALE map of the other. In generating an empirical null distribution, GingerALE merged the foci from both validating and conflicting maps and then randomly divided them into two simulated datasets the same size as the original. Similarly, we conducted a subtraction analysis between the two simulated datasets to obtain their contrast map. Significantly different clusters emerged by comparing the contrast map of the simulated dataset with that of the true dataset. After 10,000 iterations, a voxel‐wise p‐value map was obtained, showing where the values of the true data lie in the distribution of values in that voxel. An uncorrected threshold p < .001 at the voxel level with a minimum cluster volume of 200 mm³ was used (Enge et al., 2021). In addition, we also used a looser threshold of p < .01 and a stricter threshold of p < .0001 to explore convergence trends and the robustness of the results. We then converted the p‐value into a z‐score.

2.3.3. Similarities and differences of the three types of validating AV integration

To determine whether letter–sound integration reuses the previously developed neural basis of AV integration through neural recycling or assimilation–accommodation, we conducted a pairwise conjunction and contrast analysis of the three types of validating AV integration. A total of three pairs of comparisons are involved: letter–sound versus object–sound, letter–sound versus speech–sound, and object–sound versus speech–sound.

However, as the ALE maps of the three types of audiovisual integration were derived from a diverse set of experiments, the nature of the stimuli varied (including simple physical features, concepts, and sub‐lexical and lexical items) and different cognitive processes of integration were engaged (including both the initial binding process and the retrieval of unified multisensory representations). Thus, we conducted an additional screening step to further improve the homogeneity and comparability between the types of integration before comparison.

All letter–sound experiments used stimuli at the conceptual level (letters, numbers, words) and were related to the retrieval process. This is because conceptual stimuli imply that the brain must have stored the corresponding unified representations, and thus AV integration involves the process of retrieving the multisensory representations. To match letter–sound experiments, only object–sound experiments using conceptual stimuli such as animals, tools, instruments, and bodies, rather than simple physical stimuli, were included in the new dataset for comparison. Finally, speech–sound experiments using conceptual stimuli such as letters, words, numbers, and sentences were included. All experiments were concerned with the retrieval process except for the McGurk effect, in which different AV stimuli are bound together to form a new stimulus.

Experiments from the temporal congruence, spatial congruence, and type‐specific effect criteria were dropped. The first two criteria were excluded because none of the experiments on letter–sound integration used both criteria to define integrated brain areas, and studies have found a double dissociation between time/space‐sensitive and integration‐sensitive areas (Miller, 2005; Sestieri et al., 2006; Stevenson et al., 2011). The third criterion was excluded because it potentially favored the neural recycling hypothesis, as the type‐specific effect emphasized differences rather than overlap between different AV integrations.

After obtaining convergent areas for each of the three types of AV integration based on the reorganized datasets, we then performed pairwise comparison. The steps and thresholds were the same as in the previous two analyses.

2.3.4. Subgroup analysis of validating AV integration

In the above analyses, the experiments that formed each ALE dataset used a variety of stimuli. The object–sound experiments included simple physical features (e.g., sounds, fibers) as well as representational concepts (e.g., animals, tools), and speech–sound and letter–sound experiments included both sub‐lexical (e.g., letters) and lexical (e.g., words) stimuli. To investigate whether different stimulus types influence the locations of convergent clusters, we divided each type of AV integration into different subgroups according to the properties of the stimuli.

Object–sound validating integration was categorized into feature (binding) and conceptual (retrieval) subgroups. The feature subgroup was composed of experiments with tones, fibers, gratings, and checkboards. As simple physical features have no inherent representations, and are bound together for the first time in the experiment, we also refer to it as the binding subgroup. The concept subgroup was composed of experiments with animals, tools, instruments, and bodies. As a unified representation must be retrieved in these experiments, we also refer this subgroup as the retrieval subgroup.

Speech–sound validating integration was categorized into sub‐lexical and lexical subgroups. The sub‐lexical subgroup was composed of experiments using letters or strings. The lexical subgroup was composed of experiments using words, numbers or sentences.

Letter–sound validating integration was also categorized into sub‐lexical and lexical subgroups. The sub‐lexical subgroup was composed of experiments with letters. The lexical subgroup was composed of experiments with words and numbers.

If the number of experiments in any subgroup was >5, we obtained its corresponding ALE map, otherwise no meta‐analysis was performed. If the ALE map of two mutually exclusive subgroups could both be obtained, we then produced their conjunction and contrast maps. The steps and thresholds for all analyses were the same as in Sections 2.3.1 and 2.3.2 previous analyses.

3. RESULTS

3.1. Localization of three different types of AV integration

To investigate the brain regions consistently responsible for object–sound, speech–sound and letter–sound AV integration, the following maps were obtained for each AV type (Figure 2). (1) An overall ALE map was obtained based on all contrasts that met the criteria described in the methods. (2) A validating map and (3) a conflicting ALE map were obtained based on contrasts representing validating and conflicting integration, respectively (Table 1). Finally, (4) a validating–conflicting contrast map, (5) a conflicting–validating contrast map, and (6) a validating and conflicting conjunction map were obtained based on the ALE maps from (2) and (3).

Localization of three types of audiovisual integration. From top to bottom are the clusters that are consistently activated in the audiovisual integration of (A) object–sound (blue), (B) speech–sound (green), and (C) letter–sound (red). The maps are, from left to right: (1) the overall map obtained from all contrasts; (2) the validating map obtained from validating contrasts; (3) the conflicting map obtained from conflicting contrasts; (4) the validating–conflicting map obtained from the validating > conflicting contrast; (5) the conflicting–validating map obtained from the validating < conflicting contrast. The z value below each brain slice represents the axial coordinate. For maps (1) to (3), the significant convergence clusters were identified at an uncorrected formation‐level threshold of p < .001 and a family‐wise error corrected cluster‐level thresholds of p < .05 with a minimum cluster size of 200 mm³. The shade of the color represents the ALE statistics with a color bar on the right. For map (4) and (5), the significant clusters were identified at an uncorrected threshold of p < .01 at a voxel level with a minimum cluster volume of 200 mm³. The shade of the color represents the Z statistics with a color bar on the right. L and R stand for left and right hemisphere, respectively. In the lower right corner is an axial slice of z = 10, illustrating the positions of the middle and posterior superior temporal gyrus (mSTG and pSTG). The numbers at the end of the dashed line represent the y coordinates.

3.1.1. Object–sound integration

For object–sound integration (Figure 2a and Table 3), the overall map was derived from 33 experiments with 508 subjects and 248 foci. Five significantly convergent clusters were identified. The two larger clusters were located in the bilateral pSTG/MTG with Y values of −50 and −34 for the left and right peaks with the maximum ALE value respectively. A smaller cluster was found in the left middle STG (mSTG)/MTG with a Y value of −20. Two more convergences were located in the bilateral insula close to the inferior frontal gyrus (IFG).

TABLE 3.

Brain areas consistently active during object–sound integration.

Clusters	Volume (mm³)	x	y	z	ALE	Z	Brain region
Overall map
1	4256	62	−34	18	0.023	5.30	Right insula
		60	−48	12	0.019	4.57	Right superior temporal gyrus
		58	−38	0	0.017	4.21	Right middle temporal gyrus
		48	−40	14	0.011	3.24	Right superior temporal gyrus
2	3808	−54	−50	8	0.026	5.71	Left middle temporal gyrus
		−58	−36	8	0.014	3.65	Left middle temporal gyrus
3	744	−34	24	−2	0.017	4.29	Left insula
		−32	26	6	0.014	3.82	Left insula
4	720	−52	−20	6	0.020	4.75	Left superior temporal gyrus
5	680	38	22	−6	0.017	4.31	Right insula
		36	24	4	0.012	3.26	Right insula
Validating map
1	4368	−54	−50	8	0.026	5.89	Left middle temporal gyrus
		−58	−36	8	0.014	3.80	Left middle temporal gyrus
2	4184	60	−36	12	0.019	4.74	Right superior temporal gyrus
		60	−48	12	0.019	4.73	Right superior temporal gyrus
		58	−38	0	0.016	4.18	Right middle temporal gyrus
3	728	50	−20	10	0.019	4.69	Right transverse temporal gyrus
Conflicting map
1	760	36	24	4	0.011	4.17	Right insula
Validating–conflicting map
1	1968	−58	−40	13		2.65	Left superior temporal gyrus
		−57	−50	8		2.52	Left middle temporal gyrus
		−54	−56	2		2.54	Left middle temporal gyrus
		−52	−44	8		2.52	Left middle temporal gyrus
		−58	−49	10		2.52	Left middle temporal gyrus
		−50	−56	4		2.49	Left middle temporal gyrus
		−61	−38	6		2.48	Left middle temporal gyrus
		−54	−48	2		2.41	Left middle temporal gyrus
		−48	−52	6		2.37	Left middle temporal gyrus
		−61	−41	2		2.36	Left middle temporal gyrus
		−49	−49	9		2.35	Left superior temporal gyrus
Conflicting–validating map
1	328	40	24	4		2.44	Right insula
		36	24	7		2.44	Right insula
Comparable map
1	1952	−50	−48	12	0.016	4.75	Left superior temporal gyrus
		−58	−50	16	0.016	4.72	Left superior temporal gyrus
2	1512	60	−36	16	0.013	4.55	Right superior temporal gyrus
3	720	44	−20	8	0.014	3.92	Right insula
Feature map
1	2560	−54	−50	8	0.018	5.24	Left middle temporal gyrus
		−58	−36	6	0.012	4.06	Left middle temporal gyrus
2	1840	64	−34	10	0.014	4.32	Right superior temporal gyrus
		58	−36	2	0.013	4.17	Right middle temporal gyrus
		54	−34	8	0.013	4.10	Right superior temporal gyrus
		50	−38	2	0.009	3.49	Right superior temporal gyrus
		66	−32	16	0.009	3.45	Right superior temporal gyrus
Concept map
1	1827	−50	−48	12	0.016	4.73	Left superior temporal gyrus
		−58	−50	16	0.016	4.73	Left superior temporal gyrus
2	1408	60	−36	16	0.014	4.16	Right superior temporal gyrus
3	672	44	−20	8	0.014	4.31	Right insula
Feature map ∩ Concept map
1	296	−52	−50	10	0.012		Left superior temporal gyrus

Open in a new tab

The validating map was derived from 30 experiments with 443 subjects and 202 foci. All three clusters identified were located in the temporal cortex. The two larger ones were located in the bilateral pSTG/MTG and were almost identical in location as in the overall map. The smaller cluster was found in the right mSTG/MTG (Y = −20). The conflicting map was derived from 11 experiments with 179 subjects and 46 foci. Only one cluster was found in the right insula, which had the same peak point as in the overall map.

In the contrast analysis, the bilateral pSTG/MTG were more involved in validating integration, while the right insula was more involved in conflicting integration. In the conjunction analysis, we did not find clusters shared by the validating map and conflicting map. Taken together, the results illustrated a double dissociation between validating and conflicting integration.

3.1.2. Speech–sound integration

For speech–sound integration (Figure 2b and Table 4), the overall map was derived from 33 experiments with 458 subjects and 371 foci. We found four significantly convergent clusters in the bilateral temporal and frontal cortex. The largest cluster contained 7 peaks from the left pSTG/MTG extending forward to the mSTG/MTG, with Y values from −54 to −22. The right STG/MTG was relatively concentrated containing 3 peaks with Y values from −34 to −14. The remaining two smaller clusters were distributed in the left and right inferior and middle frontal gyri (IFG/MFG).

TABLE 4.

Brain areas consistently active during speech–sound integration.

Clusters	Volume (mm³)	x	y	z	ALE	Z	Brain region
Overall map
1	7784	−52	−22	10	0.032	6.33	Left superior temporal gyrus
		−60	−28	6	0.024	5.23	Left superior temporal gyrus
		−54	−54	10	0.023	5.08	Left superior temporal gyrus
		−56	−40	10	0.021	4.69	Left superior temporal gyrus
		−62	−34	−2	0.019	4.44	Left middle temporal gyrus
		−64	−36	14	0.018	4.20	Left superior temporal gyrus
		−48	−46	16	0.014	3.57	Left superior temporal gyrus
2	6816	60	−30	4	0.028	5.78	Right superior temporal gyrus
		64	−14	−6	0.021	4.77	Right superior temporal gyrus
		42	−34	4	0.017	4.13	Right superior temporal gyrus
3	1304	−46	24	22	0.025	5.28	Left middle frontal gyrus
		−46	16	32	0.017	3.99	Left middle frontal gyrus
4	768	52	24	22	0.023	4.96	Right middle frontal gyrus
Validating map
1	7768	−54	−20	10	0.031	7.00	Left transverse temporal gyrus
		−58	−40	10	0.019	5.10	Left superior temporal gyrus
		−48	−44	16	0.012	3.84	Left superior temporal gyrus
		−60	−34	−4	0.011	3.60	Left middle temporal gyrus
		−50	−64	10	0.009	3.24	Left middle temporal gyrus
2	6816	58	−34	8	0.024	6.04	Right middle temporal gyrus
		64	−22	6	0.018	5.00	Right superior temporal gyrus
		42	−34	4	0.017	4.83	Right superior temporal gyrus
		64	−14	−6	0.017	4.79	Right superior temporal gyrus
Conflicting map
1	1640	−48	24	24	0.019	4.79	Left middle frontal gyrus
		−46	16	32	0.017	4.43	Left middle frontal gyrus
2	1120	52	24	22	0.023	5.46	Right middle frontal gyrus
Validating–conflicting map
1	1344	−56	−17	4		3.24	Left superior temporal gyrus
Conflicting–validating map
1	1120	50	21	22		3.89	Right middle frontal gyrus
		53	27	21		2.97	Right middle frontal gyrus
2	992	−47	14	32		3.72	Left middle frontal gyrus
		−43	15	35		3.54	Left middle frontal gyrus
		−49	21	29		2.79	Left middle frontal gyrus
Comparable map
1	8240	−54	−20	10	0.031	7.06	Left transverse temporal gyrus
		−58	−40	10	0.019	5.16	Left superior temporal gyrus
		−48	−44	16	0.012	3.89	Left superior temporal gyrus
		−60	−34	−4	0.011	3.65	Left middle temporal gyrus
		−50	−64	10	0.009	3.29	Left middle temporal gyrus
2	7216	58	−34	8	0.024	6.09	Right middle temporal gyrus
		64	−22	6	0.018	5.05	Right superior temporal gyrus
		64	−14	−6	0.017	4.84	Right superior temporal gyrus
		42	−32	2	0.015	4.48	Right caudate
Sub‐lexical map
1	1032	−50	−22	8	0.013	4.51	Left superior temporal gyrus
2	760	66	−32	6	0.014	4.67	Right superior temporal gyrus
Lexical map
1	7208	58	−34	8	0.021	5.97	Right middle temporal gyrus
		64	−20	6	0.017	5.38	Right superior temporal gyrus
		42	−34	4	0.017	5.36	Right superior temporal gyrus
		64	−14	−6	0.017	5.31	Right superior temporal gyrus
2		−54	−20	10	0.019	5.66	Left transverse temporal gyrus
		−60	−26	4	0.014	4.78	Left superior temporal gyrus
		−60	−34	−4	0.011	4.04	Left middle temporal gyrus
3		−56	−40	10	0.014	4.72	Left superior temporal gyrus
		−48	−44	16	0.012	4.32	Left superior temporal gyrus
		−54	−54	10	0.010	3.84	Left superior temporal gyrus
Sub‐lexical map ∩ Lexical map
1	792	−50	−22	8	0.013		Left superior temporal gyrus
2	200	62	−32	6	0.010		Right middle temporal gyrus

Open in a new tab

The validating map was derived from 23 experiments with 314 subjects and 157 foci. Bilateral posterior and middle STG/MTG were consistently activated and 5 criteria contributed to each cluster. The conflicting map was derived from 15 experiments with 199 subjects and 214 foci. Bilateral IFG/MFG were consistently activated (Table S2).

Similar to object–sound integration, in the contrast analysis, we confirmed a double dissociation trend of validating and conflicting. Specifically, validating integration recruited more of the left mSTG/MTG while conflicting integration recruited more of the bilateral IFG/MFG. In the conjunction analysis, there were no brain areas shared by the validating and conflicting maps.

3.1.3. Letter–sound integration

For letter–sound integration (Figure 2c and Table 5), the overall map was derived from 10 experiments with 140 subjects and 56 foci. Two significantly convergent clusters were identified in the bilateral STG. The larger left cluster contained 7 peaks along the left STG with Y values from −46 to −6. The smaller cluster on the right side contained only two very close peaks with Y values of −20 and − 28.

TABLE 5.

Brain areas consistently active during letter–sound integration.

Clusters	Volume (mm³)	x	y	z	ALE	Z	Brain region
Overall map
1	5112	−48	−20	4	0.015	5.02	Left superior temporal gyrus
		−58	−6	6	0.014	4.80	Left superior temporal gyrus
		−62	−38	14	0.012	4.35	Left superior temporal gyrus
		−58	−46	12	0.010	3.92	Left superior temporal gyrus
		−52	−14	12	0.009	3.76	Left transverse temporal gyrus
		−56	−18	10	0.009	3.72	Left transverse temporal gyrus
		−64	−24	16	0.009	3.60	Left postcentral gyrus
2	1576	54	−20	4	0.018	5.47	Right superior temporal gyrus
		42	−28	8	0.008	3.58	Right transverse temporal gyrus
Validating map
1	5800	−48	−20	4	0.015	5.21	Left superior temporal gyrus
		−58	−6	6	0.014	4.99	Left superior temporal gyrus
		−62	−38	14	0.012	4.53	Left superior temporal gyrus
		−58	−46	12	0.010	4.09	Left superior temporal gyrus
		−52	−14	12	0.009	3.94	Left transverse temporal gyrus
		−56	−18	10	0.009	3.90	Left transverse temporal gyrus
		−64	−24	16	0.009	3.78	Left postcentral gyrus
2	1792	54	−20	4	0.018	5.67	Right superior temporal gyrus
		42	−28	8	0.008	3.77	Right transverse temporal gyrus
Comparable map
1	5528	−48	−20	4	0.015	5.24	Left superior temporal gyrus
		−58	−6	6	0.014	5.01	Left superior temporal gyrus
		−62	−38	14	0.012	4.48	Left superior temporal gyrus
		−58	−46	12	0.010	4.11	Left superior temporal gyrus
		−52	−14	12	0.009	3.96	Left transverse temporal gyrus
		−56	−18	10	0.009	3.92	Left transverse temporal gyrus
		−64	−24	16	0.009	3.80	Left postcentral gyrus
2	1816	54	−20	4	0.018	5.70	Right superior temporal gyrus
		42	−28	8	0.008	3.78	Right transverse temporal gyrus
Sub‐lexical map
1	4656	−48	−20	4	0.015	5.29	Left superior temporal gyrus
		−62	−38	14	0.012	4.53	Left superior temporal gyrus
		−58	−46	12	0.010	4.14	Left superior temporal gyrus
		−52	−14	12	0.009	3.97	Left transverse temporal gyrus
		−56	−18	10	0.009	3.94	Left transverse temporal gyrus
		−64	−24	16	0.009	3.84	Left postcentral gyrus
2	1592	54	−20	48	0.018	5.75	Right superior temporal gyrus

Open in a new tab

As the sample size for letter–sound integration was small, the results were susceptible to being influenced by a particular heterogeneous study. To avoid such a bias, the sources of the convergence clusters were verified. We found that all experiments contributed to at least one cluster, with 9 out of 10 for the left STG and 6 out of 10 for the right STG. The leave‐one‐out analysis showed that significant convergence of the bilateral STG was consistently detected after systematically removing each of the experiments, and the center coordinates of the clusters were almost identical (Table 6).

TABLE 6.

The leave‐one‐out analysis of letter–sound integration.

ALE analysis without an experiment	Center MNI coordinates in the left STG (x, y, z)	Center MNI coordinates in the right STG (x, y, z)
Blau et al. (2008)	−62, −36, 14; −48, −20, 4	54, −20, 4
Blau et al. (2009)	−62, −38, 10	54, −20, 2
Holloway et al. (2015)	−62, −38, 10	54, −20, 4
Hocking and Price (2009)	−62, −38, 12	54, −20, 4
Raij et al. (2000)	−62, −38, 10	54, −20, 2
van Atteveldt et al. (2004)	−62, −36, 10	54, −20, 4
van Atteveldt et al. (2006)	−62, −36, 8	54, −20, 4
van Atteveldt et al. (2007), study 1	−62, −38, 10	54, −22, 4
van Atteveldt et al. (2007), study 2	−62, −38, 8; −42, −24, 10	54, −20, 4
van Atteveldt et al. (2010)	−62, −36, 8; −48, −20, 4	54, −20, 4

Open in a new tab

Abbreviation: STG, superior temporal gyrus.

The validating map was derived from 10 experiments with 140 subjects and 40 foci. Again, two significantly convergent clusters were found, located in the bilateral STG.

3.2. Similarities and differences of the three types of AV integration

As described in the methods, we reorganized the datasets to improve homogeneity and re‐obtained the ALE maps of the three AV integrations for comparison (comparable maps, middle row of Figure 4a). Object–sound integration retained 17 experiments with 232 subjects and 101 foci. Two larger clusters in the bilateral pSTG/MTG and a smaller cluster in the right transverse temporal gyrus (TTG)/insula were identified (Table 3). Speech–sound integration retained 19 experiments with 261 subjects and 83 foci. Bilateral STG/MTG from posterior to middle were identified (Table 4). Letter–sound integration retained 9 experiments with 223 subjects and 39 foci. Bilateral STG were identified, with the cluster on the left being 3 times larger than on the right (Table 5).

Convergent clusters of the subgroup analyses for the three types of AV integration. From top to bottom are the clusters that are consistently activated in the subgroup analyses of AV integration of (A) object–sound (blue), (B) speech–sound (green), and (C) letter–sound (red). The z value below each brain slice represents the axial coordinate. Colored areas represent locations of statistically significantly convergent clusters, which were identified at an uncorrected formation‐level threshold of p < .001 and a family‐wise error (FWE) corrected cluster‐level threshold of p < .05 with a minimum cluster size of 200 mm³.

Figure 3a and Table 7 show the common and unique clusters for the three types of AV integration in a pairwise comparison.

The common and unique convergent clusters for three types of audiovisual integration in a pairwise comparison. Colored areas represent locations of statistically significant clusters rather than statistical values. (A) The middle row shows the comparable maps of object–sound (blue), speech–sound (green), and letter–sound (red) audiovisual integration. The two rows above and below show the results of the pairwise comparison of the conjunction and contrast analyses. The z value below each brain slice represents the axial coordinate. For comparable maps, significant convergence clusters were identified at an uncorrected formation‐level threshold of *p <* .001 and an FWE (family‐wise error) corrected cluster‐level thresholds of p < .05 with a minimum cluster size of 200 mm³. For contrast maps, significant clusters were identified at an uncorrected threshold of p < .001 at a voxel level with a minimum cluster volume of 200 mm³. O represents object–sound integration, S represents speech–sound integration, L represents letter–sound integration. No sig. indicates that no significant convergence clusters were found. The numbers at the end of the dashed line represent the y coordinates. The red circles represent the brain region shared by the three types of AV integration. (B) A conceptual assimilation–accommodation scheme for letter–sound integration. Blue clusters are recruited in object–sound integration and are consistently invoked in subsequent speech–sound and letter–sound integrations. Green clusters are recruited synergistically with the blue ones with the acquisition of speech–sound integration and are consistently invoked in subsequent letter–sound integration. Letter–sound integration no longer recruits new regions compared with speech–sound integration. The arrows indicate the development order of the three types of audiovisual integration abilities.

TABLE 7.

Results of the conjunction and contrast analysis for the three audio–visual integrations based on comparable maps.

Clusters	Volume (mm³)	x	y	z	ALE	Z	Brain region
Object‐sound and speech–sound
Object–sound ∩ speech–sound
1	1024	−48	−44	14	0.011		Left middle temporal gyrus
		−54	−52	12	0.011		Left superior temporal gyrus
Speech–sound > object–sound
1	1864	62	−24	−1		3.89	Left superior temporal gyrus
		66	−16	−7		3.72	Left superior temporal gyrus
		70	−16	6		3.54	Left transverse temporal gyrus
2	632	−57	−29	6		3.89	Right superior temporal gyrus
		−58	−26	12		3.54	Right superior temporal gyrus
Letter–sound and speech–sound
Letter–sound ∩ speech–sound
1	1224	−62	−38	14	0.012		Left superior temporal gyrus
		−58	−46	12	0.010		Left superior temporal gyrus
2	880	−50	−22	8	0.012		Left superior temporal gyrus
		−56	−18	10	0.009		Left transverse temporal gyrus
3	384	58	−22	4	0.011		Right superior temporal gyrus
		60	−16	−2	0.009		Right superior temporal gyrus
Letter‐sound and object–sound
Letter–sound ∩ object–sound
1	200	−56	−48	12	0.010		Left superior temporal gyrus
Letter–sound > object–sound
1	1224	−48	−25	7		3.89	Left superior temporal gyrus
		−46	−19	6		3.72	Left superior temporal gyrus
		−46	−14	4		3.43	Left insula
		−43	−21	3		3.35	Left insula
2	280	60	−20	2		3.43	Right superior temporal gyrus
		58	−18	−4		3.35	Right superior temporal gyrus
		55	−24	−1		3.09	Right superior temporal gyrus

Open in a new tab

Both object–sound and speech–sound integration, consistently activated the left pSTG/MTG. Along the STG, slightly more anteriorly, speech–sound integration additionally recruited bilateral mSTG relative to object–sound integration. Object–sound integration did not recruit any additional areas relative to speech–sound integration.

For letter–sound and speech–sound integration, both recruited the left posterior STG/MTG and the bilateral middle STG/MTG. No significant differences were found between AV conditions.

For letter–sound and object–sound integration, only the left pSTG was shared. Letter–sound integration recruited an additional cluster located in the bilateral mSTG relative to object–sound integration. Object–sound integration did not recruit any additional areas relative to letter–sound integration.

In summary (Figure 3b), object–sound, speech–sound and letter–sound integration all recruited the left pSTG. On this basis, speech–sound integration recruited two more additional regions in the bilateral mSTG. Finally, letter–sound integration directly recruited the regions of speech–sound integration.

3.3. Subgroup localizations of three types of validating AV integration

3.3.1. Object–sound integration

Feature and concept subgroups

The feature map was derived from 10 experiments with 159 subjects and 95 foci. Two significantly convergent clusters were identified in the bilateral pSTG/MTG. The concept map was derived from 20 experiments with 284 subjects and 107 foci. Two larger significantly convergent clusters were identified in the bilateral pSTG. A smaller cluster was located in the right middle TTG extending inwards to the insula. Both feature and concept level stimuli activated the left pSTG. No differences were found between the two subgroups (Figure 4a and Table 3).

3.3.2. Speech–sound integration

Sub‐lexical and lexical subgroups

The sub‐lexical map was derived from 8 experiments with 144 subjects and 85 foci. Two significantly convergent clusters were identified in the bilateral mSTG/MTG. The lexical map was derived from 15 experiments with 198 subjects and 72 foci. The bilateral mSTG and the left pSTG were detected. Both subgroups activated the bilateral mSTG and no differences were found between them (Figure 4b and Table 4).

3.3.3. Letter–sound integration

Sub‐lexical and lexical subgroups

The sub‐lexical map was derived from 9 experiments with 123 subjects and 37 foci. Two significantly convergent clusters were identified in the bilateral STG/TTG. Only two experiments used lexical stimuli, so the ALE map for the lexical subgroup was not available (Figure 4c and Table 5).

4. DISCUSSION

The present study links letter–sound integration in reading with the brain's innate ability to perform object–sound and speech–sound integration from a neural reuse perspective at the individual level. Using the ALE meta‐analysis approach, we investigated the neural basis of the three types of AV integration and identified the similarities and differences between them. Our results showed a double dissociation between validating and conflicting integration, with the STG recruited in validating integration and the insula/MFG recruited in conflicting integration. Importantly, the comparison across the three types of AV integration provides support for the assimilation–accommodation hypothesis. All three types of AV integration recruited the left posterior superior temporal gyrus (STG), speech–sound integration additionally activated the bilateral middle STG, and letter–sound integration directly invoked the AV areas involved in speech–sound integration.

4.1. Regions recruited for AV integration: STG and MFG/insula

The vast majority of previous imaging studies have linked the STG/STS to the integration of natural and artificial AV information, during object–sound (Beauchamp et al., 2004) speech–sound (Noesselt et al., 2012), or letter–sound associations (Raij et al., 2000). Our results supported this previous work, showing that bilateral STG was activated during all three types of AV integration. More specifically, the STG was involved in validating integration, as it significantly converged in validating maps for all three integration types and was more active in validating–conflicting maps for object–sound and speech–sound integration. Furthermore, the STG contained the posterior and middle subregions. The left pSTG is an essential cluster as it was recruited regardless of the stimulus types in all three types of AV integration. Although the pSTG is thought to more involved in processing items at the conceptual or lexical level (Choi et al., 2015; Graves et al., 2008; Okada & Hickok, 2006), we found that it was also activated during object–sound integration at the feature level and letter–sound integration at the sub‐lexical level. The mSTG was activated by both sub‐lexical and lexical AV stimuli during speech–sound and letter–sound integration, but not by either type of stimulus during object–sound integration. This is understandable because the mSTG is thought to be associated with the processing of phonological features (Hickok & Poeppel, 2004) or stimuli with temporal linear characteristics (Stevenson et al., 2011), suggesting a greater involvement in phonetic and continuous temporal dynamic detection in language‐related AV integration.

In addition to the STG, the right insula for object–sound and the bilateral MFG/IFG for speech–sound, were also recruited during AV integration. Consistent with previous review and meta‐analysis studies (Doehrmann & Naumer, 2008; Erickson et al., 2014), we found that these regions were more activated in the conflicting and conflicting–validating maps. Because the insula and IFG/MFG have tended to show activation under the same contrast in previous studies, researchers have argued that they have similar rather than different functions (Bushara et al., 2001; Noesselt et al., 2012; Szycik et al., 2012). Due to the multifunctionality of the insula and the MFG/IFG, several interpretations are offered. First, according to the error likelihood hypothesis (Brown & Braver, 2007), the involvement of the insula may reflect its role in error processing and conflict monitoring (Bossaerts, 2010; Botvinick et al., 2004). Second, the MFG/IFG are thought to be selectively involved in the storage, retrieval, and manipulation of semantic representations (Badre & Wagner, 2007; Wagner et al., 2001). Violations of auditory and visual information increase the processing load, leading to higher activation in the conflict condition. Finally, as the IFG has long been considered a multisensory brain region, it may be directly involved in cross‐modal binding (Hein et al., 2007). In this case, the conflicting stimulus can be seen as a new AV association that revises the original concept.

All of the above clusters were derived from a variety of criteria, suggesting that their stable activation may be independent of the experimental paradigm, and more relevant to the type of AV integration they are involved in.

4.2. The assimilation–accommodation mechanism in letter–sound integration

The analysis of the similarities and differences across the three types of AV integration allowed us to investigate whether letter–sound integration supports the neural recycling or the assimilation–accommodation hypothesis. The spatial overlap in the left pSTG and the bilateral mSTG during both letter–sound and speech–sound integration suggests an assimilation mechanism; to acquire later‐developing letter–sound integration, areas for speech–sound integration are directly invoked, instead of producing a subregion in the STG specialized for letter–sound integration.

The present results also support an assimilation–accommodation effect in speech–sound integration. Specifically, speech–sound integration directly invoked the left pSTG of object–sound integration (assimilation) and added the bilateral mSTG (accommodation). In terms of development, it is difficult to determine whether object–sound or speech–sound integration arises earlier in life. However, on an evolutionary scale, it is clear that object–sound stimuli emerged before speech–sound stimuli. Thus, although the assimilation–accommodation is inherently rooted in the process of individual learning during development, our results seem to provide indirect evidence for extending this hypothesis to an evolutionary scale.

Of note, these two mechanisms of reuse may not be contradictory, but rather complementary for different steps in reading acquisition. For visual word processing, numerous studies have supported neural recycling, but for AV integration assimilation–accommodation seems to provide a better explanation. Considering the whole process of literacy acquisition, multiple reuse mechanisms may be needed for different cognitive processes.

4.3. Limitations

It should be noted that we did not find a letter–sound sensitive subarea, possibly due to the inadequate spatial resolution of fMRI. Thus, our hypotheses could be tested with other methods, such as using a rapid adaptation paradigm or higher spatial resolution fMRI. Additionally, tracking the three types of AV integration longitudinally before and after reading acquisition could provide more direct insight into the development of letter–sound integration within the STG. Finally, we cannot assume that the assimilation–accommodation hypothesis can be extended to explain reuse on a broader evolutionary scale. Future work should provide more direct evidence to validate our findings by investigating multisensory integration processes in other populations (i.e., illiterate, sign language and braille users) and writing systems, and through cross‐species comparisons.

5. CONCLUSION

Using an ALE meta‐analysis, we performed pairwise comparisons of the similarities and differences between object–sound, speech–sound, and letter–sound AV integration. Results showed that speech–sound overlapped with object–sound integration in the left pSTG (assimilation) and additionally activated bilateral mSTG (accommodation). Letter–sound overlapped with speech–sound integration in both the left pSTG and bilateral mSTG (assimilation). Given the emerging sequence of three types of AV integration over the lifespan of an individual, letter–sound integration may be achieved by reusing areas that developed early for speech–sound integration, suggesting that assimilation–accommodation could support reading acquisition instead of, or in addition to, neural recycling. The assimilation–accommodation mechanism may have the potential to be extended to explain the evolution of letter–sound or speech–sound integration considering the emerging sequence of the three types of AV integration on an evolutionary time scale, but more direct evidence is needed.

AUTHOR CONTRIBUTIONS

Conceptualization: Danqi Gao, Li Liu. Investigation: Danqi Gao, Xitong Liang, Zilin Bai. Methodology: Danqi Gao, Xitong Liang, Mingnan Cai. Visualization: Danqi Gao. Supervision: Li Liu. Writing – original draft: Danqi Gao. Writing – review and editing: Chaoying Xu, Qi Ting, Emily S. Nichols.

CONFLICT OF INTEREST STATEMENT

The authors declare that they have no competing interests.

Supporting information

Table S1. Combination of search terms used in literature retrieval.

Table S2. Criteria that contribute to each of the convergence clusters.

HBM-45-e26713-s001.docx^{(48.8KB, docx)}

ACKNOWLEDGMENTS

This research was supported by the STI 2030—Major Projects (2021ZD0200500), the National Natural Science Foundation of China (31970977, 31571155), the Interdisciplinary Research Funds of Beijing Normal University and the Fundamental Research Funds for the Central Universities (2015KJJCB28).

Gao, D. , Liang, X. , Ting, Q. , Nichols, E. S. , Bai, Z. , Xu, C. , Cai, M. , & Liu, L. (2024). A meta‐analysis of letter–sound integration: Assimilation and accommodation in the superior temporal gyrus. Human Brain Mapping, 45(15), e26713. 10.1002/hbm.26713

DATA AVAILABILITY STATEMENT

The data that support the findings of this study are available from the corresponding author upon reasonable request.

REFERENCES

Anderson, M. L. (2010). Neural reuse: A fundamental organizational principle of the brain. The Behavioral and Brain Sciences, 33, 245–266. 10.1017/S0140525X10000853 [DOI] [PubMed] [Google Scholar]
Anderson, M. L. (2021). After phrenology: Neural reuse and the interactive brain. MIT Press. [DOI] [PubMed] [Google Scholar]
Aparicio, M. , Peigneux, P. , Charlier, B. , Balériaux, D. , Kavec, M. , & Leybaert, J. (2017). The neural basis of speech perception through lipreading and manual cues: Evidence from deaf native users of cued speech. Frontiers in Psychology, 8, 426. 10.3389/fpsyg.2017.00426 [DOI] [PMC free article] [PubMed] [Google Scholar]
Badre, D. , & Wagner, A. D. (2007). Left ventrolateral prefrontal cortex and the cognitive control of memory. Neuropsychologia, 45, 2883–2901. 10.1016/j.neuropsychologia.2007.06.015 [DOI] [PubMed] [Google Scholar]
Barros‐Loscertales, A. , Ventura‐Campos, N. , Visser, M. , Alsius, A. , Pallier, C. , Avila Rivera, C. , & Soto‐Faraco, S. (2013). Neural correlates of audiovisual speech processing in a second language. Brain and Language, 126, 253–262. 10.1016/j.bandl.2013.05.009 [DOI] [PubMed] [Google Scholar]
Baumann, O. , Vromen, J. M. G. , Cheung, A. , McFadyen, J. , Ren, Y. , & Guo, C. C. (2018). Neural correlates of temporal complexity and synchrony during audiovisual correspondence detection. Eneuro, 5(1), e0294‐17.2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
Beauchamp, M. S. , Lee, K. E. , Argall, B. D. , & Martin, A. (2004). Integration of auditory and visual information about objects in superior temporal sulcus. Neuron, 41, 809–823. 10.1016/S0896-6273(04)00070-4 [DOI] [PubMed] [Google Scholar]
Benoit, M. M. , Raij, T. , Lin, F.‐.H. , Jääskeläinen, I. P. , & Stufflebeam, S. (2009). Primary and multisensory cortical activity is correlated with audiovisual percepts. Human Brain Mapping, 31, 526–538. 10.1002/hbm.20884 [DOI] [PMC free article] [PubMed] [Google Scholar]
Blau, V. , Reithler, J. , van Atteveldt, N. , Seitz, J. , Gerretsen, P. , Goebel, R. , & Blomert, L. (2010). Deviant processing of letters and speech sounds as proximate cause of reading failure: A functional magnetic resonance imaging study of dyslexic children. Brain, 133, 868–879. 10.1093/brain/awp308 [DOI] [PubMed] [Google Scholar]
Blau, V. , van Atteveldt, N. , Ekkebus, M. , Goebel, R. , & Blomert, L. (2009). Reduced neural integration of letters and speech sounds links phonological and reading deficits in adult dyslexia. Current Biology, 19, 503–508. 10.1016/j.cub.2009.01.065 [DOI] [PubMed] [Google Scholar]
Blau, V. , van Atteveldt, N. , Formisano, E. , Goebel, R. , & Blomert, L. (2008). Task‐irrelevant visual letters interact with the processing of speech sounds in heteromodal and unimodal cortex. The European Journal of Neuroscience, 28, 500–509. 10.1111/j.1460-9568.2008.06350.x [DOI] [PubMed] [Google Scholar]
Bossaerts, P. (2010). Risk and risk prediction error signals in anterior insula. Brain Structure & Function, 214, 645–653. 10.1007/s00429-010-0253-1 [DOI] [PubMed] [Google Scholar]
Botvinick, M. M. , Cohen, J. D. , & Carter, C. S. (2004). Conflict monitoring and anterior cingulate cortex: An update. Trends in Cognitive Sciences, 8, 539–546. 10.1016/j.tics.2004.10.003 [DOI] [PubMed] [Google Scholar]
Bouhali, F. , Thiebaut de Schotten, M. , Pinel, P. , Poupon, C. , Mangin, J.‐F. , Dehaene, S. , & Cohen, L. (2014). Anatomical connections of the visual word form area. The Journal of Neuroscience, 34, 15402–15414. 10.1523/JNEUROSCI.4918-13.2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
Brem, S. , Bach, S. , Kucian, K. , Guttorm, T. K. , Martin, E. , Lyytinen, H. , Brandeis, D. , & Richardson, U. (2010). Brain sensitivity to print emerges when children learn letter‐speech sound correspondences. Proceedings of the National Academy of Sciences of the United States of America, 107, 7939–7944. 10.1073/pnas.0904402107 [DOI] [PMC free article] [PubMed] [Google Scholar]
Brown, J. W. , & Braver, T. S. (2007). Risk prediction and aversion by anterior cingulate cortex. Cognitive, Affective, & Behavioral Neuroscience, 7, 266–277. 10.3758/CABN.7.4.266 [DOI] [PubMed] [Google Scholar]
Bushara, K. O. , Hanakawa, T. , Immisch, I. , Toma, K. , Kansaku, K. , & Hallett, M. (2003). Neural correlates of cross ‐ modal binding. Nature Neuroscience, 6(2), 190–195. 10.1038/nn993 [DOI] [PubMed] [Google Scholar]
Butler, A. J. , & James, K. H. (2013). Active Learning of novel sound ‐ producing objects: Motor reactivation and enhancement of visuo ‐ motor connectivity. Journal of Cognitive Neuroscience, 25(2), 203–218. 10.1162/jocn_a_00284 [DOI] [PubMed] [Google Scholar]
Butler, A. J. , James, T. W. , & James, K. H. (2011). Enhanced multisensory integration and motor reactivation after active motor learning of audiovisual associations. Journal of Cognitive Neuroscience, 23(11), 3515–3528. 10.1162/jocn_a_00015 [DOI] [PubMed] [Google Scholar]
Buchsbaum, B. R. , Hickok, G. , & Humphries, C. (2001). Role of left posterior superior temporal gyrus in phonological processing for speech perception and production. Cognitive Science, 25, 663–678. 10.1207/s15516709cog2505_2 [DOI] [Google Scholar]
Bushara, K. O. , Grafman, J. , & Hallett, M. (2001). Neural correlates of auditory‐visual stimulus onset asynchrony detection. The Journal of Neuroscience, 21, 300–304. 10.1523/JNEUROSCI.21-01-00300.2001 [DOI] [PMC free article] [PubMed] [Google Scholar]
Button, K. S. , Ioannidis, J. P. A. , Mokrysz, C. , Nosek, B. A. , Flint, J. , Robinson, E. S. J. , & Munafò, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365–376. 10.1038/nrn3475 [DOI] [PubMed] [Google Scholar]
Callan, D. , Jones, J. , Munhall, K. , Callan, A. , Kroos, C. , & Vatikiotis ‐ Bateson, E. (2003). Neural processes underlying perceptual enhancement by visual speech gestures. Neuroreport, 14(17), 2213–2218. [DOI] [PubMed] [Google Scholar]
Calvert, G. A. , Campbell, R. , & Brammer, M. J. (2000). Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Current Biology, 10, 649–657. 10.1016/S0960-9822(00)00513-3 [DOI] [PubMed] [Google Scholar]
Cao, F. , Tao, R. , Liu, L. , Perfetti, C. A. , & Booth, J. R. (2013). High proficiency in a second language is characterized by greater involvement of the first language network: Evidence from chinese learners of english. Journal of Cognitive Neuroscience, 25, 1649–1663. 10.1162/jocn_a_00414 [DOI] [PMC free article] [PubMed] [Google Scholar]
Choi, Y.‐H. , Park, H. K. , & Paik, N.‐J. (2015). Role of the posterior temporal lobe during language tasks: A virtual lesion study using repetitive transcranial magnetic stimulation. Neuroreport, 26, 314–319. 10.1097/WNR.0000000000000339 [DOI] [PubMed] [Google Scholar]
Chyl, K. , Kossowski, B. , Debska, A. , Luniewska, M. , Banaszkiewicz, A. , Zelechowska, A. , Frost, S. J. , Mencl, W. E. , Wypych, M. , Marchewka, A. , Pugh, K. R. , & Jednorog, K. (2018). Prereader to beginning reader: Changes induced by reading acquisition in print and speech brain networks. Journal of Child Psychology and Psychiatry, 59, 76–87. 10.1111/jcpp.12774 [DOI] [PMC free article] [PubMed] [Google Scholar]
David, S. P. , Ware, J. J. , Chu, I. M. , Loftus, P. D. , Fusar‐Poli, P. , Radua, J. , Munafò, M. R. , & Ioannidis, J. P. A. (2013). Potential reporting bias in fMRI studies of the brain. PLoS One, 8, e70104. 10.1371/journal.pone.0070104 [DOI] [PMC free article] [PubMed] [Google Scholar]
Dehaene, S. , & Cohen, L. (2007). Cultural recycling of cortical maps. Neuron, 56, 384–398. 10.1016/j.neuron.2007.10.004 [DOI] [PubMed] [Google Scholar]
Dehaene, S. , Cohen, L. , Morais, J. , & Kolinsky, R. (2015). Illiterate to literate: Behavioural and cerebral changes induced by reading acquisition. Nature Reviews Neuroscience, 16, 234–244. 10.1038/nrn3924 [DOI] [PubMed] [Google Scholar]
Dehaene, S. , Pegado, F. , Braga, L. W. , Ventura, P. , Filho, G. N. , Jobert, A. , Dehaene‐Lambertz, G. , Kolinsky, R. , Morais, J. , & Cohen, L. (2010). How learning to read changes the cortical networks for vision and language. Science, 330, 1359–1364. 10.1126/science.1194140 [DOI] [PubMed] [Google Scholar]
Dehaene‐Lambertz, G. , Monzalvo, K. , & Dehaene, S. (2018). The emergence of the visual word form: Longitudinal evolution of category‐specific ventral visual areas during reading acquisition. PLoS Biology, 16, e2004103. 10.1371/journal.pbio.2004103 [DOI] [PMC free article] [PubMed] [Google Scholar]
Doehrmann, O. , & Naumer, M. J. (2008). Semantics and the multisensory brain: How meaning modulates processes of audio‐visual integration. Brain Research, 1242, 136–150. 10.1016/j.brainres.2008.03.071 [DOI] [PubMed] [Google Scholar]
Ehri, L. C. (2005). Learning to read words: Theory, findings, and issues. Scientific Studies of Reading, 9, 167–188. 10.1207/s1532799xssr0902_4 [DOI] [Google Scholar]
Eickhoff, S. B. , Laird, A. R. , Fox, P. M. , Lancaster, J. L. , & Fox, P. T. (2017). Implementation errors in the GingerALE software: Description and recommendations. Human Brain Mapping, 38, 7–11. 10.1002/hbm.23342 [DOI] [PMC free article] [PubMed] [Google Scholar]
Eickhoff, S. B. , Nichols, T. E. , Laird, A. R. , Hoffstaedter, F. , Amunts, K. , Fox, P. T. , Bzdok, D. , & Eickhoff, C. R. (2016). Behavior, sensitivity, and power of activation likelihood estimation characterized by massive empirical simulation. NeuroImage, 137, 70–85. 10.1016/j.neuroimage.2016.04.072 [DOI] [PMC free article] [PubMed] [Google Scholar]
Enge, A. , Abdel Rahman, R. , & Skeide, M. A. (2021). A meta‐analysis of fMRI studies of semantic cognition in children. NeuroImage, 241, 118436. 10.1016/j.neuroimage.2021.118436 [DOI] [PubMed] [Google Scholar]
Erickson, L. C. , Heeg, E. , Rauschecker, J. P. , & Turkeltaub, P. E. (2014). An ALE meta‐analysis on the audiovisual integration of speech signals: ALE meta‐analysis on AV speech integration. Human Brain Mapping, 35, 5587–5605. 10.1002/hbm.22572 [DOI] [PMC free article] [PubMed] [Google Scholar]
Fairhall, S. L. , & Macaluso, E. (2009). Spatial attention can modulate audiovisual integration at multiple cortical and subcortical sites. European Journal of Neuroscience, 29(6), 1247–1257. 10.1111/j.1460-568.2009.06688.x [DOI] [PubMed] [Google Scholar]
Francisco, A. A. , Takashima, A. , McQueen, J. M. , van den Bunt, M. , Jesse, A. , & Groen, M. A. (2018). Adult dyslexic readers benefit less from visual input during audiovisual speech processing: fMRI evidence. Neuropsychologia, 117, 454–471. 10.1016/j.neuropsychologia.2018.07.009 [DOI] [PubMed] [Google Scholar]
Gau, R. , & Noppeney, U. (2016). How prior expectations shape multisensory perception. Neuroimage, 124, 876–886. 10.1016/j.neuroimage.2015.09.045 [DOI] [PubMed] [Google Scholar]
Graves, W. W. , Grabowski, T. J. , Mehta, S. , & Gupta, P. (2008). Left posterior superior temporal gyrus participates specifically in accessing lexical phonology. Journal of Cognitive Neuroscience, 20, 1698–1710. 10.1162/jocn.2008.20113 [DOI] [PMC free article] [PubMed] [Google Scholar]
Hasson, U. , Harel, M. , Levy, I. , & Malach, R. (2003). Large‐scale mirror‐symmetry organization of human occipito‐temporal object areas. Neuron, 37, 1027–1041. 10.1016/S0896-6273(03)00144-2 [DOI] [PubMed] [Google Scholar]
Hein, G. , Doehrmann, O. , Muller, N. G. , Kaiser, J. , Muckli, L. , & Naumer, M. J. (2007). Object familiarity and semantic congruency modulate responses in cortical audiovisual integration areas. The Journal of Neuroscience, 27, 7881–7887. 10.1523/JNEUROSCI.1740-07.2007 [DOI] [PMC free article] [PubMed] [Google Scholar]
Hickok, G. , & Poeppel, D. (2004). Dorsal and ventral streams: A framework for understanding aspects of the functional anatomy of language. Cognition, 92, 67–99. 10.1016/j.cognition.2003.10.011 [DOI] [PubMed] [Google Scholar]
Hocking, J. , & Price, C. J. (2009). Dissociating verbal and nonverbal audiovisual object processing. Brain and Language, 108, 89–96. 10.1016/j.bandl.2008.10.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
Holloway, I. D. , van Atteveldt, N. , Blomert, L. , & Ansari, D. (2015). Orthographic dependency in the neural correlates of reading: Evidence from audiovisual integration in english readers. Cerebral Cortex, 25, 1544–1553. 10.1093/cercor/bht347 [DOI] [PubMed] [Google Scholar]
Howard, M. A. , Volkov, I. O. , Mirsky, R. , Garell, P. C. , Noh, M. D. , Granner, M. , Damasio, H. , Steinschneider, M. , Reale, R. A. , Hind, J. E. , & Brugge, J. F. (2000). Auditory cortex on the human posterior superior temporal gyrus. The Journal of Comparative Neurology, 416, 79–92. [DOI] [PubMed] [Google Scholar]
James, T. W. , Stevenson, R. A. , Kim, S. , VanDerKlok, R. M. , & James, K. H. (2011). Shape from sound: Evidence for a shape operator in the lateral occipital cortex. Neuropsychologia, 49(7), 1807–1815. 10.1016/j.neuropsychologia.2011.03.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
James, T. W. , VanDerKlok, R. M. , Stevenson, R. A. , & James, K. H. (2011). Multisensory perception of action in posterior temporal and parietal cortices. Neuropsychologia, 49(1), 108–114. 10.1016/j.neuropsychologia.2010.10.030 [DOI] [PMC free article] [PubMed] [Google Scholar]
Kassuba, T. , Klinge, C. , Hölig, C. , Menz, M. M. , Ptito, M. , Röder, B. , & Siebner, H. R. (2011). The left fusiform gyrus hosts trisensory representations of manipulable objects. NeuroImage, 56(3), 1566–1577. 10.1016/j.neuroimage.2011.02.032 [DOI] [PubMed] [Google Scholar]
Kopp, F. (2014). Audiovisual temporal fusion in 6‐month‐old infants. Developmental Cognitive Neuroscience, 9, 56–67. 10.1016/j.dcn.2014.01.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
Laing, M. , Rees, A. , & Vuong, Q. C. (2015). Amplitude ‐ modulated stimuli reveal auditory ‐ visual interactions in brain activity and brain connectivity. Frontiers in Psychology, 6. 10.3389/fpsyg.2015.01440 [DOI] [PMC free article] [PubMed] [Google Scholar]
Laurienti, P. J. , Wallace, M. T. , Maldjian, J. A. , Susi, C. M. , Stein, B. E. , & Burdette, J. H. (2003). Cross ‐ modal sensory processing in the anterior cingulate and medial prefrontal cortices. Human Brain Mapping, 19(4), 213–223. 10.1002/hbm.10112 [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee, H. , & Noppeney, U. (2011a). Long ‐ term music training tunes how the brain temporally binds signals from multiple senses. Proceedings of the National Academy of Sciences of the United States of America, 108(51), E1441–E1450. 10.1073/pnas.1115267108 [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee, H. , & Noppeney, U. (2011b). Physical and perceptual factors shape the neural mechanisms that integrate audiovisual signals in speech comprehension. The Journal of Neuroscience, 31(31), 11338–11350. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lewis, R. , & Noppeney, U. (2010). Audiovisual synchrony improves motion discrimination via enhanced connectivity between early visual and auditory areas. Journal of Neuroscience, 30(37), 12329–12339. [DOI] [PMC free article] [PubMed] [Google Scholar]
Love, S. A. , Petrini, K. , Pernet, C. R. , Latinus, M. , & Pollick, F. E. (2018). Overlapping but divergent neural correlates underpinning audiovisual synchrony and temporal order judgments. Frontiers in Human Neuroscience, 12, 274. 10.3389/fnhum.2018.00274 [DOI] [PMC free article] [PubMed] [Google Scholar]
Lüttke, C. S. , Ekman, M. , van Gerven, M. A. J. , & de Lange, F. P. (2016). Preference for audiovisual speech congruency in superior temporal cortex. Journal of Cognitive Neuroscience, 28(1), 1–7. 10.1162/jocn_a_00874 [DOI] [PubMed] [Google Scholar]
Liu, Y. , Dunlap, S. , Fiez, J. , & Perfetti, C. (2007). Evidence for neural accommodation to a writing system following learning. Human Brain Mapping, 28, 1223–1234. 10.1002/hbm.20356 [DOI] [PMC free article] [PubMed] [Google Scholar]
Macaluso, E. , George, N. , Dolan, R. , Spence, C. , & Driver, J. (2004). Spatial and temporal factors during processing of audiovisual speech: A PET study. NeuroImage, 21(2), 725–732. 10.1016/j.neuroimage.2003.09.049 [DOI] [PubMed] [Google Scholar]
Man, K. , Damasio, A. , Meyer, K. , & Kaplan, J. T. (2015). Convergent and invariant object representations for sight, sound, and touch. Human Brain Mapping, 36(9), 3629–3640. 10.1002/hbm.22867 [DOI] [PMC free article] [PubMed] [Google Scholar]
Matchin, W. , Groulx, K. , & Hickok, G. (2014). Audiovisual speech integration does not rely on the motor system: Evidence from articulatory suppression, the McGurk effect, and fMRI. Journal of Cognitive Neuroscience, 26(3), 606–620. 10.1162/jocn_a_00515 [DOI] [PMC free article] [PubMed] [Google Scholar]
Marchant, J. L. , Ruff, C. C. , & Driver, J. (2012). Audiovisual synchrony enhances BOLD responses in a brain network including multisensory STS while also enhancing target‐detection performance for both modalities. Human Brain Mapping, 33, 1212–1224. 10.1002/hbm.21278 [DOI] [PMC free article] [PubMed] [Google Scholar]
McCormick, K. , Lacey, S. , Stilla, R. , Nygaard, L. C. , & Sathian, K. (2018). Neural basis of the crossmodal correspondence between auditory pitch and visuospatial elevation. Neuropsychologia, 112, 19–30. 10.1016/j.neuropsychologia.2018.02.029 [DOI] [PMC free article] [PubMed] [Google Scholar]
Mercure, E. , Bright, P. , Quiroz, I. , & Filippi, R. (2022). Effect of infant bilingualism on audiovisual integration in a McGurk task. Journal of Experimental Child Psychology, 217, 105351. 10.1016/j.jecp.2021.105351 [DOI] [PubMed] [Google Scholar]
Miller, L. M. (2005). Perceptual fusion and stimulus coincidence in the cross‐modal integration of speech. The Journal of Neuroscience, 25, 5884–5893. 10.1523/JNEUROSCI.0896-05.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]
Müller, V. I. , Cieslik, E. C. , Laird, A. R. , Fox, P. T. , Radua, J. , Mataix‐Cols, D. , Tench, C. R. , Yarkoni, T. , Nichols, T. E. , Turkeltaub, P. E. , Wager, T. D. , & Eickhoff, S. B. (2018). Ten simple rules for neuroimaging meta‐analysis. Neuroscience and Biobehavioral Reviews, 84, 151–161. 10.1016/j.neubiorev.2017.11.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
Murray, M. A. (2017). Elementary egyptian grammar. Bernard Quaritch. [Google Scholar]
Naghavi, H. R. , Eriksson, J. , Larsson, A. , & Nyberg, L. (2007). The claustrum/insula region integrates conceptually related sounds and pictures. Neuroscience Letters, 422, 77–80. 10.1016/j.neulet.2007.06.009 [DOI] [PubMed] [Google Scholar]
Nath, A. R. , & Beauchamp, M. S. (2011). Dynamic changes in superior temporal sulcus connectivity during perception of noisy audiovisual speech. The Journal of Neuroscience, 31, 1704–1714. 10.1523/JNEUROSCI.4853-10.2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
Nelson, J. R. , Liu, Y. , Fiez, J. , & Perfetti, C. A. (2009). Assimilation and accommodation patterns in ventral occipitotemporal cortex in learning a second writing system. Human Brain Mapping, 30, 810–820. 10.1002/hbm.20551 [DOI] [PMC free article] [PubMed] [Google Scholar]
Nichols, E. S. , Gao, Y. , Fregni, S. , Liu, L. , & Joanisse, M. F. (2021). Representational dissimilarity of first and second language in the bilingual brain. Human Brain Mapping, 42, 5433–5445. 10.1002/hbm.25633open_in_new [DOI] [PMC free article] [PubMed] [Google Scholar]
Nichols, E. S. , & Joanisse, M. F. (2016). Functional activity and white matter microstructure reveal the independent effects of age of acquisition and proficiency on second‐language learning. NeuroImage, 143, 15–25. 10.1016/j.neuroimage.2016.08.053 [DOI] [PubMed] [Google Scholar]
Noesselt, T. , Rieger, J. W. , Schoenfeld, M. A. , Kanowski, M. , Hinrichs, H. , Heinze, H.‐ J. , Driver, J . (2007). Audiovisual temporal correspondence modulates human multisensory superior temporal sulcus plus primary sensory cortices. Journal of Neuroscience, 27(42), 11431–11441. [DOI] [PMC free article] [PubMed] [Google Scholar]
Noesselt, T. , Tyll, S. , Boehler, C. N. , Budinger, E. , Heinze, H.‐. J. , & Driver, J. (2010). Sound ‐ induced enhancement of low ‐ intensity vision: Multisensory influences on human sensory ‐ specific cortices and thalamic bodies relate to perceptual enhancement of visual detection sensitivity. Journal of Neuroscience, 30(41), 13609–13623. [DOI] [PMC free article] [PubMed] [Google Scholar]
Noesselt, T. , Bergmann, D. , Heinze, H.‐J. , Münte, T. , & Spence, C. (2012). Coding of multisensory temporal patterns in human superior temporal sulcus. Frontiers in Integrative Neuroscience, 6, 64. 10.3389/fnint.2012.00064 [DOI] [PMC free article] [PubMed] [Google Scholar]
Okada, K. , & Hickok, G. (2006). Identification of lexical–phonological networks in the superior temporal sulcus using functional magnetic resonance imaging. Neuroreport, 17, 1293–1296. 10.1097/01.wnr.0000233091.82536.b2 [DOI] [PubMed] [Google Scholar]
Ojanen, V. , Möttönen, R. , Pekkola, J. , Jääskeläinen, I. P. , Joensuu, R. , Autti, T. , & Sams, M. (2005). Processing of audiovisual speech in Broca's area. NeuroImage, 25(2), 333–338. 10.1016/j.neuroimage.2004.12.001 [DOI] [PubMed] [Google Scholar]
Olivetti Belardinelli, M. , Sestieri, C. , Di Matteo, R. , Delogu, F. , Del Gratta, C. , Ferretti, A. , Caulo, M. , Tartaro, A. , & Romani, G. L. (2004). Audio ‐ visual crossmodal interactions in environmental perception: An fMRI investigation. Cognitive Processing, 5(3), 167–174. [DOI] [Google Scholar]
Pekkola, J. , Laasonen, M. , Ojanen, V. , Autti, T. , Jääskeläinen, I. P. , Kujala, T. , & Sams, M. (2006). Perception of matching and conflicting audiovisual speech in dyslexic and fluent readers: An fMRI study at 3 T. NeuroImage, 29, 797–807. 10.1016/j.neuroimage.2005.09.069 [DOI] [PubMed] [Google Scholar]
Perfetti, C. A. , Liu, Y. , Fiez, J. , Nelson, J. , Bolger, D. J. , & Tan, L.‐H. (2007). Reading in two writing systems: Accommodation and assimilation of the brain's reading network. Bilingualism: Language and Cognition, 10, 131–146. 10.1017/S1366728907002891 [DOI] [Google Scholar]
Piaget, J. , & Mays, W. (1972). The principles of genetic epistemology: Selected works (Vol. 7). Routledge & Kegan Paul Ltd. [Google Scholar]
Plank, T. , Rosengarth, K. , Song, W. , Ellermeier, W. , & Greenlee, M. W. (2012). Neural correlates of audio ‐ visual object recognition: Effects of implicit spatial congruency. Human Brain Mapping, 33(4), 797–811. 10.1002/hbm.21254 [DOI] [PMC free article] [PubMed] [Google Scholar]
Porada, D. K. , Regenbogen, C. , Freiherr, J. , Seubert, J. , & Lundström, J. N. (2021). Trimodal processing of complex stimuli in inferior parietal cortex is modality ‐ independent. Cortex, 139, 198–210. 10.1016/j.cortex.2021.03.008 [DOI] [PubMed] [Google Scholar]
Preston, J. L. , Molfese, P. J. , Frost, S. J. , Mencl, W. E. , Fulbright, R. K. , Hoeft, F. , Landi, N. , Shankweiler, D. , & Pugh, K. R. (2016). Print‐speech convergence predicts future reading outcomes in early readers. Psychological Science, 27, 75–84. 10.1177/0956797615611921 [DOI] [PMC free article] [PubMed] [Google Scholar]
Price, C. J. (2012). A review and synthesis of the first 20 years of PET and fMRI studies of heard speech, spoken language and reading. NeuroImage, 62, 816–847. 10.1016/j.neuroimage.2012.04.062 [DOI] [PMC free article] [PubMed] [Google Scholar]
Raij, T. , Uutela, K. , & Hari, R. (2000). Audiovisual integration of letters in the human brain. Neuron, 28, 617–625. 10.1016/S0896-6273(00)00138-0 [DOI] [PubMed] [Google Scholar]
Rueckl, J. G. , Paz‐Alonso, P. M. , Molfese, P. J. , Kuo, W.‐J. , Bick, A. , Frost, S. J. , Hancock, R. , Wu, D. H. , Mencl, W. E. , Dunabeitia, J. A. , Lee, J.‐R. , Oliver, M. , Zevin, J. D. , Hoeft, F. , Carreiras, M. , Tzeng, O. J. L. , Pugh, K. R. , & Frost, R. (2015). Universal brain signature of proficient reading: Evidence from four contrasting languages. Proceedings of the National Academy of Sciences of the United States of America, 112, 15510–15515. 10.1073/pnas.1509321112 [DOI] [PMC free article] [PubMed] [Google Scholar]
Schönwiesner, M. , Novitski, N. , Pakarinen, S. , Carlson, S. , Tervaniemi, M. , & Näätänen, R. (2007). Heschl's gyrus, posterior superior temporal gyrus, and mid‐ventrolateral prefrontal cortex have different roles in the detection of acoustic changes. Journal of Neurophysiology, 97, 2075–2082. 10.1152/jn.01083.2006 [DOI] [PubMed] [Google Scholar]
Sestieri, C. , Di Matteo, R. , Ferretti, A. , Del Gratta, C. , Caulo, M. , Tartaro, A. , Olivetti Belardinelli, M. , & Romani, G. L. (2006). “What” versus “where” in the audiovisual domain: An fMRI study. NeuroImage, 33, 672–680. 10.1016/j.neuroimage.2006.06.045 [DOI] [PubMed] [Google Scholar]
Shinozaki, J. , Hiroe, N. , Sato, M. , Nagamine, T. , & Sekiyama, K. (2016). Impact of language on functional connectivity for audiovisual speech integration. Scientific Reports, 6, 31388. 10.1038/srep31388 [DOI] [PMC free article] [PubMed] [Google Scholar]
Skipper, J. I. , van Wassenhove, V. , Nusbaum, H. C. , & Small, S. L. (2007). Hearing lips and seeing voices: how cortical areas supporting speech production mediate audiovisual speech perception. Cerebral Cortex, 17(10), 2387–2399. 10.1093/cercor/bhl147 [DOI] [PMC free article] [PubMed] [Google Scholar]
Szycik, G. R. , Jansma, H. , & Münte, T. F. (2009). Audiovisual integration during speech comprehension: An fMRI study comparing ROI ‐ based and whole brain analyses. Human Brain Mapping, 30(7), 1990–1999. 10.1002/hbm.20640 [DOI] [PMC free article] [PubMed] [Google Scholar]
Szycik, G. R. , Münte, T. F. , Dillo, W. , Mohammadi, B. , Samii, A. , Emrich, H. M. , & Dietrich, D. E. (2009). Audiovisual integration of speech is disturbed in schizophrenia: An fMRI study. Schizophrenia Research, 110(1–3), 111–118. 10.1016/j.schres.2009.03.003 [DOI] [PubMed] [Google Scholar]
Szycik, G. R. , Tausche, P. , & Münte, T. F. (2008). A novel approach to study audiovisual integration in speech perception: Localizer fMRI and sparse sampling. Brain Research, 1220, 142–149. 10.1016/j.brainres.2007.08.027 [DOI] [PubMed] [Google Scholar]
Starke, J. , Ball, F. , Heinze, H. , & Noesselt, T. (2020). The spatio‐temporal profile of multisensory integration. The European Journal of Neuroscience, 51, 1210–1223. 10.1111/ejn.13753 [DOI] [PubMed] [Google Scholar]
Stevenson, R. A. , & James, T. W. (2009). Audiovisual integration in human superior temporal sulcus: Inverse effectiveness and the neural processing of speech and object recognition. NeuroImage, 44, 1210–1223. 10.1016/j.neuroimage.2008.09.034 [DOI] [PubMed] [Google Scholar]
Stevenson, R. A. , VanDerKlok, R. M. , Pisoni, D. B. , & James, T. W. (2011). Discrete neural substrates underlie complementary audiovisual speech integration processes. NeuroImage, 55, 1339–1345. 10.1016/j.neuroimage.2010.12.063 [DOI] [PMC free article] [PubMed] [Google Scholar]
Szycik, G. R. , Stadler, J. , Tempelmann, C. , & Muente, T. F. (2012). Examining the McGurk illusion using high‐field 7 Tesla functional MRI. Frontiers in Human Neuroscience, 6, 95. 10.3389/fnhum.2012.00095 [DOI] [PMC free article] [PubMed] [Google Scholar]
Tan, L. , Spinks, J. , Feng, C. , Siok, W. , Perfetti, C. , Xiong, J. , Fox, P. , & Gao, J. (2003). Neural systems of second language reading are shaped by native language. Human Brain Mapping, 18, 158–166. 10.1002/hbm.10089 [DOI] [PMC free article] [PubMed] [Google Scholar]
Tietze, F.‐A. , Hundertmark, L. , Roy, M. , Zerr, M. , Sinke, C. , Wiswede, D. , Walter, M. , Münte, T. F. , & Szycik, G. R . (2019). Auditory deficits in audiovisual speech perception in adult Asperger's syndrome: fMRI study. Frontiers in Psychology, 10, 2286. 10.3389/fpsyg.2019.02286 [DOI] [PMC free article] [PubMed] [Google Scholar]
Treille, A. , Vilain, C. , Hueber, T. , Lamalle, L. , & Sato, M. (2017). Inside speech: Multisensory and modality ‐ specific processing of tongue and lip speech actions. Journal of Cognitive Neuroscience, 29(3), 448–466. 10.1162/jocn_a_01057 [DOI] [PubMed] [Google Scholar]
Turkeltaub, P. E. , Eickhoff, S. B. , Laird, A. R. , Fox, M. , Wiener, M. , & Fox, P. (2012). Minimizing within‐experiment and within‐group effects in activation likelihood estimation meta‐analyses. Human Brain Mapping, 33, 1–13. 10.1002/hbm.21186 [DOI] [PMC free article] [PubMed] [Google Scholar]
Ujiie, Y. , Yamashita, W. , Fujisaki, W. , Kanazawa, S. , & Yamaguchi, M. K. (2018). Crossmodal association of auditory and visual material properties in infants. Scientific Reports, 8, 9301. 10.1038/s41598-018-27153-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
van Atteveldt, N. , Formisano, E. , Goebel, R. , & Blomert, L. (2004). Integration of letters and speech sounds in the human brain. Neuron, 43, 271–282. 10.1016/j.neuron.2004.06.025 [DOI] [PubMed] [Google Scholar]
van Atteveldt, N. M. , Blau, V. C. , Blomert, L. , & Goebel, R. (2010). fMR‐adaptation indicates selectivity to audiovisual content congruency in distributed clusters in human superior temporal cortex. BMC Neuroscience, 11, 11. 10.1186/1471-2202-11-11 [DOI] [PMC free article] [PubMed] [Google Scholar]
van Atteveldt, N. M. , Formisano, E. , Blomert, L. , & Goebel, R. (2006). The effect of temporal asynchrony on the multisensory integration of letters and speech sounds. Cerebral Cortex, 17, 962–974. 10.1093/cercor/bhl007 [DOI] [PubMed] [Google Scholar]
van Atteveldt, N. M. , Formisano, E. , Goebel, R. , & Blomert, L. (2007). Top–down task effects overrule automatic multisensory responses to letter–sound pairs in auditory association cortex. NeuroImage, 36, 1345–1360. 10.1016/j.neuroimage.2007.03.065 [DOI] [PubMed] [Google Scholar]
van der Linden, M. , van Turennout, M. , & Fernandez, G. (2011). Category training induces cross ‐ modal object representations in the adult human brain. Journal of Cognitive Neuroscience, 23(6), 1315–1331. 10.1162/jocn.2010.21522 [DOI] [PubMed] [Google Scholar]
Venezia, J. H. , Vaden, K. I., Jr. , Rong, F. , Maddox, D. , Saberi, K. , & Hickok, G. (2017). Auditory, visual and audiovisual speech processing streams in superior temporal sulcus. Frontiers in Human Neuroscience, 11. 10.3389/fnhum.2017.00174 [DOI] [PMC free article] [PubMed] [Google Scholar]
Wagner, A. D. , Paré‐Blagoev, E. J. , Clark, J. , & Poldrack, R. A. (2001). Recovering meaning. Neuron, 31, 329–338. 10.1016/S0896-6273(01)00359-2 [DOI] [PubMed] [Google Scholar]
Werner, S. , & Noppeney, U. (2011). The contributions of transient and sustained response codes to audiovisual integration. Cerebral Cortex, 21, 920–931. 10.1093/cercor/bhq161 [DOI] [PubMed] [Google Scholar]
Wiersinga‐ Post, E. , Tomaskovic, S. , Slabu, L. , Renken, R. , de Smit, F. , & Duifhuis, H. (2010). Decreased BOLD responses in audiovisual processing. NeuroReport, 21(18), 1146–1151. 10.1097/WNR.0b013e328340cc47 [DOI] [PubMed] [Google Scholar]
Ye, Z. , Rüsseler, J. , Gerth, I. , & Münte, T. F. (2017). Audiovisual speech integration in the superior temporal region is dysfunctional in dyslexia. Neuroscience, 356, 1–10. 10.1016/j.neuroscience.2017.05.017 [DOI] [PubMed] [Google Scholar]
Yi, H. G. , Leonard, M. K. , & Chang, E. F. (2019). The encoding of speech sounds in the superior temporal gyrus. Neuron, 102, 1096–1110. 10.1016/j.neuron.2019.04.023 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Table S1. Combination of search terms used in literature retrieval.

Table S2. Criteria that contribute to each of the convergence clusters.

HBM-45-e26713-s001.docx^{(48.8KB, docx)}

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

[hbm26713-bib-0001] Anderson, M. L. (2010). Neural reuse: A fundamental organizational principle of the brain. The Behavioral and Brain Sciences, 33, 245–266. 10.1017/S0140525X10000853 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0002] Anderson, M. L. (2021). After phrenology: Neural reuse and the interactive brain. MIT Press. [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0003] Aparicio, M. , Peigneux, P. , Charlier, B. , Balériaux, D. , Kavec, M. , & Leybaert, J. (2017). The neural basis of speech perception through lipreading and manual cues: Evidence from deaf native users of cued speech. Frontiers in Psychology, 8, 426. 10.3389/fpsyg.2017.00426 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0004] Badre, D. , & Wagner, A. D. (2007). Left ventrolateral prefrontal cortex and the cognitive control of memory. Neuropsychologia, 45, 2883–2901. 10.1016/j.neuropsychologia.2007.06.015 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0005] Barros‐Loscertales, A. , Ventura‐Campos, N. , Visser, M. , Alsius, A. , Pallier, C. , Avila Rivera, C. , & Soto‐Faraco, S. (2013). Neural correlates of audiovisual speech processing in a second language. Brain and Language, 126, 253–262. 10.1016/j.bandl.2013.05.009 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0082] Baumann, O. , Vromen, J. M. G. , Cheung, A. , McFadyen, J. , Ren, Y. , & Guo, C. C. (2018). Neural correlates of temporal complexity and synchrony during audiovisual correspondence detection. Eneuro, 5(1), e0294‐17.2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0006] Beauchamp, M. S. , Lee, K. E. , Argall, B. D. , & Martin, A. (2004). Integration of auditory and visual information about objects in superior temporal sulcus. Neuron, 41, 809–823. 10.1016/S0896-6273(04)00070-4 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0123] Benoit, M. M. , Raij, T. , Lin, F.‐.H. , Jääskeläinen, I. P. , & Stufflebeam, S. (2009). Primary and multisensory cortical activity is correlated with audiovisual percepts. Human Brain Mapping, 31, 526–538. 10.1002/hbm.20884 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0007] Blau, V. , Reithler, J. , van Atteveldt, N. , Seitz, J. , Gerretsen, P. , Goebel, R. , & Blomert, L. (2010). Deviant processing of letters and speech sounds as proximate cause of reading failure: A functional magnetic resonance imaging study of dyslexic children. Brain, 133, 868–879. 10.1093/brain/awp308 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0008] Blau, V. , van Atteveldt, N. , Ekkebus, M. , Goebel, R. , & Blomert, L. (2009). Reduced neural integration of letters and speech sounds links phonological and reading deficits in adult dyslexia. Current Biology, 19, 503–508. 10.1016/j.cub.2009.01.065 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0009] Blau, V. , van Atteveldt, N. , Formisano, E. , Goebel, R. , & Blomert, L. (2008). Task‐irrelevant visual letters interact with the processing of speech sounds in heteromodal and unimodal cortex. The European Journal of Neuroscience, 28, 500–509. 10.1111/j.1460-9568.2008.06350.x [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0010] Bossaerts, P. (2010). Risk and risk prediction error signals in anterior insula. Brain Structure & Function, 214, 645–653. 10.1007/s00429-010-0253-1 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0011] Botvinick, M. M. , Cohen, J. D. , & Carter, C. S. (2004). Conflict monitoring and anterior cingulate cortex: An update. Trends in Cognitive Sciences, 8, 539–546. 10.1016/j.tics.2004.10.003 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0012] Bouhali, F. , Thiebaut de Schotten, M. , Pinel, P. , Poupon, C. , Mangin, J.‐F. , Dehaene, S. , & Cohen, L. (2014). Anatomical connections of the visual word form area. The Journal of Neuroscience, 34, 15402–15414. 10.1523/JNEUROSCI.4918-13.2014 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0013] Brem, S. , Bach, S. , Kucian, K. , Guttorm, T. K. , Martin, E. , Lyytinen, H. , Brandeis, D. , & Richardson, U. (2010). Brain sensitivity to print emerges when children learn letter‐speech sound correspondences. Proceedings of the National Academy of Sciences of the United States of America, 107, 7939–7944. 10.1073/pnas.0904402107 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0014] Brown, J. W. , & Braver, T. S. (2007). Risk prediction and aversion by anterior cingulate cortex. Cognitive, Affective, & Behavioral Neuroscience, 7, 266–277. 10.3758/CABN.7.4.266 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0084] Bushara, K. O. , Hanakawa, T. , Immisch, I. , Toma, K. , Kansaku, K. , & Hallett, M. (2003). Neural correlates of cross ‐ modal binding. Nature Neuroscience, 6(2), 190–195. 10.1038/nn993 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0085] Butler, A. J. , & James, K. H. (2013). Active Learning of novel sound ‐ producing objects: Motor reactivation and enhancement of visuo ‐ motor connectivity. Journal of Cognitive Neuroscience, 25(2), 203–218. 10.1162/jocn_a_00284 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0086] Butler, A. J. , James, T. W. , & James, K. H. (2011). Enhanced multisensory integration and motor reactivation after active motor learning of audiovisual associations. Journal of Cognitive Neuroscience, 23(11), 3515–3528. 10.1162/jocn_a_00015 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0015] Buchsbaum, B. R. , Hickok, G. , & Humphries, C. (2001). Role of left posterior superior temporal gyrus in phonological processing for speech perception and production. Cognitive Science, 25, 663–678. 10.1207/s15516709cog2505_2 [DOI] [Google Scholar]

[hbm26713-bib-0016] Bushara, K. O. , Grafman, J. , & Hallett, M. (2001). Neural correlates of auditory‐visual stimulus onset asynchrony detection. The Journal of Neuroscience, 21, 300–304. 10.1523/JNEUROSCI.21-01-00300.2001 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0017] Button, K. S. , Ioannidis, J. P. A. , Mokrysz, C. , Nosek, B. A. , Flint, J. , Robinson, E. S. J. , & Munafò, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365–376. 10.1038/nrn3475 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0121] Callan, D. , Jones, J. , Munhall, K. , Callan, A. , Kroos, C. , & Vatikiotis ‐ Bateson, E. (2003). Neural processes underlying perceptual enhancement by visual speech gestures. Neuroreport, 14(17), 2213–2218. [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0018] Calvert, G. A. , Campbell, R. , & Brammer, M. J. (2000). Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Current Biology, 10, 649–657. 10.1016/S0960-9822(00)00513-3 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0019] Cao, F. , Tao, R. , Liu, L. , Perfetti, C. A. , & Booth, J. R. (2013). High proficiency in a second language is characterized by greater involvement of the first language network: Evidence from chinese learners of english. Journal of Cognitive Neuroscience, 25, 1649–1663. 10.1162/jocn_a_00414 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0020] Choi, Y.‐H. , Park, H. K. , & Paik, N.‐J. (2015). Role of the posterior temporal lobe during language tasks: A virtual lesion study using repetitive transcranial magnetic stimulation. Neuroreport, 26, 314–319. 10.1097/WNR.0000000000000339 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0021] Chyl, K. , Kossowski, B. , Debska, A. , Luniewska, M. , Banaszkiewicz, A. , Zelechowska, A. , Frost, S. J. , Mencl, W. E. , Wypych, M. , Marchewka, A. , Pugh, K. R. , & Jednorog, K. (2018). Prereader to beginning reader: Changes induced by reading acquisition in print and speech brain networks. Journal of Child Psychology and Psychiatry, 59, 76–87. 10.1111/jcpp.12774 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0022] David, S. P. , Ware, J. J. , Chu, I. M. , Loftus, P. D. , Fusar‐Poli, P. , Radua, J. , Munafò, M. R. , & Ioannidis, J. P. A. (2013). Potential reporting bias in fMRI studies of the brain. PLoS One, 8, e70104. 10.1371/journal.pone.0070104 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0023] Dehaene, S. , & Cohen, L. (2007). Cultural recycling of cortical maps. Neuron, 56, 384–398. 10.1016/j.neuron.2007.10.004 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0024] Dehaene, S. , Cohen, L. , Morais, J. , & Kolinsky, R. (2015). Illiterate to literate: Behavioural and cerebral changes induced by reading acquisition. Nature Reviews Neuroscience, 16, 234–244. 10.1038/nrn3924 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0025] Dehaene, S. , Pegado, F. , Braga, L. W. , Ventura, P. , Filho, G. N. , Jobert, A. , Dehaene‐Lambertz, G. , Kolinsky, R. , Morais, J. , & Cohen, L. (2010). How learning to read changes the cortical networks for vision and language. Science, 330, 1359–1364. 10.1126/science.1194140 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0026] Dehaene‐Lambertz, G. , Monzalvo, K. , & Dehaene, S. (2018). The emergence of the visual word form: Longitudinal evolution of category‐specific ventral visual areas during reading acquisition. PLoS Biology, 16, e2004103. 10.1371/journal.pbio.2004103 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0027] Doehrmann, O. , & Naumer, M. J. (2008). Semantics and the multisensory brain: How meaning modulates processes of audio‐visual integration. Brain Research, 1242, 136–150. 10.1016/j.brainres.2008.03.071 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0028] Ehri, L. C. (2005). Learning to read words: Theory, findings, and issues. Scientific Studies of Reading, 9, 167–188. 10.1207/s1532799xssr0902_4 [DOI] [Google Scholar]

[hbm26713-bib-0029] Eickhoff, S. B. , Laird, A. R. , Fox, P. M. , Lancaster, J. L. , & Fox, P. T. (2017). Implementation errors in the GingerALE software: Description and recommendations. Human Brain Mapping, 38, 7–11. 10.1002/hbm.23342 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0030] Eickhoff, S. B. , Nichols, T. E. , Laird, A. R. , Hoffstaedter, F. , Amunts, K. , Fox, P. T. , Bzdok, D. , & Eickhoff, C. R. (2016). Behavior, sensitivity, and power of activation likelihood estimation characterized by massive empirical simulation. NeuroImage, 137, 70–85. 10.1016/j.neuroimage.2016.04.072 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0031] Enge, A. , Abdel Rahman, R. , & Skeide, M. A. (2021). A meta‐analysis of fMRI studies of semantic cognition in children. NeuroImage, 241, 118436. 10.1016/j.neuroimage.2021.118436 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0032] Erickson, L. C. , Heeg, E. , Rauschecker, J. P. , & Turkeltaub, P. E. (2014). An ALE meta‐analysis on the audiovisual integration of speech signals: ALE meta‐analysis on AV speech integration. Human Brain Mapping, 35, 5587–5605. 10.1002/hbm.22572 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0088] Fairhall, S. L. , & Macaluso, E. (2009). Spatial attention can modulate audiovisual integration at multiple cortical and subcortical sites. European Journal of Neuroscience, 29(6), 1247–1257. 10.1111/j.1460-568.2009.06688.x [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0033] Francisco, A. A. , Takashima, A. , McQueen, J. M. , van den Bunt, M. , Jesse, A. , & Groen, M. A. (2018). Adult dyslexic readers benefit less from visual input during audiovisual speech processing: fMRI evidence. Neuropsychologia, 117, 454–471. 10.1016/j.neuropsychologia.2018.07.009 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0122] Gau, R. , & Noppeney, U. (2016). How prior expectations shape multisensory perception. Neuroimage, 124, 876–886. 10.1016/j.neuroimage.2015.09.045 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0034] Graves, W. W. , Grabowski, T. J. , Mehta, S. , & Gupta, P. (2008). Left posterior superior temporal gyrus participates specifically in accessing lexical phonology. Journal of Cognitive Neuroscience, 20, 1698–1710. 10.1162/jocn.2008.20113 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0035] Hasson, U. , Harel, M. , Levy, I. , & Malach, R. (2003). Large‐scale mirror‐symmetry organization of human occipito‐temporal object areas. Neuron, 37, 1027–1041. 10.1016/S0896-6273(03)00144-2 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0036] Hein, G. , Doehrmann, O. , Muller, N. G. , Kaiser, J. , Muckli, L. , & Naumer, M. J. (2007). Object familiarity and semantic congruency modulate responses in cortical audiovisual integration areas. The Journal of Neuroscience, 27, 7881–7887. 10.1523/JNEUROSCI.1740-07.2007 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0037] Hickok, G. , & Poeppel, D. (2004). Dorsal and ventral streams: A framework for understanding aspects of the functional anatomy of language. Cognition, 92, 67–99. 10.1016/j.cognition.2003.10.011 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0038] Hocking, J. , & Price, C. J. (2009). Dissociating verbal and nonverbal audiovisual object processing. Brain and Language, 108, 89–96. 10.1016/j.bandl.2008.10.005 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0039] Holloway, I. D. , van Atteveldt, N. , Blomert, L. , & Ansari, D. (2015). Orthographic dependency in the neural correlates of reading: Evidence from audiovisual integration in english readers. Cerebral Cortex, 25, 1544–1553. 10.1093/cercor/bht347 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0040] Howard, M. A. , Volkov, I. O. , Mirsky, R. , Garell, P. C. , Noh, M. D. , Granner, M. , Damasio, H. , Steinschneider, M. , Reale, R. A. , Hind, J. E. , & Brugge, J. F. (2000). Auditory cortex on the human posterior superior temporal gyrus. The Journal of Comparative Neurology, 416, 79–92. [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0090] James, T. W. , Stevenson, R. A. , Kim, S. , VanDerKlok, R. M. , & James, K. H. (2011). Shape from sound: Evidence for a shape operator in the lateral occipital cortex. Neuropsychologia, 49(7), 1807–1815. 10.1016/j.neuropsychologia.2011.03.004 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0091] James, T. W. , VanDerKlok, R. M. , Stevenson, R. A. , & James, K. H. (2011). Multisensory perception of action in posterior temporal and parietal cortices. Neuropsychologia, 49(1), 108–114. 10.1016/j.neuropsychologia.2010.10.030 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0092] Kassuba, T. , Klinge, C. , Hölig, C. , Menz, M. M. , Ptito, M. , Röder, B. , & Siebner, H. R. (2011). The left fusiform gyrus hosts trisensory representations of manipulable objects. NeuroImage, 56(3), 1566–1577. 10.1016/j.neuroimage.2011.02.032 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0041] Kopp, F. (2014). Audiovisual temporal fusion in 6‐month‐old infants. Developmental Cognitive Neuroscience, 9, 56–67. 10.1016/j.dcn.2014.01.001 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0093] Laing, M. , Rees, A. , & Vuong, Q. C. (2015). Amplitude ‐ modulated stimuli reveal auditory ‐ visual interactions in brain activity and brain connectivity. Frontiers in Psychology, 6. 10.3389/fpsyg.2015.01440 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0094] Laurienti, P. J. , Wallace, M. T. , Maldjian, J. A. , Susi, C. M. , Stein, B. E. , & Burdette, J. H. (2003). Cross ‐ modal sensory processing in the anterior cingulate and medial prefrontal cortices. Human Brain Mapping, 19(4), 213–223. 10.1002/hbm.10112 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0095] Lee, H. , & Noppeney, U. (2011a). Long ‐ term music training tunes how the brain temporally binds signals from multiple senses. Proceedings of the National Academy of Sciences of the United States of America, 108(51), E1441–E1450. 10.1073/pnas.1115267108 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0096] Lee, H. , & Noppeney, U. (2011b). Physical and perceptual factors shape the neural mechanisms that integrate audiovisual signals in speech comprehension. The Journal of Neuroscience, 31(31), 11338–11350. [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0097] Lewis, R. , & Noppeney, U. (2010). Audiovisual synchrony improves motion discrimination via enhanced connectivity between early visual and auditory areas. Journal of Neuroscience, 30(37), 12329–12339. [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0098] Love, S. A. , Petrini, K. , Pernet, C. R. , Latinus, M. , & Pollick, F. E. (2018). Overlapping but divergent neural correlates underpinning audiovisual synchrony and temporal order judgments. Frontiers in Human Neuroscience, 12, 274. 10.3389/fnhum.2018.00274 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0099] Lüttke, C. S. , Ekman, M. , van Gerven, M. A. J. , & de Lange, F. P. (2016). Preference for audiovisual speech congruency in superior temporal cortex. Journal of Cognitive Neuroscience, 28(1), 1–7. 10.1162/jocn_a_00874 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0042] Liu, Y. , Dunlap, S. , Fiez, J. , & Perfetti, C. (2007). Evidence for neural accommodation to a writing system following learning. Human Brain Mapping, 28, 1223–1234. 10.1002/hbm.20356 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0100] Macaluso, E. , George, N. , Dolan, R. , Spence, C. , & Driver, J. (2004). Spatial and temporal factors during processing of audiovisual speech: A PET study. NeuroImage, 21(2), 725–732. 10.1016/j.neuroimage.2003.09.049 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0101] Man, K. , Damasio, A. , Meyer, K. , & Kaplan, J. T. (2015). Convergent and invariant object representations for sight, sound, and touch. Human Brain Mapping, 36(9), 3629–3640. 10.1002/hbm.22867 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0104] Matchin, W. , Groulx, K. , & Hickok, G. (2014). Audiovisual speech integration does not rely on the motor system: Evidence from articulatory suppression, the McGurk effect, and fMRI. Journal of Cognitive Neuroscience, 26(3), 606–620. 10.1162/jocn_a_00515 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0043] Marchant, J. L. , Ruff, C. C. , & Driver, J. (2012). Audiovisual synchrony enhances BOLD responses in a brain network including multisensory STS while also enhancing target‐detection performance for both modalities. Human Brain Mapping, 33, 1212–1224. 10.1002/hbm.21278 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0103] McCormick, K. , Lacey, S. , Stilla, R. , Nygaard, L. C. , & Sathian, K. (2018). Neural basis of the crossmodal correspondence between auditory pitch and visuospatial elevation. Neuropsychologia, 112, 19–30. 10.1016/j.neuropsychologia.2018.02.029 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0044] Mercure, E. , Bright, P. , Quiroz, I. , & Filippi, R. (2022). Effect of infant bilingualism on audiovisual integration in a McGurk task. Journal of Experimental Child Psychology, 217, 105351. 10.1016/j.jecp.2021.105351 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0045] Miller, L. M. (2005). Perceptual fusion and stimulus coincidence in the cross‐modal integration of speech. The Journal of Neuroscience, 25, 5884–5893. 10.1523/JNEUROSCI.0896-05.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0046] Müller, V. I. , Cieslik, E. C. , Laird, A. R. , Fox, P. T. , Radua, J. , Mataix‐Cols, D. , Tench, C. R. , Yarkoni, T. , Nichols, T. E. , Turkeltaub, P. E. , Wager, T. D. , & Eickhoff, S. B. (2018). Ten simple rules for neuroimaging meta‐analysis. Neuroscience and Biobehavioral Reviews, 84, 151–161. 10.1016/j.neubiorev.2017.11.012 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0047] Murray, M. A. (2017). Elementary egyptian grammar. Bernard Quaritch. [Google Scholar]

[hbm26713-bib-0048] Naghavi, H. R. , Eriksson, J. , Larsson, A. , & Nyberg, L. (2007). The claustrum/insula region integrates conceptually related sounds and pictures. Neuroscience Letters, 422, 77–80. 10.1016/j.neulet.2007.06.009 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0049] Nath, A. R. , & Beauchamp, M. S. (2011). Dynamic changes in superior temporal sulcus connectivity during perception of noisy audiovisual speech. The Journal of Neuroscience, 31, 1704–1714. 10.1523/JNEUROSCI.4853-10.2011 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0050] Nelson, J. R. , Liu, Y. , Fiez, J. , & Perfetti, C. A. (2009). Assimilation and accommodation patterns in ventral occipitotemporal cortex in learning a second writing system. Human Brain Mapping, 30, 810–820. 10.1002/hbm.20551 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0051] Nichols, E. S. , Gao, Y. , Fregni, S. , Liu, L. , & Joanisse, M. F. (2021). Representational dissimilarity of first and second language in the bilingual brain. Human Brain Mapping, 42, 5433–5445. 10.1002/hbm.25633open_in_new [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0052] Nichols, E. S. , & Joanisse, M. F. (2016). Functional activity and white matter microstructure reveal the independent effects of age of acquisition and proficiency on second‐language learning. NeuroImage, 143, 15–25. 10.1016/j.neuroimage.2016.08.053 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0105] Noesselt, T. , Rieger, J. W. , Schoenfeld, M. A. , Kanowski, M. , Hinrichs, H. , Heinze, H.‐ J. , Driver, J . (2007). Audiovisual temporal correspondence modulates human multisensory superior temporal sulcus plus primary sensory cortices. Journal of Neuroscience, 27(42), 11431–11441. [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0106] Noesselt, T. , Tyll, S. , Boehler, C. N. , Budinger, E. , Heinze, H.‐. J. , & Driver, J. (2010). Sound ‐ induced enhancement of low ‐ intensity vision: Multisensory influences on human sensory ‐ specific cortices and thalamic bodies relate to perceptual enhancement of visual detection sensitivity. Journal of Neuroscience, 30(41), 13609–13623. [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0053] Noesselt, T. , Bergmann, D. , Heinze, H.‐J. , Münte, T. , & Spence, C. (2012). Coding of multisensory temporal patterns in human superior temporal sulcus. Frontiers in Integrative Neuroscience, 6, 64. 10.3389/fnint.2012.00064 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0054] Okada, K. , & Hickok, G. (2006). Identification of lexical–phonological networks in the superior temporal sulcus using functional magnetic resonance imaging. Neuroreport, 17, 1293–1296. 10.1097/01.wnr.0000233091.82536.b2 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0107] Ojanen, V. , Möttönen, R. , Pekkola, J. , Jääskeläinen, I. P. , Joensuu, R. , Autti, T. , & Sams, M. (2005). Processing of audiovisual speech in Broca's area. NeuroImage, 25(2), 333–338. 10.1016/j.neuroimage.2004.12.001 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0108] Olivetti Belardinelli, M. , Sestieri, C. , Di Matteo, R. , Delogu, F. , Del Gratta, C. , Ferretti, A. , Caulo, M. , Tartaro, A. , & Romani, G. L. (2004). Audio ‐ visual crossmodal interactions in environmental perception: An fMRI investigation. Cognitive Processing, 5(3), 167–174. [DOI] [Google Scholar]

[hbm26713-bib-0055] Pekkola, J. , Laasonen, M. , Ojanen, V. , Autti, T. , Jääskeläinen, I. P. , Kujala, T. , & Sams, M. (2006). Perception of matching and conflicting audiovisual speech in dyslexic and fluent readers: An fMRI study at 3 T. NeuroImage, 29, 797–807. 10.1016/j.neuroimage.2005.09.069 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0056] Perfetti, C. A. , Liu, Y. , Fiez, J. , Nelson, J. , Bolger, D. J. , & Tan, L.‐H. (2007). Reading in two writing systems: Accommodation and assimilation of the brain's reading network. Bilingualism: Language and Cognition, 10, 131–146. 10.1017/S1366728907002891 [DOI] [Google Scholar]

[hbm26713-bib-0058] Piaget, J. , & Mays, W. (1972). The principles of genetic epistemology: Selected works (Vol. 7). Routledge & Kegan Paul Ltd. [Google Scholar]

[hbm26713-bib-0109] Plank, T. , Rosengarth, K. , Song, W. , Ellermeier, W. , & Greenlee, M. W. (2012). Neural correlates of audio ‐ visual object recognition: Effects of implicit spatial congruency. Human Brain Mapping, 33(4), 797–811. 10.1002/hbm.21254 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0110] Porada, D. K. , Regenbogen, C. , Freiherr, J. , Seubert, J. , & Lundström, J. N. (2021). Trimodal processing of complex stimuli in inferior parietal cortex is modality ‐ independent. Cortex, 139, 198–210. 10.1016/j.cortex.2021.03.008 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0059] Preston, J. L. , Molfese, P. J. , Frost, S. J. , Mencl, W. E. , Fulbright, R. K. , Hoeft, F. , Landi, N. , Shankweiler, D. , & Pugh, K. R. (2016). Print‐speech convergence predicts future reading outcomes in early readers. Psychological Science, 27, 75–84. 10.1177/0956797615611921 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0060] Price, C. J. (2012). A review and synthesis of the first 20 years of PET and fMRI studies of heard speech, spoken language and reading. NeuroImage, 62, 816–847. 10.1016/j.neuroimage.2012.04.062 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0061] Raij, T. , Uutela, K. , & Hari, R. (2000). Audiovisual integration of letters in the human brain. Neuron, 28, 617–625. 10.1016/S0896-6273(00)00138-0 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0062] Rueckl, J. G. , Paz‐Alonso, P. M. , Molfese, P. J. , Kuo, W.‐J. , Bick, A. , Frost, S. J. , Hancock, R. , Wu, D. H. , Mencl, W. E. , Dunabeitia, J. A. , Lee, J.‐R. , Oliver, M. , Zevin, J. D. , Hoeft, F. , Carreiras, M. , Tzeng, O. J. L. , Pugh, K. R. , & Frost, R. (2015). Universal brain signature of proficient reading: Evidence from four contrasting languages. Proceedings of the National Academy of Sciences of the United States of America, 112, 15510–15515. 10.1073/pnas.1509321112 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0063] Schönwiesner, M. , Novitski, N. , Pakarinen, S. , Carlson, S. , Tervaniemi, M. , & Näätänen, R. (2007). Heschl's gyrus, posterior superior temporal gyrus, and mid‐ventrolateral prefrontal cortex have different roles in the detection of acoustic changes. Journal of Neurophysiology, 97, 2075–2082. 10.1152/jn.01083.2006 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0064] Sestieri, C. , Di Matteo, R. , Ferretti, A. , Del Gratta, C. , Caulo, M. , Tartaro, A. , Olivetti Belardinelli, M. , & Romani, G. L. (2006). “What” versus “where” in the audiovisual domain: An fMRI study. NeuroImage, 33, 672–680. 10.1016/j.neuroimage.2006.06.045 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0065] Shinozaki, J. , Hiroe, N. , Sato, M. , Nagamine, T. , & Sekiyama, K. (2016). Impact of language on functional connectivity for audiovisual speech integration. Scientific Reports, 6, 31388. 10.1038/srep31388 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0111] Skipper, J. I. , van Wassenhove, V. , Nusbaum, H. C. , & Small, S. L. (2007). Hearing lips and seeing voices: how cortical areas supporting speech production mediate audiovisual speech perception. Cerebral Cortex, 17(10), 2387–2399. 10.1093/cercor/bhl147 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0112] Szycik, G. R. , Jansma, H. , & Münte, T. F. (2009). Audiovisual integration during speech comprehension: An fMRI study comparing ROI ‐ based and whole brain analyses. Human Brain Mapping, 30(7), 1990–1999. 10.1002/hbm.20640 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0113] Szycik, G. R. , Münte, T. F. , Dillo, W. , Mohammadi, B. , Samii, A. , Emrich, H. M. , & Dietrich, D. E. (2009). Audiovisual integration of speech is disturbed in schizophrenia: An fMRI study. Schizophrenia Research, 110(1–3), 111–118. 10.1016/j.schres.2009.03.003 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0114] Szycik, G. R. , Tausche, P. , & Münte, T. F. (2008). A novel approach to study audiovisual integration in speech perception: Localizer fMRI and sparse sampling. Brain Research, 1220, 142–149. 10.1016/j.brainres.2007.08.027 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0066] Starke, J. , Ball, F. , Heinze, H. , & Noesselt, T. (2020). The spatio‐temporal profile of multisensory integration. The European Journal of Neuroscience, 51, 1210–1223. 10.1111/ejn.13753 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0067] Stevenson, R. A. , & James, T. W. (2009). Audiovisual integration in human superior temporal sulcus: Inverse effectiveness and the neural processing of speech and object recognition. NeuroImage, 44, 1210–1223. 10.1016/j.neuroimage.2008.09.034 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0068] Stevenson, R. A. , VanDerKlok, R. M. , Pisoni, D. B. , & James, T. W. (2011). Discrete neural substrates underlie complementary audiovisual speech integration processes. NeuroImage, 55, 1339–1345. 10.1016/j.neuroimage.2010.12.063 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0069] Szycik, G. R. , Stadler, J. , Tempelmann, C. , & Muente, T. F. (2012). Examining the McGurk illusion using high‐field 7 Tesla functional MRI. Frontiers in Human Neuroscience, 6, 95. 10.3389/fnhum.2012.00095 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0070] Tan, L. , Spinks, J. , Feng, C. , Siok, W. , Perfetti, C. , Xiong, J. , Fox, P. , & Gao, J. (2003). Neural systems of second language reading are shaped by native language. Human Brain Mapping, 18, 158–166. 10.1002/hbm.10089 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0115] Tietze, F.‐A. , Hundertmark, L. , Roy, M. , Zerr, M. , Sinke, C. , Wiswede, D. , Walter, M. , Münte, T. F. , & Szycik, G. R . (2019). Auditory deficits in audiovisual speech perception in adult Asperger's syndrome: fMRI study. Frontiers in Psychology, 10, 2286. 10.3389/fpsyg.2019.02286 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0116] Treille, A. , Vilain, C. , Hueber, T. , Lamalle, L. , & Sato, M. (2017). Inside speech: Multisensory and modality ‐ specific processing of tongue and lip speech actions. Journal of Cognitive Neuroscience, 29(3), 448–466. 10.1162/jocn_a_01057 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0072] Turkeltaub, P. E. , Eickhoff, S. B. , Laird, A. R. , Fox, M. , Wiener, M. , & Fox, P. (2012). Minimizing within‐experiment and within‐group effects in activation likelihood estimation meta‐analyses. Human Brain Mapping, 33, 1–13. 10.1002/hbm.21186 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0073] Ujiie, Y. , Yamashita, W. , Fujisaki, W. , Kanazawa, S. , & Yamaguchi, M. K. (2018). Crossmodal association of auditory and visual material properties in infants. Scientific Reports, 8, 9301. 10.1038/s41598-018-27153-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0074] van Atteveldt, N. , Formisano, E. , Goebel, R. , & Blomert, L. (2004). Integration of letters and speech sounds in the human brain. Neuron, 43, 271–282. 10.1016/j.neuron.2004.06.025 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0075] van Atteveldt, N. M. , Blau, V. C. , Blomert, L. , & Goebel, R. (2010). fMR‐adaptation indicates selectivity to audiovisual content congruency in distributed clusters in human superior temporal cortex. BMC Neuroscience, 11, 11. 10.1186/1471-2202-11-11 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0076] van Atteveldt, N. M. , Formisano, E. , Blomert, L. , & Goebel, R. (2006). The effect of temporal asynchrony on the multisensory integration of letters and speech sounds. Cerebral Cortex, 17, 962–974. 10.1093/cercor/bhl007 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0077] van Atteveldt, N. M. , Formisano, E. , Goebel, R. , & Blomert, L. (2007). Top–down task effects overrule automatic multisensory responses to letter–sound pairs in auditory association cortex. NeuroImage, 36, 1345–1360. 10.1016/j.neuroimage.2007.03.065 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0117] van der Linden, M. , van Turennout, M. , & Fernandez, G. (2011). Category training induces cross ‐ modal object representations in the adult human brain. Journal of Cognitive Neuroscience, 23(6), 1315–1331. 10.1162/jocn.2010.21522 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0119] Venezia, J. H. , Vaden, K. I., Jr. , Rong, F. , Maddox, D. , Saberi, K. , & Hickok, G. (2017). Auditory, visual and audiovisual speech processing streams in superior temporal sulcus. Frontiers in Human Neuroscience, 11. 10.3389/fnhum.2017.00174 [DOI] [PMC free article] [PubMed] [Google Scholar]

[hbm26713-bib-0078] Wagner, A. D. , Paré‐Blagoev, E. J. , Clark, J. , & Poldrack, R. A. (2001). Recovering meaning. Neuron, 31, 329–338. 10.1016/S0896-6273(01)00359-2 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0079] Werner, S. , & Noppeney, U. (2011). The contributions of transient and sustained response codes to audiovisual integration. Cerebral Cortex, 21, 920–931. 10.1093/cercor/bhq161 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0120] Wiersinga‐ Post, E. , Tomaskovic, S. , Slabu, L. , Renken, R. , de Smit, F. , & Duifhuis, H. (2010). Decreased BOLD responses in audiovisual processing. NeuroReport, 21(18), 1146–1151. 10.1097/WNR.0b013e328340cc47 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0080] Ye, Z. , Rüsseler, J. , Gerth, I. , & Münte, T. F. (2017). Audiovisual speech integration in the superior temporal region is dysfunctional in dyslexia. Neuroscience, 356, 1–10. 10.1016/j.neuroscience.2017.05.017 [DOI] [PubMed] [Google Scholar]

[hbm26713-bib-0081] Yi, H. G. , Leonard, M. K. , & Chang, E. F. (2019). The encoding of speech sounds in the superior temporal gyrus. Neuron, 102, 1096–1110. 10.1016/j.neuron.2019.04.023 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A meta‐analysis of letter–sound integration: Assimilation and accommodation in the superior temporal gyrus

Danqi Gao

Xitong Liang

Qi Ting

Emily Sophia Nichols

Zilin Bai

Chaoying Xu

Mingnan Cai

Li Liu

Abstract

Practitioner Points.

1. INTRODUCTION

2. MATERIALS AND METHODS

2.1. Literature search and studies selection

FIGURE 1.

2.2. Coordinate retrieval

TABLE 1.

TABLE 2.

2.3. ALE statistical analysis

2.3.1. Localization of three different types of AV integration

2.3.2. Localization of validating and conflicting AV integration

2.3.3. Similarities and differences of the three types of validating AV integration

2.3.4. Subgroup analysis of validating AV integration

3. RESULTS

3.1. Localization of three different types of AV integration

FIGURE 2.

3.1.1. Object–sound integration

TABLE 3.

3.1.2. Speech–sound integration

TABLE 4.

3.1.3. Letter–sound integration

TABLE 5.

TABLE 6.

3.2. Similarities and differences of the three types of AV integration

FIGURE 4.

FIGURE 3.

TABLE 7.

3.3. Subgroup localizations of three types of validating AV integration

3.3.1. Object–sound integration

Feature and concept subgroups

3.3.2. Speech–sound integration

Sub‐lexical and lexical subgroups

3.3.3. Letter–sound integration

Sub‐lexical and lexical subgroups

4. DISCUSSION

4.1. Regions recruited for AV integration: STG and MFG/insula

4.2. The assimilation–accommodation mechanism in letter–sound integration

4.3. Limitations

5. CONCLUSION

AUTHOR CONTRIBUTIONS

CONFLICT OF INTEREST STATEMENT

Supporting information

ACKNOWLEDGMENTS

DATA AVAILABILITY STATEMENT

REFERENCES

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases