Abstract
Visual inhibition of return (IOR) is a mechanism for preventing attention from returning to previously examined spatial locations. Previous studies have found that auditory stimuli presented simultaneously with a visual target can reduce or even eliminate the visual IOR. However, the mechanism responsible for decreased visual IOR accompanied by auditory stimuli is unclear. Using functional magnetic resonance imaging, we aimed to investigate how auditory stimuli reduce visual IOR. Behaviorally, we found that the visual IOR accompanying auditory stimuli was significant but smaller than the visual IOR. Neurally, only in the validly cued trials, the superior temporal gyrus showed increased neural coupling with the intraparietal sulcus, presupplementary motor area, and some other areas in audiovisual conditions compared with visual conditions. These results suggest that the reduction in visual IOR by the simultaneous auditory stimuli may be due to a dual mechanism: rescuing the suppressed visual salience and facilitating response initiation. Our results support crossmodal interactions can occur across multiple neural levels and cognitive processing stages. This study provides a new perspective for understanding attention‐orienting networks and response initiation based on crossmodal information.
Keywords: crossmodal interaction, inhibition of return, intraparietal sulcus, presupplementary motor area, response initiation, saliency map, superior temporal gyrus
The visual inhibition of return (IOR) accompanying auditory stimuli was significant but smaller than the unimodal visual IOR. In the validly cued trials, the superior temporal gyrus showed increased functional coupling with the intraparietal sulcus, presupplementary motor area, and other areas in audiovisual conditions compared with visual conditions.

1. INTRODUCTION
At every moment, we are surrounded by a wealth of information from different sensory modalities. Attention helps to select some information and filter out others. Inhibition of return (IOR) is a mechanism for preventing attention from returning to previously examined spatial locations, which can improve the efficiency of sampling information (Klein, 1988, 2000; Posner et al., 1985). Neural imaging studies have shown that IOR is correlated with the posterior parietal cortex (PPC), frontal eye field (FEF), supplementary motor area (SMA), and other areas (Chi et al., 2014; Mayer et al., 2004; Zhou & Chen, 2008).
Researchers found that auditory stimuli presented simultaneously with the visual target reduced or even eliminated the visual IOR (Tang et al., 2019, 2021; Van der Stoep et al., 2017). Tang et al. (2019) proposed that the combination of sound with a visual target increased perceptual salience (Van der Burg et al., 2008, 2011) and counteracted IOR (decreased salience; Itti & Koch, 2001; Prime & Ward, 2006). Moreover, several studies have suggested that IOR is caused by inhibiting response components (Ivanoff & Klein, 2001; Pastötter et al., 2008; Taylor & Klein, 1998). Specifically, a response bias that participants are reluctant to respond to a target at the cued location causes IOR (Ivanoff & Klein, 2006). Simultaneous auditory stimuli could affect the response component of the visual target (Chen & Zhou, 2013; Makovac et al., 2015). Participants tend to execute responses to visual targets accompanied by auditory stimuli (Li et al., 2015; Sun et al., 2022). In other words, combining auditory stimuli and visual targets can facilitate response initiation. Could this effect contribute to reducing visual IOR? It is unclear whether the decrease in visual IOR accompanied by auditory stimuli is due to auditory stimuli increasing the salience of the visual target, facilitating the response initiation to the visual target, or both.
On the one hand, changes in perceptual saliency can be represented as a saliency map. The saliency map is a topographic map that depicts how exogenous attention is allocated based on bottom‐up saliency in the visual scene (Itti & Koch, 2000, 2001). Evidence from numerous neurophysiological and imaging studies has shown that the intraparietal sulcus (IPS) of the PPC (Bisley & Goldberg, 2010; Wang et al., 2022), FEF of the prefrontal cortex (Bogler et al., 2011; Thompson & Bichot, 2005), some visual cortical areas (Burrows & Moore, 2009; Zhang et al., 2012), and the superior colliculus (SC) of subcortical areas (Veale et al., 2017; White et al., 2017) could realize the saliency map. Moreover, some imaging studies on IOR found that activation of the PPC (Mayer et al., 2004), FEF (Zhou & Chen, 2008), and visual cortex (Müller & Kleinschmidt, 2007) was correlated with maintaining inhibitory bias against returning attention to previously attended locations. If the effect of auditory stimuli on visual salience is responsible for the reduction in visual IOR, then the auditory cortex (such as the superior temporal gyrus [STG]) should interact with brain regions involved in forming visual saliency maps.
On the other hand, the pre‐SMA is central for initiating and inhibiting responses (Hoshi & Tanji, 2004; Mostofsky & Simmonds, 2008; Nachev et al., 2008). Wolpe et al. (2022) suggested that the dynamic adjustment of pre‐SMA activation played a role in controlling response inhibition and response initiation. Specifically, stronger activation of the pre‐SMA will lead to faster response initiation (Mansfield et al., 2011; Tosun et al., 2017). Some imaging studies on IOR found that activation of the pre‐SMA was correlated with maintaining response bias (Mayer et al., 2004; Müller & Kleinschmidt, 2007; Yang & Mayer, 2014). If the effect of auditory stimuli on response initiation is responsible for reducing visual IOR, then we predict that the STG will interact with the pre‐SMA.
Using functional magnetic resonance imaging (fMRI) and a multimodal exogenous cue‐target paradigm in this study, we mainly investigated how auditory stimuli affect the visual IOR. The study manipulated cue validity (including cued and uncued) and target type (including visual, audiovisual: visual stimuli accompanied by auditory stimuli, and auditory). We compared the audiovisual (AV) and visual (V) trials, that is, the “AV > V” contrast, and used a mask “A > V” to restrict the activation of audiovisual condition to the area where auditory information is processed. Furthermore, we conjoined the “AV > V” contrast across the cued and uncued conditions, thus creating a region independent of the cued validity. The processing of auditory information activated the STG (Alho et al., 2014; Lattner et al., 2005). Then, we performed a psychophysiological interaction (PPI) analysis with neural activity in the STG as the physiological factor and “AV > V” contrast as the psychological factor. The current study will show how auditory stimuli reduce the visual IOR by the functional coupling situation of the STG under different cue validities.
2. MATERIALS AND METHODS
2.1. Participants
We calculated the appropriate sample size based on the G*Power toolbox (Faul et al., 2007). To achieve the recommended 80% statistical power at α = .05 and a medium effect size of 0.25 (Cohen, 1988), the appropriate sample size was defined as at least 19 participants. Twenty‐four participants (age range 19–26; mean age 23 years; 8 males) took part in the fMRI experiment. All participants were right handed, with normal or corrected‐to‐normal vision and hearing and no neurological or psychiatric illness history. They all signed informed consent before participating in the study and were rewarded for participation. This experiment was approved by the Ethics Committee of Liaoning Normal University and conducted in line with the Declaration of Helsinki. One participant was excluded due to excessive movement within the scanner (larger than 3 mm, final sample N = 23).
2.2. Stimuli
E‐prime software (1.0 version, Neurobehavioral Systems, Inc.) was used to present the stimuli and to record the responses. An LCD projector was used to present visual stimuli, which were projected onto a screen behind the participants' heads and were viewed by participants via an angled mirror on the head coil of the MRI setup. MR‐compatible stereo headphones delivered binaural auditory stimuli. The headphone volume was adjusted for each participant to ensure they could hear the auditory stimuli clearly despite the background scanner noise. Visual stimuli were presented on a black (0.4 cd/m2) background display. The fixation stimulus consisted of a white (155.2 cd/m2) fixation cross (0.05° × 0.05° of the visual angle) flanked by two white (155.2 cd/m2) square outline boxes (7.7° × 7.7°, horizontal distance: 14.3°, vertical distance: 3.9°). Based on the fixation stimulus, one box at the bottom left/right of the screen will be randomly thicker as a peripheral cue, whereas the fixation cross was extended to 0.1° × 0.1° of visual angle to serve as a central cue to summon attention back to the central location and hasten the appearance of IOR (Visser Ta & Lm, 2006). There were three types of target stimuli: visual (V), audiovisual (AV), and auditory (A) targets. The V target (duration of 100 ms) was checkerboards with two black dots. The A target (produced by SoundEnginefree duration of 100 ms) was a 1600 Hz sinusoidal tone, with linear rise and fall times of 10 ms and an intensity of 65 dB that was presented from headphones. The AV target consisted of the simultaneous presentation of visual and auditory stimuli (see Figure 1b).
FIGURE 1.

Experimental procedure and target type. (a) Schematic of the stimulus procedure. In this example, the trial was cued validity. The target appeared on the same side of the cue. The cue validity was uncued when the target appeared in the opposite square outline box. Intertrial interval (ITI, ms), interstimulus interval (ISI, ms). (b) Target types (visual, auditory, audiovisual, and catch).
2.3. Experimental procedure and task
At the beginning of each trial, the fixation stimulus was presented for 750 ms in the center of the monitor (see Figure 1a). Following the presentation of the fixation stimulus, a spatially uninformative peripheral cue (left or right) was presented for 75 ms. Following the fixation stimulus, which varied randomly from 225 to 375 ms, the center cue was presented for a duration of 75 ms. Before the target occurrence, the fixation stimulus appeared again for a random interval of 225–375 ms. Thus, the SOA between the peripheral cue and the target varied randomly from 600 to 900 ms. The varied SOA was set to prevent the temporal expectation of the target's appearance. The target stimulus appeared from one of the two locations (left or right). The target stimulus could be presented from the same or different locations as the cue. Therefore, there were two cue validities: cued (i.e., same lateral location) and uncued (i.e., opposite lateral locations). Participants were instructed to press a button (“1” on the response pad) as quickly and accurately as possible with the index finger of their dominant hand whenever a sound stimulus, a visual stimulus, or an audiovisual stimulus was presented (86%) and to withhold their response when the target stimulus was not presented (14%). Finally, the intertrial interval was randomly set to 1000–1100 ms to allow the participants to make corresponding responses.
A 2 (cue validity: cued and uncued) by 3 (target type: V, AV, and A) event‐related parametric design was adopted in the present fMRI experiment. Each participant completed two experimental runs for a total of 800 trials, including 240 cued trials (80 Auditory trials, 80 Visual trials, 80 Audiovisual trials), 240 uncued trials (80 Auditory trials, 80 Visual trials, 80 Audiovisual trials), 80 catch trials and 240 null trials. The null trials, in which only the fixation stimulus was presented, were used as the implicit baseline and improved the signal‐to‐noise ratio. In the catch trials, cue stimuli appeared, but target stimuli did not appear (see Figure 1b). Every participant performed two practice blocks outside the scanner prior to the experiment to familiarize them with the stimuli and the task. Each experimental run lasted ~16 min; the participants were asked to rest for a short period after the run1 task was completed. During the short rest period, the scanner stopped running.
2.4. Statistical analysis of behavior
Trials with response times (RT) shorter than 100 ms (anticipations) or longer than 1000 ms (misses) were removed (Van der Stoep et al., 2015). Finally, 0.13% of the data were removed. The incorrect response was removed (3.8% of the data were removed); only the correct response was used in the RT analysis. The RT data were compared using a 2 (cue validity: cued, uncued) × 3 (target type: A, V, and AV) repeated‐measures ANOVA. The Greenhouse–Geisser epsilon correction was used to correct for nonsphericity. The Bonferroni correction was applied to the post hoc comparisons. One‐sample t‐tests were used to test whether there was an IOR effect (RT of cued condition minus the uncued condition) in different target modalities. Furthermore, we used a paired t‐test (two‐tailed) to compare the IOR effect in different target modalities. The effect size of Cohen's d or partial eta‐squared (η2 p) was calculated for mean comparisons or repeated‐measures ANOVA, respectively. For all tests, p‐values are reported. Additionally, we used JASP (Version 0.16.4, JASP Team 2022) to calculate and report Bayes factors (BF 10 ). Bayes factors (BF 10 ) were used to assess the strength of the evidence for H1 relative to H0 (Wagenmakers et al., 2017). A BF 10 of above 3 indicates substantial evidence for H1, whereas a B of below 1/3 indicates substantial evidence for H0, and between these values indicates that the data are insensitive (Dienes, 2014).
2.5. Imaging data acquisition and preprocessing
A 3 T Siemens Trio system with a standard head coil (Erlangen, Germany) was used to obtain T2*‐weighted echo‐planar images (EPIs) with blood oxygenation level‐dependent (BOLD) contrast (matrix size: 64 × 64, voxel size: 3.0 × 3.0 × 3.0 mm3). Thirty‐seven transverse slices of 3 mm thickness that covered the whole brain were acquired sequentially with a 0.3 mm gap (TR = 2.2 s, echo time = 30 ms, field of view = 192 mm, flip angle = 90°).
The scanning protocol closely followed that used in the study by Sun et al. (2022), so for consistency, we have reused the text describing these procedures in this article, with modifications as appropriate. Data were preprocessed with Statistical Parametric Mapping software SPM12 (Welcome to Department of Imaging Neuroscience, London; http://www.fil.ion.ucl.ac.uk). Images were realigned to the first image to correct head motion during the scan. Then, the mean EPI image of each subject was computed and spatially normalized to the Montreal Neurological Institute (MNI) single‐subject template using the “unified segmentation” function in SPM12. Before performing the statistical analysis, data were smoothed with an 8 mm full‐width half‐maximum Gaussian kernel to decrease spatial noise.
2.6. Statistical analysis of imaging data
The time series of all voxels were high‐pass filtered to 1/128 Hz and then analyzed with a general linear model (GLM) using SPM12. At the first level, the GLM was used to construct a multiple regression design matrix. Cued auditory trials, uncued auditory trials, cued visual trials, uncued visual trials, cued audiovisual trials, and uncued audiovisual trials were modeled as regressors of interest. In addition, the behavioral error trials, catch trials, and null trials were used as other regressors of no interest. The six head movement parameters, which were derived from realignment, were modeled as confounding regressors. Realignment parameters were included as nuisance covariates to account for residual motion artifacts. All the trials were time locked to the onset of the target stimuli by a canonical synthetic hemodynamic response function and its first‐order time derivative with an event duration of 0 s. Parameter estimates were subsequently calculated for each voxel using weighted least‐squares analysis to provide maximum likelihood estimators based on the temporal autocorrelation of the data. No global scaling was applied.
For each participant, simple main effects of each of the six interesting events were computed by assigning 1 to the regressor of interest and 0 to all the other regressors, namely, the experimental trials versus the baseline mean. Then, employing a random‐effects model (flexible factorial design in SPM12 including an additional factor modeling the participant means), the six first‐level individual contrast images were input into a within‐participants ANOVA at the group level. In the model of variance components, we modeled nonindependence across parameter estimates from the same participants. We used the standard implementation in SPM12 to allow unequal variances both between conditions and participants, thereby allowing violations of sphericity. To investigate how auditory stimuli reduced visual IOR, we were particularly interested in the differential neural activity in different modality trials under cued validity (Cued_Audiovisual vs. Cued_Visual) and under uncued validity (Uncued_Audiovisual vs. Uncued_Visual). To ensure that the activation of contrast “AV > V” reflects the processing of auditory information, the contrast “AV > V” was inclusively masked by the contrast “A > V” at a liberal threshold of p < .05, uncorrected at the voxel level. Areas of activation were identified as significant only if they passed a conservative threshold of p < .001 after the parametric family wise error (FW) correction for multiple comparisons at the cluster level, with an underlying voxel level of an uncorrected p < .001 (Poline et al., 1997).
2.7. PPI analysis
To further investigate how auditory stimuli reduce visual IOR, PPI analysis was used to examine the context‐specific functional modulation of neural activity across the brain by the neural activity in the bilateral STG (derived from the conjunction of contrast “AV > V” in the cued condition and “AV > V” contrast in the uncued condition). PPI analyses can serve to establish condition‐dependent functional coupling (or “effective connectivity”) between brain regions (Gitelman et al., 2003).
We used the contrast “AV > V” as the psychological factor and the neural activity in the bilateral STG as the physiological factor (seeds). For each participant, the contrast “AV > V” from the cued condition and the contrast “AV > V” from the uncued condition were first calculated at the individual level. Subsequently, for the assessment of neural activity in contrast listed above, a participant's individual peak voxel was determined as the maximally activated voxel within a sphere with an 8 mm radius around the coordinates of the peak voxel within the right STG (MNI: 66, −30, 6) and the left STG (MNI: −51, −27, 6) from the group‐level analysis. Individual peak voxels from every participant were located in the same anatomical structure. Next, the STG time series were extracted from a sphere with a 6 mm radius around the individual peak voxels. PPI analysis at the first individual‐level employed three regressors. The first regressor represents the extracted time series in the STG (the physiological variable). The second regressor represented the psychological variable of interest, “AV > V,” and the last regressor represented the cross product of the previous two (the PPI interaction term). An SPM was calculated to reveal areas whose activation was predicted by the PPI interaction term. The physiological and psychological regressors were treated as confounding variables, that is, by assigning 1 to the PPI regressor and 0 to the physiological and psychological regressors, respectively. At the group level, a random‐effects analysis was adopted: the individual SPM corresponding to the PPI term in each participant was subsequently analyzed using a one‐sample t‐test. Areas of activation were identified as significant only if they passed a conservative threshold of p < .001 after the parametric FWE correction for multiple comparisons at the cluster level, with an underlying voxel level of an uncorrected p < .001.
To compare functional coupling strength under different validity conditions, we extracted mean parameter estimates of the PPI regressor from the activated areas. Furthermore, we submitted them to a paired t‐test (one‐tailed) to determine whether the functional coupling strength under the cued validity was significantly stronger than uncued validity.
3. RESULTS
3.1. Behavioral results
The mean RT is shown in Table 1. The 2 (cue validity: cued, uncued) × 3 (target type: A, V, AV) repeated‐measures ANOVA on RTs revealed a significant main effect of cue validity (F[1, 22] = 23.489, p < .001, η2 p = 0.516, BF 10 = 69.9). The results showed that the responses in the cued condition (402 ms) were slower than those in the uncued condition (390 ms), which suggested that IOR occurred.
TABLE 1.
The overall reaction times (ms) of participants different target type under the cued and uncued conditions (M ± SD).
| Cue type | Target type | ||
|---|---|---|---|
| V | AV | A | |
| Cued | 447 ± 79 | 355 ± 72 | 403 ± 81 |
| Uncued | 421 ± 83 | 347 ± 74 | 402 ± 78 |
Abbreviations: A, auditory; AV, audiovisual; V, visual.
The main effect of the target type was significant (F[1.117, 24.573] = 40.245, p < .001, η2 p = 0.647, BF 10 > 1000), which is driven by AV targets (351 ms) being faster than A (403 ms) targets and V (434 ms) targets. V targets are not significantly slower than A targets.
In addition, the interaction between the target type and cue validity was significant (F[1.331, 29.283] = 15.472, p < .001, η p 2 = 0.413, BF 10 > 1000). One‐sample t‐tests showed that the IOR effect was significant for both the V (25.2 ms, t[22] = 7.176, p < .001, Cohen's d = 1.528, BF 10 > 1000) and AV targets (8.5 ms, t[22] =2.953, p = .007, Cohen's d = 0.616, BF 10 = 6.36) but not significant for the A targets (1.1 ms, t[22] = 0.277, p = .784, Cohens d = 0.058, BF 10 = 0.226). Paired‐sample t‐tests found that the IOR effect of the visual target was larger than that of the audiovisual target (t[22] = 6.486, p < .001, Cohen's d = 1.35, BF 10 > 1000; see Figure 2).
FIGURE 2.

Inhibition of return (IOR) effect in different conditions. The position of the square and circle represents the average IOR effect for audiovisual and visual modalities, respectively. Each point represents the IOR effect of the individual. The error bars represent the standard deviation of the mean. At the bottom is a box plot of the IOR effect for different conditions.
3.2. Imaging results
3.2.1. Common and specific neural correlates underlying the audiovisual/visual targets and cued/uncued condition
We identified brain regions associated with specific activation of visual stimuli accompanied by auditory stimuli conditions under cued validity (activation was restricted to the area where auditory information was processed); the bilateral STG was activated by contrast “CuedAV > CuedV” masked by “CuedA > CuedV” (see Figure 3a and Table 2a). Analogously, we detected bilateral STG activation in response to the contrast “UncuedAV > UncuedV,” which was masked by “UncuedA > UncuedV” (see Figure 3b and Table 2b). Furthermore, a conjunction analysis of the two contrasts “AV > V” under the cued and uncued conditions, masked by “Cued (A‐V) ∩ Uncued (A‐ V),” resulted in significant bilateral STG activation (see Figure 3c and Table 2c).
FIGURE 3.

Specific activation of visual stimuli accompanied by auditory stimuli under different cue validity and conjunction activation. (a) Bilateral superior temporal gyrus (STG) was activated by contrast “CuedAV > CuedV” masked by “CuedA > CuedV” (b) Bilateral STG was activated by contrast “UncuedAV > UncuedV” masked by “UncuedA > UncuedV” (c) A conjunction analysis between the two contrasts “AV > V” under the cued and uncued conditions masked by “Cued (A‐V) ∩ Uncued (A‐V)” significantly activated the bilateral STG. STG, superior temporal gyrus.
TABLE 2.
Results from the contrast “CuedAV > CuedV” masked by “CuedA > CuedV,” “UncuedAV > UncuedV” masked by “UncuedA > UncuedV” and the conjunction contrast between cued and uncued conditions masked by “Cued (A‐V) ∩ Uncued (A‐V).”
| Anatomical region | Side | Cluster peak (mm) | t‐score | K E (voxels) |
|---|---|---|---|---|
| (a) Cued condition | ||||
| CuedAV > CuedV masked by CuedA > CuedV | ||||
| Superior temporal gyrus | R | 66, −30, 6 | 11.53 | 565 |
| Superior temporal gyrus | R | 51, −3, 6 | 4.32 | |
| Superior temporal gyrus | R | 42, −18, −3 | 3.52 | |
| Superior temporal gyrus | L | −51, −27, 6 | 9.10 | 596 |
| Superior temporal gyrus | L | −60, −36, 9 | 8.98 | |
| Superior temporal gyrus | L | −45, −18, −3 | 5.21 | |
| (b) Uncued condition | ||||
| UncuedAV > UncuedV masked by UncuedA > UncuedV | ||||
| Superior temporal gyrus | R | 63, −27, 9 | 14.02 | 1310 |
| Superior temporal gyrus | R | 54, −24, 9 | 12.90 | |
| Superior temporal gyrus | R | 51, −9, −6 | 9.22 | |
| Superior temporal gyrus | L | −51, −27, 6 | 12.23 | 1322 |
| Superior temporal gyrus | L | −60, −30, 9 | 12.13 | |
| Middle temporal gyrus | L | −63, −48, 6 | 9.19 | |
| (c) Conjunction | ||||
| Cued (AV‐V) ∩ Uncued (AV‐V) masked by Cued (A‐V) ∩ Uncued (A‐V) | ||||
| Superior temporal gyrus | R | 66, −30, 6 | 11.53 | 554 |
| Superior temporal gyrus | R | 51, −3, −6 | 4.32 | |
| Superior temporal gyrus | L | −51, −27, 6 | 9.10 | 587 |
| Middle temporal gyrus | L | −60, −36, 9 | 8.98 | |
| Superior temporal gyrus | L | −45, −18, −3 | 5.21 |
Note: The coordinates (x, y, z) correspond to Montreal Neurological Institute coordinates. The coordinates of the maximally activated voxel within a significant cluster and the coordinates of relevant local maxima within the cluster (in italics with larger indents) are shown.
3.2.2. PPI analysis with the STG as the seed
PPI analysis was performed with the different target modalities (i.e., “AV vs. V”) as the psychological factor and neural activity in the right and left STG as the physiological factor. For cued validity, the right STG showed increased neural coupling with the cerebellum, pre‐SMA, cuneus, middle temporal gyrus (MTG), and middle frontal gyrus (MFG) in audiovisual conditions compared with visual conditions (see Table 3a and Figure 4a). No significant modulation of neural coupling was obtained in visual compared with audiovisual conditions. For the uncued validity, however, no modulation of neural coupling was observed whether audiovisual compared with visual conditions or visual compared with audiovisual conditions. For cued validity, the left STG showed increased neural coupling with the vermis, thalamus, and IPS in audiovisual conditions compared with visual conditions (see Table 3b and Figure 4b). Similar to the right STG, we did not find significant neural coupling with the left STG in the uncued condition.
TABLE 3.
Results from the psychophysiological interaction analyses.
| Anatomical region | Side | Cluster peak (mm) | t‐score | K E (voxels) |
|---|---|---|---|---|
| (a) PPI analyses: CuedAV > CuedV | ||||
| Right STG as the seed | ||||
| Cerebellum_6 | R | 27, −63, −24 | 5.61 | 111 |
| Inferior occipital gyrus | R | 42, −72, −12 | 4.86 | |
| Middle temporal gyrus | R | −51, −60, −3 | 5.38 | 54 |
| Cuneus | L | 0, −90, 18 | 5.33 | 99 |
| Presupplementary motor area | L | −6, 9, 48 | 4.82 | 112 |
| Middle cingulate | R | 9, 15, 45 | 3.92 | |
| Middle frontal gyrus | R | 36, 54, 30 | 4.48 | 61 |
| PPI analyses: UncuedAV > UncuedV | ||||
| Right STG as the seed | ||||
| No significant neural coupling | ||||
| (b) PPI analyses: CuedAV > CuedV | ||||
| Left STG as the seed | ||||
| Vermis | R | 6, −63, −24 | 6.64 | 121 |
| Cerebellum_4_5 | R | 18, −51, −18 | 4.99 | |
| Thalamus | R | 6, −21, 6 | 5.50 | 50 |
| Intraparietal sulcus | L | −30, −60, 66 | 5.10 | 47 |
| PPI analyses: UncuedAV > UncuedV | ||||
| Left STG as the seed | ||||
| No significant neural couplings | ||||
Note: The coordinates (x, y, z) correspond to Montreal Neurological Institute coordinates. The coordinates of the maximally activated voxel within a significant cluster and the coordinates of relevant local maxima within the cluster (in italics with larger indents) are displayed.
FIGURE 4.

Results of the PPI analysis. (a) PPI analysis based on neural activity in the right STG, with the contrast “cuedAV > cuedV” as the psychological factor. For cued validity, the right STG showed significantly higher functional coupling with the right cerebellum, left presupplementary motor area, left cuneus, and right middle frontal gyrus in audiovisual conditions. (b) PPI analysis based on neural activity in the left STG, with the contrast “cuedAV > cuedV” as the psychological factor. For cued validity, in audiovisual conditions, the left STG showed significantly higher functional coupling with the right vermis, right thalamus, and left intraparietal sulcus. PPI, psychophysiological interaction; STG, superior temporal gyrus.
Paired t‐test showed that the functional coupling strength of the STG and other regions (in addition to MTG, p = .237) was significantly higher in the cue validity condition than in the uncued validity condition (pre‐SMA: p = .01, cuneus: p = .002, MFG: p = .001, cerebellum: p = .035, IPS: p = .014, thalamus: p < .001, vermis: p = .009). This suggests that the heightened functional coupling strength observed in the audiovisual condition under cue validity is significantly greater than that observed under uncued validity.
4. DISCUSSION
The current fMRI study aimed to investigate how auditory stimuli reduce visual IOR. At the behavioral level, we found that the visual IOR accompanying auditory stimuli was significant but smaller than the unimodal visual IOR (Figure 2). These findings were consistent with previous studies (Tang et al., 2019, 2021). At the neural level, we found that in both cued and uncued conditions, audiovisual stimuli evoked stronger neural activity in the bilateral STG than visual stimuli (Figure 3). Notably, we found only in the validly cued trials that the STG showed increased functional coupling with the IPS, pre‐SMA, and other areas in audiovisual conditions compared with visual conditions (Figure 4). Our results suggest that auditory stimuli could affect both visual target salience and response initiation (see Figure 5), resulting in reduced visual IOR accompanying auditory stimuli.
FIGURE 5.

Information processing of visual targets accompanied by auditory stimuli. Cued visual targets are suppressed in perceptual processing and response execution. Simultaneous auditory stimuli rescue suppressed visual targets in the perceptual processing and response execution stages. In cued conditions, auditory stimuli facilitate inhibited processing, resulting in a smaller IOR. The content in the gray boxes indicated evidence that auditory stimuli facilitated perceptual processing and response execution in our research and Tang et al. (2021). The uncued target is not inhibited in perceptual processing and response execution. IOR, inhibition of return; STG, superior temporal gyrus.
The IOR effect is behaviorally defined as the difference in reaction times between cued and uncued conditions. Previous fMRI studies using the contrast “cued versus uncued” to study the neural mechanism of IOR, however, generally did not obtain significant differential brain activations (Hanlon et al., 2017; Mayer et al., 2004; Müller & Kleinschmidt, 2007; Zhou & Chen, 2008). Mayer et al. (2004) posited that the neuronal substrates responsible for the IOR theoretically cancel when the cued and uncued are directly compared. Consequently, the focus of our research was not directed toward differential neural activity between the two cue validity trials in visual and visual accompanying auditory stimulus conditions. Instead, to investigate how auditory stimuli reduced visual IOR, we directly compared the audiovisual (AV) and visual (V) trials to represent the specific activation under the audiovisual condition. Furthermore, activation (STG) was used as a seed to investigate the functional coupling difference between the visual accompanying auditory condition and the visual condition under different cue validity. Thus, the heightened functional coupling of the audiovisual condition solely discovered in the cued validity condition is responsible for the visual IOR accompanying auditory stimuli being smaller than the visual IOR.
In addition to processing auditory information, STG activation has typically been associated with multisensory interactions in previous research (Foxe & Schroeder, 2005; Kayser et al., 2007, 2009; Lewis & Noppeney, 2010). Nevertheless, our results do not support the notion that STG activation is driven by multisensory interactions. First, we employed a mask that limited the analysis of audiovisual activation to the area responsible for auditory information processing. Second, the conjunction between “AV > V” and “AV > A,” which was used to calculate multisensory interaction activation in fMRI (Beauchamp, 2005; Stevenson et al., 2014), did not identify significant activation in the STG (the results and a discussion of the results are presented in the Supplementary Material S1). Thus, our results suggest that the functional coupling between the STG and other regions observed in our study reflects the influence of auditory stimuli on other cognitive functions.
We found that the STG showed increased functional coupling with the left IPS in audiovisual conditions compared with visual conditions under cued validity. Traditionally, the right IPS is thought to guide the allocation of attention (Corbetta & Shulman, 2002) by inheriting the bottom‐up visual saliency map computed by the visual cortex and SC (Wang et al., 2022; White et al., 2017; Zhang et al., 2012). Surprisingly, we only found functional coupling with the left IPS. Mevorach and colleagues suggested that the left and right IPS have opposite roles, and the activation of the left IPS is associated with the suppression of the more salient stimuli (Mevorach et al., 2006, 2009, 2010). Some researchers found that the salience of auditory stimuli modulates the activation of the STG (Bordier et al., 2013; Nardo et al., 2014). Therefore, the functional couplings between the STG and left IPS may reflect auditory stimuli modulating suppressed visual saliency in the cued validity trials. In our research, auditory stimuli presented simultaneously and on the same side of space rescued the suppressed saliency of the visual target. Rescued visual salience indirectly promoted reorienting of attention and thus facilitated perceptual processing (Hillyard et al., 1998). The simultaneously presented auditory stimuli facilitated the perceptual processing of visual targets, leading to a smaller IOR effect (see Figure 5).
In addition to IPS, we also found that the STG showed increased functional coupling with the pre‐SMA in audiovisual conditions compared with visual conditions under cued validity. Previous anatomical and functional data consistently suggested that there exists an inherent link between the auditory and the motor systems. Anatomically, the auditory cortex was connected directly with the premotor cortex (Luppino et al., 2001; Zatorre et al., 2007). Functionally, passive listening to sounds can engage the pre‐SMA (Chen et al., 2008; Mayka et al., 2006). The enhanced functional coupling between the STG and pre‐SMA may reflect pre‐SMA activation by auditory stimuli during valid cue trials. Activation of the pre‐SMA led to less information being required to initiate the response (Mansfield et al., 2011). Therefore, simultaneously presented auditory stimuli facilitated response initiation, leading to a smaller IOR effect (see Figure 5).
We only found significantly enhanced functional coupling of the STG in audiovisual conditions compared with visual conditions under cued validity. The results may be due to the inhibition of the saliency and response components of the visual target under cued validity (Ivanoff & Klein, 2006; Martín‐Arévalo et al., 2016; Prime & Ward, 2006). Several studies were consistent with our hypothesis (Gifford & Cohen, 2004; Huang et al., 2015; Noesselt et al., 2010). For example, a study found that when audition dominated vision (i.e., visual response is suppressed), the auditory cortex showed enhanced connectivity with the sensorimotor area. In contrast, no such functional connectivity was found when visual dominated auditory (Huang et al., 2015). Our results further support the idea that matching between modalities is essential for experiencing a multisensory percept (Miller et al., 2015; Otto et al., 2013).
For cued validity, the STG showed increased neural coupling with the IPS and pre‐SMA in audiovisual conditions. These results suggest that auditory stimuli reduce visual IOR due to a “dual mechanism”: rescuing the suppressed visual salience and facilitating response initiation. Specifically, auditory stimuli facilitate perceptual processing and response execution, leading to a smaller IOR effect. A previous ERP study found no P1 and N2 neural modulation for the visual IOR accompanying auditory stimuli. In contrast, they found P1 and N2 neural modulation for the visual IOR (Tang et al., 2021). It is well known that P1 and N2 components, respectively, represent early perceptual processing (Hopfinger & Mangun, 1998) and late response execution processes (Falkenstein et al., 1999). The results of Tang et al. (2021) directly supported our dual mechanism hypothesis.
However, some researchers have proposed that the cause of the IOR effect may be fundamentally different depending on the activation state of the oculomotor system (Hilchey et al., 2014; Redden et al., 2021). Specifically, the IOR effect is due to inhibit response components when eye movements are permitted. In contrast, the IOR effect is due to inhibit perceptual/attention components when eye movements are prohibited. We suggest that our dual mechanism results are due to different experimental designs. In our research, we asked subjects to control eye movements, but we could not guarantee that subjects did not perform eye movements. Furthermore, some studies have found that the response task affects the IOR effect (Adam et al., 2005; Lupiáñez et al., 2007; Prime & Jolicoeur, 2009). For example, Prime and Jolicoeur (2009) proposed that when the response is prepotent (e.g., a simple detection response or high probability response is required), response bias and perceptual impairment for cued targets contribute to the IOR effect. This study used a detection task with a response probability of 86%. Therefore, in future studies, we need to monitor eye movements and manipulate experimental tasks to investigate further the mechanisms by which auditory stimuli affect visual IOR.
We also found increased functional coupling between the STG and other areas (e.g., MFG, cerebellum, and thalamus) under cued validity. Previous research has suggested that the activation of the right MFG (Corbetta et al., 2008; Downar et al., 2000; Shulman et al., 2003) and right cerebellum (Baier et al., 2010; Lepsien & Pollmann, 2002) is associated with attention reorienting. The functional coupling of the STG with the MFG and cerebellum may represent the effect of auditory stimuli on attention reorienting. For the thalamus, Noesselt et al. (2010) found that increased neural coupling strength between the thalamus and auditory cortex is related to the enhancement of detection sensitivity when lower‐intensity visual events are paired with sounds. We propose that the functional coupling of the STG and thalamus, like that of the IPS, may also represent the rescue of suppressed visual salience by auditory stimuli. Similar attentional networks involving the left IPS, right MFG, and right cerebellum have been observed during attention shifts across different modalities (Saito et al., 2005; Shomstein & Yantis, 2004). Therefore, the functional coupling of the STG with the MFG, cerebellum, IPS, and thalamus reflects that auditory stimuli promote attention reorienting to cued locations directly or indirectly.
L/R STG has different coupling patterns. On the one hand, the spatial attention network is found to be right lateralized (Corbetta et al., 2008; Corbetta & Shulman, 2002). Therefore, the right STG showed increased functional coupling with the right MFG and right cerebellum. On the one hand, the brain processes auditory information asymmetrically at more fundamental levels of auditory processing. The left STG specializes in the rapid temporal processing of auditory stimuli (Heimrath et al., 2014; Jamison et al., 2006; Yamasaki et al., 2005). Our auditory stimuli lasted for 100 ms and appeared randomly in the scanner noise stream. The coding of auditory saliency in our experiments may be more dependent on temporal contrast (Kayser et al., 2005). Therefore, we only found functional coupling of the left STG with the IPS and the thalamus.
In a long SOA cue‐target paradigm, the peripheral cue inhibits the processing of visual targets that appear at the exact location as the cue from early perceptual processing to late response execution, resulting in the IOR effect. Correspondingly, simultaneous auditory stimuli rescue suppressed visual targets in the perceptual processing and response execution stages by rescuing the suppressed visual salience, directly promoting attention reorienting and facilitating response initiation, resulting in a smaller IOR effect. Our results support that crossmodal interactions can occur across multiple neural levels and cognitive processing stages. This study provides a new perspective for understanding attention‐orienting networks and response initiation based on crossmodal information.
AUTHOR CONTRIBUTIONS
Yufeng He: Conceptualization, software, data curation, formal analysis, writing‐original draft, and writing‐review & editing. Xing Peng: Conceptualization, data curation, and formal analysis, writing‐original draft, and writing‐review & editing. Jiaying Sun: Conceptualization, software, data curation, and formal analysis. Xiaoyu Tang: Conceptualization, funding acquisition, investigation, supervision, project administration, writing‐original draft, and writing‐review & editing. Aijun Wang: Conceptualization, formal analysis, investigation, supervision, project administration, and writing‐review & editing. Ming Zhang: Funding acquisition, investigation, supervision, and project administration.
CONFLICT OF INTEREST STATEMENT
The authors declare no conflicts of interest.
Supporting information
Data S1: Supporting Information.
ACKNOWLEDGMENTS
The research was supported by the Youth Project of Humanities and Social Sciences Financed by Ministry of Education of China (22YJC190020 to Xiaoyu Tang), the Natural Science Foundation of Liaoning Province of China (2022‐MS‐312 to Xiaoyu Tang), the Natural Science Basic Scientific Research Project of Educational Department of Liaoning Province of China (LJKZ0987 to Xiaoyu Tang), the Suzhou Science and Technology Development Plan [People's Livelihood Science and Technology: SKY2022113 to Aijun Wan], Interdiscipline Research Team of Humanities and Social Sciences of Soochow University (2022), the National Natural Science Foundation of China (31871092 to Ming Zhang) and the Japan Society for the Promotion of Science KAKENHI (20K04381 to Ming Zhang).
He, Y. , Peng, X. , Sun, J. , Tang, X. , Wang, A. , & Zhang, M. (2023). The auditory stimulus reduced the visual inhibition of return: Evidence from psychophysiological interaction analysis. Human Brain Mapping, 44(10), 4152–4164. 10.1002/hbm.26336
Yufeng He and Xing Peng contributed equally to this work and are considered as cofirst authors.
Contributor Information
Xiaoyu Tang, Email: tangyu-2006@163.com.
Aijun Wang, Email: ajwang@suda.edu.cn.
Ming Zhang, Email: psychzm@mail.usts.edu.cn.
DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.
REFERENCES
- Adam, J. , O'Donnell, C. , & Pratt, J. (2005). Response selection influences inhibition of return. European Journal of Cognitive Psychology, 17(3), 319–328. 10.1080/09541440440000069 [DOI] [Google Scholar]
- Alho, K. , Rinne, T. , Herron, T. J. , & Woods, D. L. (2014). Stimulus‐dependent activations and attention‐related modulations in the auditory cortex: A meta‐analysis of fMRI studies. Hearing Research, 307, 29–41. 10.1016/j.heares.2013.08.001 [DOI] [PubMed] [Google Scholar]
- Baier, B. , Dieterich, M. , Stoeter, P. , Birklein, F. , & Müller, N. G. (2010). Anatomical correlate of impaired covert visual attentional processes in patients with cerebellar lesions. The Journal of Neuroscience, 30(10), 3770–3776. 10.1523/JNEUROSCI.0487-09.2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beauchamp, M. S. (2005). Statistical criteria in FMRI studies of multisensory integration. Neuroinformatics, 3(2), 93–113. 10.1385/NI:3:2:093 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bisley, J. W. , & Goldberg, M. E. (2010). Attention, intention, and priority in the parietal lobe. Annual Review of Neuroscience, 33, 1–21. 10.1146/annurev-neuro-060909-152823 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bogler, C. , Bode, S. , & Haynes, J.‐D. (2011). Decoding successive computational stages of saliency processing. Current Biology, 21(19), 1667–1671. 10.1016/j.cub.2011.08.039 [DOI] [PubMed] [Google Scholar]
- Bordier, C. , Puja, F. , & Macaluso, E. (2013). Sensory processing during viewing of cinematographic material: Computational modeling and functional neuroimageing. NeuroImage, 67, 213–226. 10.1016/j.neuroimage.2012.11.031 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burrows, B. E. , & Moore, T. (2009). Influence and limitations of popout in the selection of salient visual stimuli by area V4 neurons. The Journal of Neuroscience, 29(48), 15169–15177. 10.1523/JNEUROSCI.3710-09.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen, J. L. , Penhune, V. B. , & Zatorre, R. J. (2008). Listening to musical rhythms recruits motor regions of the brain. Cerebral Cortex, 18(12), 2844–2854. 10.1093/cercor/bhn042 [DOI] [PubMed] [Google Scholar]
- Chen, Q. , & Zhou, X. (2013). Vision dominates at the preresponse level and audition dominates at the response level in cross‐modal interaction: Behavioral and neural evidence. Journal of Neuroscience, 33(17), 7109–7121. 10.1523/JNEUROSCI.1985-12.2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chi, Y. , Yue, Z. , Liu, Y. , Mo, L. , & Chen, Q. (2014). Dissociable identity and modality‐specific neural representations as revealed by cross‐modal nonspatial inhibition of return. Human Brain Mapping, 35(8), 4002–4015. 10.1002/hbm.22454 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Routledge. 10.4324/9780203771587 [DOI] [Google Scholar]
- Corbetta, M. , Patel, G. , & Shulman, G. L. (2008). The reorienting system of the human brain: From environment to theory of mind. Neuron, 58(3), 306–324. 10.1016/j.neuron.2008.04.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Corbetta, M. , & Shulman, G. L. (2002). Control of goal‐directed and stimulus‐driven attention in the brain. Nature Reviews Neuroscience, 3(3), 201–215. 10.1038/nrn755 [DOI] [PubMed] [Google Scholar]
- Dienes, Z. (2014). Using Bayes to get the most out of non‐significant results. Frontiers in Psychology, 5, 781. 10.3389/fpsyg.2014.00781 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Downar, J. , Crawley, A. P. , Mikulis, D. J. , & Davis, K. D. (2000). A multimodal cortical network for the detection of changes in the sensory environment. Nature Neuroscience, 3(3), 277–283. 10.1038/72991 [DOI] [PubMed] [Google Scholar]
- Falkenstein, M. , Hoormann, J. , & Hohnsbein, J. (1999). ERP components in Go/Nogo tasks and their relation to inhibition. Acta Psychologica, 101(2), 267–291. 10.1016/S0001-6918(99)00008-6 [DOI] [PubMed] [Google Scholar]
- Faul, F. , Erdfelder, E. , Lang, A.‐G. , & Buchner, A. (2007). G*power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175–191. 10.3758/bf03193146 [DOI] [PubMed] [Google Scholar]
- Foxe, J. J. , & Schroeder, C. E. (2005). The case for feedforward multisensory convergence during early cortical processing. Neuroreport, 16(5), 419–423. 10.1097/00001756-200504040-00001 [DOI] [PubMed] [Google Scholar]
- Gifford, G. W. , & Cohen, Y. E. (2004). Effect of a central fixation light on auditory spatial responses in area LIP. Journal of Neurophysiology, 91(6), 2929–2933. 10.1152/jn.01117.2003 [DOI] [PubMed] [Google Scholar]
- Gitelman, D. R. , Penny, W. D. , Ashburner, J. , & Friston, K. J. (2003). Modeling reginal and psychophysiologic interactions in fMRI: The importance of hemodynamic deconvolution. NeuroImage, 19(1), 200–207. 10.1016/s1053-8119(03)00058-2 [DOI] [PubMed] [Google Scholar]
- Hanlon, F. M. , Dodd, A. B. , Ling, J. M. , Bustillo, J. R. , Abbott, C. C. , & Mayer, A. R. (2017). From behavioral facilitation to inhibition: The neuronal correlates of the orienting and reorienting of auditory attention. Frontiers in Human Neuroscience, 11, 293. 10.3389/fnhum.2017.00293 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heimrath, K. , Kuehne, M. , Heinze, H.‐J. , & Zaehle, T. (2014). Transcranial direct current stimulation (tDCS) traces the predominance of the left auditory cortex for processing of rapidly changing acoustic information. Neuroscience, 261, 68–73. 10.1016/j.neuroscience.2013.12.031 [DOI] [PubMed] [Google Scholar]
- Hilchey, M. D. , Hashish, M. , MacLean, G. H. , Satel, J. , Ivanoff, J. , & Klein, R. M. (2014). On the role of eye movement monitoring and discouragement on inhibition of return in a go/no‐go task. Vision Research, 96, 133–139. 10.1016/j.visres.2013.11.008 [DOI] [PubMed] [Google Scholar]
- Hillyard, S. A. , Vogel, E. K. , & Luck, S. J. (1998). Sensory gain control (amplification) as a mechanism of selective attention: Electrophysiological and neuroimaging evidence. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 353(1373), 1257–1270. 10.1098/rstb.1998.0281 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hopfinger, J. B. , & Mangun, G. R. (1998). Reflexive attention modulates processing of visual stimuli in human extrastriate cortex. Psychological Science, 9(6), 441–447. 10.1111/1467-9280.00083 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoshi, E. , & Tanji, J. (2004). Differential roles of neuronal activity in the supplementary and presupplementary motor areas: From information retrieval to motor planning and execution. Journal of Neurophysiology, 92(6), 3482–3499. 10.1152/jn.00547.2004 [DOI] [PubMed] [Google Scholar]
- Huang, S. , Li, Y. , Zhang, W. , Zhang, B. , Liu, X. , Mo, L. , & Chen, Q. (2015). Multisensory competition is modulated by sensory pathway interactions with fronto‐sensorimotor and default‐mode network regions. Journal of Neuroscience, 35(24), 9064–9077. 10.1523/JNEUROSCI.3760-14.2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Itti, L. , & Koch, C. (2000). A saliency‐based search mechanism for overt and covert shifts of visual attention. Vision Research, 40(10), 1489–1506. 10.1016/S0042-6989(99)00163-7 [DOI] [PubMed] [Google Scholar]
- Itti, L. , & Koch, C. (2001). Computational modelling of visual attention. Nature Reviews Neuroscience, 2(3), 194–203. 10.1038/35058500 [DOI] [PubMed] [Google Scholar]
- Ivanoff, J. , & Klein, R. M. (2001). The presence of a nonresponding effector increases inhibition of return. Psychonomic Bulletin & Review, 8(2), 307–314. 10.3758/bf03196166 [DOI] [PubMed] [Google Scholar]
- Ivanoff, J. , & Klein, R. M. (2006). Inhibition of return: Sensitivity and criterion as a function of response time. Journal of Experimental Psychology. Human Perception and Performance, 32(4), 908–919. 10.1037/0096-1523.32.4.908 [DOI] [PubMed] [Google Scholar]
- Jamison, H. L. , Watkins, K. E. , Bishop, D. V. M. , & Matthews, P. M. (2006). Hemispheric specialization for processing auditory nonspeech stimuli. Cerebral Cortex, 16(9), 1266–1275. 10.1093/cercor/bhj068 [DOI] [PubMed] [Google Scholar]
- Kayser, C. , Petkov, C. I. , Augath, M. , & Logothetis, N. K. (2007). Functional imaging reveals visual modulation of specific fields in auditory cortex. The Journal of Neuroscience, 27(8), 1824–1835. 10.1523/JNEUROSCI.4737-06.2007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kayser, C. , Petkov, C. I. , Lippert, M. , & Logothetis, N. K. (2005). Mechanisms for allocating auditory attention: An auditory saliency map. Current Biology, 15(21), 1943–1947. 10.1016/j.cub.2005.09.040 [DOI] [PubMed] [Google Scholar]
- Kayser, C. , Petkov, C. I. , & Logothetis, N. K. (2009). Multisensory interactions in primate auditory cortex: FMRI and electrophysiology. Hearing Research, 258(1–2), 80–88. 10.1016/j.heares.2009.02.011 [DOI] [PubMed] [Google Scholar]
- Klein, R. (1988). Inhibitory tagging system facilitates visual search. Nature, 334(6181), 430–431. 10.1038/334430a0 [DOI] [PubMed] [Google Scholar]
- Klein, R. (2000). Inhibition of return. Trends in Cognitive Sciences, 4(4), 138–147. 10.1016/s1364-6613(00)01452-2 [DOI] [PubMed] [Google Scholar]
- Lattner, S. , Meyer, M. E. , & Friederici, A. D. (2005). Voice perception: Sex, pitch, and the right hemisphere. Human Brain Mapping, 24(1), 11–20. 10.1002/hbm.20065 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lepsien, J. , & Pollmann, S. (2002). Covert reorienting and inhibition of return: An event‐related fMRI study. Journal of Cognitive Neuroscience, 14(2), 127–144. 10.1162/089892902317236795 [DOI] [PubMed] [Google Scholar]
- Lewis, R. , & Noppeney, U. (2010). Audiovisual synchrony improves motion discrimination via enhanced connectivity between early visual and auditory areas. The Journal of Neuroscience, 30(37), 12329–12339. 10.1523/JNEUROSCI.5745-09.2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li, Q. , Yang, H. , Sun, F. , & Wu, J. (2015). Spatiotemporal relationships among audiovisual stimuli modulate auditory facilitation of visual target discrimination. Perception, 44(3), 232–242. 10.1068/p7846 [DOI] [PubMed] [Google Scholar]
- Lupiáñez, J. , Ruz, M. , Funes, M. J. , & Milliken, B. (2007). The manifestation of attentional capture: Facilitation or IOR depending on task demands. Psychological Research, 71(1), 77–91. 10.1007/s00426-005-0037-z [DOI] [PubMed] [Google Scholar]
- Luppino, G. , Calzavara, R. , Rozzi, S. , & Matelli, M. (2001). Projections from the superior temporal sulcus to the agranular frontal cortex in the macaque. The European Journal of Neuroscience, 14(6), 1035–1040. 10.1046/j.0953-816x.2001.01734.x [DOI] [PubMed] [Google Scholar]
- Makovac, E. , Buonocore, A. , & McIntosh, R. D. (2015). Audio‐visual integration and saccadic inhibition. Quarterly Journal of Experimental Psychology, 68(7), 1295–1305. 10.1080/17470218.2014.979210 [DOI] [PubMed] [Google Scholar]
- Mansfield, E. L. , Karayanidis, F. , Jamadar, S. , Heathcote, A. , & Forstmann, B. U. (2011). Adjustments of response threshold during task switching: A model‐based functional magnetic resonance imaging study. Journal of Neuroscience, 31(41), 14688–14692. 10.1523/JNEUROSCI.2390-11.2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martín‐Arévalo, E. , Chica, A. B. , & Lupiáñez, J. (2016). No single electrophysiological marker for facilitation and inhibition of return: A review. Behavioural Brain Research, 300, 1–10. 10.1016/j.bbr.2015.11.030 [DOI] [PubMed] [Google Scholar]
- Mayer, A. R. , Seidenberg, M. , Dorflinger, J. M. , & Rao, S. M. (2004). An event‐related fMRI study of exogenous orienting: Supporting evidence for the cortical basis of inhibition of return? Journal of Cognitive Neuroscience, 16(7), 1262–1271. 10.1162/0898929041920531 [DOI] [PubMed] [Google Scholar]
- Mayka, M. A. , Corcos, D. M. , Leurgans, S. E. , & Vaillancourt, D. E. (2006). Three‐dimensional locations and boundaries of motor and premotor cortices as defined by functional brain imaging: A meta‐analysis. NeuroImage, 31(4), 1453–1474. 10.1016/j.neuroimage.2006.02.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mevorach, C. , Hodsoll, J. , Allen, H. , Shalev, L. , & Humphreys, G. (2010). Ignoring the elephant in the room: A neural circuit to downregulate salience. Journal of Neuroscience, 30(17), 6072–6079. 10.1523/JNEUROSCI.0241-10.2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mevorach, C. , Humphreys, G. W. , & Shalev, L. (2006). Opposite biases in salience‐based selection for the left and right posterior parietal cortex. Nature Neuroscience, 9(6), 740–742. 10.1038/nn1709 [DOI] [PubMed] [Google Scholar]
- Mevorach, C. , Shalev, L. , Allen, H. A. , & Humphreys, G. W. (2009). The left intraparietal sulcus modulates the selection of low salient stimuli. Journal of Cognitive Neuroscience, 21(2), 303–315. 10.1162/jocn.2009.21044 [DOI] [PubMed] [Google Scholar]
- Miller, R. L. , Pluta, S. R. , Stein, B. E. , & Rowland, B. A. (2015). Relative unisensory strength and timing predict their multisensory product. The Journal of Neuroscience, 35(13), 5213–5220. 10.1523/JNEUROSCI.4771-14.2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mostofsky, S. H. , & Simmonds, D. J. (2008). Response inhibition and response selection: Two sides of the same coin. Journal of Cognitive Neuroscience, 20(5), 751–761. 10.1162/jocn.2008.20500 [DOI] [PubMed] [Google Scholar]
- Müller, N. G. , & Kleinschmidt, A. (2007). Temporal dynamics of the attentional spotlight: Neuronal correlates of attentional capture and inhibition of return in early visual cortex. Journal of Cognitive Neuroscience, 19(4), 587–593. 10.1162/jocn.2007.19.4.587 [DOI] [PubMed] [Google Scholar]
- Nachev, P. , Kennard, C. , & Husain, M. (2008). Functional role of the supplementary and pre‐supplementary motor areas. Nature Reviews Neuroscience, 9(11), 856–869. 10.1038/nrn2478 [DOI] [PubMed] [Google Scholar]
- Nardo, D. , Santangelo, V. , & Macaluso, E. (2014). Spatial orienting in complex audiovisual environments. Human Brain Mapping, 35(4), 1597–1614. 10.1002/hbm.22276 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Noesselt, T. , Tyll, S. , Boehler, C. N. , Budinger, E. , Heinze, H.‐J. , & Driver, J. (2010). Sound‐induced enhancement of low‐intensity vision: Multisensory influences on human sensory‐specific cortices and thalamic bodies relate to perceptual enhancement of visual detection sensitivity. Journal of Neuroscience, 30(41), 13609–13623. 10.1523/JNEUROSCI.4524-09.2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Otto, T. U. , Dassy, B. , & Mamassian, P. (2013). Principles of multisensory behavior. Journal of Neuroscience, 33(17), 7463–7474. 10.1523/JNEUROSCI.4678-12.2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pastötter, B. , Hanslmayr, S. , & Bäuml, K.‐H. (2008). Inhibition of return arises from inhibition of response processes: An analysis of oscillatory beta activity. Journal of Cognitive Neuroscience, 20(1), 65–75. 10.1162/jocn.2008.20010 [DOI] [PubMed] [Google Scholar]
- Poline, J. B. , Worsley, K. J. , Evans, A. C. , & Friston, K. J. (1997). Combining spatial extent and peak intensity to test for activations in functional imaging. NeuroImage, 5(2), 83–96. 10.1006/nimg.1996.0248 [DOI] [PubMed] [Google Scholar]
- Posner, M. I. , Rafal, R. D. , Choate, L. S. , & Vaughan, J. (1985). Inhibition of return: Neural basis and function. Cognitive Neuropsychology, 2(3), 211–228. 10.1080/02643298508252866 [DOI] [Google Scholar]
- Prime, D. J. , & Jolicoeur, P. (2009). Response‐selection conflict contributes to inhibition of return. Journal of Cognitive Neuroscience, 21(5), 991–999. 10.1162/jocn.2009.21105 [DOI] [PubMed] [Google Scholar]
- Prime, D. J. , & Ward, L. M. (2006). Cortical expressions of inhibition of return. Brain Research, 1072(1), 161–174. 10.1016/j.brainres.2005.11.081 [DOI] [PubMed] [Google Scholar]
- Redden, R. S. , MacInnes, W. J. , & Klein, R. M. (2021). Inhibition of return: An information processing theory of its natures and significance. Cortex, 135, 30–48. 10.1016/j.cortex.2020.11.009 [DOI] [PubMed] [Google Scholar]
- Saito, D. N. , Yoshimura, K. , Kochiyama, T. , Okada, T. , Honda, M. , & Sadato, N. (2005). Cross‐modal binding and activated attentional networks during audio‐visual speech integration: A functional MRI study. Cerebral Cortex, 15(11), 1750–1760. 10.1093/cercor/bhi052 [DOI] [PubMed] [Google Scholar]
- Shomstein, S. , & Yantis, S. (2004). Control of attention shifts between vision and audition in human cortex. The Journal of Neuroscience, 24(47), 10702–10706. 10.1523/jneurosci.2939-04.2004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shulman, G. L. , McAvoy, M. P. , Cowan, M. C. , Astafiev, S. V. , Tansy, A. P. , d'Avossa, G. , & Corbetta, M. (2003). Quantitative analysis of attention and detection signals during visual search. Journal of Neurophysiology, 90(5), 3384–3397. 10.1152/jn.00343.2003 [DOI] [PubMed] [Google Scholar]
- Stevenson, R. A. , Ghose, D. , Fister, J. K. , Sarko, D. K. , Altieri, N. A. , Nidiffer, A. R. , Kurela, L. R. , Siemann, J. K. , James, T. W. , & Wallace, M. T. (2014). Identifying and quantifying multisensory integration: A tutorial review. Brain Topography, 27(6), 707–730. 10.1007/s10548-014-0365-7 [DOI] [PubMed] [Google Scholar]
- Sun, J. , Huang, J. , Wang, A. , Zhang, M. , & Tang, X. (2022). The role of the interaction between the inferior parietal lobule and superior temporal gyrus in the multisensory go/No‐go task. NeuroImage, 254, 119140. 10.1016/j.neuroimage.2022.119140 [DOI] [PubMed] [Google Scholar]
- Ta, V. , & Lm, W. (2006). Reorienting attention and inhibition of return. Perception & Psychophysics, 68(8), 1310–1323. 10.3758/bf03193730 [DOI] [PubMed] [Google Scholar]
- Tang, X. , Gao, Y. , Yang, W. , Ren, Y. , Wu, J. , Zhang, M. , & Wu, Q. (2019). Bimodal‐divided attention attenuates visually induced inhibition of return with audiovisual targets. Experimental Brain Research, 237(4), 1093–1107. 10.1007/s00221-019-05488-0 [DOI] [PubMed] [Google Scholar]
- Tang, X. , Wang, X. , Peng, X. , Li, Q. , Zhang, C. , Wang, A. , & Zhang, M. (2021). Electrophysiological evidence of different neural processing between visual and audiovisual inhibition of return. Scientific Reports, 11(1), 8056. 10.1038/s41598-021-86999-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Taylor, T. L. , & Klein, R. M. (1998). On the causes and effects of inhibition of return. Psychonomic Bulletin & Review, 5(4), 625–643. 10.3758/BF03208839 [DOI] [Google Scholar]
- Thompson, K. G. , & Bichot, N. P. (2005). A visual salience map in the primate frontal eye field. Progress in Brain Research, 147, 251–262. 10.1016/S0079-6123(04)47019-8 [DOI] [PubMed] [Google Scholar]
- Tosun, T. , Berkay, D. , Sack, A. T. , Çakmak, Y. Ö. , & Balcı, F. (2017). Inhibition of pre‐supplementary motor area by continuous theta burst stimulation leads to more cautious decision‐making and more efficient sensory evidence integration. Journal of Cognitive Neuroscience, 29(8), 1433–1444. 10.1162/jocn_a_01134 [DOI] [PubMed] [Google Scholar]
- Van der Burg, E. , Olivers, C. N. L. , Bronkhorst, A. W. , & Theeuwes, J. (2008). Pip and pop: Nonspatial auditory signals improve spatial visual search. Journal of Experimental Psychology: Human Perception and Performance, 34(5), 1053–1065. 10.1037/0096-1523.34.5.1053 [DOI] [PubMed] [Google Scholar]
- Van der Burg, E. , Talsma, D. , Olivers, C. N. L. , Hickey, C. , & Theeuwes, J. (2011). Early multisensory interactions affect the competition among multiple visual objects. NeuroImage, 55(3), 1208–1218. 10.1016/j.neuroimage.2010.12.068 [DOI] [PubMed] [Google Scholar]
- Van der Stoep, N. , Van der Stigchel, S. , & Nijboer, T. C. W. (2015). Exogenous spatial attention decreases audiovisual integration. Attention, Perception & Psychophysics, 77(2), 464–482. 10.3758/s13414-014-0785-1 [DOI] [PubMed] [Google Scholar]
- Van der Stoep, N. , Van der Stigchel, S. , Nijboer, T. C. W. , & Spence, C. (2017). Visually induced inhibition of return affects the integration of auditory and visual information. Perception, 46(1), 6–17. 10.1177/0301006616661934 [DOI] [PubMed] [Google Scholar]
- Veale, R. , Hafed, Z. M. , & Yoshida, M. (2017). How is visual salience computed in the brain? Insights from behavior, neurobiology and modeling. Philosophical Transactions of the Royal Society B‐Biological Sciences, 372(1714), 20160113. 10.1098/rstb.2016.0113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wagenmakers, E.‐J. , Verhagen, A. J. , Ly, A. , Matzke, D. , Steingroever, H. , Rouder, J. , & Morey, R. (2017). The need for Bayesian hypothesis testing in psychological science. In S. O. Lilienfeld & I. D. Waldman (Eds.), Psychological science under scrutiny: Recent challenges and proposed solutions (pp. 123–138). Wiley Blackwell. 10.1002/9781119095910.ch8 [DOI] [Google Scholar]
- Wang, L. , Huang, L. , Li, M. , Wang, X. , Wang, S. , Lin, Y. , & Zhang, X. (2022). An awareness‐dependent mapping of saliency in the human visual system. NeuroImage, 247, 118864. 10.1016/j.neuroimage.2021.118864 [DOI] [PubMed] [Google Scholar]
- White, B. , Kan, J. , Levy, R. , Itti, L. , & Munoz, D. (2017). Superior colliculus encodes visual saliency before the primary visual cortex. Proceedings of the National Academy of Sciences of the United States of America, 114(35), 9451–9456. 10.1073/pnas.1701003114 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wolpe, N. , Hezemans, F. H. , Rae, C. L. , Zhang, J. , & Rowe, J. B. (2022). The pre‐supplementary motor area achieves inhibitory control by modulating response thresholds. Cortex, 152, 98–108. 10.1016/j.cortex.2022.03.018 [DOI] [PubMed] [Google Scholar]
- Yamasaki, T. , Goto, Y. , Taniwaki, T. , Kinukawa, N. , Kira, J. , & Tobimatsu, S. (2005). Left hemisphere specialization for rapid temporal processing: A study with auditory 40 Hz steady‐state responses. Clinical Neurophysiology, 116(2), 393–400. 10.1016/j.clinph.2004.08.005 [DOI] [PubMed] [Google Scholar]
- Yang, Z. , & Mayer, A. R. (2014). An event‐related FMRI study of exogenous orienting across vision and audition: Cross‐modal orienting. Human Brain Mapping, 35(3), 964–974. 10.1002/hbm.22227 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zatorre, R. J. , Chen, J. L. , & Penhune, V. B. (2007). When the brain plays music: Auditory‐motor interactions in music perception and production. Nature Reviews Neuroscience, 8(7), 547–558. 10.1038/nrn2152 [DOI] [PubMed] [Google Scholar]
- Zhang, X. , Zhaoping, L. , Zhou, T. , & Fang, F. (2012). Neural activities in V1 create a bottom‐up saliency map. Neuron, 73(1), 183–192. 10.1016/j.neuron.2011.10.035 [DOI] [PubMed] [Google Scholar]
- Zhou, X. , & Chen, Q. (2008). Neural correlates of spatial and non‐spatial inhibition of return (IOR) in attentional orienting. Neuropsychologia, 46(11), 2766–2775. 10.1016/j.neuropsychologia.2008.05.017 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data S1: Supporting Information.
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
