Abstract
Object recognition benefits maximally from multimodal sensory input when stimulus presentation is noisy, or degraded. Whether this advantage can be attributed specifically to the extent of overlap in object‐related information, or rather, to object‐unspecific enhancement due to the mere presence of additional sensory stimulation, remains unclear. Further, the cortical processing differences driving increased multisensory integration (MSI) for degraded compared with clear information remain poorly understood. Here, two consecutive studies first compared behavioral benefits of audio‐visual overlap of object‐related information, relative to conditions where one channel carried information and the other carried noise. A hierarchical drift diffusion model indicated performance enhancement when auditory and visual object‐related information was simultaneously present for degraded stimuli. A subsequent fMRI study revealed visual dominance on a behavioral and neural level for clear stimuli, while degraded stimulus processing was mainly characterized by activation of a frontoparietal multisensory network, including IPS. Connectivity analyses indicated that integration of degraded object‐related information relied on IPS input, whereas clear stimuli were integrated through direct information exchange between visual and auditory sensory cortices. These results indicate that the inverse effectiveness observed for identification of degraded relative to clear objects in behavior and brain activation might be facilitated by selective recruitment of an executive cortical network which uses IPS as a relay mediating crossmodal sensory information exchange.
Keywords: intraparietal sulcus, multisensory integration, perception threshold, principle of inverse effectiveness
1. INTRODUCTION
The act of perceiving and processing information through multiple senses and attributing congruent stimuli to the same source is often referred to as multisensory integration (MSI). Commonly, a distinction is made between a mere temporal or spatial congruency (Slutsky & Recanzone, 2001), and higher‐order semantic congruency of object‐related information (Chen & Spence, 2010; Hein et al., 2007), which refers to overlap in sensory information in multiple channels that can be attributed to the same emitting object. To facilitate the appropriate grouping of object‐related information between noisy sensory channels, our brain depends on probabilistic Bayesian causal inference, which optimizes decision‐making about an object's identity through weighting of all incoming sensory information (Rohe & Noppeney, 2015; Seilheimer, Rosenberg, & Angelaki, 2014). The principle of inverse effectiveness states that MSI is most effective, and therefore elicits maximal behavioral enhancements, when degraded or ambiguous and thus difficult individual stimuli are presented (Meredith and Stein, 1983, 1986). Inverse effectiveness has been shown to apply at both the level of integration of basic temporospatial overlap (Kayser, Petkov, Augath, & Logothetis, 2005; Senkowski, Saint‐Amour, Hofle, & Foxe, 2011) as well as at the level of object‐specific feature integration (Rohe & Noppeney, 2012; Stevenson & James, 2009; however, see Chandrasekaran, Chan, & Wong, 2011; Schepers, Schneider, Hipp, Engel, & Senkowski 2013; for exceptions). Even though many sensory stimuli we encounter in real life are degraded, mixed with other co‐occuring perceptions and multisensory in character, studies exploring the role of MSI during object identification have predominantly used clearly perceivable sensory stimuli, where the need for integration of multisensory information naturally is limited. Further, common comparisons assess the effectiveness of recognizing multisensory and pure unisensory stimuli, e.g., the audiovisual sound and sight of tools to separate presentations of either the sound alone or the sight of a tool alone. The latter practice limits the possibility to attribute effects of multisensory enhancement causally to the integration of object‐relevant knowledge from multiple sensory channels, given that the presence of sensory input alone, in absence of overlapping object information, has been demonstrated to result in performance improvements, likely as a result of attentional or priming‐related mechanisms (e.g., Gleiss and Kayser, 2014a, 2014b; Kayser, Philiastides, & Kayser, 2017). In other words, whether the addition of sensory input in the other modality was sufficient to elicit multisensory effects in absence of object‐related information could not be established due to the absence of a baseline consisting of parallel sensory stimulation without object information (compare with Werner and Noppeney, 2010b).
We (Regenbogen, Johansson, Andersson, Olsson, & Lundstrom, 2016) recently demonstrated enhancement of MSI for object‐related information within a Bayesian hierarchical drift diffusion model (HDDM) (Wiecki, Sofer, & Frank, 2013). Here, the task was to identify dynamic and degraded presentations of congruent auditory/visual information. Although sensory input was always presented on both auditory and visual sensory channels, trials varied in whether object‐related information was present in the auditory/visual channel only, while the respective other channel carried noise, or whether meaningful object‐relevant information could be extracted from both channels (compare to Werner & Noppeney, 2010a). A few additional studies have recently investigated the effectiveness of degraded stimuli for crossmodal object identification (Ohla, Hochenberger, Freiherr, & Lundstrom, in press; Stevenson, Geoghegan, & James, 2007; Werner & Noppeney, 2010a, 2010b) and demonstrated superadditive effects as an indicator of object‐based MSI. Behavioral findings include superior behavioral performance levels for bimodal compared to unisensory object information through accuracy and reaction times (Werner and Noppeney, 2010b). On a neural level, increased blood‐oxygen‐level‐dependent (BOLD) responses were found for multisensory compared to summed unisensory signals in MSI sites, for example, the superior temporal sulcus (STS) (Stevenson, et al., 2007), the intraparietal sulcus (IPS), the superior frontal gyrus and superior medial frontal gyrus, as well as the angular gyrus (Werner & Noppeney, 2010b).
Although a few studies (Werner & Noppeney, 2010b; Stevenson and James, 2009) specifically tested for “inverse effectiveness” and reported a neural correlate of this contrast in the STS, it remained speculative how the contrast between degraded over clear information was characterized on the behavioral as well as the neural level. In other words, whether the effects were unique to degraded, or equally observable in clear stimuli, and whether the neural mechanisms of object‐related information integration differed from those identified for basic sensory integration, remained unclear. A second question regarding the neural correlates specifically pertained to the IPS. While the IPS is often referred to as a central “hub” of MSI in the literature (Bremmer, et al., 2001; Calvert, Campbell, & Brammer, 2000; Driver & Noesselt, 2008; Grefkes & Fink, 2005), its exact function during this process remains poorly understood. This is because multisensory tasks may involve shifts in focused attention, for which the IPS also holds an important role (i.e., dorsal attention network Tang Tang, Wu, & Shen, 2016), specifically, when it comes to task‐difficulty (Basten, Biele, Heekeren, & Fiebach, 2010; Hare, Schultz, Camerer, O'doherty, & Rangel, 2011). Differences in task difficulty for identification of degraded relative to clear stimuli may thus have exerted a strong effect on the recruitment of networks supporting the management of tasks with higher cognitive load. The aim of this study was therefore to more specifically investigate IPS involvement in the perception of degraded and difficult multisensory stimuli, especially its role in the top‐down mediation of the information exchange between unisensory auditory and visual cortices.
This was achieved through comparisons of degraded and clear multisensory information processing in two independent yet interlinked experiments. In Experiment 1, we set out to replicate our previous behavioral finding of strong multisensory enhancement of degraded information using a HDDM model where sensory object‐related information, rather than sensory inputs, were uni‐ or multisensory (i.e., variations of overlap in object information were dissociated from the mere temporal concordance of sensory channels). In Experiment 2, we used these dynamic and degraded stimuli, as well as their clear counterparts, to define the role of the IPS in processing multisensory information. We ensured stable difficulty levels throughout presentation of the degraded stimuli by assessing personalized and adaptive perception thresholds (see Regenbogen et al., 2016). This procedure was chosen to ensure that attention would not be unevenly divided between the two senses, as can be observed in true unisensory stimuli (Kording et al., 2007) and to focus on object‐related MSI rather than nonobject‐related sensory MSI (e.g., Gleiss & Kayser 2014a, 2014b; Kayser et al., 2017). Instead, paying attention to both senses simultaneously minimizes the difference between unisensory and multimodal stimuli regarding attention allocation. The difference thus relates to the actual object identity and represents relative multimodal gain (Ross, Saint‐Amour, Leavitt, Javitt, & Foxe, 2007) and informativeness (Werner & Noppeney, 2010a). For this article, this also means that “unisensory” and “multisensory” refers to the informativeness and object‐relatedness, not the general sensory input, which was indeed always bimodal. As previously demonstrated, we further hypothesized that degraded and difficult stimuli would show larger evidence of MSI on a behavioral level in both experiments. Moreover, given its role in both task difficulty and MSI, we further hypothesized that the IPS would be preferentially involved in the integration of multisensory information within a noisy environment.
2. EXPERIMENT 1
2.1. Material and methods
2.1.1. Participants
Forty‐three healthy participants were recruited from a student population. Three participants were excluded due to technical errors and two were excluded due to performance below chance level, which reduced the final sample size to 38 participants (age M = 25.74, SD = 5.86, 20 females). Participants provided written informed consent, the study conformed to the Declaration of Helsinki, and all aspects were approved by the regional ethics review board in Stockholm, Sweden.
2.1.2. Stimuli
Stimuli consisted of four sound clips and four video clips (http://www.shutterstock.com) depicting “wood fire” (ID no. 1644241), “lawn mower” (ID no. 3193405), “popcorn” (ID no. 3211516), and “flopping fish” (ID no. 2004905). Sound clips were cut to a length of 2000 ms, including 100ms fade‐in and fade‐out ramps and a root mean square equalization to −23 dB loudness (http://audacity.sourceforge.net, Adobe Audition). Auditory noise control files were created using MATLAB (R2014a) (100% pink noise) and included a fade‐in and fade‐out ramp and root mean square equalization. Videos were cut to 2000 ms in length and had a resolution of 720*400 pixels (http://www.virtualdub.org). Visual noise control files were created using MATLAB to create salt and pepper noise.
2.1.3. Unisensory threshold session
Prior to the experiment, unisensory identification thresholds were separately assessed for each object and each participant. In two separate blocks (auditory and visual, counter‐balanced order), either sounds or videos were individually presented using Matlab. Visual stimuli were presented in the screen's center and auditory stimuli were presented binaurally through the headphones with a constant loudness level. Degradedness was induced by adding auditory pink noise to the auditory stimulus, and by adding salt and pepper noise to the visual stimulus, so that the relation between stimulus and noise would be slowly moving towards the noise. An adaptive staircase design was used to determine stimuli with a degradedness level that would go along with 75% performance accuracy within one object category. This procedure was chosen to guarantee sufficient training and performance on the resulting degraded stimuli to enable adequate subsequent application in the multisensory assessment. After a stimulus presentation of 2 s (randomized order), participants indicated the identity of the presented object (fire, lawn mower, popcorn, fish, or nothing) through a button press on one of five keyboard buttons using their five fingers (five alternative‐forced‐choice test, 5AFCT). The assignment of a button to an object was always the same and so was the order in which the object names appeared on the screen. Two correct responses in a row (choosing the object that was presented), and one individual error, respectively, triggered a reversal and the mean masking of the last four out of a total of six reversals constituted the participant's degraded stimuli.
In contrast to the subsequent main experiment in which MSI of bimodal information was assessed, sounds and video clips were not accompanied by stimulation of the respective other modality; stimulation was therefore truly unisensory. In the visual threshold session, visual noise input to the stimulus ranged from 40% (60% remaining clear stimulus) to a maximum of 98% (2% remaining clear stimulus) in 20 steps of 2%. In the auditory threshold session, auditory noise input to the stimulus ranged from an equal signal‐to‐noise‐ratio (SNR) of 0 to −23.75 dB in 20 steps of 1.25 SNR. Participants' mean visual threshold level was 17.38 ± 0.71 (values reported as mean values ± SD of steps) which corresponded to a 93% masking level. Participants' mean auditory threshold level was 16.20 ± 1.63 steps which corresponded to a −19.00 dB SNR masking level. Individual auditory and visual degraded stimuli at the selected threshold were saved as new files for the main experiment in which MSI of bimodal information was assessed.
2.1.4. MSI of bimodal information session
The experimental protocol consisted of presentations of degraded auditory, visual and audiovisual stimuli in a 5AFCT (Figure 1). Participants were instructed to press the space bar as soon as they thought to have identified a presented object (2 s max). Each trial started with a 2 s black fixation cross, followed by the presentation of a stimulus. Upon space bar press, the video disappeared and a response screen appeared. Participants pressed one of the buttons to select their answer out of five alternatives (the true answer, the three remaining object labels, and “nothing”) within another 2 s. Response data consisted of object identification accuracy and reaction time (space bar press).
Figure 1.

In both experiments, participants were presented with degraded unimodal and bimodal stimuli (Ad, Vd, AVd, and control condition NN, all in randomized order) followed by an object detection forced‐choice task (five alternatives). In Experiment 1, the response was speeded, in Experiment 2, the response was given after the audiovisual presentation to prevent movement‐related brain activation. In Experiment 2, three conditions depicted clear uni‐and bimodal stimuli (Ac, Vc, AVc) [Color figure can be viewed at http://wileyonlinelibrary.com]
Each participant was presented with 40 stimuli in each of the three conditions: degraded auditory and visual stimuli (“auditory degraded” [Ad] and “visual degraded” [Vd]), and their combination “audiovisual degraded” [AVd]), as well as 30 additional noise control trials (NN), resulting in a total of 150 stimuli. Stimuli carrying object‐relevant information in only one modality were always paired with 100% noise in the respective other modality (see also Regenbogen et al., 2016; Werner & Noppeney, 2010a, 2010b). In contrast to the previous threshold/ROI session, all trials therefore presented stimulation to both the auditory and visual modality, and as such should be considered “bimodal” in terms of their sensory input. Their information value with regard to object recognition, however, could be uni‐ or multisensory [compare to (Werner & Noppeney, 2010a)]. For simplicity, we will use the terms “unisensory” and “multisensory” to refer to the specific conditions of our experiment throughout the results and discussion section.
2.1.5. Data analysis
Our aim in Experiment 1 was to determine potential MSI of object information in individually optimized degraded visual and auditory stimuli when presented in a speeded‐response task, using a new approach based on our recently published work (Regenbogen et al., 2016). To achieve this, we analyzed single‐trial behavioral data (accuracy and RT) with a HDDM (v.0.5.5) (Wiecki et al., 2013) within an IPython (Python 3.5.1) interpreter shell (Perez & Granger, 2007). The HDDM models choice data as a sequential noisy‐information‐accumulation, which eventually leads to a decision by crossing a response boundary (threshold parameter “a”) within a certain amount of time (drift rate “v”). The model specifies a likelihood function (Navarro & Fuss, 2009), which links the actual data to its prior probability. Based on single‐trial data, the model draws 10,000 posterior samples using a Markov‐chain Monte Carlo algorithm (Gamerman & Lopes, 2006) with the first 1,000 samples discarded for model stabilization. On a trial‐by‐trial basis, the HDDM estimates the evidence accumulated to reach a decision (correct/false) and, across trials, obtains a temporal accumulation of evidence. The speed with which the evidence accumulation approaches one of the two boundaries is called‐drift rate v. The model also accounts for drift‐rate intertrial variability and the condition‐independent time it takes for perception, movement initiation and execution of a decision (nondecision time t). This results in joint posterior distribution of all model parameters (group parameters for each condition, as well as individual subject parameters). Drift‐rate v is thus first estimated on the individual level and later constrained by group‐level distributions (Nilsson, Rieskamp, & Wagenmakers, 2011; Shiffrin, Lee, Kim, & Wagenmakers, 2008). HDDM calculations were quality‐controlled by visual observation of the trace, autocorrelation, and the marginal posterior, as well as the Gelman–Rubin Geweke statistic (< 1.02) (Gelman and Rubin, 1992). All indices indicated successful convergence of the model.
HDDM‐derived group‐level posteriors, representing a combined speed‐accuracy measure, were extracted for each condition (auditory, visual, audiovisual) and analyzed with a repeated‐measures analysis of variance (ANOVA, within‐subject factor “Condition”). Degrees of freedom were corrected using Greenhouse‐Geisser estimates if necessary per Mauchly's test. Posthoc two‐tailed paired t tests (p < .05) were Bonferroni‐corrected. Effect size estimates were given by partial eta‐square (ηp 2 for ANOVAs) and by the effect size Cohen's d adjusted1 (for Student's t tests) (Cumming, 2012).
We analyzed drift‐rate posteriors for audiovisual stimuli regarding MSI using the “Probability Summation” index (Stevenson et al., 2014): (AVd > Ad + Vd − [Ad × Vd]). This index compares the probability of a multisensory response to the probability of the added unisensory responses, correcting for their joint probability. The Probability Summation index was tested against evidence of no change with a one‐sample Student's t test (two‐tailed, p < .05).
2.2. Results
We first assessed whether there was a significant difference between the three conditions. The one‐way ANOVA showed a main effect of Condition on posterior drift rates (F[1.42,52.42] = 48.03, p < .01, ηp 2 = 0.56). Pairwise tests of the main effect indicated that drift rates were higher in AVd compared with Ad (t[37] = 10.09, p < .001, d = 2.01) and compared with Vd (t[37] = 9.48, p < .001, d = 1.20). Drift rates were also higher in Vd compared with Ad (t[37] = 2.21, p = .03, d = 0.52). Figure 2 displays the density plots of the drift rate distributions within each condition.
Figure 2.

Left panel: Posterior drift rates derived from the HDDM in Experiment 1 are displayed for each of the three conditions. An ANOVA revealed significantly higher drift rates in AVd stimuli than in both other conditions (Ad and Vd). Right panel: Calculation of the Probability Summation. To account for statistical facilitation, the detection rate for bimodal (AVd) responses is compared to the sum of the rates for each of the unisensory presentations (Ad and Vd, respectively), less the probability that both the unisensory auditory and visual stimulus were detected on the same trial (AVd). A detection rate (P) that exceeds this predicated rate implies that information is being integrated (compare with Stevenson et al., 2014). Values displayed represent Mean and SEM [Color figure can be viewed at http://wileyonlinelibrary.com]
We then assessed whether this difference was a demonstration of MSI rather than a mere additive summation effect by calculating the Probability Summation Index. The one‐sample t test on Probability Summation showed a significant effect (t[37] = 8.85, p < .001, d = 1.42), indicating MSI.
2.3. Discussion experiment 1
In Experiment 1, we applied a two‐step procedure to investigate MSI of degraded multisensory information, creating individually optimized degraded dynamic stimuli at individuals' perceptual threshold, to then establish evidence for a response advantage on speeded responses within a HDDM analyses paradigm. Group‐derived drift rate measures from the HDDM demonstrated significant multisensory enhancement for degraded dynamic audio‐visual objects (AVd) per a conservative MSI criterion (“Probability Summation”). This means that drift rates in the AVd condition did not only show the fastest accumulation of correct decisions, compared to both stimuli in the Ad and Vd condition, but also met a criterion which has traditionally been applied to accuracy data, but previously failed to consider the underlying RT distribution (Stevenson et al., 2014).
These results replicate and extend our previous findings from a nonspeeded response task (Regenbogen et al., 2016) to a speeded response task. HDDM‐derived drift rates, which account for potential speed‐accuracy trade‐offs and subject‐specific effects, can thus be generated and analyzed to test against conservative MSI criteria (e.g., Probability Summation) in a 5‐AFCTs, potentially n‐AFCTS, while this would have to be tested in future studies. Using speeded response tasks increases the ecological validity of MSI studies since sensory integration processes and object identity decisions are more likely to be based on a spontaneous and optimally fast process rather than on a consciously made decision within a certain response window.
Although our degraded stimuli met the requirements for effective integration of sensory information as defined by exceedance of the Probability Summation (Meredith & Stein, 1983; Ross et al., 2007), a direct comparison to clear stimuli was absent in this study. A comparison between processing of degraded and clear information on a behavioral as well as neural level was included in Experiment 2. Here, we presented degraded as well as clearly perceivable stimuli while participants underwent an fMRI scanning procedure. This experiment allowed the investigation of differences in the neural correlates of MSI of sensory information between clear and degraded stimuli. Further, a connectivity analysis was conducted to contrast the engagement of the IPS in MSI of clear and easy, compared to degraded and difficult, multisensory stimuli.
3. EXPERIMENT 2
3.1. Material and methods
3.1.1. Participants
Thirty‐five healthy volunteers were recruited from a student population. One participant was excluded due to excessive head movement and five were excluded due to performance below chance level, which reduced the final sample size to 29 individuals (M age = 28 years, SD = 6.02, 15 females). Participants provided written informed consent and the study conformed to the Declaration of Helsinki and was approved by the regional ethics review board in Stockholm, Sweden.
3.1.2. Stimuli
Stimuli were identical to those used in Experiment 1.
3.1.3. Threshold (region‐of‐interest) session
Before the main experiment, participants underwent a separate scanning session serving two purposes—first, individual threshold was assessed in the same environment as experienced during imaging and, second, we obtained regions of interests (ROIs) that were independent of the data acquired in the experiment session. As in Experiment 1, individual identification thresholds were assessed for each object and each participant using an adaptive staircase design and a 5AFCT. This was done within the MR scanner during two separate functional imaging sessions (auditory and visual threshold assessment, counter‐balanced order). We determined individual detection thresholds during scanning to account for scanner noise and the potential increase in stress that the scanner environment constitutes. Visual noise contribution to the stimulus ranged from 70% to a maximum of 98% in 15 steps of 2%. Auditory noise contribution to the stimulus ranged from a perceivable signal‐to‐noise‐ratio (SNR) of 15 to −20 dB in 15 steps of 2.5 db. After six reversals, the point‐of‐reversal depending on the participant's performance was determined. Participants' mean visual threshold level was 12.74 ± 0.54 steps, which corresponded to a masking degree of 93%. Participants' mean auditory threshold level was 8.87 ± 1.60 (values reported as mean values ± SD) steps, which corresponded to a SNR of −4.68 dB. Auditory and visual degraded stimuli were saved as new files for the experiment and we also saved auditory and visual degraded stimuli with 1, 2, or 3 steps more/less difficult than this average in order to adapt to the participants' performance in the actual experiment.
3.1.4. MSI of bimodal information session
We presented degraded and clear auditory, visual and audiovisual presentations to participants while lying in the scanner (Figure 1). Participants were instructed to focus on the object and respond first after the presentation, to avoid movement‐related artifacts in stimulus and identification‐related brain activation. Each trial started with a black fixation cross for an average of 5.8 s (jittered between 3 and 9 s), followed by a 2 s stimulus presentation and a 2.5 s 5AFC task during which participants had to indicate the object identity out of five possible choices (the true answer, the three remaining object labels, and “nothing”). Response data comprised object identification accuracy and reaction time (button press of the respective response key).
Each participant was presented with 32 stimuli in each of the six conditions: degraded auditory or visual stimuli (“auditory degraded” [Ad] and “visual degraded” [Vd]), and their combination “audiovisual degraded” [AVd]), clear auditory and visual stimuli (“auditory clear” [Ac] and “visual clear” [Vc]), and their combination “audiovisual clear” [AVc]), as well as 32 control trials (NN), resulting in 224 stimuli in total. As in Experiment 1, stimuli were all bimodal regarding sensory input, but unisensory/multisensory regarding informativeness.
To ensure that the degraded stimuli were consistently presented at optimal task difficulty throughout the study, an adaptive paradigm was used: If participants correctly identified the same object three times in a row, a one‐step more difficult stimulus was displayed. If they scored incorrectly once, a one‐step easier stimulus was displayed. AVd presentations were presented at the level of the most recently correctly identified unisensory stimuli (Ad and Vd).
3.1.5. Data analysis
Behavior
We analyzed single‐trial behavioral data (accuracy and RT) using a HDDM (Wiecki, et al., 2013) using the same model specifications as reported in Experiment 1. HDDM calculations were quality‐controlled by visual observation of the trace, autocorrelation, and the marginal posterior, as well as the Gelman–Rubin Geweke statistic (< 1.02) (Gelman & Rubin, 1992). All indices indicated successful convergence. HDDM‐derived group‐level posteriors were analyzed by means of a two‐factorial repeated‐measures ANOVA (within‐subject factors “clarity” and “condition”). Degrees of freedom were corrected using Greenhouse‐Geisser estimates if necessary per Mauchly's test. Posthoc two‐tailed paired t tests (p < .05) were Bonferroni‐corrected. Effect size estimates were given by partial eta‐square (ηp 2 for ANOVAs) and by the effect size Cohen's d adjusted2 (for Student's t‐tests) (Cumming, 2012).
Drift‐rate posteriors for clear and degraded audiovisual stimuli, respectively, were tested separately for evidence of MSI using the “Probability Summation Index” (Stevenson et al., 2014) using one‐sample Student's t‐tests (two‐tailed, p < .05).
fMRI
Functional images were acquired during threshold assessment and during the main experiment using a T2*‐weighted, gradient‐echo, echoplanar imaging (EPI) sequence with a BOLD contrast on a 3T GE (General Electric) 750 MR scanner using an eight‐channel head coil. Volumes had whole‐brain coverage (TE = 30 ms, TR = 2,238 ms, field of view = 220 × 220 mm, matrix size = 64 × 64, flip angle = 70°), consisted of 47 axial slices (3mm slice thickness, no gap) in interleaved ascending acquisition order.
Data were analyzed using SPM8. Images underwent a two‐pass realignment procedure, in which the time‐series was first realigned to the mean functional image and then to the first image. The mean functional image was nonlinearly segmented and delivered the priors for a unified segmentation process (Ashburner & Friston, 2005). All images were normalized and spatial smoothing was performed with a 9‐mm full‐width‐at‐half‐maximum Gaussian kernel. Functional images acquired during the threshold/ROI session, and acquired during the main experiment in which MSI of bimodal information was assessed, were analyzed separately. The analysis of the threshold/ROI session was used to generate ROIs for the subsequent connectivity analysis (dynamic causal modeling [DCM]). Analyses of the main experiment were primarily used to investigate and compare degraded versus clear information on a neural level.
3.1.6. Threshold/ROI session
Separate general linear models (GLMs) were set up for visual and auditory information, respectively. For each GLM, one regressor modeled the onset times for correct trials (and reaction times to correct trials as a parametric modulator), incorrect trials, and the period during which the 5AFCT was presented, as well as six realignment parameters were included as regressors of no interest. Regressors were convolved with the canonical hemodynamic response function (HRF). A high‐pass filter with a cutoff of 128 s and a first‐order autoregressive model (AR[1]) were applied. From each GLM, individual simple main contrasts of the stimulus HRF regressors were submitted to a mixed‐effects GLM, which modeled subjects as random effects and conditions as fixed effects. Deviations from sphericity were corrected for by setting up variance components assuming heteroscedasticity.
The contrast “unisensory auditory” represented the simple main effect of correctly identified truly unisensory auditory stimuli across clear and degraded conditions. The contrast “unisensory visual” represented the simple main effect of correctly identified truly unisensory visual stimuli across clear and degraded conditions. Statistical parametrical maps were thresholded at a peak level threshold of p < .05 and corrected for multiple comparisons using family‐wise error correction based on random field theory.
3.1.7. MSI of bimodal information session
A GLM was set up with the regressors corresponding to the onset times of correct trials of each experimental condition (AVd, Vd, Ad, AVc, Vc, Ac). Two additional regressors for onset times of incorrect trials and the period during which the 5AFCT was presented, and six realignment parameters were included as regressors of no interest. Regressors were convolved with the canonical HRF. A high‐pass filter with a cutoff of 128 s and a first‐order autoregressive model AR(1) were used. Individual simple main contrasts were submitted to a mixed‐effects GLM modeling subjects as random effects and conditions as fixed effects. Deviations from sphericity were corrected for by setting up variance components assuming heteroscedasticity.
Two contrasts were set up to address our principal research interest, the neural correlates of MSI of degraded bimodal information. The contrast “degraded > clear” compared all trials from degraded conditions (AVd, Ad, Vd) against all trials from clear conditions (AVc, Ac, Vc), the contrast “clear > degraded” was the reverse comparison (AVc, Ac, Vc > AVd, Ad, Vd). The resulting statistical parametrical maps were thresholded at p < .05 at the peak‐level and corrected for multiple comparisons using family‐wise error correction based on random field theory within the SPM program environment.
3.1.8. Dynamic causal modeling
Our next aim was to establish a directional model of cortical information exchange underlying the processing of degraded and clear multisensory information. To this end, condition‐dependent changes in effective connectivity among brain regions involved in processing MSI‐related information were estimated with DCM10. Relevant ROIs were selected based on activation peaks for correctly identified visual and auditory stimuli during the unisensory threshold/ROI session (VIS [MNI coordinate: x = 18 y = −93 z = −12], AUD [MNI coordinate: x = 69 y = −21 z = 3]). These ROIs (6mm sphere around the local maxima in the respective sensory cortices) were chosen to represent the main effect of individually optimized degraded information from true unisensory auditory and visual stimuli, respectively (Table 1).
Table 1.
Stereotaxic coordinates (MNI space) of local maxima of BOLD activation to “unisensory visual”, corresponding to the main effect of correctly responded video clips, and “unisensory auditory”, corresponding to the main effect of correctly responded audio clips (t‐contrasts from a random‐effects GLM, T > 4.74, peak‐level FWE corrected p < .05)
| Contrast | Brain region/Anatomy | Hem | Size | T | p FWE | x | y | z |
|---|---|---|---|---|---|---|---|---|
| Unisensory visual | Lingual/middle occipital gyrus | R/L | 10,971 | 14.84 | <.001 | −33 | −81 | 21 |
| Orbitofrontal gyrus | R/L | 193 | 9.72 | <.001 | 33 | 27 | −6 | |
| Thalamus | R/L | 188 | 6.18 | .004 | −9 | −21 | −9 | |
| Unisensory auditory | TE 3 (Auditory cortex) | R/L | 12,290 | 16.83 | <.001 | 69 | −21 | 3 |
| Inferior occipital gyrus (hOc1/2/3v/4v) | R/L | 338 | 9.53 | <.001 | −30 | −87 | −12 | |
| Lingual gyrus (hOc1/2/3v/4v) | R/L | 161 | 8.33 | <.001 | 18 | −96 | −9 | |
| Thalamus | R/L | 420 | 7.93 | <.001 | −12 | −15 | 3 | |
| Brainstem | R | 14 | 5.47 | .003 | 3 | −30 | −42 |
Hem, Hemisphere, with bold letters indicating the hemisphere of peak activation in bilateral activations (R/L). Anatomy, probabilistic cytoarchitectonic maps for structure‐function relationships in standard reference space were assigned using the Anatomy toolbox (Eickhoff et al., 2005) (for labeling please refer to Eickhoff et al., 2007)
To investigate how the IPS would communicate with visual and auditory cortices at different levels of clarity, activation in IPS [27 −60 48] was extracted based on the local maximum in the IPS in the contrast “degraded > clear” from the main experiment (MSI of bimodal information) for each subject (6‐mm sphere around the individual peak, adjusted by the effects of interest across all sessions).
To optimize the estimation of the coupling parameters and to follow the modeling constraints of DCM, data were organized into four condition types, performed within each session: one “driving input” (all trials, 2 s duration), two “stimulus/modulatory inputs” (AVd and AVc trials, 2 s duration), and one condition of no interest (5AFCT period, 5 s duration). Realignment parameters were included as nuisance regressors. Model estimation was identical to the GLM models described above. Six different single‐state models (Figure 3), including linear as well as nonlinear effects, were set up, representing different assumptions on the integration of the selected regions in an easy (clear multisensory) compared with a difficult (degraded multisensory) condition. An identical pattern of full intrinsic connectivity (A‐matrix, see Supplementary Material) was assumed for all six models. Forward and backward endogenous connections were set between AUD, VIS, and IPS. Driving inputs (all audiovisual events) were set to modulate neuronal activity in AUD and VIS.
Figure 3.

DCM model space. Depicted are driving inputs, linear direct and modulatory effects as well as nonlinear modulatory effects (B‐, C‐, and D‐matrix). For depiction of intrinsic connectivity matrix (A‐matrix) please refer to the Supporting Information
Linear modulatory effects (B‐matrix) and linear direct effects (C‐matrix) were specified as follows: inputs from degraded trials (AVd) were always included in the model structure, either as direct inputs onto IPS (in Models 1–3) or as modulatory inputs onto the reciprocal connection between AUD and VIS (in Models 4–6). Inputs from clear trials (AVc) were either included in the model structures, either as direct inputs onto IPS (in Models 2 and 6), as modulatory inputs onto the reciprocal connection between AUD and VIS (in Models 3 and 5), or left out (in Models 1 and 4). Nonlinear modulatory effects (D‐matrix) were modeled as a top‐down effect from IPS on the connection between AUD and VIS in four models (Models 1, 2, 3, and 6). These nonlinear modulatory effects allowed inference on how the connection between two neuronal units (i.e., AUD and VIS) is enabled by activity in another unit (i.e., IPS) (Stephan et al., 2008).
Based on the assumption that the physiological mechanisms underlying MSI are a basic governing principle of human brain function, fixed‐effects Bayesian model selection (BMS) was used to identify the most probable model given the data, among the models defined in the model space (Stephan, Penny, Daunizeau, Moran, & Friston et al., 2009). For the winning model, Bayesian Parameter Averaging (BPA) was calculated for each coupling parameter.
3.2. Results
3.2.1. Behavior
We first assessed whether there was a difference in behavioral measures between the various conditions. The two‐way ANOVA analysis showed a main effect for both Clarity (F[1,28] = 303.77, p < .001, ηp 2 = 0.92) and Condition (F[2,56] = 143.77, p < .001, ηp 2 = 0.84), as well as a significant interaction between the two factors (F[2,56] = 33.44, p < .001, ηp 2 = 0.54). Pairwise tests of the interaction indicated that in both clear and degraded stimuli, the audiovisual condition displayed significantly higher drift rates than the auditory condition (AVc > Ac: t[28] = 11.15, p < .001, d = 2.06; AVd > Ad: t[28] = 12.22, p < .001, d = 2.57), and significantly higher drift rates than the visual condition (AVc > Vc: t(28) = 3.62, p = .001, d = 0.45; AVd > Vd: t[28] = 11.41, p < .001, d = 2.78). Drift rates of the comparison between the auditory and the visual condition significantly differed for clear stimuli with higher drift rates in the visual condition (Vc > Ac: t[28] = 8.58, p < .001, d = 1.66) but did not survive Bonferroni‐correction for degraded stimuli (Vd > Ad: t[28] = 2.32, p = .3, d = 0.54) (Figure 4).
Figure 4.

Result of the ANOVA testing “Clarity” and “Condition” dependent drift rate differences between the stimuli [Color figure can be viewed at http://wileyonlinelibrary.com]
As in Experiment 1, we then assessed whether the differences are demonstrations of MSI. Two t tests on Probability Summation showed a significant effect for clear stimuli (t[28] = 10.12, p < .001, d = 1.87) as well as for degraded stimuli (t[28] = 11.09, p < .001, d = 2.05), indicating MSI effects in both clear and degraded stimuli.
3.2.2. Functional imaging
MSI based on clear and degraded percepts
Our first aim was to identify cortical processing sites for the individually optimized degraded multisensory information. The contrast “degraded > clear” yielded significant activation in a bilateral frontoparietal network including superior and inferior frontal gyri, precentral gyrus, IPS, and neighboring parietal gyri, as well as cingulate cortex, hippocampus, and cerebellum (Table 2, Figure 5, top). The opposite contrast, clear > degraded, yielded significant activation in widespread visual cortex (middle occipital gyrus, lingual, and fusiform gyrus), in the angular gyrus, superior frontal, and inferior temporal gyrus (Table 2, Figure 5, bottom)
Table 2.
Stereotaxic coordinates (MNI space) of local maxima of BOLD activation to “degraded > clear” and “clear > degraded” (t‐contrasts from a random‐effects GLM, T > 4.60, peak‐level FWE corrected p < .05)
| Contrast | Brain region/Anatomy | Hem | Size | T | p FWE | x | y | z |
|---|---|---|---|---|---|---|---|---|
| degraded > clear | Superior frontal gyrus | R | 94 | 6.47 | <.001 | 27 | −3 | 54 |
| IFG, Premotor cortex, Thalamus, BG | R | 3,738 | 14.10 | <.001 | 33 | 24 | −9 | |
| IPS | R | 648 | 9.30 | <.001 | 27 | −63 | 48 | |
| IPS | L | 631 | 8.85 | <.001 | −33 | −51 | 45 | |
| MCC/PCC | – | 158 | 7.81 | <.001 | 0 | −24 | 27 | |
| dmPFC, MCC | – | 898 | 12.54 | <.001 | 0 | 21 | 45 | |
| Cerebellum (Lob VIIa and b) | L | 693 | 11.15 | <.001 | −9 | −78 | −39 | |
| Cerebellum (Lob VIIa and b) | R | 146 | 7.76 | <.001 | 36 | −57 | −36 | |
| Cerebellum (Lob VIIa, VI) | L | 76 | 5.86 | <.001 | −33 | −63 | −33 | |
| Cerebellum (Lob VIIIa and b, X, IXVI) | R | 24 | 5.31 | .003 | 18 | −45 | −48 | |
| Brainstem (Pons) | R | 26 | 5.92 | <.001 | 3 | −33 | −45 | |
| clear > degraded | MFG, Fp0,1,2, Superior medial gyrus | R/L | 891 | 8.21 | <.001 | −30 | 27 | 39 |
| MFG | R | 16 | 5.05 | .008 | 33 | 21 | 48 | |
| Angular gyrus | L | 332 | 8.87 | <.001 | 45 | −63 | 24 | |
| Inferior temporal gyrus | L | 109 | 6.92 | <.001 | −66 | −18 | −27 | |
| Inferior temporal gyrus | R | 92 | 7.38 | <.001 | 60 | −6 | −30 | |
| PCC | L | 176 | 6.86 | <.001 | −3 | −51 | 33 | |
| Lingual/fusiform gyrus/MOG/SUP | L | 1,861 | 12.31 | <.001 | −21 | −87 | −15 | |
| Lingual/fusiform gyrus/MOG/SUP | R | 1,005 | 12.10 | <.001 | 21 | −87 | −12 |
Anatomy, probabilistic cytoarchitectonic maps for structure‐function relationships in standard reference space were assigned using the Anatomy toolbox (Eickhoff, et al., 2005) (for labeling please refer to Eickhoff et al., 2007). Hem, Hemisphere; with bold letters indicating the hemisphere of peak activation in bilateral activations (R/L); IFG, inferior frontal gyrus; BG, basal ganglia; IPS, intraparietal sulcus; MCC, Middle cingulate cortex; PCC, Posterior cingulate cortex; dmPFC, dorsomedial prefrontal cortex; MFG, Middle frontal gyrus; MOG, middle occipital gyrus; SUP, superior occipital gyrus
Figure 5.

Whole‐brain activation to “degraded > clear” (top) and to “clear > degraded” (bottom), t‐contrasts from a random‐effects GLM, displayed in neurological convention at T > 4.60, peak‐level FWE corrected p < .05, extent threshold > 10 voxels, color bar depicting t‐values of local maxima peak activation. IFG, inferior frontal gyrus; BG, basal ganglia; dmPFC, dorsomedial prefrontal cortex; MCC/PCC, middle/posterior cingulate cortex; SFG, superior frontal gyrus; IPS, intraparietal sulcus; AG, angular gyrus, MFG, middle frontal gyrus; MOG, middle occipital gyrus; ITG, inferior temporal gyrus; FFG, fusiform gyrus. For a complete list of all activations please refer to Table 2 [Color figure can be viewed at http://wileyonlinelibrary.com]
Dynamic causal modeling
The DCM analysis, establishing a directional model of cortical information exchange underlying the perception of easily perceivable (clear) and difficult to perceive (degraded) multisensory information, revealed the following results: Fixed‐effects BMS indicated Model 3 to be the most likely model, by both the relative log‐evidence, and the posterior probability (Figure 6). Model 3 was composed of a linear input from degraded trials (AVd) directly onto IPS, and two types of modulation onto the reciprocal connection between AUD and VIS; a nonlinear modulation from IPS, and a linear modulation from clear trials (AVc).
Figure 6.

(a) Display of coupling regions selected for DCM in IPS, auditory cortex (AUD), and visual cortex (VIS). (b) winning model graph from DCM BMS, (c) Winning Model 3 with mean coupling parameters based on Bayesian Parameter Averaging [Color figure can be viewed at http://wileyonlinelibrary.com]
In other words, in this model, degraded information, which is more difficult, modulates the connection between the primary unisensory processing sites by means of a nonlinear top‐down modulation from IPS onto the connection between the two sensory modalities, while clear, and easily perceivable, information modulates the connection between AUD and VIS linearly and directly, independent of IPS.
3.3. Discussion experiment 2
We applied a two‐step procedure to investigate MSI on a combined measure of accuracy and RT for individually optimized degraded perithreshold, relative to clear, dynamic stimuli conveying multisensory information. Replicating the results of Experiment 1, group‐derived posterior drift rate measures demonstrated a significant multisensory enhancement for naturalistic and dynamic audio‐visual objects for both degraded as well as clear stimuli. “Probability Summation” also supported the presence of MSI of the information for both degraded as well as for clear stimuli. As can be seen in Figure 4, the drift rate difference between multisensory and auditory conditions was comparable for clear and degraded stimuli (AVd − Ad = 1.95, AVc − Ac = 1.49). In contrast, there was a significant effect of degradedness on the drift rate difference between multisensory and visual conditions; while there was a big gap in drift rate between degraded multisensory and visual stimuli (AVd − Vd = 1.68), the difference between clear multisensory and visual stimuli (AVc − Vc = 0.31) was smaller, yet still significant. That drift rates of visual clear stimuli were almost as large as those of multisensory clear stimuli (Figure 4) supports the theorized visual dominance among the senses (Posner, Nissen, & Klein, 1976) and the relatively small benefit that participants may have had by clear multisensory compared to clear unisensory visual information. In contrast, in degraded stimuli, visual dominance could not be detected, and drift rates of auditory and visual unisensory stimuli were not significantly different from each other, leading us to assume that the degrading of stimuli negates (or even reverses, as in Alais & Burr, 2004) visual dominance. This indicates not only that participants' multisensory benefit is larger for degraded stimuli than for clear ones, and supports the notion of applying perceptually degraded and thus, more difficult stimuli, to follow the implications of the principle of inverse effectiveness in MSI studies (Ozker, Schepers, Magnotti, Yoshor, & Beauchamp, 2017; Regenbogen et al., 2016; Ross et al., 2007; Werner & Noppeney, 2010a, 2010b). It also suggests that the extent to which MSI arises may differ depending on the respective sensory modalities involved. This is especially important for fMRI studies which all—except when the use of sparse sampling technique is applied—suffer from the degradation of even the clear auditory stimuli, while all visual stimuli remain unaffected by the scanner environment.
The processing advantage for clear visual stimuli was also reflected on a neural level. The contrast of clear compared to degraded dynamic natural video clips (AcVc > AdVd) showed higher activation in bilateral lingual and fusiform gyrus, middle occipital gyri, as well as inferior temporal and middle/superior frontal gyrus, which correspond to the ventrolateral visual stream (Smith et al., 2009). The angular gyri are involved in visuospatial attention (Cattaneo Cattaneo, Silvanto, Pascual‐Leone, & Battelli, 2009) but also serve as a cross‐modal hub linking different subsystems, structurally (Hagmann et al., 2008), as well as functionally (Rademacher, Galaburda, Kennedy, Filipek, & Caviness, 1992). Here, bilateral angular gyrus activation together with a robust visual network and inferior temporal gyrus involvement may represent the neural correlate of clearly perceivable dynamic multisensory information, including a preparation for the decision that needs to be made on the object's identity. In other words, the observed response may facilitate the process of giving sense and meaning to an event within a contextualized environment, based on prior expectations and knowledge, and toward an intended action (Seghier, 2013).
The reverse contrast of degraded dynamic greater than clear natural video clips (AdVd > AcVc) yielded bilateral activation in a more medially located frontoparietal network of core areas important for converging different sensory information (Bremmer et al., 2001), including IPS and neighboring parietal cortex, precentral cortex, inferior and middle frontal gyrus, cingulate cortex, and left hippocampus. Both, clear and degraded dynamic stimuli therefore activate multisensory processing networks, with a distinct ventrolateral visually dominated network for clear multisensory information, and a more medially located frontoparietal network for degraded multisensory stimuli, thus possibly representing different aspect of multisensory processing.
Given the frequently demonstrated crucial role of the IPS for sensory integration (Bremmer et al., 2001; Calvert et al., 2000; Driver & Noesselt, 2008; Grefkes & Fink, 2005), we specifically tested its impact on information exchange between earlier sensory regions (i.e., auditory and visual cortex) and how it differed between clear and degraded stimulus presentations. To achieve this, we established a model space of six possible effective connectivity patterns, and used a Bayesian modeling approach (DCM) to establish which model most appropriately represented the data. The winning model (Model 3) assumed direct linear input to the IPS for degraded multisensory information, and a nonlinear top‐down modulation from IPS onto the connections between auditory and visual regions, which may represent a mechanism for visual enhancement of sound processing, as speculated by a study in which somatosensory‐auditory interactions at low stimulus intensities (“high‐excitability“) were found (Lakatos, Chen, O'connell, Mills, & Schroeder, 2007). Clear multisensory trials, however, directly and linearly fed onto the connection between these two sensory areas without any evidence for IPS modulation. In other words, clear information directly enhanced bidirectional information exchange between visual and auditory cortices (with a bias towards information flow from auditory to visual areas, in line with previous evidence of robust auditory influences on visual cortices, e.g., Mercier et al., 2013) whereas degraded information was fed through the IPS, which modulated the auditory‐visual connection. While clear multisensory stimuli are thus processed through direct information exchange between visual and auditory sensory cortices, possibly by way of oscillatory synchronization through phase alignment (Mercier et al., 2015), degraded information requires the IPS to act as a relay to regulate this inter‐regional communication. This interpretation is in line with previous work highlighting the IPS' increasing involvement with task difficulty (Basten et al., 2010; Hare et al., 2011) and extends this to explain the IPS' role in MSI. This explanation is also in line with several event‐related potential studies suggesting reallocation of attentional resources in case of crossmodal cognitive overflow (Haroush et al., 2011; Lavie, 2005; Regenbogen et al., 2012). Given that degraded stimuli require more cognitive resources than clear stimuli to be identified, it seems plausible that additional attentional resources in the form of a top‐down dorsal attention network are allocated to the task of integrating visual and auditory stimuli. IPS is usually implicated as a central mediator of such changes in attentional allocation (Tang et al., 2016). This activation may be adaptive to the successful completion of MSI under noisy stimulus conditions by providing a mechanism that protects the task at hand from interference by irrelevant distraction. Such an account would be in line with attentional resource load theory (Lavie, 2005) which posits that high perceptual load (as present in our case in the degraded stimuli) through recruitment of a higher‐order attentional network protects from distraction. Although our study did not explicitly test whether IPS recruitment was linked to more interference‐resistant processing, and our interpretations therefore remain speculative, future studies should investigate this as a possible functional interpretation of the activity changes observed.
One critical aspect to discuss is the absence of activation in superior temporal regions, that is, STS, and gyrus, both structures repeatedly reported in audiovisual integration studies (ROI analysis: Stevenson & James, 2009; whole‐brain analysis Werner & Noppeney, 2010b). However, they often correspond to “superadditivity” (Stevenson & James, 2009), which requires a direct comparison between multisensory stimuli to the sum of their unisensory components, not a direct comparison between two different types of multisensory stimuli, as done here. As both our unisensory stimuli already consisted of input in the respective other modality (auditory pink noise, visual salt, and pepper noise, respectively) we do not report this type of comparison here. Indeed, Werner and Noppeney (2010b) who reported this “superadditive” activation pattern, subsequently failed to detect suprathreshold activation in the STS on a whole brain or ROI level when degraded or clear stimuli were accompanied by high‐level noise in the respective other modality (Werner & Noppeney, 2010b).
4. GENERAL DISCUSSION AND CONCLUSION
In two independent experiments, we could demonstrate that MSI of bimodal object‐related information in higher‐order object identification is present for dynamic and complex sensory stimuli with high ecological relevance. Half of these stimuli were tailored to perithreshold identification of the individual participant (degraded) and all stimuli (degraded and clear) allowed us to control for low‐level integration and attention effects by adding noise to the respective other channel in unisensory stimuli. Across both experiments, we used a Bayesian statistical approach on traditional behavioral indices of MSI to demonstrate that individually optimized degraded and dynamic stimuli show significant benefit of sensory integration. The estimated posterior probability of drift rates for multisensory dynamic stimuli significantly exceeded the drift rates for unisensory ones, demonstrating information integration across sensory modalities, controlling for sensory input. This was indicated by the statistical significant “Probability Summation”, evident for both clear and degraded stimuli.
Experiment 2 demonstrated that MSI of clear and effortlessly perceivable stimuli was characterized by strong visual dominance, both on a behavioral and neural level. A brain network of ventrolateral cortical regions (visual cortex, fusiform gyrus, angular gyrus, superior frontal, and inferior temporal gyrus) was responsible for coding clear compared to degraded dynamic auditory, visual and audiovisual percepts. We interpret this pattern as evidence for a visually dominated perceptual system, which extracts information mainly from the visual modality when this activation is clear enough to be effortlessly perceived. As further evident in the behavioral data, where drift rates for visual clear stimuli were almost as large as those observed for multisensory clear stimuli, visual signal alone provided enough information for the participant to identify the object. Correspondingly, the effective connectivity analysis showed that visual and auditory cortices directly exchange information (and speculatively, subsequently feed‐forward to higher‐order areas) without involving afferent information from the IPS as a central crossmodal integration hub.
In contrast to the patterns observed in clear stimuli, the strong drift‐rate benefit observed for degraded multisensory compared to both visual and auditory unisensory stimuli on a behavioral level, suggests superadditive sensory integration as the driving force for this activation pattern. This stronger reliance on sensory integration under conditions of perceptual uncertainty is expected based on the principle of inverse effectiveness, which states that MSI becomes more relevant as less information can be extracted based on a single modality alone. Correspondingly, we were able to show that degraded, and thus difficult, stimuli activated an extended frontoparietal network relative to clear stimuli. This network not only included “classical” integration areas, such as the IPS, premotor and inferior frontal cortex, but also additional sites of supramodal processing, such as the hippocampus. The presentation of degraded multisensory stimuli was further associated with a stronger involvement of bilateral IPS in the connectivity analysis, which indicated a crucial role for this region in the mediation of information exchange between visual and auditory cortices to reach a decision on the object's identity, possibly via a top‐down control mechanism. This was in contrast with the pattern under clear perceptual conditions, where no such involvement was observed.
In summary, these findings suggest an enhanced recruitment of supramodal higher‐order cortical structures to regulate exchange between early perceptual areas under conditions of higher task difficulty. Whether the same principle also applies to other “core” areas of MSI (i.e., STG, premotor cortex) should be tested in future studies. Our two experiments further provide evidence that degraded and thus difficult sensory stimuli can provide a tool to induce overt behavioral responses conforming to the multisensory principle of inverse effectiveness. Our results demonstrate that such tasks do not only show stronger evidence for MSI, but also recruit a fundamentally different network to accomplish sensory integration, which involves the IPS as a necessary relay for bimodal sensory information exchange.
Supporting information
Additional Supporting Information may be found online in the supporting information tab for this article.
Supporting Information
ACKNOWLEDGMENTS
This research was supported by the Knut and Alice Wallenberg Foundation (KAW 2012.0141, J.L.), the Swedish Research Council (2014‐1346, J.N.L. and 2014‐1384, J.S.), as well as a DAAD postdoctoral fellowship and a grant by the Else Kröner‐Fresenius Stiftung (2014_A273, CR). We thank Jonathan Berrebi and Rouslan Sitnikov for their help in optimizing the scan protocol and stimulus presentation and Thilo Keller man for assistance with DCM analyses.
Regenbogen C, Seubert J, Johansson E, Finkelmeyer A, Andersson P, Lundström JN. The intraparietal sulcus governs multisensory integration of audiovisual information based on task difficulty. Hum Brain Mapp. 2018;39:1313–1326. 10.1002/hbm.23918
Funding information Knut and Alice Wallenberg Foundation, Grant/Award Number: KAW 2012.0141; the Swedish Research Council, Grant/Award Number: 2014‐1346 and 2014‐1384; Else Kröner‐Fresenius Stiftung, Grant/Award Number: 2014_A273
Footnotes
For dependent two‐samples t tests: , and for one‐sample t tests: .
For dependent two‐samples t tests: , and for one‐sample t tests: .
Contributor Information
Christina Regenbogen, Email: cregenbogen@ukaachen.de.
Johan N. Lundström, Email: johan.lundstrom@ki.se.
REFERENCES
- Alais, D. , & Burr, D. (2004). The ventriloquist effect results from near‐optimal bimodal integration. Current Biology, 14, 257–262. [DOI] [PubMed] [Google Scholar]
- Ashburner, J. , & Friston, K. J. (2005). Unified segmentation. NeuroImage, 26, 839–851. [DOI] [PubMed] [Google Scholar]
- Basten, U. , Biele, G. , Heekeren, H. R. , & Fiebach, C. J. (2010). How the brain integrates costs and benefits during decision making. Proceedings of the National Academy of Sciences of the United States of America, 107, 21767–21772. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bremmer, F. , Schlack, A. , Shah, N. J. , Zafiris, O. , Kubischik, M. , Hoffmann, K.‐P. , … Fink, G. R. (2001). Polymodal motion processing in posterior parietal and premotor cortex: a human fmri study strongly implies equivalencies between humans and monkeys. Neuron, 29, 287–296. [DOI] [PubMed] [Google Scholar]
- Calvert, G. A. , Campbell, R. , & Brammer, M. J. (2000). Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Current Biology, 10, 649–657. [DOI] [PubMed] [Google Scholar]
- Cattaneo, Z. , Silvanto, J. , Pascual‐Leone, A. , & Battelli, L. (2009). The role of the angular gyrus in the modulation of visuospatial attention by the mental number line. Neuroimage, 44, 563–568. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chandrasekaran, B. , Chan, A. H. , & Wong, P. C. (2011). Neural processing of what and who information in speech. Journal of cognitive neuroscience, 23, 2690–2700. [DOI] [PubMed] [Google Scholar]
- Chen, Y. C. , & Spence, C. (2010). When hearing the bark helps to identify the dog: Semantically‐congruent sounds modulate the identification of masked pictures. Cognition, 114(3), 389–404. [DOI] [PubMed] [Google Scholar]
- Cumming . (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta‐analysis. Abingdon, UK: Routledge. [Google Scholar]
- Driver, J. , & Noesselt, T. (2008). Multisensory interplay reveals crossmodal influences on ‘sensory‐specific’ brain regions, neural responses, and judgments. Neuron, 57, 11–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eickhoff, S. B. , Paus, T. , Caspers, S. , Grosbras, M. H. , Evans, A. C. , Zilles, K. , & Amunts, K. (2007). Assignment of functional activations to probabilistic cytoarchitectonic areas revisited. Neuroimage, 36, 511–521. [DOI] [PubMed] [Google Scholar]
- Eickhoff, S. B. , Stephan, K. E. , Mohlberg, H. , Grefkes, C. , Fink, G. R. , Amunts, K. , & Zilles, K. (2005). A new SPM toolbox for combining probabilistic cytoarchitectonic maps and functional imaging data. Neuroimage, 25, 1325–1335. [DOI] [PubMed] [Google Scholar]
- Gamerman, D. , & Lopes, H. F. (2006). Markov chain Monte Carlo: stochastic simulation for Bayesian inference. Boca Raton, FL, US: CRC Press. [Google Scholar]
- Gelman, A. , & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Sciences, 457–472.
- Gleiss, S. , & Kayser, C. (2014a). Oscillatory mechanisms underlying the enhancement of visual motion perception by multisensory congruency. Neuropsychologia, 53, 84–93. [DOI] [PubMed] [Google Scholar]
- Gleiss, S. , & Kayser, C. (2014b). Acoustic noise improves visual perception and modulates occipital oscillatory states. Journal of Cognitive Neuroscience, 26, 699–711. [DOI] [PubMed] [Google Scholar]
- Grefkes, C. , & Fink, G. R. (2005). The functional organization of the intraparietal sulcus in humans and monkeys. Journal of Anatomy, 207, 3–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hagmann, P. , Cammoun, L. , Gigandet, X. , Meuli, R. , Honey, C. J. , Wedeen, V. J. , & Sporns, O. (2008). Mapping the structural core of human cerebral cortex. PLoS Biology, 6, e159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hare, T. A. , Schultz, W. , Camerer, C. F. , O'doherty, J. P. , & Rangel, A. (2011). Transformation of stimulus value signals into motor commands during simple choice. Proceedings of the National Academy of Sciences of the United States of America, 108, 18120–18125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haroush, K. , Deouell, L. Y. , & Hochstein, S. (2011). Hearing while blinking: multisensory attentional blink revisited. Journal of Neuroscience, 31, 922–927. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hein, G. , Doehrmann, O. , Muller, N. G. , Kaiser, J. , Muckli, L. , & Naumer, M. J. (2007). Object familiarity and semantic congruency modulate responses in cortical audiovisual integration areas. Journal of Neuroscience, 27(30), 7881–7887. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kayser, C. , Petkov, C. I. , Augath, M. , & Logothetis, N. K. (2005). Integration of touch and sound in auditory cortex. Neuron, 48(2), 373–384. [DOI] [PubMed] [Google Scholar]
- Kayser, S. J. , Philiastides, M. G. , & Kayser, C. (2017). Sounds facilitate visual motion discrimination via the enhancement of late occipital visual representations. NeuroImage, 148, 31–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kording, K. P. , Beierholm, U. , Ma, W. J. , Quartz, S. , Tenenbaum, J. B. , & Shams, L. (2007). Causal inference in multisensory perception. PloS One, 2, e943. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lakatos, P. , Chen, C.‐M. , O'connell, M. N. , Mills, A. , & Schroeder, C. E. (2007). Neuronal oscillations and multisensory interaction in primary auditory cortex. Neuron, 53, 279–292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lavie, N. (2005). Distracted and confused? Selective attention under load. Trends in Cognitive Sciences, 9, 75–82. [DOI] [PubMed] [Google Scholar]
- Mercier, M. R. , Foxe, J. J. , Fiebelkorn, I. C. , Butler, J. S. , Schwartz, T. H. , & Molholm, S. (2013). Auditory‐driven phase reset in visual cortex: Human electrocorticography reveals mechanisms of early multisensory integration. Neuroimage, 79, 19–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mercier, M. R. , Molholm, S. , Fiebelkorn, I. C. , Butler, J. S. , Schwartz, T. H. , & Foxe, J. J. (2015). Neuro‐oscillatory phase alignment drives speeded multisensory response times: An electro‐corticographic investigation. The Journal of Neuroscience, 35, 8546–8557. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meredith, M. A. , & Stein, B. E. (1983). Interactions among converging sensory inputs in the superior colliculus. Science, 221, 389–391. [DOI] [PubMed] [Google Scholar]
- Meredith, M. A. , & Stein, B. E. (1986). Visual, auditory, and somatosensory convergence on cells in superior colliculus results in multisensory integration. J. Neurophysiol, 56, 640–662. [DOI] [PubMed] [Google Scholar]
- Navarro, D. J. , & Fuss, I. G. (2009). Fast and accurate calculations for first‐passage times in Wiener diffusion models. Journal of Mathematical Psychology, 53, 222–230. [Google Scholar]
- Nilsson, H. , Rieskamp, J. , & Wagenmakers, E.‐J. (2011). Hierarchical Bayesian parameter estimation for cumulative prospect theory. Journal of Mathematical Psychology, 55, 84–93. [Google Scholar]
- Ohla, K. , Hochenberger, R. , Freiherr, J. , & Lundstrom, J. N. (in press). Super‐ and subadditive neural processing of dynamic auditory‐visual objects in the presence of congruent odors. Chemical Senses. 10.1093/chemse/bjx068 [DOI] [PubMed] [Google Scholar]
- Ozker, M. , Schepers, I. M. , Magnotti, J. F. , Yoshor, D. , & Beauchamp, M. S. (2017). A double dissociation between anterior and posterior superior temporal gyrus for processing audiovisual speech demonstrated by electrocorticography. Journal of Cognitive Neuroscience, 29, 1044–1060. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Perez, F. , & Granger, B. E. (2007). IPython: A system for interactive scientific computing. Computing in Science & Engineering, 9, 21–29. [Google Scholar]
- Posner, M. I. , Nissen, M. J. , & Klein, R. M. (1976). Visual dominance: an information‐processing account of its origins and significance. Psychological Review, 83, 157–171. [PubMed] [Google Scholar]
- Rademacher, J. , Galaburda, A. M. , Kennedy, D. N. , Filipek, P. A. , & Caviness, V. S. Jr. (1992). Human cerebral cortex: localization, parcellation, and morphometry with magnetic resonance imaging. Journal of Cognitive Neuroscience, 4, 352–374. [DOI] [PubMed] [Google Scholar]
- Regenbogen, C. , Johansson, E. , Andersson, P. , Olsson, M. J. , & Lundstrom, J. N. (2016). Bayesian‐based integration of multisensory naturalistic perithreshold stimuli. Neuropsychologia, 88, 123–130. [DOI] [PubMed] [Google Scholar]
- Rohe, T. , & Noppeney, U. (2012). Intraparietal sulcus represents audiovisual space. In Bernstein Conference 2012. [Google Scholar]
- Rohe, T. , & Noppeney, U. (2015). Cortical hierarchies perform Bayesian causal inference in multisensory perception. PLoS Biology, 13, e1002073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ross, L. A. , Saint‐Amour, D. , Leavitt, V. M. , Javitt, D. C. , & Foxe, J. J. (2007). Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments. Cerebral Cortex, 17, 1147–1153. [DOI] [PubMed] [Google Scholar]
- Schepers, I. M. , Schneider, T. R. , Hipp, J. F. , Engel, A. K. , & Senkowski, D. (2013). Noise alters beta‐band activity in superior temporal cortex during audiovisual speech processing. Neuroimage, 70, 101–112. [DOI] [PubMed] [Google Scholar]
- Seghier, M. L. (2013). The angular gyrus: Multiple functions and multiple subdivisions. The Neuroscientist: A Review Journal Bringing Neurobiology, Neurology and Psychiatry, 19, 43–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Senkowski, D. , Saint‐Amour, D. , Hofle, M. , & Foxe, J. J. (2011). Multisensory interactions in early evoked brain activity follow the principle of inverse effectiveness. Neuroimage, 56(4), 2200–2208. [DOI] [PubMed] [Google Scholar]
- Seilheimer, R. L. , Rosenberg, A. , & Angelaki, D. E. (2014). Models and processes of multisensory cue combination. Current Opinion in Neurobiology, 25, 38–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shiffrin, R. M. , Lee, M. D. , Kim, W. , & Wagenmakers, E. J. (2008). A survey of model evaluation approaches with a tutorial on hierarchical Bayesian methods. Cognitive Science: A Multidisciplinary Journal, 32, 1248–1284. [DOI] [PubMed] [Google Scholar]
- Slutsky, D. A. , & Recanzone, G. H. (2001). Temporal and spatial dependency of the ventriloquism effect. Neuroreport, 12(1), 7–10. [DOI] [PubMed] [Google Scholar]
- Smith, S. M. , Fox, P. T. , Miller, K. L. , Glahn, D. C. , Fox, P. M. , Mackay, C. E. , … Beckmann, C. F. (2009). Correspondence of the brain's functional architecture during activation and rest. Proceedings of the National Academy of Sciences of the United States of America, 106, 13040–13045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stephan, K. E. , Kasper, L. , Harrison, L. M. , Daunizeau, J. , den Ouden, H. E. M. , Breakspear, M. , & Friston, K. J. (2008). Nonlinear dynamic causal models for fMRI. NeuroImage, 42, 649–662. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stephan, K. E. , Penny, W. D. , Daunizeau, J. , Moran, R. J. , & Friston, K. J. (2009). Bayesian model selection for group studies. NeuroImage, 46, 1004–1017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stevenson, R. A. , Geoghegan, M. L. , & James, T. W. (2007). Superadditive BOLD activation in superior temporal sulcus with threshold non‐speech objects. Experimental Brain Research, 179, 85–95. [DOI] [PubMed] [Google Scholar]
- Stevenson, R. A. , Ghose, D. , Fister, J. K. , Sarko, D. K. , Altieri, N. A. , Nidiffer, A. R. , … Wallace, M. T. (2014). Identifying and quantifying multisensory integration: A tutorial review. Brain Topography, 27, 707–730. [DOI] [PubMed] [Google Scholar]
- Stevenson, R. A. , & James, T. W. (2009). Audiovisual integration in human superior temporal sulcus: Inverse effectiveness and the neural processing of speech and object recognition. Neuroimage, 44, 1210–1223. [DOI] [PubMed] [Google Scholar]
- Tang, X. , Wu, J. , & Shen, Y. (2016). The interactions of multisensory integration with endogenous and exogenous attention. Neuroscience & Biobehavioral Reviews, 61, 208–224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Werner, S. , & Noppeney, U. (2010a). Distinct functional contributions of primary sensory and association areas to audiovisual integration in object categorization. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 30, 2662–2675. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Werner, S. , & Noppeney, U. (2010b). Superadditive responses in superior temporal sulcus predict audiovisual benefits in object categorization. Cerebral Cortex, 20, 1829–1842. [DOI] [PubMed] [Google Scholar]
- Wiecki, T. V. , Sofer, I. , & Frank, M. J. (2013). HDDM: Hierarchical Bayesian estimation of the Drift‐Diffusion Model in Python. Frontiers in Neuroinformatics, 7, 14. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Additional Supporting Information may be found online in the supporting information tab for this article.
Supporting Information
