Skip to main content
Nature Communications logoLink to Nature Communications
. 2022 Jul 7;13:3924. doi: 10.1038/s41467-022-31549-0

Audiovisual adaptation is expressed in spatial and decisional codes

Máté Aller 1,2,✉,#, Agoston Mihalik 1,3,#, Uta Noppeney 1,4
PMCID: PMC9262908  PMID: 35798733

Abstract

The brain adapts dynamically to the changing sensory statistics of its environment. Recent research has started to delineate the neural circuitries and representations that support this cross-sensory plasticity. Combining psychophysics and model-based representational fMRI and EEG we characterized how the adult human brain adapts to misaligned audiovisual signals. We show that audiovisual adaptation is associated with changes in regional BOLD-responses and fine-scale activity patterns in a widespread network from Heschl’s gyrus to dorsolateral prefrontal cortices. Audiovisual recalibration relies on distinct spatial and decisional codes that are expressed with opposite gradients and time courses across the auditory processing hierarchy. Early activity patterns in auditory cortices encode sounds in a continuous space that flexibly adapts to misaligned visual inputs. Later activity patterns in frontoparietal cortices code decisional uncertainty consistent with these spatial transformations. Our findings suggest that regions within the auditory processing hierarchy multiplex spatial and decisional codes to adapt flexibly to the changing sensory statistics in the environment.

Subject terms: Perception, Neural decoding, Sensory processing


The brain adapts dynamically to the statistics of its environment. Here, the authors use psychophysics and model-based representational fMRI and EEG to show that audiovisual recalibration relies on distinct spatial and decisional codes that are expressed with opposite gradients and time courses across the auditory processing hierarchy.

Introduction

Throughout life the brain needs to adapt dynamically to changes in the environment and the sensorium. Changes in the sensory statistics evolve across multiple timescales ranging from milliseconds to years and bring cues from different sensory modalities into conflict. Most notably, physical growth, ageing or entering a room with reverberant acoustics can profoundly alter the sensory cues that guide the brain’s construction of spatial representations. To maintain auditory and visual-spatial maps in co-registration the brain needs to constantly recalibrate the senses1.

Recalibrating auditory and visual-spatial maps is particularly challenging because the two sensory systems encode space not only in different reference frames (i.e. eye vs. head-centred) but also in different representational formats24. In vision, spatial location is encoded directly in the sensory epithelium and retinotopic maps of visual cortices (‘place code’)5. In audition, azimuth location is computed mainly from interaural time and level differences in the brain stem2. In primate auditory cortices, sound location is thought to be encoded by activity differences between two neuronal populations, broadly tuned to ipsi- or contra-lateral hemifields (i.e. ‘hemifield code’)69. Less is known about the coding principles at successive processing stages in parietal or prefrontal cortices, in which the hemifield code may be converted into a place code with narrow spatial tuning functions comparable to vision (e.g. ventral intraparietal area10).

Mounting behavioural research illustrates the brain’s extraordinary ability for cross-sensory plasticity. Most prominently, exposure to synchronous, yet spatially misaligned audiovisual signals biases the observer’s perceived sound location towards a previously presented visual stimulus—a phenomenon coined ventriloquist aftereffect3,1119. This cross-sensory adaptation to intersensory disparities emerges at multiple timescales ranging from milliseconds13,18,20 to minutes3,16,17,19 or even days21.

Yet, the underlying neural circuitries, mechanisms and representations that support cross-sensory plasticity remain unknown. The frequency selectivity of audiovisual recalibration has initially pointed towards early stages in tonotopically organized auditory cortices (see ref. 16 but ref. 14). By contrast, the involvement of hybrid spatial reference frames that are neither eye- nor head-centred suggested a pivotal role for parietal cortices or inferior colliculus3,22.

Importantly, recalibration may arise at multiple levels affecting spatial and choice-related computations. While the former alters the neural encoding of sound location irrespective of the task, the latter affects the read-out of decisional choices from those neural representations. The dissociation of the two is challenging for behavioural research that is forced to estimate recalibration from behavioural responses. Likewise, previous neuroimaging studies were not able to disentangle the two, because they employed a spatial localization task that maps each sound location onto one particular response choice, thereby conflating spatial and choice-related processes13,19,20. To dissociate changes in spatial and decisional representations, we need neuroimaging experiments that map different sound locations onto identical decisional choices17 (i.e. classification tasks).

In this work, we show that changes in audiovisual statistics shape neural representations in a spatial classification task on auditory stimuli. Combining model-based analyses of regional BOLD-responses and fine-scaled EEG and fMRI activity patterns we reveal that audiovisual recalibration relies on distinct spatial and decisional codes that are expressed with opposite gradients and time courses across the auditory processing hierarchy. Our findings suggest that regions within the auditory processing hierarchy multiplex spatial and decisional codes to adapt flexibly to the changing sensory statistics in the environment.

Results

For each participant, we performed psychophysics, functional magnetic resonance imaging (fMRI) and electroencephalography (EEG) experiments for 13 days (Fig. 1a). Each experiment included unisensory auditory pre-adaptation, left (i.e. VA) and right (i.e. AV) audiovisual adaptation, and auditory postVA/postAV-adaptation phases. During pre- and post-adaptation phases, participants were presented with auditory stimuli at 7 spatial locations (±12°, ±5°, ±2° and 0°) along the azimuth. They performed a left-right spatial classification task only on 22% of ‘response trials’ that were randomly interspersed in ‘non-response trials’ (Fig. 1b). During the audiovisual adaptation phases, participants were presented with a sound in synchrony with a visual stimulus that was displaced in separate sessions by 15° to the left or right of the sound location. To focus on implicit perceptual recalibration, observers performed a non-spatial task during the adaptation phases: they detected small changes in the contrast to the visual stimulus that occurred on 10% ‘response trials’ (see Fig. 1b). Direct comparison of postVA and postAV-adaptation trials enables the assessment of recalibration unconfounded by time and learning effects.

Fig. 1. Study and experimental design.

Fig. 1

a The study included one day of pre-screening, 4 days of psychophysics testing, 4 days of fMRI and 4 days of EEG. b Each day included (i) auditory pre-adaptation, (ii) left (i.e. VA) or right (AV) audiovisual adaptation and (iii) auditory post-adaptation phases. In the pre-adaptation and post-adaptation phases, observers were presented with unisensory auditory stimuli sampled from seven (±12°, ±5°, ±2° and 0° visual angle) locations along the azimuth. They performed a spatial (left vs. right) classification task on 22% of the trials (‘response trial’) that were indicated by a brief dimming of the fixation cross 500 ms after sound onset. In the audiovisual adaptation phases, observers were presented with audiovisual stimuli with a spatial disparity of ±15°: the visual signal was sampled from three locations along the horizontal plane (i.e. −5°, 0° and 5°) and the visual signal was spatially shifted by 15° either to the left (i.e. VA-adaptation) or to the right (i.e. AV-adaptation) with respect to the auditory stimulus. Observers were engaged in a non-spatial visual detection task on 10% of the trials (‘response-trial’) indicated by the lower contrast of the visual signals. Each day started with the pre-adaptation phase. Next, VA (or AV) adaptation phases alternated with postVA (or postAV) adaptation phases. This figure shows the experimental details for the fMRI experiments, which were slightly modified for EEG and psychophysics experiments (for details see Experimental design and procedure section).

Behavioural results

Throughout all experiments, observers were attentive and followed task instructions as indicated by a sensitivity index (d’) of >4 across all experimental phases (Supplementary Table 1). Observers responded to >88% of ‘response trials’ (i.e. ‘hits’) both in the pre-/post-adaptation (i.e. 22% response trials) and the adaptation phases (i.e. 10% response trials, Supplementary Table 1). They made only <1.43% responses to ‘non-response trials’ (i.e. ‘false alarms’).

Likewise, eye movement analyses during the psychophysics experiment showed that fixation was well maintained on 94.02 ± 1.00% (mean ± SEM) of the trials throughout the entire experiment with no significant differences in eye movement indices between postVA- and postAV-adaptation phases (i.e. % saccades, % eye blinks and post-stimulus mean horizontal fixation position, see Supplementary Note 1: Eye movement results). Potential recalibration effects are therefore not confounded by eye movements.

To assess whether exposure to misaligned visual signals recalibrates observers’ sound representations, we fitted cumulative Gaussians as psychometric functions to the percentage ‘perceived right’ responses in the auditory post-adaptation phases (Fig. 2a). Consistent with previous research1517 observers’ perceived sound locations shifted towards the displaced visual stimulus that was presented in the prior adaptation phases, causing the psychometric function to shift in the opposite direction (see Supplementary Fig. 1). Bayesian model comparison confirmed that the recalibration model that allows for shifts in the point of subjective equality (PSE) across pre-, postVA- and postAV-adaptation phases was exceedingly more likely than the static model in which the PSE values were constrained to be equal (random effects analysis with Akaike criterion as an approximation to the model evidence yielded protected exceedance probabilities >0.90 in each of the psychophysics, fMRI and EEG experiments; Supplementary Table 2). Moreover, paired t-tests of the PSE values (from the recalibration model) between postVA- and postAV-adaptation phases revealed significantly more positive PSE for postVA-adaptation than for postAV-adaptation phases (one-tailed p-values for the second level bootstrap-based paired t-tests: psychophysics: t(14) = 11.4, p < 0.0001; fMRI: t(4) = 9.4 p = 0.010; EEG: t(4) = 9.2, p = 0.011) (Fig. 2b). This was also consistently reflected in participants’ mean response times across the three experiments (see Supplementary Fig. 2). Collectively, our behavioural results confirmed that prior exposure to disparate audiovisual signals recalibrates observers’ spatial classification responses to sounds.

Fig. 2. Behavioural results.

Fig. 2

a Psychometric functions fitted to the behavioural data from the psychophysics, fMRI and EEG experiments for pre-, postAV- and postVA-adaptation phases (semi-transparent lines represent subject-specific psychometric functions, solid lines represent across-subjects’ mean). b Across-subjects’ mean (±SEM, n = 15 subjects for psychophysics, n = 5 subjects for fMRI and EEG) of the PSE for pre-adaptation, postVA- and postAV-adaptation phases. The subject-specific PSEs are overlaid as line graphs with circular markers. The PSE was shifted towards the left for postAV- (purple) relative to postVA-adaptation (dark blue) phases in every single participant. PSE point of subjective equality, SEM standard error of the mean. Source data are provided as a Source Data file.

fMRI: spatial encoding and recalibration indices

Using multivariate decoding we identified brain areas that (i) encoded sound location and (ii) recalibrated this encoded sound location. Guided by previous research7,19,2326 we focused on five regions of interest (ROI): Heschl’s gyrus (HG), higher auditory cortex (hA, mainly planum temporale), intraparietal sulcus (IPS), inferior parietal lobule (IPL) and frontal eye-field (FEF). In each of those ROIs we trained a linear support vector regression model (SVR, LIBSVM27) in a four-fold cross-validation scheme to learn the mapping from BOLD-response patterns of the pre-adaptation phase to external auditory locations. We used this learnt mapping to decode the sound location from the activation patterns of the pre-, postVA- and postAV-adaptation phases. Focusing selectively on the pre-adaptation trials, we observed significantly better than chance decoding accuracies (i.e. ‘spatial encoding index’, the Fisher z-transformed Pearson correlation coefficient between the true and the decoded sound locations) across all ROIs along the auditory spatial processing hierarchy with a maximal decoding accuracy in planum temporale (t-values and FDR-adjusted one-tailed p-values for the second level bootstrap-based one-sample t-tests: HG: t(4) = 2.47, p = 0.028; hA: t(4) = 6.58, p = 0.020; IPS: t(4) = 4.04, p = 0.020; IPL: t(4) = 4.27, p = 0.020; FEF: t(4) = 2.84, p = 0.023, see Fig. 3a).

Fig. 3. fMRI multivariate decoding and representational dissimilarity matrices.

Fig. 3

a Across-subjects’ mean (±SEM, n = 5 subjects) spatial encoding index (left bars, Fisher z-transformed correlation coefficient) and across-subjects’ mean (±SEM, n = 5 subjects) recalibration index (right bars, difference in fraction ‘decoded right’ (i.e. positive azimuth) for postAV-adaptation vs. postVA-adaptation) in the ROIs as indicated. FDR-corrected one-tailed p-values from one-sample bootstrap-based t-tests for spatial encoding index: HG: p = 0.028; hA: p = 0.020; IPS: p = 0.020; IPL: p = 0.020; FEF: p = 0.024; and recalibration index: HG: p = 0.042; hA: p = 0.010; IPL: p = 0.025; FEF: p = 0.046; IPS: p = 0.086. Asterisks denote alpha level *p ≦ 0.05, **p ≦ 0.01. The individual data points are overlaid as black circles. b Neurometric function fitted to the fMRI decoded sound locations across ROIs for pre-, postVA- and postAV-adaptation phases. c RDMs (across-subjects’ mean) showing the Mahalanobis distances for the fMRI decoded sound locations across ROIs. Top row: 7 × 7 RDM for pre-adaptation phase. Bottom row: 14 × 14 matrix for postVA- and postAV-adaptation. The drawings on the left indicate the organization of the RDMs of spatial locations for pre- (top) as well as for postAV- and postVA-adaptation phases (bottom). Using MDS we projected each RDM onto a single dimension (i.e. representing space along the azimuth). For illustration purposes, we vertically offset the MDS projections for postVA- and postAV-adaptation phases and connected the projections corresponding to the same physical stimulus locations. The MDS results show that the projected locations of the postVA-adaptation phase are shifted towards the left in relation to those of the postAV-adaptation phase particularly in hA and IPS, while a more complex relationship arises in IPL and FEF. HG Heschl’s gyrus, hA higher auditory cortex, IPL inferior parietal lobule, IPS intraparietal sulcus, FEF frontal eye-field, RDM representational dissimilarity matrix, MDS multidimensional scaling, FDR false discovery rate. Source data are provided as a Source Data file.

To assess whether these encoded sound locations adapt to displaced visual stimuli, we computed the recalibration index (RI) as the difference between the fraction of ‘decoded right’ (i.e. positive azimuth) in the postAV-adaptation phase minus the postVA-adaptation phase. If the decoded sound location shifts towards the disparate visual signal during the AV- and VA-adaptation phase, we would expect a positive recalibration index. Consistent with this conjecture we observed a RI that was significantly greater than zero in nearly all ROIs (t-values and FDR-adjusted one-tailed p-value of second level bootstrap-based one-sample t-tests, HG: t(4) = 3.55, p = 0.042, hA: t(4) = 3.33, p = 0.010, IPL: t(4) = 4.31, p = 0.026, FEF: t(4) = 2.49, p = 0.047, and a trend in IPS: t(4) = 1.81, p = 0.086, see Fig. 3a).

These results provide initial evidence that regions along the dorsal auditory processing hierarchy show an overall effect of recalibration averaged across all sound locations.

fMRI: neurometric functions

Similar to our behavioural analysis, we fitted cumulative Gaussians as neurometric functions to the percentage ‘decoded right’ (i.e. positive azimuth) separately for the pre-, postVA- and postAV-adaptation phases at the group level (see Fig. 3b). Consistent with our behavioural findings the neurometric functions shifted rightwards after VA-adaptation and leftwards after AV-adaptation. Bayesian model comparison28 provided strong evidence29 for the recalibration model that allows for changes in PSE values between pre, postVA- and postAV-adaptation phases relative to a static model in which the PSE values were constrained to be equal (Akaike criterion > 2.3 (evidence in favour of the recalibration model) in all ROIs; HG: 10.6, hA: 15.5, IPS: 12.7, IPL: 8.0, FEF: 5.6; Supplementary Table 3).

fMRI: representational similarity analysis

Our linear decoding results indicate that auditory representations throughout the processing stream are altered by prior AV recalibration. Critically, our results so far are agnostic about the coding principles underlying this pervasive cross-sensory plasticity. Next, we therefore characterized the geometry of the neural representations across the seven sound locations using representational similarity analysis (RSA30). For visualization, we projected the group-level representational dissimilarity matrices (RDMs, averaged across participants) with non-classical multidimensional scaling (MDS31) onto a single dimension to incorporate the spatial organization along the azimuth (i.e. as physical space, see Fig. 3c). Particularly, the higher auditory cortex (hA, including planum temporale) and IPS encoded the seven sound locations largely consistent with the physical distances of the sound locations. For instance, in hA, IPS and IPL the MDS distance is greater for the sound locations 12° and 5° than for the locations 5° and 0°. This was also reflected in the Spearman’s rank-correlation (RS) between MDS projected and true spatial locations (hA: RS(5) = 1; IPS: RS(5) = 1; IPL: RS(5) = 1). By contrast, the representational geometry in HG and FEF does not fully correspond to the physical sound distances (HG: RS(5) = 0.86; FEF: RS(5) = 0.54; see Fig. 3c bottom row). Across all regions, the MDS shows spatial shifts to the left (resp. right) after VA- (resp. AV)-adaptation. Again, the similarity between neural representations in hA, IPS, and IPL obeyed the physical order of the sound locations (hA: RS(12) = 0.91; IPS: RS(12) = 0.92; IPL: RS(12) = 0.93), while this is less clear for HG and FEF (HG: RS(12) = 0.68; FEF: RS(12) = 0.41). For instance, in FEF the neural representations for the ‘−12° sound’ are shifted towards the centre (see Fig. 3c, FEF in bottom row).

Model-based fMRI analysis: dissociating perceptual and decisional codes

MDS revealed subtle differences in representational geometry across regions that may arise from the mixing of multiple representational components. Most notably, recalibration may affect spatial and decisional (i.e. choice-related) representations. To arbitrate between these hypotheses, we compared a spatial, a decisional and a combined spatial + decisional uncertainty model as explanations for the regional mean BOLD-response and/or the fine-scale activation patterns using pattern component modelling (PCM).

Pattern component modelling is a relatively recent methodological approach that, similar to related multivariate analysis approaches (e.g. multivariate multiple regression or multivariate analysis of variance) models multivariate observations (i.e. multi-voxel activity patterns) by a set of explanatory variables (e.g. experimental conditions) under Gaussian assumptions. Critically, it does not fit individual voxel weights but second-order parameters that determine the similarity structure or distribution of activity profiles over voxels as incorporated in the second-moment matrix. By marginalizing over all possible voxel-specific feature weights PCM obtains the marginal likelihood of the data given the model that can be used for model comparison (for further details see refs. 32,33). PCM thus allows us to adjudicate whether the activity distribution over voxels is better accounted for by spatial, decisional or combined spatial + decisional coding.

To model spatial/perceptual representations we used the hemifield model (see Fig. 4a) that encodes sound location in the relative activity of two subpopulations of neurons, each broadly tuned to the ipsi- or contra-lateral hemifield6,34. Because the hemifield and place code models make near-indistinguishable predictions for the pattern similarity structure over the relatively central locations used in the current study, we applied the hemifield model as a generic spatial model to all regions (for details, see Supplementary Methods: Comparison of spatial hemifield and place code models). The decisional uncertainty model (see Fig. 4a) codes observers’ decisional uncertainty as a non-linear function of the distance between the spatial estimate from the left/right classification boundary35 (see Methods for further details).

Fig. 4. Spatial and decisional uncertainty models.

Fig. 4

a Neural models: The spatial (hemifield) model encodes spatial location in the relative activity of two subpopulations of neurons each broadly tuned either to the ipsi- or contra-lateral hemifield. The ratio of the ipsi- and contralaterally tuned neurons was set to 30%/70% consistent with prior research8. The decisional uncertainty model encodes observers’ decisional uncertainty as a non-linear function of the distance between the spatial estimates and the spatial classification boundary. b Predicted mean BOLD-response as a function of sound location along the azimuth in a left hemisphere region for pre-, postVA- and postAV-adaptation. The spatial model predicts a BOLD-response that increases linearly for sound locations along the azimuth. The decisional uncertainty model predicts BOLD-response that decays with the distance from the decision boundary in an inverted U-shaped function. Further, this inverted U-shaped function shifts along the azimuth when spatial estimates are recalibrated. The model predictions were obtained by averaging the simulated neural activities across 360 neurons in a left hemisphere region. c Predicted representational dissimilarity matrices (RDM) based on the individual model neural activity profiles across spatial locations (−12° to 12°) and experimental phases (pre, postVA and postAV). We simulated RDMs from the spatial (left) and the decisional (right) model for (i) top row: no recalibration, i.e. without a representational shift and (ii) bottom row: with recalibration, i.e. with a representational shift. Solid white lines delineate the sub-RDM matrices that show the representational dissimilarities for different stimulus locations within and between different experimental phases. Diagonal dashed white lines highlight the RDM dissimilarity values for two identical physical locations of the postAV- and the postVA-adaptation phases. Comparing the RDMs with and without recalibration along those dashed white lines shows how the shift in spatial representations towards the previously presented visual stimulus alters the representational dissimilarity of corresponding stimulus locations in postAV- and postVA-adaptation phases, while the off-diagonals show the dissimilarity values for neighbouring spatial locations. Source data are provided as a Source Data file.

Because a region may combine spatial and decisional coding, we first assessed whether (i) the spatial model (S), (ii) the decisional uncertainty model (D) or (iii) the combined spatial + decisional uncertainty model (S + D) were the best explanations for our data from the pre-adaptation phase. In a second step, we used the combined spatial + decisional uncertainty model to assess the contributions of the spatial and decisional components to recalibration by comparing: (i) spatial + decisional with no recalibration in either model component (S + D), (ii) spatial with recalibration + decisional (SR + D), (iii) spatial + decisional with recalibration (S + DR) and (iv) spatial with recalibration + decisional with recalibration (SR + DR) as explanations for the data from the pre-, postAV- and postVA-adaptation phase. Each model component accounted for recalibration by shifting the ‘encoded sound locations’ by a constant ±2.3° (i.e. across-subjects’ mean in behavioural PSE shift) to the right (postVA-adaptation) vs. left (postAV-adaptation). As shown in Fig. 4, the spatial and decisional uncertainty models make distinct predictions for the regional mean BOLD-response (Fig. 4b) and the similarity structure of activity patterns over sound locations (Fig. 4c). In supplementary analyses, we also explored whether additional model components that account for observers’ binary left/right decisional choices account for additional variance in our data (see Supplementary Note 2: Assessment of decisional choice model in fMRI and EEG). In the main text we focus mainly on the predictions of the spatial and decisional uncertainty models because the predictions of the decisional uncertainty model (but not of the decisional choice model) are largely independent from those of the spatial model.

Regional mean BOLD-response—linear mixed-effects modelling

The spatial model predicts that the regional mean BOLD-response of—for instance—a left hemisphere region increases for stimuli towards the right hemifield. Likewise, its response to all sounds irrespective of location should be greater after right recalibration. By contrast, the decisional uncertainty model predicts a mean BOLD-response that peaks at the decisional boundary and tapers off with a greater distance from the boundary. As a result of recalibration, this peak shifts towards the left or right, because the brain’s spatial estimates have been recalibrated leading to a change in the relative distance between the spatial estimates and the decision boundary (see Fig. 4b). For instance, the encoded spatial estimate for a sound stimulus at the physical location of 5° may be shifted towards the left after left recalibration (i.e. visual signal displaced towards the left of the auditory location). As a result, the decisional boundary and spatial location that is associated with the greatest decisional uncertainty is shifted towards the right after left recalibration (see Supplementary Fig. 1).

In hA the mean BOLD-response increased progressively for stimuli along the azimuth as expected under the spatial model (Fig. 5a). By contrast, in IPS, IPL and FEF the BOLD-response peaked for physical 0° sound location in the pre-adaptation phase, left sound locations (i.e. <0°) in the postAV-adaptation and right sound locations (i.e. >0°) in the postVA-adaptation phase as expected under the decisional uncertainty model. This visual impression was confirmed by Bayesian model comparison of the three (i.e. spatial, decisional, spatial + decisional) linear mixed-effects (LME) models that included the simulated activity of the respective models as fixed effects predictors (with the additional constant term) and were fitted to the pre-adaptation phase alone. The Bayes factors provided strong evidence for the spatial model in hA and moderate evidence for the decisional uncertainty model in IPS (see Supplementary Table 4). In HG, IPL and FEF the most parsimonious baseline model outperformed or was on par with the other models suggesting that the regional BOLD-response in those regions did not provide reliable information about sound location in the pre-adaptation runs. For the recalibration analysis, we designed each LME model using the simulated activity of the spatial and/or decisional uncertainty models as fixed effects predictors (in addition to a constant regressor) for the 7 (sound locations) × 3 (pre-, postAV- and postVA-adaptation) conditions. We observed a small trend toward spatial coding in hA (i.e. SR + D and SR + DR > S + D and S + DR), but moderate evidence towards decisional coding in IPL and strong evidence in IPS and FEF (i.e. SR + DR and S + DR > S + D and SR + D, Fig. 5b bottom row, Supplementary Table 4). This suggests that the mean regional BOLD-response follows the predictions of the hemifield model only in hA. By contrast in IPS, IPL and FEF the regional mean BOLD-response mainly reflects activity associated with the decisional uncertainty invoked by mapping the recalibrated spatial estimates onto observers’ decisional boundary. Adding predictors encoding observers’ decisional spatial choices did not increase the model evidence in any of those regions (see Supplementary Note 2: Assessment of decisional choice model in fMRI and EEG).

Fig. 5. fMRI results: regional BOLD-response and pattern component modelling (PCM).

Fig. 5

a Across-subjects’ mean (n = 5 subjects) positive BOLD-responses shown as a function of spatial location in pre-, postVA- and postAV-adaptation phases across ROIs. The shaded areas indicate SEM. b Results of the linear mixed-effects analysis of regional mean BOLD-response across ROIs: Loge-Bayes factors (LogeBF) for a specific target model as indicated by the capital letter relative to the null model, which includes only the constant term (see Supplementary Methods: Plotting of regional mean BOLD-responses). Top row: The spatial, decisional, and spatial + decisional uncertainty models without recalibration. Bottom row: The models factorially manipulate whether the spatial and/or decisional component include recalibration. c PCM results—Spatial and/or decisional uncertainty models as predictors for fine-grained BOLD-response patterns across ROIs. Across-subjects’ mean LogeBFs (±SEM, n = 5 subjects) and individual data points (circular markers) for each predictor model. Top row: The spatial, decisional,- and spatial + decisional uncertainty models without recalibration relative to a null model that allows for no similarity between activity patterns. Bottom row: The models factorially manipulate whether the spatial and/or decisional component accounts for recalibration. Loge-Bayes factors are relative to the spatial + decisional uncertainty model (without recalibration) as the null model. Dash-dotted grey lines indicate the relative LogeBF for the fully flexible models as noise ceiling estimated separately for the top and bottom row model comparisons (for details, see Methods). S = spatial model without recalibration, D = decisional uncertainty model without recalibration, SR = spatial model with recalibration, DR = decisional uncertainty model with recalibration32. Source data are provided as a Source Data file.

Fine-scale activation pattern—pattern component modelling

Using pattern component modelling (PCM32) we investigated whether spatial and decisional codes contributed to the fine-scale voxel-level activation patterns and the recalibration effects. Consistent with our analysis of the regional mean BOLD-response we fitted and compared a baseline model that assumes independence of activation patterns across conditions with the spatial, decisional, and combined spatial + decisional PCM models. All these models were fitted to the 7 (sound locations) in the pre-adaptation conditions. In a second step we used the combined spatial + decisional uncertainty model to assess whether recalibration was expressed in spatial, decisional, or spatial + decisional codes using data from the pre-, postVA- and postAV-adaptation phases (see Fig. 5c). We fitted all models (e.g. spatial and/or decisional etc.) and a fully flexible model using a leave-one-subject-out cross-validation scheme. The fully flexible model accommodates any possible covariance structure of activity profiles and enables us to assess whether a particular target model accounts for the key representational structure in the observations given the measurement noise and inter-subject variability (see Fig. 5c).

As shown in Fig. 5c top row, the Loge-Bayes factors (averaged across participants ±SEM) provided strong evidence for the spatial relative to the decisional uncertainty model in HG and hA (see Supplementary Table 5 for exact values). In HG, the spatial component alone was sufficient to reach the threshold set by the fully flexible model. In all other regions, i.e. hA, IPS, IPL and FEF, the combined spatial + decisional uncertainty model outperformed the single-component models suggesting that in those regions spatial and decisional representations together contribute to adaptive coding. Moreover, the decisional component became progressively more dominant along the dorsal processing hierarchy. In FEF the decisional uncertainty model even outperformed the spatial model.

The recalibration results further emphasized these opposite gradients for spatial and decisional coding along the auditory processing stream. As shown in Fig. 5c bottom row, in HG and hA the models expressing recalibration in spatial coding (SR + D and SR + DR) outperformed those without recalibration in spatial coding (S + D and S + DR). The evidence was strong in hA, but not conclusive in HG. In hA, the combined model incorporating recalibration of spatial and decisional coding outperformed more parsimonious models. Conversely in IPS, IPL and FEF we observed strong evidence for the models that expressed recalibration in the decisional code (S + DR and SR + DR) relative to those that did not (S + D and SR + D, see Supplementary Table 5 for exact values). Again, in IPL and IPS the combined model (SR + DR) outperformed the more parsimonious models.

In summary, our advanced model-based representational analyses demonstrate that audiovisual adaptation relies on plastic changes in spatial and decisional uncertainty representations. While all regions (apart from HG) multiplexed spatial and decisional coding, their contributions arose with opposite gradients along the dorsal auditory hierarchy. The spatial model provided a better explanation for activations in HG and hA, the decisional uncertainty model dominated in IPS, IPL and FEF. Adding a decisional choice component (i.e. observers’ left/right choices) substantially increased the model evidence only in IPS for spatial coding in pre-adaptation runs and in IPS and hA for recalibration (see Supplementary Note 2: Assessment of decisional choice model in fMRI and EEG, Supplementary Methods: Comparison of response time and decisional choice models, and Supplementary Tables S6 and S7). These results suggest that IPL may be more involved in computing decisional uncertainty, while IPS codes observers’ decisional spatial choice, which in turn influences spatial coding in hA via feedback connectivity. Crucially, only characterizing the representational geometry of the neural representations enabled us to reveal that spatial and decisional codes evolve with opposite gradients across regions. This highlights the critical importance to move beyond simple linear decoding analyses.

EEG: spatial encoding and recalibration indices

Using EEG, we characterized how spatial and decisional coding evolved post-stimulus. Consistent with our fMRI analysis, we first computed the spatial encoding and recalibration indices for EEG activity patterns across time. We trained linear support vector regression models (SVR, LIBSVM27) on the EEG activity patterns of the pre-adaptation trials in overlapping 50 ms time windows sliding from −100 to 500 ms post-stimulus and generalized them to trials from pre- and post-adaptation trials in a four-fold cross-validation scheme. The cluster-based bootstrap test on the Fisher z-transformed Pearson correlation coefficient between the true and the decoded sound locations revealed a significant cluster extending from 110 to 500 ms post-stimulus (one-tailed p = 0.01 corrected for multiple comparisons within the entire [−50 to 500] ms window). The decoding accuracy was significantly better than chance from about 110 ms post-stimulus, rose steadily and peaked at about 355 ms (Fig. 6a). Likewise, we assessed the recalibration index (RI) within the time window that showed a significant effect of spatial encoding, i.e. [110–500] ms post-stimulus. In the cluster-based bootstrap test the RI was significantly positive in two clusters from [185–285] ms (one-tailed p = 0.019) and [335–470] ms (one-tailed p = 0.005) post-stimulus (Fig. 6b). Moreover, because previous ERP analyses suggested that the N100 potential is affected by recalibration13, we performed a temporal ROI analysis selectively on the EEG activity pattern averaged within the N100 time window (i.e. [70–130] ms). This temporal ROI analysis showed that the decoding accuracy was significantly better than chance (mean ± SEM: 0.125 ± 0.059; t(4) = 2.12, one-tailed p = 0.0406) and the RI was significantly greater than zero (mean ± SEM: 5.963 ± 1.743; t(4) = 3.42, one-tailed p = 0.0019).

Fig. 6. EEG results: multivariate decoding and pattern component modelling.

Fig. 6

a Time course of the spatial encoding index (across-subjects’ mean ± SEM, n = 5 subjects, grey line and shaded area) and EEG evoked potentials (across-subjects’ mean, n = 5 subjects, averaged over central channels, see inset) for the 7 spatial locations in pre-adaptation phase (±12°, ±5°, ±2° and 0° azimuth). b Time course of the recalibration index (across-subjects’ mean ± SEM, n = 5 subjects, grey line and shaded area) and the EEG evoked potentials (across-subjects’ mean, n = 5 subjects, averaged over central channels, see inset) for sounds presented at 0° azimuth for pre-, postVA-, and postAV-adaptation phases. Clusters underlying significant effects of spatial encoding (a) and recalibration (b) (p < 0.05, one-sided, cluster corrected) are indicated by grey boxes. Areas within the dashed boxes indicate the a priori defined time window focusing on early recalibration effects13. c PCM results—Spatial and/or decisional uncertainty models as predictors for EEG activity patterns across four time windows. Across-subjects’ mean LogeBFs (±SEM, n = 5 subjects) and individual data points (circular markers) for each model. Top row: The spatial, decisional, and spatial + decisional uncertainty models without recalibration relative to a null model that allows for no similarity between activity patterns. Bottom row: The models factorially manipulate whether the spatial and/or decisional component accounts for recalibration. Loge-Bayes factors are relative to the spatial + decisional uncertainty model (without recalibration). Dash-dotted grey lines indicate the relative LogeBF for the fully flexible models as noise ceiling. S = spatial model, D = decisional uncertainty model, SR = spatial model with recalibration, DR = decisional uncertainty model with recalibration. d PCM results—BOLD-response patterns from the five ROIs as predictors for EEG activity patterns across four time windows. Across-subjects’ mean LogeBFs (±SEM, n = 5 subjects) and individual data points (circular markers) for each target model relative to the model using BOLD-response patterns in HG as predictors. HG Heschl’s gyrus, hA higher auditory cortex, IPS intraparietal sulcus, IPL inferior parietal lobule, FEF frontal eye-field. Source data are provided as a Source Data file.

Model-based EEG analysis: dissociating perceptual and decisional codes

Combining EEG and PCM we investigated whether spatial and decisional representations evolved with different time courses. For this, we compared the null, spatial, decisional and combined spatial + decisional uncertainty models as explanations for EEG activity patterns over spatial locations in four consecutive time windows: [50–150], [150–250], [250–350] and [350–450] ms post-stimulus (Fig. 6c). From [50–150] ms the spatial model alone provided a sufficient explanation of the EEG data; however, including the decisional uncertainty model as well did explain moderately more evidence. From [150–250] ms and beyond the spatial + decisional uncertainty model performed better than the spatial model with LogeBFs of at least 6.1 (Supplementary Table 8). These results suggest that the decisional code becomes progressively more important at later processing stages. This representational gradient across post-stimulus time was also apparent in our recalibration results (Fig. 6c, bottom row). Spatial recalibration was expressed mainly via spatial coding (i.e. SR + D and SR + DR better than S + D and S + DR by LogeBFs of at least 2.4, see Supplementary Table 8) from 150 to 250 ms. From 250 ms onwards, recalibration relied jointly on spatial and decisional codes (i.e. SR + DR explained the most variance consistently), as in our fMRI results in later stages. Consistent with the fMRI results, adding the decisional choice component explained additional variance in the EEG activity patterns from 250 to 450 ms (see Supplementary Note 2: Assessment of decisional choice model in fMRI and EEG, Supplementary Methods: Comparison of response time and decisional choice models, and Supplementary Table 9). Collectively, our EEG results revealed that spatial codes are more prominent in earlier processing stages and decisional coding in later ones. Critically, at later stages recalibration was expressed jointly in spatial and decisional codes.

Fusing EEG and fMRI in pattern component modelling

The fMRI results revealed a mixture of spatial and decisional codes in several ROIs across the dorsal processing hierarchy. To link fMRI and EEG results more directly, we fused them in PCMs in which the second-moment matrices of the BOLD-response activation patterns in one of our five ROIs (i.e. HG, hA, IPS, IPL or FEF) formed the predictor for the distribution of EEG activity patterns over sound locations. With these PCMs we asked when the pattern similarity structure that we observed in a particular ROI is expressed in the EEG activity patterns. This EEG-fMRI PCM fusion approach is related to previous approaches that for instance correlated the representational similarity matrices obtained from fMRI for different ROIs from EEG across post-stimulus time36. It can also be considered a form of EEG source analysis where we try to explain the similarity structure over EEG channels across time by source activation patterns obtained from fMRI. Bayesian model comparison across these five fMRI-EEG fusions PCMs confirmed a direct relationship between our fMRI and EEG results (Fig. 6d). The PCM with hA (i.e. including planum temporale) explained EEG activity patterns best (exceeding other models by LogeBFs of at least 5.0, see Supplementary Table 8) from [150–250] ms and the PCM with IPL from 250 ms onwards (exceeding other models by LogeBFs of at least 3.2, see Supplementary Table 8). We suspect that FEF has less explanatory power because of its smaller size, thereby contributing less to EEG scalp potentials. Collectively, combining EEG and fMRI spatiotemporally resolved the influence of recalibration on the encoding of sound location and decisional uncertainty.

Discussion

This study demonstrates that the brain recalibrates the senses by flexibly adapting spatial and decisional codes, which are expressed with opposite gradients along the auditory processing hierarchy. Early activity patterns in planum temporale encode sound locations in a continuous space that dynamically shifts towards misaligned visual inputs. Later activity patterns in frontoparietal cortices mainly code choice-related uncertainty in line with these representational changes.

Our behavioural results provide robust evidence that the brain recalibrates auditory space to keep auditory and visual maps in co-registration1517. The point of subjective equality changed significantly between left and right recalibration in every observer (Fig. 2). Because the audiovisual adaptation phase used a non-spatial task, this robust cross-sensory plasticity relied mainly on implicit perceptual rather than choice-related mechanisms3,18,20,3739.

Consistent with previous research in non-human primates, our multivariate fMRI analyses showed that sound location can be decoded from activity patterns in a widespread network including primary auditory regions, planum temporale and frontoparietal cortices7,19,2426,40. Moreover, the activity patterns in all of these regions adapted to changes in the sensory statistics as indicated by recalibration indices and neurometric functions (Fig. 3a. b). This widespread cross-sensory plasticity converges with our EEG results showing persistent spatial encoding and recalibration starting early with the auditory N1 component, generated in primary and secondary auditory cortices41, and extending until 500 ms post-stimulus (Fig. 6a, b).

Critically, multidimensional scaling unravelled subtle representational differences in this network of regions (see Fig. 3c). While recalibration induced a constant representational shift in planum temporale, a more complex similarity structure arose in IPL and FEF. We investigated whether this complex representational geometry resulted from multiplexing of spatial and decisional codes42 by comparing a spatial, a decisional, and a combined spatial + decisional uncertainty model (Fig. 4). The decisional uncertainty model computes choice-related uncertainty based on the distance of the spatial estimates from the classification boundary (see Fig. 4 and ref. 35). The spatial hemifield model encodes sound location in the relative activity of two subpopulations of neurons each broadly tuned either to the ipsi- or contra-lateral hemifield69 - it is currently the leading model for spatial coding in human auditory cortices. Yet, we applied it as a model not only to auditory but also to frontoparietal areas7,25, because its pattern similarity structure over the limited (i.e. [−12° to 12°]) sound locations is indistinguishable from that of the competing place code model (see Supplementary Fig. 3).

Crucially, Bayesian model comparison showed a functional gradient along the auditory processing stream: the spatial model provided a better explanation for the regional mean and fine-scale activation patterns in planum temporale, while the decisional uncertainty model performed better in frontoparietal cortices. As shown in Fig. 5a, the regional BOLD-response increased linearly for stimuli along the azimuth in planum temporale, but followed an inverted U-shaped function that adapts to audiovisual conflicts in frontoparietal cortices (Fig. 5b top and bottom row)7,19,2426,43,44. Likewise, the recalibration of the fine-scale activation patterns was better captured by the spatial model in planum temporale, but by the decisional uncertainty model in frontoparietal cortices. Yet, despite this predominance of decisional coding, IPS, IPL and even FEF multiplexed spatial and decisional patterns (Fig. 5c top row).

This spatial-decisional gradient across the cortical hierarchy was mirrored in their temporal evolutions. Recalibration of spatial coding was reflected in early EEG activity patterns ([150–250] ms), while adaptation of decisional uncertainty was more prominent in later activity (after 250 ms, Fig. 6c bottom row). We could even directly link the functional gradients across time and regions by fusing fMRI and EEG via pattern component modelling: the BOLD-response patterns in hA predicted EEG activity patterns mainly at early ([150–250] ms) stages and IPL at later stages from 250 ms onwards (Fig. 6d).

Collectively, our fMRI and EEG results demonstrate that the brain adapts flexibly to the changing statistics of the environment. They also highlight the critical importance to move beyond decoding analyses, as the ability to decode a feature value (e.g. sound location) from activity patterns does not imply that this feature is well represented32. Instead, in order to define the features or mixtures of features (e.g. spatial, decisional) that are encoded in activations, we need to characterize the fine-grained representational geometry and assess the explanatory power of different sets of features via model comparison. Only the latter was able to show that later recalibration effects in frontoparietal cortices reflect mainly choice-related rather than genuine spatial coding. The expression of recalibration in distinct spatial and decisional codes opens the intriguing possibility that recalibration may not always impact both codes together. Instead, the representations and neural circuitries involved in recalibration may depend on the particular task and/or the duration of recalibration. For instance, while our study focused on implicit long-term recalibration using a non-spatial task, other studies have shown rapid recalibration effects when observers performed spatial localization tasks during the adaptation and/or post-adaptation phases18,20,38,39. It remains unknown whether these short-term recalibration effects involve plastic changes in the spatial representations or selectively alter choice-related activity. In line with this notion, accumulating research suggests that recalibration relies on multiple mechanisms that operate over different timescales12,4547. Future studies thus need to disentangle the impact of recalibration duration and task on spatial and decisional coding.

So far, we have emphasized that spatial and decisional codes are expressed with opposite gradients across the cortical hierarchy and post-stimulus time. Yet, Bayesian model comparison also indicated that the combined spatial + decisional uncertainty model significantly outperformed the more parsimonious decisional or spatial models throughout the cortical hierarchy. Likewise, a benefit was observed for the pattern component models in which recalibration was expressed in both spatial and decisional codes. This mixture of spatial and decisional codes may result from true multiplexing of multiple neural codes concurrently within the same region or from a methodological limitation of fMRI, namely the sluggishness of the BOLD-response whereby neural codes that are present at different latencies nevertheless mix in the BOLD-response patterns. The EEG-fMRI fusion approach suggests that early neural activity until 250 ms is dominated by spatial coding in higher-order auditory cortices. By contrast, later neural activity reflects a mixture of spatial and decisional coding. We therefore suggest that later activity in all higher-order cortical regions (i.e. hA, IPS, IPL and FEF) multiplexed perceptual and decisional codes and adapt both codes to recalibrate the senses42. Moreover, because multiplexing of spatial and decisional codes arose mainly later from 250 ms post-stimulus, the coding of decisional uncertainty in auditory cortices (hA) is likely to reflect top-down influences from frontoparietal areas48,49.

In conclusion, our results show that audiovisual adaptation relies on spatial and decisional coding. Crucially, the expression of spatial and decisional codes evolves with different time courses and opposite gradients along the auditory processing hierarchy. Early neural activity in planum temporale encoded sound location within a continuous auditory space. Later frontoparietal activity mainly encoded observers’ decisional uncertainty that flexibly adapts in accordance with these transformations of auditory space.

Methods

This study was conducted in accordance with the Declaration of Helsinki and approved by the research ethics committee of the University of Birmingham (approval number: ERN_11_0470AP4).

Participants

Fifteen participants (10 females, mean age = 22.1; SD = 4.1) participated in the psychophysics study. Five of those participants (4 females, mean age = 22.2; SD = 5.1, one author of the study, A.M.) completed the fMRI and EEG experiments (for full selection criteria see Supplementary Methods: Participants). All participants had no history of neurological or psychiatric illnesses, had a normal or corrected-to-normal vision and had normal hearing. Participants gave informed written consent to participate in the study and were compensated with £6 per hour for behavioural and £8 per hour for fMRI and EEG sessions.

Stimuli

The auditory stimulus consisted of a burst of white noise with a duration of 50 ms and 5 ms on/off ramp delivered at a 75 dB sound pressure level. To create a virtual auditory spatial signal, the noise was convolved with spatially specific head-related transfer functions thereby providing both binaural and monaural cues for sound location50. Head-related transfer functions from the available locations in the MIT database (http://sound.media.mit.edu/resources/KEMAR.html) were interpolated to the desired spatial locations. For the EEG experiment, scanner background noise was superimposed on the spatial sound stimuli to match the task environment of the fMRI experiment. The visual stimulus was a cloud of 15 white dots (diameter = 0.4° visual angle) sampled from a bivariate Gaussian distribution with a vertical and horizontal standard deviation of 1.5° and a duration of 50 ms presented on a dark grey background (90% contrast) in synchrony with the auditory stimuli.

Experimental design and procedure

The psychophysics, fMRI, and EEG experiments used the same design including three phases: (i) unisensory auditory pre-adaptation, (ii) audiovisual adaptation (AV- or VA-adaptation) and (iii) unisensory auditory post-adaptation (postAV-adaptation or postVA-adaptation) (Fig. 1b). Psychophysics, fMRI, EEG included 2 days each for left (i.e. VA) and 2 days for right (i.e. AV) adaptation (except for one participant who completed EEG over 2 days), i.e. 12 days of experimental testing + 1 day pre-screening for each participant. The order of left and right adaptation was counterbalanced across participants. Throughout all experiments, participants fixated on a central fixation cross (0.5° diameter).

Auditory pre- and post-adaptation

In unisensory pre- and post-adaptation phases (Fig. 1b), participants were presented with auditory stimuli that were sampled uniformly from 7 spatial locations (±12°, ±5°, ±2° and 0° visual angle) along the azimuth (stimulus onset asynchrony (SOA) = 2000 ± 200 ms jitter). Participants performed a two-alternative forced-choice left-right spatial classification task explicitly only on a fraction of trials (22%), the so-called ‘response trials’, which were randomly interspersed and indicated 500 ms after sound onset by a brief (i.e. 200 ms duration) dimming of the fixation cross to 55% of its initial contrast. The fMRI decoding was based only on the non-response trials to minimize motor confounds.

Participants indicated their left-right spatial classification response by pressing one of two buttons with the index or middle fingers of their left or right hand. The response hand was alternated over runs within a day to control for potential motor confounds (see fMRI multivariate decoding section below). The order of left and right response hands was counterbalanced across days (for number of trials, runs etc., see Supplementary Methods: Auditory pre- and post-adaptation, Organization of each testing day).

Audiovisual adaptation

In the audiovisual adaptation phase, participants were presented with spatially disparate (±15° visual angle) audiovisual stimuli (SOA = 500 ms): the visual stimulus was uniformly sampled from three locations along the azimuth (i.e. −5°, 0°, 5°). On separate days, the visual stimulus was spatially shifted by 15° either to the left (i.e. VA-adaptation) or to the right (i.e. AV-adaptation) with respect to the auditory stimulus. Hence, we included the following audiovisual stimulus location pairs: [A = −20°, V = −5°], [A = −15°, V = 0°], [A = −10°, V = 5°] in (right) AV-adaptation phases and [A = 10°, V = −5°], [A = 15°, V = 0°], [A = 20°, V = 5°] in (left) VA-adaptation phases. The locations of the audiovisual stimulus pairs were fixed within mini-blocks of 5 (psychophysics, i.e. duration of 2.5 s) or 20 (fMRI, EEG, i.e. duration of 10 s) consecutive trials.

Participants detected slightly dimmer visual stimuli (80% of normal contrast), pseudorandomly interspersed on 10% of the trials (i.e. so-called ‘response trials’). This non-spatial task ensures the maintenance of participants’ attention and introduces audiovisual recalibration at the perceptual rather than decisional or motor response level. To allow sufficient time for responding (given the short SOA of 500 ms), we ensured that each response trial was followed by at least three consecutive non-response trials (for further details see Supplementary Methods: Audiovisual adaptation, Organization of each testing day).

Experimental setup

In all experiments, visual and auditory stimuli were presented using Psychtoolbox version 3.0.1151,52 under MATLAB R2011b (MathWorks Inc.) on a MacBook Pro running Mac OSX 10.6.8 (Apple Inc.). In the psychophysics and EEG experiments, participants were seated at a desk with their heads rested on a chinrest. Two accessory rods were mounted on the chinrest serving as forehead rest and allowing stable and reliable head positioning. Visual stimuli were presented at a viewing distance of 60 cm via a gamma-corrected 24” LCD monitor (ProLite B2483HS, iiyama Corp.) with a resolution of 1920 × 1080 pixels at a frame rate of 60 Hz. Auditory stimuli were delivered via circumaural headphones (HD 280 Pro, Sennheiser electronic GmbH & Co. KG) in the psychophysics experiment and via in-ear earphones (E-A-RTONE GOLD, 3 M Company Auditory Systems) in the EEG experiment. Participants used a standard USB keyboard for responding. In the fMRI experiment, visual stimuli were back-projected to a plexiglass screen using a D-ILA projector (DLA-SX21, JVC, JVCKENWOOD UK Ltd.) with a resolution of 1400 × 1050 pixels at a frame rate of 60 Hz. The screen was visible to the subject through a mirror mounted on the magnetic resonance (MR) head coil and the eye-to-screen distance was 68 cm. Auditory stimuli were delivered via a pair of MR-compatible headphones (MR Confon HP-VS03, Cambridge Research Systems Ltd). Participants responded using a two-button MR-compatible keypad (LXPAD 1 × 5–10 M, NATA Technologies). Exact audiovisual onset timing in adaptation trials was confirmed by recording visual and auditory signals concurrently with a photodiode and a microphone.

Eye movement recording

Eye movement recordings were calibrated in the recommended field of view (32° horizontally and 24° vertically) for the EyeLink 1000 Plus system (SR Research Ltd.) with the desktop mount at a sampling rate of 2000 Hz. Eye position data were online parsed into events (saccade, fixation, eye blink) using the EyeLink 1000 Plus software. The ‘cognitive configuration’ was used for saccade detection (velocity threshold = 30°/sec, acceleration threshold = 8000°/sec2, motion threshold = 0.15°) with an additional criterion of radial amplitude >1°. Fixation position was post-hoc offset corrected. In the fMRI experiment, precise positioning of participants’ heads inside the scanner bore was critical for the sensitive measurement of spatial recalibration, so high-quality eye movement recordings were not possible.

Behavioural analysis for psychophysics, fMRI and EEG experiments

Signal detection measures

Participants responded only on a fraction of ‘response trials’, i.e. 22% in auditory pre- and post-adaptation and 10% in AV- and VA-adaptation. This enabled us to assess their performance with the signal sensitivity measure d’:

d=ZphitZpfalsealarm 1

where phit and pfalse alarm are the hit and false alarm rates, respectively. Hits are ‘response trials’, on which observers gave a response. False alarms are ‘non-response trials’, in which observers gave a response. 100% hit rate and 0% false alarm rate were approximated by 99.999% and 0.001%, respectively, to enable the calculation of Z-scores.

Psychometric functions

For the pre-, postVA- and postAV-adaptation phases we fitted cumulative Gaussians as psychometric functions (PF) to the percentage ‘perceived right’ responses on the ‘response trials’ as a function of stimulus location (±12°, ±5°, ±2° and 0°) (see Fig. 2). To account for potential overdispersion in the data across sessions, we used the beta-binomial model53,54. The beta-binomial model assumes that the percentage of right responses at a particular stimulus location is not fixed throughout the entire experiment but a beta-distributed random variable with variance determined by the scaling factor η (between 0 and 1). The models were fitted individually to the behavioural data of each participant with maximum-likelihood estimation55 and using the beta-binomial model56,57. To enable reliable parameter estimates for each participant, we employed a multi-condition fitting using the following constraints: (i) the just noticeable differences (JND or slope parameter) were set equal across all conditions (i.e. pre-, postVA- and postAV-adaptation phases); (ii) guess and lapse rates were set equal to each other and (iii) equal across all conditions. Furthermore, (iv) we constrained the fitted guess and lapse rate parameters to be within 0 and 0.1 and (v) the variance of the beta-distribution (i.e. η in the beta-binomial model) was set equal across all conditions.

For statistical inference, we assessed the goodness of fit of the cumulative Gaussian for each condition and participant using a likelihood ratio test. This likelihood ratio test compares the likelihood of participants’ responses given the model that is constrained by a cumulative Gaussian function (i.e. our ‘target model’) to the likelihood given by a so-called ‘saturated model’ that models observers’ responses with one parameter for each stimulus location in each condition. The resulting likelihood ratio for the original dataset is then compared with a null distribution of likelihood ratios generated by parametrically bootstrapping the data (5000×) from the ‘target model’ fitted to the original dataset and refitting the ‘target’ and ‘saturated’ models. Since the likelihood ratio for the original dataset was not smaller in any of the participants than 5% of the parametrically bootstrapped likelihood ratios (i.e. p > 0.05), we inferred sufficient goodness of fit for all participants.

Next, we assessed whether AV- and VA-adaptation induced a shift in participants’ perceived auditory location by comparing a ‘static’ model, which constrains PSEs to be equal for pre-, postVA- and postAV-adaptation, with a ‘recalibration’ model, which includes three PSE values for the pre-, postVA- and postAV-adaptation PFs. For each participant and model, we calculated the Akaike Information Criterion (AIC28) according to the following formula:

AIC=logeLN 2

where L stands for the likelihood of the model given the data, N is the number of parameters, and loge is the natural logarithm. The summed AIC values over participants for each model provided an approximation to the model evidence. We performed Bayesian model comparison at the level of the random effect as implemented in SPM12 to obtain the protected exceedance probability for the candidate models58. A uniform prior was used over candidate models.

fMRI data acquisition and analysis

fMRI data acquisition

We used a 3 T Philips Achieva scanner to acquire both T1-weighted anatomical images (TR/TE/TI, 7.4/3.5/min. 989 ms; 176 slices; image matrix, 256 × 256; spatial resolution, 1 × 1 × 1 mm3 voxels) and T2*-weighted echo-planar images (EPI) with blood oxygenation level-dependent (BOLD) contrast (fast field echo; TR/TE, 2800/40 ms; 38 axial slices acquired in ascending direction; image matrix, 76 × 75; slice thickness, 2.5 mm; interslice gap, 0.5 mm; spatial resolution, 3 × 3 × 3 mm3 voxels). On each of the 4 days we acquired five auditory pre-adaptation runs and four audiovisual adaptation runs. Each fMRI run started and ended with 10 s fixation and the first 4 volumes of each run were discarded to allow for T1 equilibration effects.

Pre-processing and general linear model

The data were analysed with Statistical Parametric Mapping (SPM12; http://www.fil.ion.ucl.ac.uk/spm/59). Scans from each participant were realigned using the first as a reference, unwarped and slice-time corrected. The time series in each voxel was high-pass filtered to 1/128 Hz. The EPI images were spatially smoothed with a Gaussian kernel of 3 mm FWHM and analysed in native space. The data were modelled in a mixed event/block fashion with regressors entered into the design matrix after convolving the unit impulse or the block with a canonical hemodynamic response function and its first temporal derivative. In the unisensory auditory pre- and post-adaptation phases, unisensory sound stimuli were modelled as events separately for each of our 7 (sound location) × 2 (response vs. non-response) × 3 (pre, post-VA, post-AV) conditions. In the AV- and VA-adaptation phases, AV stimulus presentations were modelled as blocks separately for the 3 (visual locations) × 2 (VA vs. AV-adaptation) conditions. In addition, we modelled all response trials during the adaptation phase with a single regressor to account for motor responses. Realignment parameters were included as nuisance covariates. Condition-specific effects for each subject were estimated according to the general linear model (GLM). To minimise confounds of motor response, we limited all subsequent fMRI analyses to the parameter estimates pertaining to the ‘non-response’ trials.

For the BOLD-response analysis, we computed contrast images comparing auditory stimulus at a particular location > fixation in each subject (averaged over runs) resulting in 21 contrast images (i.e. 7 (sound location) × 3 (pre, postVA, postAV)). Moreover, we computed a contrast and associated t-image that compared all 21 sound conditions relative to fixation baseline (for identification of sound-responsive voxels).

For the multivariate decoding and representational similarity analyses, we applied multivariate spatial noise normalization to the parameter estimates using the noise covariance matrix obtained from the residuals of the GLM and the optimal shrinkage method60 and finally performed Euclidean normalization.

Regions of interest for fMRI analysis

We defined five regions of interest (ROI, combined from two hemispheres) that have previously been implicated in auditory spatial processing based on neurophysiology and neuroimaging research10,24,25. Heschl’s gyrus (HG), higher auditory cortex (hA) and inferior parietal lobule (IPL) were defined using the following parcellations of the Destrieux atlas of Freesurfer 5.3.061: (i) HG: Heschl’s gyrus and anterior transverse temporal gyrus; (ii) hA: higher auditory cortex, i.e. transverse temporal sulcus, planum temporale and posterior ramus of the lateral sulcus; (iii) IPL: inferior parietal lobule, i.e., supramarginal gyrus and inferior part of the postcentral sulcus. The intraparietal sulcus (IPS) and frontal eye-field (FEF) were defined using the following group-level retinotopic probabilistic maps62: (iv) IPS: IPS0, IPS1, IPS2, IPS3, IPS4, IPS5 and SPL1; (v) FEF: hFEF. Because previous studies of audiovisual spatial integration suggested relatively similar auditory influences on several IPS subfields6366 we pooled over them for this analysis. All probabilistic maps were thresholded to a probability of 0.1 (i.e. probability that a vertex belongs to a particular ROI) and inverse normalized into each participant’s native space.

fMRI multivariate decoding—spatial encoding and recalibration indices

We extracted the voxel response patterns in a particular ROI from the pre-whitened and normalized parameter estimate images pertaining to the magnitude of the BOLD-response for each condition and run. To avoid motor confounds, we used the parameter estimate images only from the ‘non-response trials’. In a four-fold stratified cross-validation procedure, we trained support vector regression models (C = 1, ν = 0.5, LIBSVM 3.2027) to learn the mapping from the condition-specific fMRI response patterns (i.e. examples) to external spatial locations (i.e. labels) using examples selectively from the unisensory auditory pre-adaptation runs of all but one fold64,65. This learnt mapping was used to decode the spatial locations from the BOLD-response patterns of the remaining pre-adaptation fold and all postVA- and postAV-adaptation examples (acquired in separate runs).

To determine whether a ROI encodes auditory spatial representations, we computed the Pearson correlation coefficients between the true and the decoded auditory locations for the pre-adaptation runs for each participant as a ‘spatial encoding index’.

To determine whether auditory spatial representations in a region of interest are recalibrated by misaligned visual signals, we binarized the predicted auditory locations into left vs. right predictions and computed the difference in the fraction of ‘decoded right responses’ (i.e. positive azimuth) between auditory postVA- and postAV-adaptation phases as ‘recalibration index’ (RI).

RI=pdecoded right∣postAVexamplespdecodedright∣postVAexamples 3

Importantly, postVA- and postAV-adaptation phases were matched in terms of time, exposure, training and other non-specific effects.

To allow for generalization at the population level, we entered the subject-specific Fisher z-transformed spatial encoding and recalibration indices into separate bootstrap-based one-sample t-tests against zero at the group level67. Briefly, an empirical null distribution of t-values was generated from the data by resampling the subject-specific values with replacement, subtracting the across-subjects’ mean and computing the one-sample t-statistic for each bootstrap sample. A one-tailed p-value was computed based on the proportion of t-values in the bootstrapped t-distribution greater than the observed t-value. We used the Benjamini-Hochberg algorithm68 to correct for false discovery rate (FDR) across ROIs.

fMRI multivariate decoding—neurometric functions

Because of the lower signal-to-noise ratio of fMRI data and decoding analyses, we fitted cumulative Gaussians as neurometric functions (NF) in each ROI to the percentage of ‘decoded right’ (i.e. positive azimuth) averaged across participants as a function of stimulus location (for similar motivation and analysis strategy, see ref. 63)

Consistent with the behavioural analysis we assessed whether AV- and VA-adaptation induced a shift in the decoded location of the unisensory auditory stimuli by comparing a ‘static’ model with a ‘recalibration’ model using the Akaike Information Criterion (AIC28, for details see Methods: Psychometric functions and Supplementary Methods: fMRI multivariate decoding—neurometric functions).

fMRI multivariate pattern analysis—representational dissimilarity analyses and multidimensional scaling

We generated 21 condition-specific contrast images for the 7 auditory spatial locations × 3 (pre-, postVA-, and postAV-adaptation) by averaging parameter estimate images across fMRI runs for each participant. We then characterized the geometry of spatial representations using representational dissimilarity matrices (RDMs30) based on the Mahalanobis distance for each participant and each ROI separately for pre-adaptation as well as postVA- and postAV-adaptation phases (see Supplementary Methods: fMRI multivariate pattern analysis—representational dissimilarity analyses and multidimensional scaling). Using non-classical multidimensional scaling (MDS)31 with non-metric scaling, we projected the group-level RDMs (i.e. averaged across participants) onto a one-dimensional space (‘reflecting’ spatial dimension along the azimuth). To quantify how well the MDS projection of neural representations corresponds to the relative physical positions of stimuli, we computed the Spearman’s rank-correlation between the MDS projections and the corresponding spatial locations separately for pre- and post-adaptation phases in each ROI. As the MDS projection only reflects the relative distance between neural representations, it is agnostic about the absolute orientation of stimuli in physical space (i.e. whether ‘left’ or ‘right’ is associated with a negative or positive sign). Therefore, only the magnitude, not the sign of the rank-correlation is meaningful, hence we report the absolute value of rank-correlation coefficients. For illustration purposes, we flipped the MDS projections around zero consistently for all conditions in ROIs where the correlation between MDS projections and corresponding spatial locations was negative.

EEG data acquisition and analysis

EEG data acquisition

Continuous EEG signals were recorded from 64 channels using Ag/AgCl active electrodes arranged in 10–20 layout (ActiCap, Brain Products GmbH, Gilching, Germany) at a sampling rate of 1000 Hz with FCz as reference. Channel impedances were kept below 10 kΩ.

EEG pre-processing

Pre-processing was performed with the FieldTrip toolbox69 (http://www.fieldtriptoolbox.org/). Raw data were high pass filtered at 0.1 Hz, inspected for bad channels, re-referenced to the average of all channels, and low pass filtered at 45 Hz. Bad channels were identified by visual inspection, rejected, and interpolated using spherical spline interpolation70 based on the neighbouring channels (across-subjects’ mean number of rejected channels = 1.5, min = 0, max = 4). Trial epochs for the unisensory auditory pre-adaptation and the auditory post-adaptation conditions were extracted between [−100 to 500] ms post-stimulus (i.e. the onset of the response cue on the response trials), baseline corrected and down-sampled to 200 Hz. Epochs containing artefacts within the time window of interest (i.e. between [0–500] ms post-stimulus) were identified based on visual inspection and rejected. Furthermore, based on eyetracking data, trials were rejected if they (i) contained eye blinks or (ii) saccades or (iii) the eye gaze was away from the fixation cross by more than 2 degrees (% rejected trials across-subjects’ mean ± SEM: 8.2 ± 1.0%). Grand average ERPs were computed by averaging all trials for each condition first within each participant and then across participants.

For the multivariate analysis, we applied spatial multivariate noise normalization to the individual trials using a noise covariance matrix estimated separately for each time point and the optimal shrinkage method60. Furthermore, the EEG activity patterns were divided by their Euclidean norm separately for each time point and trial for normalization. Since the EEG responses on trials with and without behavioural response were identical until 500 ms post-stimulus, we pooled over response and no response trials in all EEG analyses.

EEG multivariate decoding—spatial encoding and recalibration indices

Similar to our fMRI analysis, we trained a SVR in a four-fold stratified cross-validation (C = 1, ν = 0.527) to learn the mapping from evoked potentials (averages of 16 randomly sampled trials) of the pre-adaptation run to external auditory space over 50 ms time windows, shifting in increments of 5 ms, from −100 to 500 ms post-stimulus. The learnt mapping was used to decode the sound location from the EEG activity patterns of the pre-adaptation examples in the remaining fold and all post-adaptation examples. To minimize sampling variance, we averaged the decoded locations across 50 repetitions of this cross-validation procedure to compute the ‘spatial encoding’ and ‘recalibration’ indices for each 50 ms window. At the level of the random effect, we report results from a bootstrap-based t-test67 against zero corrected for multiple comparisons across [−50 to 500] ms using a cluster-based correction with an auxiliary cluster-defining height threshold of p < 0.05 uncorrected71.

Based on our a priori hypothesis that spatial location is encoded in the N100 potential13, we also performed bootstrap-based t-tests on the spatial encoding and recalibration indices obtained from evoked potentials averaged within a [70–130] ms window.

Spatial hemifield and decisional uncertainty models for fMRI and EEG

We assessed a spatial and a decisional encoding model as explanations for the regional mean BOLD-response and fine-scale fMRI/EEG patterns.

Spatial hemifield model

The hemifield model69 encodes sound location in the relative activity of two subpopulations of neurons each broadly tuned either to the ipsi- or contra-lateral hemifield (Fig. 4a). We simulated 360 neurons with broad Gaussian tuning functions. The standard deviation was set to 64°. The means of the tuning functions were sampled uniformly from 80° to 100° for the neuronal population tuned to the contra-lateral hemifield and from −80° to −100° azimuth for the neuronal population tuned to the ipsilateral hemifield. Consistent with previous research8 the ratio of the ipsi- and contralaterally tuned neurons was set to 30%/70% (see Fig. 4a, Supplementary Fig. 3a). Critically, additional simulations indicated that the ratio of ipsi/contralaterally tuned neurons affected only the predictions for the regional mean BOLD-responses but not those for the pattern similarity structure. In fact, the pattern similarity structure is nearly identical for simulations with the ratio between ipsi/contralaterally tuned neurons set to 30%/70% and 50%/50% (see Supplementary Methods: Comparison of spatial hemifield and place code models).

For the pre-adaptation conditions we sampled neural responses from the seven sound locations in our paradigm. For the post-adaptation conditions, we sampled again from these seven locations for the non-recalibration model. For the recalibration model we sampled the neural responses from the above locations shifted by 2.3° to the right (postVA-adaptation) or left (postAV-adaptation). The shift by 2.3° was calculated as the difference between the across-subjects’ mean PSE values in postVA- and postAV-adaptation phases from the psychometric functions.

Decisional uncertainty model

In the decisional uncertainty model the activity of a neuron encodes observers’ choice-related uncertainty that depends non-linearly on the distance between observers’ spatial estimates and their left-right spatial classification boundary35 according to:

Decisionaluncertainty=Fxμ,σ0.50.50.5 4

where F(x| μ, σ) is the cumulative normal distribution function with mean μ and standard deviation σ evaluated at spatial location x:

Fxμ,σ=1σ2πxekμ22σ2dk 5

The standard deviations of the cumulative normal distribution were set to 10° and their means were uniformly sampled between −1° and +1° (see Fig. 4a). We simulated responses from 360 neurons. Exactly as for the spatial model, we sampled neural responses from the seven sound locations for the pre-adaptation conditions and the post-adaptation conditions for the non-recalibration model. For the recalibration model we sampled the neural responses from the above locations shifted by 2.3° to the right (postVA-adaptation) or left (postAV-adaptation).

As shown in Fig. 4b, c, the spatial and decisional uncertainty models make distinct predictions for the regional mean BOLD-response and the pattern similarity structure over the 21 conditions = 7 spatial locations × 3 phases (pre, postVA, postAV).

While decisional uncertainty depends on the distance of the noisy signal from the decision boundary in signal detection theory models (see above), experimentally it is closely related to observers’ confidence and response times72,73—although differences between decisional uncertainty in first-order perceptual judgments and explicit confidence judgment are well-established74,75. We have therefore assessed to what extent the decisional uncertainty as computed above is related to observers’ response times for spatial locations during the pre- and post-adaptation phases. As expected, the predictions for the spatial locations in pre- and post-adaptation phases are highly correlated between the decisional uncertainty model and the response time models (Spearman’s rank-correlation (RS): RS(19) = 0.95, p < 0.001, see Supplementary Note 3: Pairwise correlations between the predictions of the spatial, decisional uncertainty, decisional choice and response time models; Response time analysis; Supplementary Figs. 2 and 4).

Regional mean BOLD-response: spatial and decisional linear mixed-effects models

Regional mean BOLD-response: For each of the 2 (hemisphere: left, right) × 5 (ROI: HG, hA, IPS, IPL, FEF) regions we selected the 20 most reliably responsive voxels, i.e. with the greatest t-values for all unisensory sound conditions relative to fixation. For each of those 10 regions we extracted the BOLD-response magnitude for each of the 7 locations × 3 phases (pre-, postAV- and postVA-adaptation) and formed the regional mean.

Linear mixed-effects modelling: To account for lateralization effects7, we performed separate analyses for each region and hemisphere. Separately for each hemisphere we used the spatial (resp. decisional) model to generate predicted regional mean BOLD-responses for each of the 21 conditions = 7 locations × 3 phases (pre-, postAV- and postVA-adaptation) by averaging activations of 360 simulated neurons.

We generated seven linear mixed-effects (LME) models that varied in their fixed effects predictors:

  • Null LME: single intercept term.

  • Spatial LME model (S): predictor from the spatial encoding model without recalibration and intercept term.

  • Decisional LME model (D): predictor from the decisional uncertainty model without recalibration and intercept term.

The remaining LME models included spatial, decisional and intercept terms (i.e. three fixed effects regressors) and factorially manipulated whether the spatial and/or the decisional predictor modelled recalibration:

  • (S + D) Spatial without recalibration + decisional without recalibration

  • (SR + D) Spatial with recalibration + decisional without recalibration,

  • (S + DR) Spatial without recalibration + decisional with recalibration

  • (SR + DR) Spatial with recalibration + decisional with recalibration.

Subject-level effects were included as random effects. For each of the 2 hemispheres × 5 ROIs we fitted (1) the Null, S, D, S + D LME models to the pre-adaptation phase (i.e. 7 conditions, see Fig. 5b top row) and (2) the Null, S + D, SR + D, S + DR, SR + DR LME models to the pre- and post-adaptation phase (i.e. 21 conditions, see Fig. 5b bottom row). All LME models were fitted using maximum-likelihood estimation and the Bayesian information criterion (BIC) was computed as76:

BIC=logeL12MlogeN 6

where L stands for the likelihood of the model given the data, M is the number of parameters, N is the number of observations, and loge is the natural logarithm. The figures show the natural logarithm of the Bayes factors (Loge-Bayes factors) averaged across hemispheres for LME models relative to the null LME (i.e. BICmodel − BICnull; for visualization details see Supplementary Methods: Plotting of regional mean BOLD-responses).

Multivariate pattern: pattern component modelling of fMRI and EEG data

To assess whether spatial and/or decisional uncertainty models can explain the fine-scale fMRI or EEG activity patterns across the 7 (sound location) × 3 (pre, post-VA, post-AV) = 21 conditions, we combined Pattern Component Modelling (PCM32, https://github.com/jdiedrichsen/pcm_toolbox) and Bayesian model comparison. Like the more widely used representational similarity analyses, pattern component modelling allows one to investigate whether specific representational structures—as implied for instance by the spatial or decisional uncertainty model—are expressed in fMRI or EEG activity patterns. Critically, PCM goes beyond standard representational similarity analyses by allowing us to assess arbitrary mixtures of representational components such as a combined spatial + decisional uncertainty model (i.e. with unknown mixture weights, for further information see Supplementary Methods: Multivariate pattern:—pattern component modelling of fMRI and EEG data).

Consistent with our LME analysis, we generated second-moment matrices (‘pattern components’) as predictors for PCM based on the activations of 360 simulated neurons from the spatial and decisional uncertainty models, respectively.

We compared the following PCM models for fMRI and EEG data:

Null PCM: all conditions are independent (i.e. the second-moment matrix is the identity matrix).

Spatial PCM (S): activity patterns generated by the spatial model without recalibration.

Decisional PCM (D): activity patterns generated by the decisional uncertainty model without recalibration.

Combined (spatial + decisional) PCMs: activity patterns are a weighted linear combination of the patterns generated by the spatial and the decisional uncertainty model. We factorially manipulated whether the spatial and/or decisional uncertainty model accommodates audiovisual recalibration:

  • (S + D) spatial component without recalibration and decisional component without recalibration.

  • (SR + D) spatial component with recalibration and decisional component without recalibration.

  • (S + DR) spatial component without recalibration and decisional component with recalibration.

  • (SR + DR) spatial component with recalibration and decisional component with recalibration.

Free (i.e. fully flexible) PCM imposes no constraints on the second-moment matrix and provides an upper benchmark. If a model performs at least as good as the cross-validated free PCM, it is sufficiently complex to capture all consistent variations in the data32.

fMRI and EEG fusion PCMs: To assess whether the fMRI activation patterns evolve with different time courses in EEG, we computed five PCMs each including only one pattern component generated from the BOLD-response patterns of HG, hA, IPS, IPL and FEF.

Model estimation and comparison: In our fMRI analysis these PCM models were applied to the pre-whitened parameter estimates from the first-level GLM analysis separately for each of the 5 fMRI ROIs (i.e. HG, hA, IPS, IPL, FEF pooled over both hemispheres). In our EEG analysis they were applied to pre-whitened evoked EEG potentials averaged within individual runs (20 trials) and within each of the 4 EEG time windows (i.e. [50–150] ms, [150–250] ms, [250–350] ms, [350–450] ms). Consistent with our LME analysis, we fitted (1) the Null, S, D, S + D PCMs to the pre-adaptation phase (i.e. 7 conditions, see Fig. 5c top row and Fig. 6c top row) and (2) the Null, S + D, SR + D, S + DR, SR + DR PCMs to the pre- and post-adaptation phase (i.e. 21 conditions, see Fig. 5c bottom row and 6c bottom row). We estimated the parameters of the PCM models in a leave-one-subject-out cross-validation scheme at the group level32. Because the parameter estimates were computed relative to the common baseline, we modelled the run mean as a fixed effect in fMRI when comparing the spatial, decisional, and spatial + decisional uncertainty models without recalibration. When comparing the four spatial + decisional uncertainty models with/without recalibration, we modelled the run effects as random, because a fixed effect run mean would have modelled out the recalibration effect across runs. In EEG, we modelled the run effect as a random effect (as they were not computed relative to a common baseline).

The marginal likelihood for each model and subject from the leave-one-subject-out cross-validation scheme was used as an approximation to the model evidence. We compared the models using the natural logarithm (loge) of the Bayes factors averaged across participants32.

Evidential categories for Loge-Bayes factors

Throughout the manuscript we report model comparison results using Bayes factors (BF), specifically the natural logarithm of the Bayes factors (LogeBF) which are a convenient way of expressing evidence in favour of a given model with respect to a reference model. Although BFs and LogeBFs are defined on a continuous scale, it is useful to subdivide this continuous scale into discrete evidential categories29. In our manuscript we followed the heuristic classification scheme described by29 first proposed by77 expressing the level of evidence in favour of the tested model with respect to the null model. Anecdotal evidence: BF of 1–3 (LogeBF of 0–1.1); Moderate evidence: BF of 3–10 (LogeBF of 1.1–2.3); Strong evidence: BF of >10 (LogeBF of >2.3).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Supplementary information

Reporting Summary (302.6KB, pdf)

Acknowledgements

This research was funded by the European Research Council (https://erc.europa.eu/; grant number: ERC-2012-StG_20111109 multsens).

Source data

Source Data (6.2MB, xlsx)

Author contributions

Conceptualization: M.A., A.M. and U.N.; Methodology: M.A. and A.M.; Software: M.A. and A.M.; Formal analysis: M.A., A.M. and U.N.; Investigation: M.A. and A.M.; Resources: U.N.; Data curation: M.A. and A.M.; Writing—original draft: M.A., A.M. and U.N.; Writing—review and editing: M.A., A.M. and U.N.; Visualization: M.A. and A.M.; Supervision: U.N.; Funding acquisition: U.N.

Peer review

Peer review information

Nature Communications thanks Biyu He, William Penny, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Data availability

The processed data files necessary to reproduce the results using the shared analysis code are available at figshare (10.6084/m9.figshare.19469861.v2)78. The raw data are available for research purposes only upon request from the corresponding author (M.A., mate.aller@mrc-cbu.cam.ac.uk), because of constraints imposed by the ethics approval under which this study was conducted. Source data are provided with this paper.

Code availability

The analysis code for this manuscript is available at GitHub (https://github.com/allermat/audiovisual_adaptation_fMRI_EEG)79.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Máté Aller, Agoston Mihalik.

Supplementary information

The online version contains supplementary material available at 10.1038/s41467-022-31549-0.

References

  • 1.Chen L, Vroomen J. Intersensory binding across space and time: a tutorial review. Atten. Percept. Psychophys. 2013;75:790–811. doi: 10.3758/s13414-013-0475-4. [DOI] [PubMed] [Google Scholar]
  • 2.Grothe B, Pecka M, McAlpine D. Mechanisms of sound localization in mammals. Physiol. Rev. 2010;90:983–1012. doi: 10.1152/physrev.00026.2009. [DOI] [PubMed] [Google Scholar]
  • 3.Kopčo N, Lin I-F, Shinn-Cunningham BG, Groh JM. Reference frame of the ventriloquism aftereffect. J. Neurosci. 2009;29:13809–13814. doi: 10.1523/JNEUROSCI.2783-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Maier JX, Groh JM. Multisensory guidance of orienting behavior. Hear. Res. 2009;258:106–112. doi: 10.1016/j.heares.2009.05.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Wandell BA, Dumoulin SO, Brewer AA. Visual field maps in human cortex. Neuron. 2007;56:366–383. doi: 10.1016/j.neuron.2007.10.012. [DOI] [PubMed] [Google Scholar]
  • 6.McAlpine D, Jiang D, Palmer AR. A neural code for low-frequency sound localization in mammals. Nat. Neurosci. 2001;4:396–401. doi: 10.1038/86049. [DOI] [PubMed] [Google Scholar]
  • 7.Ortiz-Rios M, et al. Widespread and opponent fMRI signals represent sound location in macaque auditory cortex. Neuron. 2017;93:971–983. doi: 10.1016/j.neuron.2017.01.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Salminen NH, May PJC, Alku P, Tiitinen H. A population rate code of auditory space in the human cortex. PLoS ONE. 2009;4:e7600. doi: 10.1371/journal.pone.0007600. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Stecker GC, Harrington IA, Middlebrooks JC. Location coding by opponent neural populations in the auditory cortex. PLoS Biol. 2005;3:0520–0528. doi: 10.1371/journal.pbio.0030078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Schlack A, Sterbing-D’Angelo SJ, Hartung K, Hoffmann K-P, Bremmer F. Multisensory space representations in the macaque ventral intraparietal area. J. Neurosci. 2005;25:4616–4625. doi: 10.1523/JNEUROSCI.0455-05.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Bertelson P, Frissen I, Vroomen J, de Gelder B. The aftereffects of ventriloquism: patterns of spatial generalization. Percept. Psychophys. 2006;68:428–436. doi: 10.3758/BF03193687. [DOI] [PubMed] [Google Scholar]
  • 12.Bosen AK, Fleming JT, Allen PD, O’Neill WE, Paige GD. Multiple time scales of the ventriloquism aftereffect. PLoS ONE. 2018;13:e0200930. doi: 10.1371/journal.pone.0200930. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Bruns P, Liebnau R, Röder B. Cross-modal training induces changes in spatial representations early in the auditory processing pathway. Psychol. Sci. 2011;22:1120–1126. doi: 10.1177/0956797611416254. [DOI] [PubMed] [Google Scholar]
  • 14.Frissen I, Vroomen J, de Gelder B, Bertelson P. The aftereffects of ventriloquism: generalization across sound-frequencies. Acta Psychol. (Amst.) 2005;118:93–100. doi: 10.1016/j.actpsy.2004.10.004. [DOI] [PubMed] [Google Scholar]
  • 15.Radeau M, Bertelson P. The after-effects of ventriloquism. Q. J. Exp. Psychol. 1974;26:63–71. doi: 10.1080/14640747408400388. [DOI] [PubMed] [Google Scholar]
  • 16.Recanzone GH. Rapidly induced auditory plasticity: the ventriloquism aftereffect. Proc. Natl Acad. Sci. USA. 1998;95:869–875. doi: 10.1073/pnas.95.3.869. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Woods TM, Recanzone GH. Visually induced plasticity of auditory spatial perception in macaques. Curr. Biol. 2004;14:1559–1564. doi: 10.1016/j.cub.2004.08.059. [DOI] [PubMed] [Google Scholar]
  • 18.Wozny DR, Shams L. Recalibration of auditory space following milliseconds of cross-modal discrepancy. J. Neurosci. 2011;31:4607–4612. doi: 10.1523/JNEUROSCI.6079-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Zierul B, Röder B, Tempelmann C, Bruns P, Noesselt T. The role of auditory cortex in the spatial ventriloquism aftereffect. NeuroImage. 2017;162:257–268. doi: 10.1016/j.neuroimage.2017.09.002. [DOI] [PubMed] [Google Scholar]
  • 20.Park H, Kayser C. Shared neural underpinnings of multisensory integration and trial-by-trial perceptual recalibration in humans. eLife. 2019;8:e47001. doi: 10.7554/eLife.47001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Zwiers MP, Van Opstal AJ, Paige GD. Plasticity in human sound localization induced by compressed spatial vision. Nat. Neurosci. 2003;6:175–181. doi: 10.1038/nn999. [DOI] [PubMed] [Google Scholar]
  • 22.Mullette-Gillman OA, Cohen YE, Groh JM. Eye-centered, head-centered, and complex coding of visual and auditory targets in the intraparietal sulcus. J. Neurophysiol. 2005;94:2331–2352. doi: 10.1152/jn.00021.2005. [DOI] [PubMed] [Google Scholar]
  • 23.Michalka SW, Rosen ML, Kong L, Shinn-Cunningham BG, Somers DC. Auditory spatial coding flexibly recruits anterior, but not posterior, visuotopic parietal cortex. Cereb. Cortex. 2016;26:1302–1308. doi: 10.1093/cercor/bhv303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Rauschecker JP, Tian B. Mechanisms and streams for processing of ‘what’ and ‘where’ in auditory cortex. Proc. Natl Acad. Sci. USA. 2000;97:11800–11806. doi: 10.1073/pnas.97.22.11800. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.van der Heijden K, Rauschecker JP, de Gelder B, Formisano E. Cortical mechanisms of spatial hearing. Nat. Rev. Neurosci. 2019;20:609–623. doi: 10.1038/s41583-019-0206-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Zatorre RJ, Bouffard M, Ahad P, Belin P. Where is ‘where’ in the human auditory cortex? Nat. Neurosci. 2002;5:905–909. doi: 10.1038/nn904. [DOI] [PubMed] [Google Scholar]
  • 27.Chang C-C, Lin C-J. LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2011;2:1–27. doi: 10.1145/1961189.1961199. [DOI] [Google Scholar]
  • 28.Akaike, H. A New Look at the Statistical Model Identification. in Selected Papers of Hirotugu Akaike (eds. Parzen, E., Tanabe, K. & Kitagawa, G.) 215–222 (Springer New York, 1998).
  • 29.Schönbrodt FD, Wagenmakers E-J. Bayes factor design analysis: planning for compelling evidence. Psychon. Bull. Rev. 2018;25:128–142. doi: 10.3758/s13423-017-1230-y. [DOI] [PubMed] [Google Scholar]
  • 30.Nili H, et al. A toolbox for representational similarity analysis. PLoS Comput. Biol. 2014;10:e1003553. doi: 10.1371/journal.pcbi.1003553. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Cox, M. A. A. & Cox, T. F. Multidimensional Scaling. in Handbook of Data Visualization (eds. Chen, C., Härdle, W. & Unwin, A.) 315–347 (Springer, 2008).
  • 32.Diedrichsen J, Yokoi A, Arbuckle SA. Pattern component modeling: a flexible approach for understanding the representational structure of brain activity patterns. NeuroImage. 2018;180:119–133. doi: 10.1016/j.neuroimage.2017.08.051. [DOI] [PubMed] [Google Scholar]
  • 33.Friston KJ, Diedrichsen J, Holmes E, Zeidman P. Variational representational similarity analysis. NeuroImage. 2019;201:115986. doi: 10.1016/j.neuroimage.2019.06.064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Middlebrooks JC, Clock AE, Xu L, Green DM. A panoramic code for sound location by cortical neurons. Science. 1994;264:842–844. doi: 10.1126/science.8171339. [DOI] [PubMed] [Google Scholar]
  • 35.Grinband J, Hirsch J, Ferrera VP. A neural representation of categorization uncertainty in the human brain. Neuron. 2006;49:757–763. doi: 10.1016/j.neuron.2006.01.032. [DOI] [PubMed] [Google Scholar]
  • 36.Cichy RM, Oliva A. A M/EEG-fMRI fusion primer: resolving human brain responses in space and time. Neuron. 2020;107:772–781. doi: 10.1016/j.neuron.2020.07.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Zaidel A, Ma WJ, Angelaki DE. Supervised calibration relies on the multisensory percept. Neuron. 2013;80:1544–1557. doi: 10.1016/j.neuron.2013.09.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Bruns P, Röder B. Sensory recalibration integrates information from the immediate and the cumulative past. Sci. Rep. 2015;5:12739. doi: 10.1038/srep12739. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Mendonça C, Escher A, van de Par S, Colonius H. Predicting auditory space calibration from recent multisensory experience. Exp. Brain Res. 2015;233:1983–1991. doi: 10.1007/s00221-015-4259-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Tian B, Reser D, Durham A, Kustov A, Rauschecker JP. Functional specialization in rhesus monkey auditory cortex. Science. 2001;292:290–293. doi: 10.1126/science.1058911. [DOI] [PubMed] [Google Scholar]
  • 41.Winkler, I., Denham, S. & Escera, C. Auditory Event-related Potentials. in Encyclopedia of Computational Neuroscience (eds. Jaeger, D. & Jung, R.) 1–29 (Springer, 2013).
  • 42.Raposo D, Kaufman MT, Churchland AK. A category-free neural population supports evolving demands during decision-making. Nat. Neurosci. 2014;17:1784–1792. doi: 10.1038/nn.3865. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Petkov CI, Kayser C, Augath M, Logothetis NK. Functional imaging reveals numerous fields in the monkey auditory cortex. PLoS Biol. 2006;4:e215. doi: 10.1371/journal.pbio.0040215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Zimmer U, Macaluso E. High binaural coherence determines successful sound localization and increased activity in posterior auditory areas. Neuron. 2005;47:893–905. doi: 10.1016/j.neuron.2005.07.019. [DOI] [PubMed] [Google Scholar]
  • 45.Watson DM, Akeroyd MA, Roach NW, Webb BS. Distinct mechanisms govern recalibration to audio-visual discrepancies in remote and recent history. Sci. Rep. 2019;9:8513. doi: 10.1038/s41598-019-44984-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Park H, Kayser C. The neurophysiological basis of the trial-wise and cumulative ventriloquism aftereffects. J. Neurosci. 2021;41:1068–1079. doi: 10.1523/JNEUROSCI.2091-20.2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Noppeney U. Perceptual inference, learning, and attention in a multisensory world. Annu. Rev. Neurosci. 2021;44:449–473. doi: 10.1146/annurev-neuro-100120-085519. [DOI] [PubMed] [Google Scholar]
  • 48.Mihalik A, Noppeney U. Causal inference in audiovisual perception. J. Neurosci. 2020;40:6600–6612. doi: 10.1523/JNEUROSCI.0051-20.2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Werner S, Noppeney U. Distinct functional contributions of primary sensory and association areas to audiovisual integration in object categorization. J. Neurosci. 2010;30:2662–2675. doi: 10.1523/JNEUROSCI.5091-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Gardner WG, Martin KD. HRTF measurements of a KEMAR. J. Acoust. Soc. Am. 1995;97:3907–3908. doi: 10.1121/1.412407. [DOI] [Google Scholar]
  • 51.Brainard DH. The psychophysics toolbox. Spat. Vis. 1997;10:433–436. doi: 10.1163/156856897X00357. [DOI] [PubMed] [Google Scholar]
  • 52.Kleiner, M., Brainard, D. & Pelli, D. What’s new in Psychtoolbox-3? in Perception, vol. 36 (EVCP Abstract Supplement) (Pion Ltd., 2007).
  • 53.Fründ I, Haenel NV, Wichmann FA. Inference for psychometric functions in the presence of nonstationary behavior. J. Vis. 2011;11:16. doi: 10.1167/11.6.16. [DOI] [PubMed] [Google Scholar]
  • 54.Schütt HH, Harmeling S, Macke JH, Wichmann FA. Painfree and accurate Bayesian estimation of psychometric functions for (potentially) overdispersed data. Vis. Res. 2016;122:105–123. doi: 10.1016/j.visres.2016.02.002. [DOI] [PubMed] [Google Scholar]
  • 55.Prins N, Kingdom FAA. Applying the model-comparison approach to test specific research hypotheses in psychophysical research using the palamedes toolbox. Front. Psychol. 2018;9:1250. doi: 10.3389/fpsyg.2018.01250. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Meijer D, Veselič S, Calafiore C, Noppeney U. Integration of audiovisual spatial signals is not consistent with maximum likelihood estimation. Cortex. 2019;119:74–88. doi: 10.1016/j.cortex.2019.03.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Buergers, S. & Noppeney, U. The role of alpha oscillations in temporal binding within and across the senses. Nat. Hum. Behav. 10.1038/s41562-022-01294-x (2022). [DOI] [PMC free article] [PubMed]
  • 58.Rigoux L, Stephan KE, Friston KJ, Daunizeau J. Bayesian model selection for group studies—-revisited. NeuroImage. 2014;84:971–985. doi: 10.1016/j.neuroimage.2013.08.065. [DOI] [PubMed] [Google Scholar]
  • 59.Friston KJ, et al. Statistical parametric maps in functional imaging: a general linear approach. Hum. Brain Mapp. 1994;2:189–210. doi: 10.1002/hbm.460020402. [DOI] [Google Scholar]
  • 60.Ledoit O, Wolf M. A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal. 2004;88:365–411. doi: 10.1016/S0047-259X(03)00096-4. [DOI] [Google Scholar]
  • 61.Destrieux C, Fischl B, Dale A, Halgren E. Automatic parcellation of human cortical gyri and sulci using standard anatomical nomenclature. NeuroImage. 2010;53:1–15. doi: 10.1016/j.neuroimage.2010.06.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Wang L, Mruczek REB, Arcaro MJ, Kastner S. Probabilistic maps of visual topography in human cortex. Cereb. Cortex N. Y. N. 1991. 2015;25:3911–3931. doi: 10.1093/cercor/bhu277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Rohe T, Noppeney U. Reliability-weighted integration of audiovisual signals can be modulated by top-down attention. eNeuro. 2018;5:ENEURO.0315-17.2018. doi: 10.1523/ENEURO.0315-17.2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Rohe T, Noppeney U. Cortical hierarchies perform Bayesian causal inference in multisensory perception. PLoS Biol. 2015;13:e1002073. doi: 10.1371/journal.pbio.1002073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Rohe T, Noppeney U. Distinct computational principles govern multisensory integration in primary sensory and association cortices. Curr. Biol. 2016;1:1–6. doi: 10.1016/j.cub.2015.12.056. [DOI] [PubMed] [Google Scholar]
  • 66.Ferrari A, Noppeney U. Attention controls multisensory perception via two distinct mechanisms at different levels of the cortical hierarchy. PLoS Biol. 2021;19:e3001465. doi: 10.1371/journal.pbio.3001465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Efron, B. & Tibshirani, R. J. An Introduction to the Bootstrap. (Springer US, 1993).
  • 68.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple. Test. J. R. Stat. Soc. Ser. B Methodol. 1995;57:289–300. [Google Scholar]
  • 69.Oostenveld R, Fries P, Maris E, Schoffelen J-M. FieldTrip: open source software for advanced analysis of MEG, EEG, and invasive electrophysiological data. Comput. Intell. Neurosci. 2011;2011:156869. doi: 10.1155/2011/156869. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Perrin F, Pernier J, Bertrand O, Echallier JF. Spherical splines for scalp potential and current density mapping. Electroencephalogr. Clin. Neurophysiol. 1989;72:184–187. doi: 10.1016/0013-4694(89)90180-6. [DOI] [PubMed] [Google Scholar]
  • 71.Maris E, Oostenveld R. Nonparametric statistical testing of EEG- and MEG-data. J. Neurosci. Methods. 2007;164:177–190. doi: 10.1016/j.jneumeth.2007.03.024. [DOI] [PubMed] [Google Scholar]
  • 72.Zylberberg A, Fetsch CR, Shadlen MN. The influence of evidence volatility on choice, reaction time and confidence in a perceptual decision. eLife. 2016;5:e17688. doi: 10.7554/eLife.17688. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Kiani R, Corthell L, Shadlen MN. Choice certainty is informed by both evidence and decision time. Neuron. 2014;84:1329–1342. doi: 10.1016/j.neuron.2014.12.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Fleming SM, Daw ND. Self-evaluation of decision-making: a general Bayesian framework for metacognitive computation. Psychol. Rev. 2017;124:91–114. doi: 10.1037/rev0000045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Schulz, L., Fleming, S. M. & Dayan, P. Metacognitive computations for information search: confidence in control. bioRxiv10.1101/2021.03.01.433342 (2021). [DOI] [PubMed]
  • 76.Bishop, C. M. Pattern Recognition and Machine Learning (Information Science and Statistics). (Springer-Verlag, 2006).
  • 77.Jeffreys, H. Theory of probability. (Oxford University Press, 1961).
  • 78.Aller, M., Mihalik, A. & Noppeney, U. Source data for research article ‘Audiovisual adaptation is expressed in spatial and decisional codes’. figshare10.6084/m9.figshare.19469861.v2 (2022). [DOI] [PMC free article] [PubMed]
  • 79.Aller, M., Mihalik, A. & Noppeney, U. Audiovisual adaptation is expressed in spatial and decisional codes, allermat/audiovisual_adaptation_fMRI_EEG: av_adapt 0.1.1. Zenodo 10.5281/zenodo.6572895 (2022). [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Reporting Summary (302.6KB, pdf)

Data Availability Statement

The processed data files necessary to reproduce the results using the shared analysis code are available at figshare (10.6084/m9.figshare.19469861.v2)78. The raw data are available for research purposes only upon request from the corresponding author (M.A., mate.aller@mrc-cbu.cam.ac.uk), because of constraints imposed by the ethics approval under which this study was conducted. Source data are provided with this paper.

The analysis code for this manuscript is available at GitHub (https://github.com/allermat/audiovisual_adaptation_fMRI_EEG)79.


Articles from Nature Communications are provided here courtesy of Nature Publishing Group

RESOURCES