Abstract
The integration of multisensory information has been shown to be guided by spatial and temporal proximity, as well as to be influenced by attention. Here we used neural measures of the multisensory spread of attention to investigate the spatial and temporal linking of synchronous versus near-synchronous auditory and visual events. Human participants attended selectively to one of two lateralized visual-stimulus streams while task-irrelevant tones were presented centrally. Electrophysiological measures of brain activity showed that tones occurring simultaneously or delayed by 100ms were temporally linked to an attended visual stimulus, as reflected by robust cross-modal spreading-of-attention activity, but not when delayed by 300ms. The neural data also indicated a ventriloquist-like spatial linking of the auditory to the attended visual stimuli, but only when occurring simultaneously. These neurophysiological results thus provide unique insight into the temporal and spatial principles of multisensory feature integration and the fundamental role attention plays in such integration.
Keywords: Multisensory, Attention, Temporal, Spatial, ERP, Ventriloquism
Introduction
Sensory perception entails the ability to combine information from multiple modalities to reliably determine the objects in one’s environment. This can be accomplished in various ways, often invoking selective attention toward one or multiple modalities to guide the stimulus processing. Fundamental to the grouping of multisensory information into meaningful entities are several general principles that underscore the temporal and spatial linking of stimulus input components (Stein and Meredith, 1993, Stein and Stanford, 2008).
The necessity for some degree of temporal correspondence for multisensory integration to occur has been shown both neurophysiologically and behaviorally. More specifically, as the stimulus onset asynchrony (SOA) between the unisensory components of a multisensory stimulus increases (beyond the typical temporal window of integration of ~150ms), the probability that they will be neurally integrated and judged as from the same source or event decreases (Meredith et al., 1987, Stone et al., 2001, Schneider and Bavelier, 2003).
Likewise, with both neural recordings (Meredith and Stein, 1986, Wallace et al., 1996) and behavioral measures (Spence et al., 2003, Gondan et al., 2005, Keetels and Vroomen, 2005, Bolognini et al., 2007), as spatial discrepancy increases, the likelihood also decreases for physiological multisensory interaction and for the behavioral judgment of perceptual correspondence. Importantly, however, as manifested in the phenomenon of ventriloquism, when a physically separated sound occurs concurrently with a visual stimulus, the perceived auditory location tends to be shifted toward the visual (Bertelson and Radeau, 1981, Hairston et al., 2003), and spatially separated but synchronous multisensory stimuli can still yield behavioral and neural enhancements (Teder-Salejarvi et al., 2005). Moreover, in an explicit auditory localization task, this perceptual shift has been associated with a shift in the auditory brain response toward the side contralateral to the visual stimulus, presumably reflecting perceptual spatial integration of these spatially disparate stimuli (Bonath et al., 2007). To date, however, relatively little is known about how the temporal factors of multisensory integration interact neurally with the spatial factors, such as during the ventriloquist illusion, and the role attention might play in these interactions.
Previously, we had reported neural activity measures showing that attention to stimuli in one modality (vision) can spread to irrelevant but synchronous stimuli in another modality (audition), even when arising from different locations (Busse et al., 2005). This effect was reflected electrophysiologically by a late-onsetting (>200ms), long-lasting, negative-polarity ERP wave, as well as enhanced auditory-cortex fMRI activity, being elicited by sounds occurring synchronously with an attended, spatially disparate, visual stimulus. This spreading-of-attention effect was interpreted as a cross-sensory, object-related, linking process (see also Molholm et al., 2007, Fiebelkorn et al., 2009), reflecting a multisensory version of attentional spreading previously observed across unimodal visual objects (Egly et al., 1994, Martinez et al., 2007). Here, we investigated the temporal and spatial linking of multisensory stimulus components, and the role of attention in this linking, by examining the cross-modal attentional spreading between spatially disparate visual and auditory stimulus events occurring with different temporal separations, both within and outside of the temporal window of integration. We report a fundamental role of attention in the multisensory-linking processes, as well as a dissociation between patterns of brain activity reflecting the temporal and spatial linking of the stimuli.
Methods and Materials
Participants
Eighteen healthy right-handed adult volunteers (9 male) participated in the study (ages: 18–24 yrs; M= 21.1). Two additional participants were excluded due to poor behavioral task performance. Participants gave written informed consent and were financially compensated for their time. All procedures conducted were approved by the Duke University Health System Institutional Review Board.
Stimuli and Task
To determine the spread of attention at varying SOAs, we adapted the bilateral attentional streaming paradigm we had employed previously for simultaneous visual and auditory events (Busse et al., 2005). During each block, participants were instructed to covertly attend to the left side or to the right side of a central fixation point (Fig. 1). Visual stimuli were randomly presented to the lower left or lower right quadrant of the screen (at 12.3° visual angle to the left and right of the center, and 3.4° below the central fixation). The visual stimuli were checkerboard images with 0, 1, or 2 dots contained within the checkerboard. Each visual stimulus was on the screen for 33 ms, and the inter-trial interval was jittered between 950 and 1050 ms. Participants were instructed to detect an occasional target visual image (a checkerboard on the designated side with 2 dots, 14% probability) in the attended visual stream, and to press a button when this image appeared. Accuracy and reaction times (RTs) were recorded, and for each participant the difficulty level was titrated by adjusting the contrast and size of the dots within the target images so that participants were ~80% correct in detecting the target-stimulus checkerboard possessing two dots.
All visual stimuli (targets and non-targets, both on the attended and unattended sides) were presented in the following multisensory conditions: Visual Only, Visual with Simultaneous Auditory (Simultaneous), Visual with Auditory Delayed by 100 ms (Delay-100), or Visual with Auditory Delayed by 300 ms (Delay-300; Fig. 1B). In each of the multisensory trial types, the auditory stimulus consisted of a tone pip (33 ms duration, 1200 Hz, 60 dBSL, 5 ms rise and fall periods) presented centrally. Participants were instructed to ignore all of the auditory stimuli as being irrelevant to their task. All stimuli were presented in Matlab (Mathworks) using Psychophysics Toolbox 3 (Brainard, 1997, Pelli, 1997). After one practice block, participants completed a total of 30 experimental blocks (half attend left, half attend right), each a little over two minutes in duration. The trial types were presented in randomized and counterbalanced order within each block, and the order of the blocks was randomized for all participants.
Post-EEG Behavioral Assessment of Simultaneous Judgment Perception
In order to assess participants’ ability to determine the temporal separation between auditory and visual events, each participant was behaviorally tested using a simultaneity judgment task immediately after the EEG recording session. In this task, as before, participants were instructed to covertly attend to the left or right side of a central fixation point, and lateralized visual streams were presented as during the EEG session. The auditory stimuli, also similar to before, were presented centrally, either simultaneously with the visual (Simultaneous), delayed by 100ms (Delayed-100) or delayed by 300ms (Delayed-300), and the visual stimuli were always accompanied by one of these auditory conditions (i.e. there were no Visual-Only trials here). The time between consecutive visual stimuli was jittered from 1450 to 1550 ms, to allow enough time for participants to make a simultaneity judgment and to respond. More specifically, participants were instructed to judge whether the visual and auditory components of the stimulus were simultaneous, indicating their judgment with a button press. A total of 48 trials were completed for each of the 3 conditions.
Behavioral Data Analysis
Reaction times (RTs), hits, and false alarms were obtained for each subject for the behavioral data obtained during the EEG recording session. Outlier trials, as defined by having reaction times more than two standard deviations from each subject’s mean reaction time, were excluded from the analysis. Repeated-measures analyses of variance (ANOVAs) were conducted to look at the effect of multisensory-SOA condition on reaction time and accuracy, and any significant effects, using an alpha level of 0.05, were followed-up with t-tests. In the simultaneity judgment task, the ‘percent simultaneous’ judgment responses were calculated for each condition, and a repeated-measures ANOVA was conducted to see if these judgments differed between the SOA conditions.
EEG Recording and Analysis
Continuous electroencephalogram (EEG) was recorded using a Synamps Neuroscan system (Charlotte, NC) and a customized elastic electrode cap (Electro-Cap International, Eaton, OH) contained 64 electrodes. The data were sampled at 500 Hz, bandpass filtered online from 0.01–100 Hz, and referenced to the right mastoid electrode site. Eye blinks and eye movements were recorded using two electrodes lateral to each eye, referenced to each other, and two electrodes inferior to each orbit, referenced to electrodes above the eyes.
Offline, the data were filtered with a low-pass filter that strongly attenuated signal frequencies above 50 Hz. Trials that contained eye movements or blinks were rejected, as were trials with excess muscle activity or excess slow drift. The time range around each trial used for assessment of artifact was −250 to 950 ms, relative to the onset of the visual stimulus. The artifact-rejection threshold level was titrated individually for each participant, and that value was then used for an computer-applied rejection processes applied in automatic fashion across all the trials for that participant. The data were re-referenced to the algebraic average of the left and right mastoid electrodes. Time-locked ERP averages were obtained for each of the different conditions, and difference waves were calculated based on these averages. For the analyses reported here, only the non-target trials were considered, thereby focusing on the influence of the visual spatial attention manipulation without the presence of the large, long-latency ERP waves (e.g., P300s) associated with target detection. To examine the differences between conditions, repeated-measures ANOVAs were conducted on mean amplitude measures of brain activity (see Results) across subjects using a prestimulus baseline of 200 ms. All offline processing was done using the ERPSS software package (UCSD, San Diego, CA).
Extraction of Spreading-of-Attention Activity
To extract the activity associated with the processing of the task-irrelevant auditory tone as a function of whether it was accompanied by an attended or unattended visual stimulus, the following analysis was conducted. In each of the three auditory conditions (Simultaneous, Delay-100 and Delay-300), the task-irrelevant tones were always presented centrally in the same trial with either a visually attended or a visually unattended lateral stimulus. To separate the contribution of the visual stimuli on the ERPs in the multisensory conditions, the Visual-Only condition was subtracted from the Simultaneous, Delay-100 and Delay-300 conditions (all time-locked to the onset of the visual stimulus), separately for each visual-attention condition, isolating the activity linked to the processing of the auditory stimulus under each multisensory attentional context. These extracted ERP responses to the central tones when they were accompanied by an attended versus an unattended lateral visual event could then be compared to extract the possible spread of attention across modality and space to the tones (see Fig. 4). In addition, the conditions were collapsed across the left and right side to obtain this overall attentional spreading effect, regardless of the side of visual stimulation. This spreading-of-attention activity was extracted and analyzed for each of the three SOA delay conditions.
Two additional comparisons were performed between the extracted auditory responses for the three SOA conditions: one for when the lateral visual stimulus was attended and one for when it was unattended. These were to assess the effect of the SOA manipulation separately for within each of the visual-attention conditions.
Results
Behavioral results for the visual attention task during the EEG runs
For the visual attention task during which the EEG was recorded, response times (RTs) and detection accuracy for the visual target stimuli were collected. (Note that the centrally presented auditory tones were always task irrelevant in these runs.) No significant differences in accuracy for the visual targets were observed between the three stimulus onset asynchrony (SOA) conditions, with the performance in each case being close to the desired difficulty titration level of 80% correct (Visual Only: M = 78.3%, SD = 13.3%; Simultaneous: M = 78.8%, SD = 9.5%; Delay-100: M = 77.0%, SD = 14.1%; Delay-300: M = 78.8%, SD = 12.5%). For the RTs (Visual Only: M = 587 ms, SD = 46 ms; Simultaneous: M = 583 ms, SD = 49 ms; Delay-100: M = 586 ms, SD = 48 ms; Delay-300: M = 595 ms, SD = 50 ms), however, an analysis of variance (ANOVA) revealed a main effect of condition (F(3,51) = 2.82; P < 0.05), with the Delay-300 condition being significantly slower than the other conditions with an auditory component (Simultaneous vs. Delay-300: t(17) = 2.53; P < 0.05; Delay-100 vs. Delay-300: t(17) = 2.90; P = 0.01; Visual Only vs. Delay-300: t(17) = 1.96; P = 0.07).
Simultaneity-judgment task (behavior only)
In the separate behavioral task after the EEG session, participants were highly likely to judge the visual and auditory events as simultaneous in both the Simultaneous and Delay-100 conditions (87.55% and 78.85%, respectively), while they were very unlikely to judge them as simultaneous in the Delay-300 condition (37.16%; Fig. 2). An ANOVA revealed a main effect of condition (F(2,34) = 60.81; P < 0.001), with post-hoc t-tests showing differences between all three conditions (Simultaneous vs. Delay-100: t(17) = 4.11; P = 0.001; Simultaneous vs. Delay-300: t(17) = 8.68; P < 0.001); Delay-100 vs. Delay-300: t(17) = 7.21; P < 0.001).
Visual Spatial Attention ERP Effects
Visual spatial attention effects to the non-target visual stimuli that occurred by themselves (Visual Only trials) were assessed to assure that the manipulation of the subjects’ covert visual spatial attention was effective. Classical attentional modulations (e.g., Hillyard and Anllo-Vento, 1998) of the early sensory ERP components contralateral to the direction of visual attention were observed for both directed loci of attention (left and right). In particular, attended compared to unattended stimuli showed an increased positivity-polarity component at contralateral occipital sites (P1 effect) between 90–120 ms post stimulus, followed by an increased negativity-polarity wave over contralateral parietal-occipital sites (posterior N1 effect) between 190–230 ms (Fig. 3). An ANOVA that included the factors of attention (attended vs. unattended), stimulus location (left vs. right visual field), and hemisphere (left vs. right electrode location) confirmed the presence of a significant contralateral P1 attention effect over the latency window 90–120 ms with a three-way significant interaction across the occipital sites1 TO1/TO2, O1i/O2i, and P3i/P4i (F(1,17) = 22.87 P < 0.0005). The analyses similarly showed a significant N1 attention effect at posterior sites P3i/P4i, P3a/P4a, and O1/O2 (F(1,17) = 5.02 P < 0.05) from 190 to 230 ms. These attention effects on the sensory evoked ERP waves indicate that subjects were appropriately focusing their visual attention to the instructed side.
The Cross-modal Spread of Attention from Vision to Audition as a Function of SOA
Simultaneous Condition
Looking first at the Simultaneous condition, the ERPs to attended visual stimuli occurring alone was subtracted from the ERPs to attended visual stimuli occurring with a task-irrelevant central tone, thereby deriving an extracted ERP to the central auditory tone elicited in the context of occurring synchronously with an attended lateral visual stimulus from a different location. An analogous subtraction was performed to extract the ERP to the central tone when it occurred with an unattended lateral visual stimulus. Comparing these two extracted ERP responses should reveal any differences in auditory activity derived from the differential visual attention, reflecting the spreading of attention from the visual event to the synchronous auditory stimulus (Busse et al., 2005). In the Simultaneous condition here, we replicated our previously reported pattern of multisensory-attentional-spread activity, appearing as in Busse et al. (2005) as a long-lasting, fronto-central, negative-polarity wave from ~200 to ~700 ms (F(1,17) = 16.80, P < 0.001, at sites Fz, FCz, FC1 and FC2; Fig. 4), elicited by the central tones occurring simultaneously with an attended lateral visual stimulus relative to an unattended one (see Table 1 for detailed statistics). The ERPs at time points prior to and beyond the time period of 200–700 ms did not differ significantly (all Ps > 0.05). In that the spread-of–attention effect seemed to have a particularly anterior distribution, the additional, somewhat more anterior sites of Fp1m, Fp2m, and Fpz (Fig. 5) were also tested for significant differences between the attended-visual and unattended-visual multisensory conditions from 200 to 700 ms, and indeed were also highly significant (F(1,17) = 15.01, P = 0.001).
Table 1.
Attended vs. Unattended | Simultaneous | Delay-100 | Delay-300 | ||||
---|---|---|---|---|---|---|---|
Time Window | DF | F | P | F | P | F | P |
150 to 200 ms | 1,17 | <0.1 | NS (0.90) | <0.1 | NS (0.78) | 0.12 | NS (0.74) |
200 to 250 ms | 1,17 | 18.83 | 0.0004 | 1.25 | NS (0.28) | 0.44 | NS (0.52) |
250 to 300 ms | 1,17 | 14.2 | 0.002 | 1.86 | NS (0.20) | 0.2 | NS (0.66) |
300 to 350 ms | 1,17 | 7.32 | 0.02 | 15.12 | 0.001 | <0.1 | NS (0.77) |
350 to 400 ms | 1,17 | 8.99 | 0.008 | 10.42 | 0.0049 | 0.58 | NS (0.46) |
400 to 450 ms | 1,17 | 4.98 | 0.04 | 1.84 | NS (0.20) | <0.1 | NS (0.87) |
450 to 500 ms | 1,17 | 6.31 | 0.02 | 7.7 | 0.01 | 0.62 | NS (0.40) |
500 to 550 ms | 1,17 | 12.44 | 0.003 | 12.96 | 0.002 | 1.57 | NS (0.20) |
550 to 600 ms | 1,17 | 18.4 | 0.0005 | 8.52 | 0.01 | 0.82 | NS (0.40) |
600 to 650 ms | 1,17 | 13.55 | 0.002 | 7.39 | 0.01 | 1.82 | NS (0.20) |
650 to 700 ms | 1,17 | 8.42 | 0.01 | 2.17 | NS (0.20) | 4.22 | NS (0.06) |
700 to 750 ms | 1,17 | 5.77 | 0.03 | 1.77 | NS (0.20) | 2.11 | NS (0.20) |
750 to 800 ms | 1,17 | 3.42 | NS (0.08) | 0.98 | NS (0.30) | 2.22 | NS (0.20) |
800 to 850 ms | 1,17 | 2.14 | NS (0.16) | 1.08 | NS (0.30) | 3.87 | NS (0.07) |
850 to 900 ms | 1,17 | 2.6 | NS (0.13) | <0.01 | NS (0.90) | 3.44 | NS (0.08) |
900 to 950 ms | 1,17 | 1.18 | NS (0.29) | 0.17 | NS (0.68) | 0.62 | NS (0.40) |
950 to 1000 ms | 1,17 | 0.52 | NS (0.48) | 0.27 | NS (0.61) | 0.2 | NS (0.60) |
1000 to 1050 ms | 1,17 | 0.16 | NS (0.69) | 0.11 | NS (0.74) | 0.4 | NS (0.50) |
Tone-Delayed-by-100-ms Condition (Delay-100)
In the Delay-100 condition, as indicated in Figs 4 and 5, the onset of the spreading-attention effect was shifted in time by ~100 ms, with the apparent onset of the late negative wave starting at 300 ms, rather than 200 ms. Testing the same frontal-central sites (Fz, FCz, FC1 and FC2; Fig. 5) as for the Simultaneous condition revealed a significant effect of condition (attended-visual vs. unattended-visual) from 300 to 800 ms (F(1,17) = 6.71, P < 0.05; Table 1). Importantly, no effects of multisensory attentional context were found prior to 300ms, supporting the presence of a 100-ms temporal shift for the attention-spreading effect when the auditory stimulus was delayed by 100 ms. As with the Simultaneous condition, an additional set of more anterior electrodes (Fp1m, Fp2m, and Fpz) were also tested from 300 to 800 ms and, as above, a significant effect of multisensory attentional context was also observed over these channels (F(1,17) = 8.57, P < 0.01).
Tone-Delayed-by-300-ms Condition (Delay-300)
Using the same subtractive methods, the Delay-300 condition was examined to look at the effects of the spreading of attention to the centrally presented auditory tone delayed by this greater interval. As indicated in Fig. 4 and 5, the late negative wave observed in the Simultaneous and Delay-100 conditions was essentially eliminated for the Delay-300 condition. The analyses revealed no 50-ms time periods that had significant attentional-spread activity between 500 and 1000 ms (Table 1), nor earlier or later; however, an analysis of the more anterior electrodes did reveal a small significant effect of the spreading of attention if taken across the entire time range (500–1000 ms: F(1,17) = 6.00, P < 0.05; Fig. 4 and 5).
Although the analyses described above show clear differences between the extracted auditory responses in the Simultaneous, Delay-100, and Delay-300 conditions as a function of whether the accompanying lateral visual stimulus was attended versus unattended, we wanted to determine whether or not the SOA effects observed were present in both the attended-visual and unattended-visual conditions, but just larger in the attended, or if they were only present in the attended. To assess this, we conducted two additional ANOVAs of the extracted auditory activity, separately for when the accompanying lateral visual stimulus was attended and for when it was unattended, with SOA as the main factor. The data that went into these ANOVAs were the mean amplitude values across the fronto-central ROI of sites Fz, FCz, FC1 and FC2, extracted from 200–700 ms for the Simultaneous condition, from 300–800 ms for the Delay-100, and from 500–1000 ms for the Delay-300 condition (i.e., after subtracting off the corresponding visual-only responses). For the attended-visual-stimulus conditions, this analysis revealed a clear main effect of the SOA, showing that there were clear differences present in the extracted auditory activity as a function of the relative delay of the onset of the auditory stimulus when the accompanying lateral visual stimulus was attended (F(2,34) = 3/28, P = 0.05). In contrast, the corresponding ANOVA across the same channels and time periods for the extracted auditory activity for the unattended-visual-stimulus conditions showed no effect of SOA (F<1), indicating that when the accompanying lateral visual stimulus was unattended, there were no significant differences in the response to the auditory stimulus as a function of its relative timing.
Distribution Comparisons
A close inspection of the topographic maps (Fig. 5) suggests that there was a change in the distribution for the spreading-of-attention effect in the earliest time range of activity in the Simultaneous versus the Delay-100 conditions, with the Simultaneous condition showing an initial period (latency 200–250 ms) of fronto-central activity, which then shifted to a more anterior position shortly later (i.e., shifting anteriorly at around 250 to 300 ms post-stimulus onset). In contrast, the effect in the Delay-100 condition appeared to lack the early fronto-central effect, having the more anterior distribution across its entire duration, suggesting the presence of an additional early source in the Simultaneous conditions that was not present in the Delay-100 condition.
To determine if this apparent distributional difference was statistically significant we analyzed data from the 20 most anterior electrodes in different time windows. Using these electrodes, we vector-scaled the data using the McCarthy and Wood approach (McCarthy and Wood, 1985) and then determined if any time-window by electrode interactions existed, which, if present, would indicate the presence of a significant shift in the distribution between those two time periods. Such a significant shift in distribution was indeed observed when comparing the initial onset of the negativity (200 to 250 ms) with a later portion of this long-lasting negative wave (300 to 350 ms; F(9,153) = 3.33, P = 0.001), with the activity being more anterior in the later time window. Moreover, the Delay-100 condition appeared to lack this initial, more central distribution. This was examined statistically by testing the initial period of the spreading-of-attention effect, again using the twenty most anterior electrodes, for the Delay-100 condition (300 to 350 ms) versus the initial phase of the Simultaneous condition (200 to 250 ms), which also revealed a significant difference in distribution (F(9,153) = 3.97, P = 0.0001). In addition, the distribution of the Delay-100 condition in its initial effect period from time 300 to 350 ms did not differ from the distribution of the Simultaneous condition in the same (300–350 ms) latency (F<1), both being the more anterior distribution. These results thus provide further converging evidence that there was an additional early source present in the Simultaneous attentional-spreading activity that was not present in the Delay-100 condition (see Fig. 5).
Spatial Shifts and Ventriloquism
Ventriloquism is defined as a shift in the perceived location of a sound toward a simultaneous visual stimulus occurring in a different location (Bertelson and Radeau, 1981). Here, the tones were always centrally presented and were task irrelevant, with the attended and unattended visual stimuli being lateralized. If the perceived location of the tones was shifted in position toward the simultaneous visual stimulus when the latter was attended (as opposed to unattended), then there should be a lateral shift of the representation of the auditory stimulus in the brain toward the side contralateral to the attended visual stimulus, as observed previously in an explicit localization task – that is, the spreading-of-attention activity should be shifted contralaterally. To determine if this occurred, the spreading-of-attention activity for each of the SOA conditions was analyzed separately for when the visual stimulus was on the left versus on the right (Fig. 6). In the top panel, the unattended Simultaneous condition for left visual stimuli was subtracted from the attended Simultaneous condition for left visual stimuli (all conditions having already subtracted the respective Visual-Only ERP responses), analyzed for the time period of 200 to 250 ms. This subtraction revealed that the initial attentional-spread neural activity in the Simultaneous conditions was indeed shifted toward the side contralateral to the attended visual stimulus. Correspondingly, the analogous analysis for auditory tones occurring with an attended right stimulus revealed a shift in the opposite direction (Fig. 6). As observed in the middle and bottom panels of Fig. 6, this same subtraction conducted for the corresponding initial time period in the Delay-100 and Delay-300 conditions (from 300 to 350 and 500 to 550 ms, respectively) showed no such lateralization for the processing of the extracted responses to the auditory stimuli.
To statistically assess this effect, we performed an ANOVA of the activity in the time period from 200 to 250 ms for the Simultaneous condition over the fronto-central sites C1a, C1p, C5a, C2a, C2p, and C6a, using the factors of stimulus location, hemisphere (electrode location), and attention. This analysis confirmed a significant interaction (F(1,17) = 4.58, P < 0.05) between these factors, due to the attentional-spreading effect being shifted to the side contralateral to the visual stimulus. There was also a significant lateralization interaction from 250 to 300 ms (F(1,17) = 7.36, P < 0.05) and 300 to 350 ms (F(1,17) = 417.09, P < 0.001); however these later interactions were driven by shifts toward the side contralateral to the right attended visual stimulus, with no shifts toward the side contralateral to the left visual stimulus (P’s > 0.1 for attended left minus unattended left on right versus left channels). No such significant interactions were observed for the Delay-100 and Delay-300 conditions, analyzed in the corresponding initial time windows for the effects in those conditions (300 to 350 ms and 500 to 550 ms, respectively), nor any other time windows for those conditions (F's<1).
Discussion
This study is the first to provide a clear dissociation between the multisensory linking of the temporal and spatial aspects of the auditory and visual components of a multisensory stimulus, reflected by the spreading-of-attention across a multisensory object. While it is apparent that the principles of sensory integration are fundamental to the successful linking of multisensory information (Stein and Stanford, 2008), to date the degree to which the temporal factors can interact with the spatial ones, and how these are modulated by attention, has not been much explored. Here, using an attentional manipulation and obtaining both neural and behavioral measures, we provide a new account of the spatial and temporal linking of auditory and visual stimuli, summarized in Table 2.
Table 2.
Stimulus | Neural | Perception | ||
---|---|---|---|---|
Simultaneous Auditory and Visual Stimuli |
→ | Spatial Linking (200–250ms) Temporal Linking (200–700 ms) Spreading of Attention (200–700ms) |
→ | Spatial Shift (Ventriloquism) Judged as Simultaneous |
Auditory Tone Delayed by 100 ms Compared to Visual |
→ | Temporal Linking (300–800ms) Spreading of Attention (300–800ms) |
→ |
No spatial shift Judged as Simultaneous |
Auditory Tone Delayed by 300 ms Compared to Visual |
→ |
No Spatial Linking No Temporal Linking No Attentional Spreading |
→ |
No spatial shift Not judged as Simultaneous |
First, as shown in the separate behavioral sessions, participants were likely to judge our stimuli as occurring simultaneously when they were either actually presented simultaneously or were delayed by 100 ms, but not when delayed by 300 ms, thus indicating the time window over which the stimuli are perceptually linked from a temporal standpoint. Second, in line with these behavioral findings, the neural (EEG) measures indicated that attention spread from the visual to the auditory modality when the stimuli were simultaneous or when the tone was delayed by 100 ms, but not when delayed by 300 ms. Therefore, it appears that for attention to spread successfully the stimuli need to be temporally linked within the classic time window of audiovisual perceptual integration (Meredith et al., 1987, Schneider and Bavelier, 2003, Zampini et al., 2005, van Wassenhove et al., 2007), or, conversely, for the stimuli to be temporally linked, attention needs to spread between them. While design limitations precluded the determination of any potential modulation in the unattended channel compared to baseline, significant differences between the extracted auditory activity for the different SOA conditions were only present for attended-visual-stimulus trials, and not for unattended-visual-stimulus ones, suggesting that only attended stimuli are differentially processed and linked under our temporal manipulation. Finally, only when the auditory stimuli were presented simultaneously with the lateral visual stimulus were they spatially linked, as indicated by a shift in the neural response to the central tone when it occurred with an attended versus an unattended lateralized visual stimulus (cf., Bonath et al., 2007). Delaying the tone by even 100 ms, while still leading to the temporal linking of the stimuli and a robust spreading of attention, did not lead to any neural reflection of spatial linking, as reflected by the lack of a contralateral shift of neural processing. Importantly, due to the design of the present paradigm, the spatial linking and neural shift effects that were observed in the simultaneous conditions occurred directly as a result of a cross-sensory attentional manipulation (i.e., deriving from the same auditory stimulus occurring with an attended versus an unattended lateral visual stimulus), underscoring the importance of attention in the ventriloquism effect and in the processes underlying the spatial and temporal linking of multisensory stimulus components.
Temporal linking of multisensory stimuli
There is considerable behavioral evidence suggesting that at SOAs of increasing disparity, multisensory stimuli are more likely to be perceived as separate (Spence et al., 2001, Schneider and Bavelier, 2003, Zampini et al., 2005), as observed here in our simultaneity judgment task. Further, in multisensory speech identification, a temporal separation of more than ~150 ms generally has little or no audio-visual benefit behaviorally when compared to visual alone (McGrath and Summerfield, 1985), whereas there is a clear benefit at SOAs of less than this separation. Neurally, semantically unrelated stimuli (e.g., a tone and a light) are linked together, as indexed by enhanced firing to the multisensory stimulus, but only when they occur within the temporal window of integration (Meredith et al., 1987).
Consistent with this previous work, our findings indicate that simple multisensory stimulus components will be perceptually linked when presented within the temporal window of integration. More importantly here, however, we demonstrate that attention will only spread from one modality to another only when stimuli are presented within this temporal window of integration, thereby illustrating the correspondence between the temporal window of integration and the temporal window over which attention will spread across modalities. Indeed, the present data suggest the intriguing hypothesis for the key role played by attention, and its striking ability to spread across sensory modalities, to potentially aid in (or be necessary for) the temporal linking of the component features of multisensory stimulus input into a perceptual whole. One might speculate further that it may be this sort of attentional spread that underlies, or at least contributes to, the perceptual linking of the various features of any multifeatureal object (e.g., Schoenfeld et al., 2003).
Spatial linking of multisensory stimuli
Another important new finding here is that of the tighter temporal constraints that appear to be required for the spatial linking of the different multisensory components. Only in the simultaneous condition, at the onset of the negative-polarity wave reflecting the attentional-spreading activity for the centrally presented tone, did we observe an additional ERP component showing a lateral shift to the side contralateral to the visual stimulus. The location and timing of this lateralized neural activity (centrally/fronto-centrally distributed, occurring at ~200–250 ms) is very similar to that found in an explicit auditory localization task in Bonath et al. (2007). In that study, on trials in which the percept of the spatial location of the auditory stimulus was shifted toward the visual, there was a corresponding lateralized shift in the distribution of the ERP activity contralateral to the location of the visual stimulus, with this activity being modeled as arising from auditory cortex (Bonath et al., 2007). The fMRI part of our previous study confirmed the presence of spreading-of-attention activity in auditory cortex (Busse et al., 2005), where similar regions in the planum temporale are involved in discriminating the spatial location of sound (Deouell et al., 2007).
In the present study, a particularly intriguing aspect of the ventriloquist-related finding was that the observed neural processing shift occurred directly as a result of an attentional manipulation, emerging as a difference for identical tones that occur simultaneously with an attended versus an unattended visual stimulus. Moreover, the effect occurred only in the Simultaneous condition, was present only in the initial 50–100 ms phase of the activity, and was elicited for auditory stimuli that were completely task irrelevant. While the delay of 100 ms allowed attention to still spread from the visual modality to the auditory modality, as reflected by the elicitation of the sustained negative-polarity ERP wave, and also resulted in the stimuli being still judged as being synchronous, this temporal offset appeared to be enough to abolish the lateral neural-processing shift associated with a ventriloquist effect. This neural result is consistent with previous behavioral studies reporting that increasing the audio-visual temporal separation reduces the perceived location shift of the auditory stimulus toward the visual, as compared to stimuli presented simultaneously or delayed by only 50 ms (Slutsky and Recanzone, 2001, Lewald and Guski, 2003). The present study provides the first electrophysiological evidence for the temporal limits of the neural processes that lead to the ventriloquist illusion.
Further, the present study emphasizes the important role of visual attention for this lateralization effect to occur, as it occurred directly as a result of the attentional manipulation. Although previous behavioral studies have suggested that the ventriloquist illusion is pre-attentive and not influenced by attention (e.g., Bertelson et al., 2000), the present findings argue strongly against such a conclusion, with explicit neural evidence showing a direct modulation of the spatial linking of auditory and visual stimuli as a function of attention . The present findings showing the key role of attention on multisensory integration are in line with other recent evidence suggesting that other multisensory illusions, such as the sound-induced extra-flash visual illusion (Shams et al., 2001), can be modulated by attention (Mishra et al., 2010).
To summarize, we show that visual attention can spread robustly across both modality and space to a task-irrelevant and spatially separated auditory tone when it occurs within the temporal window of integration, with this spread essentially being eliminated for SOAs outside that window. Further, when the auditory stimulus is delayed in time relative to the visual event, but is still within the temporal window of integration, the visual attention will spread (albeit slightly attenuated), and this spreading will be delayed by the delay in the onset of the stimulus. Finally, only when auditory and visual stimuli occur in close temporal proximity is there a spatial linking of the unisensory components. In particular, only under these circumstances is there was a shift of the auditory neural processing to the side contralateral to the visual stimulus, consistent with a ventriloquistic-like perceptual shift of the centrally presented auditory stimulus toward the visual. Moreover, all of these cross-modal effects on the processing of task-irrelevant auditory stimulus occurred directly as a function of whether the synchronous or near-synchronous visual stimulus was or was not attended, underscoring the fundamental role of attention in these multisensory integration processes (Talsma et al., 2010). These results thus shed new light on the temporal and spatial constraints by which the various unisensory components of multisensory stimuli are linked together into a perceptual whole, and the way in which attention modulates these stimulus-linking processes.
Acknowledgements
The authors thank Maria A. Pavlova for assistance with data collection and analysis. This work was supported by a NSF graduate research fellowship to S.E.D. and by an NINDS grant (R01-NS051048) to M.G.W.
Footnotes
Electrodes are labeled according to the International 10–20 system. For electrode locations that are close, but not identical to the standard 10–20 system locations, the postscript “i” and “a” are used to indicate a location slightly inferior, or anterior to the standard location.
References
- Bertelson P, Radeau M. Cross-Modal bias and perceptual fusion with auditory-visual spatial discordance. Perception & Psychophysics. 1981;29:578–584. doi: 10.3758/bf03207374. [DOI] [PubMed] [Google Scholar]
- Bertelson P, Vroomen J, de Gelder B, Driver J. The ventriloquist effect does not depend on the direction of deliberate visual attention. Perception & Psychophysics. 2000;62:321–332. doi: 10.3758/bf03205552. [DOI] [PubMed] [Google Scholar]
- Bolognini N, Leo F, Passamonti C, Stein BE, Ladavas E. Multisensory-mediated auditory localization. 2007:1477–1485. doi: 10.1068/p5846. [DOI] [PubMed] [Google Scholar]
- Bonath B, Noesselt T, Martinez A, Mishra J, Schwiecker K, Heinze HJ, Hillyard SA. Neural basis of the ventriloquist illusion. Current Biology. 2007;17:1697–1703. doi: 10.1016/j.cub.2007.08.050. [DOI] [PubMed] [Google Scholar]
- Brainard DH. The psychophysics toolbox. Spatial Vision. 1997;10:433–436. [PubMed] [Google Scholar]
- Busse L, Roberts KC, Crist RE, Weissman DH, Woldorff MG. The spread of attention across modalities and space in a multisensory object. Proc Natl Acad Sci U S A. 2005;102:18751–18756. doi: 10.1073/pnas.0507704102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deouell LY, Heller AS, Malach R, D'Esposito M, Knight RT. Cerebral responses to change in spatial location of unattended sounds. Neuron. 2007;55:985–996. doi: 10.1016/j.neuron.2007.08.019. [DOI] [PubMed] [Google Scholar]
- Egly R, Driver J, Rafal RD. Shifting visual-attention between objects and locations-evidence from normal and parietal lesion subjects. Journal of Experimental Psychology-General. 1994;123:161–177. doi: 10.1037//0096-3445.123.2.161. [DOI] [PubMed] [Google Scholar]
- Fiebelkorn IC, F JJ, Molholm S. Dual Mechanisms for the Cross-Sensory Spread of Attention: How Much Do Learned Associations Matter? Cereb Cortex. 2010;20(1):109–120. doi: 10.1093/cercor/bhp083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gondan M, Niederhaus B, Rosler F, Roder B. Multisensory processing in the redundant-target effect: A behavioral and event-related potential study. Perception & Psychophysics. 2005;67:713–726. doi: 10.3758/bf03193527. [DOI] [PubMed] [Google Scholar]
- Hairston WD, Wallace MT, Vaughan JW, Stein BE, Norris JL, Schirillo JA. Visual localization ability influences cross-modal bias. Journal of Cognitive Neuroscience. 2003;15:20–29. doi: 10.1162/089892903321107792. [DOI] [PubMed] [Google Scholar]
- Hillyard SA, Anllo-Vento L. Event-related brain potentials in the study of visual selective attention. Proc Natl Acad Sci U S A. 1998;vol. 95:781–787. doi: 10.1073/pnas.95.3.781. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Keetels M, Vroomen J. The role of spatial disparity and hemifields in audio-visual temporal order judgments. Experimental Brain Research. 2005;167:635–640. doi: 10.1007/s00221-005-0067-1. [DOI] [PubMed] [Google Scholar]
- Lewald J, Guski R. Cross-modal perceptual integration of spatially and temporany disparate auditory and visual stimuli. Cognitive Brain Research. 2003;16:468–478. doi: 10.1016/s0926-6410(03)00074-0. [DOI] [PubMed] [Google Scholar]
- Martinez A, Ramanathan DS, Foxe JJ, Javitt DC, Hillyard SA. The role of spatial attention in the selection of real and illusory objects. Journal of Neuroscience. 2007;27:7963–7973. doi: 10.1523/JNEUROSCI.0031-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McCarthy G, Wood CC. Scalp distributions of event-related Potentials - an ambiguity associated with analysis of variance models. Electroencephalography and Clinical Neurophysiology. 1985;62:203–208. doi: 10.1016/0168-5597(85)90015-2. [DOI] [PubMed] [Google Scholar]
- McGrath M, Summerfield Q. Intermodal timing relations and audiovisual speech recognition by normal-hearing adults. Journal of the Acoustical Society of America. 1985;77:678–685. doi: 10.1121/1.392336. [DOI] [PubMed] [Google Scholar]
- Meredith MA, Nemitz JW, Stein BE. Determinants of multisensory integration in superior colliculus neurons. 1. Temporal factors. Journal of Neuroscience. 1987;7:3215–3229. doi: 10.1523/JNEUROSCI.07-10-03215.1987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meredith MA, Stein BE. Visual, auditory, and somatosensory convergence on cells in superior colliculus results in multisensory integration. Journal of Neurophysiology. 1986;56:640–662. doi: 10.1152/jn.1986.56.3.640. [DOI] [PubMed] [Google Scholar]
- Mishra J, Martinez A, Hillyard SA. Effect of attention on early cortical processes associated with the sound-induced extra flash illusion. Journal of Cognitive Neuroscience. 2010;22:1714–1729. doi: 10.1162/jocn.2009.21295. [DOI] [PubMed] [Google Scholar]
- Molholm S, Martinez A, Shpaner M, Foxe JJ. Object-based attention is multisensory: co-activation of an object's representations in ignored sensory modalities. European Journal of Neuroscience. 2007;26:499–509. doi: 10.1111/j.1460-9568.2007.05668.x. [DOI] [PubMed] [Google Scholar]
- Munhall KG, Gribble P, Sacco L, Ward M. Temporal constraints on the McGurk effect. Perception & Psychophysics. 1996;58:351–362. doi: 10.3758/bf03206811. [DOI] [PubMed] [Google Scholar]
- Pelli DG. The VideoToolbox software for visual psychophysics: Transforming numbers into movies. Spatial Vision. 1997;10:437–442. [PubMed] [Google Scholar]
- Schneider KA, Bavelier D. Components of visual prior entry. 2003:333–366. doi: 10.1016/s0010-0285(03)00035-5. [DOI] [PubMed] [Google Scholar]
- Schoenfeld MA, Tempelmann C, Martinez A, Hopf JM, Sattler C, Heinze HJ, Hillyard SA. Dynamics of feature binding during object-selective attention. Proc Natl Acad Sci U S A. 2003;100:11806–11811. doi: 10.1073/pnas.1932820100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shams L, Kamitani Y, Shimojo S. Illusions. What you see is what you hear. Nature. 2000;408:788. doi: 10.1038/35048669. [DOI] [PubMed] [Google Scholar]
- Slutsky DA, Recanzone GH. Temporal and spatial dependency of the ventriloquism effect. Neuroreport. 2001;12:7–10. doi: 10.1097/00001756-200101220-00009. [DOI] [PubMed] [Google Scholar]
- Spence C, Baddeley R, Zampini M, James R, Shore DI. Multisensory temporal order judgments: When two locations are better than one. Perception & Psychophysics. 2003;65:318–328. doi: 10.3758/bf03194803. [DOI] [PubMed] [Google Scholar]
- Spence C, Shore DI, Klein RM. Multisensory prior entry. Journal of Experimental Psychology-General. 2001;130:799–832. doi: 10.1037//0096-3445.130.4.799. [DOI] [PubMed] [Google Scholar]
- Stein BE, Meredith MA. The Cambridge: MIT Press; 1993. [Google Scholar]
- Stein BE, Stanford TR. Multisensory integration: current issues from the perspective of the single neuron. Nature Reviews Neuroscience. 2008;9:255–266. doi: 10.1038/nrn2331. [DOI] [PubMed] [Google Scholar]
- Stone JV, Hunkin NM, Porrill J, Wood R, Keeler V, Beanland M, Port M, Porter NR. When is now? Perception of simultaneity. Proceedings of the Royal Society of London Series B-Biological Sciences. 2001;268:31–38. doi: 10.1098/rspb.2000.1326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Talsma D, Senkowski D, Soto-Faraco S, Woldorff MG. The multifaceted interplay between attention and multisensory integration. Trends in Cognitive Sciences. 2010;14:400–410. doi: 10.1016/j.tics.2010.06.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Teder-Salejarvi WA, Di Russo F, McDonald JJ, Hillyard SA. Effects of spatial congruity on audio-visual multimodal integration. Journal of Cognitive Neuroscience. 2005;17:1396–1409. doi: 10.1162/0898929054985383. [DOI] [PubMed] [Google Scholar]
- van Wassenhove V, Grant KW, Poeppel D. Temporal window of integration in auditory-visual speech perception. Neuropsychologia. 2007;45:598–607. doi: 10.1016/j.neuropsychologia.2006.01.001. [DOI] [PubMed] [Google Scholar]
- Wallace MT, Wilkinson LK, Stein BE. Representation and integration of multiple sensory inputs in primate superior colliculus. Journal of Neurophysiology. 1996;76:1246–1266. doi: 10.1152/jn.1996.76.2.1246. [DOI] [PubMed] [Google Scholar]
- Zampini M, Guest S, Shore DI. Audio-visual simultaneity judgments. Perception & Psychophysics. 2005;67:531–544. doi: 10.3758/BF03193329. [DOI] [PubMed] [Google Scholar]