Abstract
A single flash accompanied by two auditory beeps tends to be perceived as two flashes (Shams et al. Nature 408:788, 2000, Cogn Brain Res 14:147–152, 2002). This phenomenon is known as ‘sound-induced flash illusion.’ Previous neuroimaging studies have shown that this illusion is correlated with modulation of activity in early visual cortical areas (Arden et al. Vision Res 43(23):2469–2478, 2003; Bhattacharya et al. NeuroReport 13:1727–1730, 2002; Shams et al. NeuroReport 12(17):3849–3852, 2001, Neurosci Lett 378(2):76–81, 2005; Watkins et al. Neuroimage 31:1247–1256, 2006, Neuroimage 37:572–578, 2007; Mishra et al. J Neurosci 27(15):4120–4131, 2007). We examined how robust the illusion is by testing whether the frequency of the illusion can be reduced by providing feedback. We found that the sound-induced flash illusion was resistant to feedback training, except when the amount of monetary reward was made dependent on accuracy in performance. However, even in the latter case the participants reported that they still perceived illusory two flashes even though they correctly reported single flash. Moreover, the feedback training effect seemed to disappear once the participants were no longer provided with feedback suggesting a short-lived refinement of discrimination between illusory and physical double flashes rather than vanishing of the illusory percept. These findings indicate that the effect of sound on the perceptual representation of visual stimuli is strong and robust to feedback training, and provide further evidence against decision factors accounting for the sound-induced flash illusion.
Keywords: Crossmodal integration, Visual perception, Reward, Learning
Introduction
The sound-induced flash illusion (SIFI) is a demonstration of influence of sound on visual perception, wherein a single flash in the periphery, accompanied by multiple auditory beeps, induces a percept of multiple flashes (Shams et al. 2000, 2002). Recent fMRI, MEG, and ERP studies have shown that the percept of the sound-induced flash illusion is correlated with modulation of activity in early visual cortical areas (Arden et al. 2003; Bhattacharya et al. 2002; Shams et al. 2001, 2005; Watkins et al. 2006, 2007; Mishra et al. 2007). Here we investigated the robustness of the illusion by examining the effect of feedback training on the frequency of the illusion. An effect of a single-session feedback-training on the magnitude of the illusion would support the possibility that the illusion involves modification of visual processing at high, decision-related levels. Alternatively, resistance of the illusion to feedback training would suggest that the main modification underlying the illusion is in perceptual processes, which are unlikely to be immediately modified by feedback training. We compared the frequency of the illusion when participants were given immediate feedback about their performance (accuracy with regard to the physical stimulus) to that when participants were not given any feedback. In both cases participants were instructed to report whether a single flash or multiple flashes were presented.
In Experiment 1, we examined the effect of feedback training in an experimental design consistent with previous studies of SIFI, i.e., with trials of different conditions presented in random order (e.g., Shams et al. 2000, 2002). In Experiment 2, we presented the different sound conditions in separate blocks, since this is thought to improve attention efficiency and perceptual learning (e.g., (Gorea A. and D. Sagi 2000; Yu et al. 2004)). In Experiment 3, we examined the effect of performance-dependent monetary reward on the frequency of the illusion.
Experiment 1
Using standard SIFI inducing conditions, we compared the frequency of the SIFI when participants were given no feedback to that when participants were given feedback. We used a between-subject design for the feedback factor in order to avoid carry-over effects.
Participants
Twenty naïve volunteers participated in the experiment (10 females). All participants had normal or corrected-to normal vision and normal hearing. Their ages ranged from 22 to 33 years. Participants gave their informed consent before inclusion in the study. Participants were randomly assigned to one of two groups-the feedback and the no-feedback groups (see below). Five subjects were excluded later from further analysis, due to low performance in the visual-only condition (see below), resulting in seven subjects in the feedback group and eight subjects in the no-feedback group. Participants were paid $10 per hour.
Stimuli
Stimuli were similar to those used previously (Shams et al. 2000, 2002). Visual stimuli were presented on a computer screen. In each trial a uniform bright white disk subtending 1.4° of visual field at 7° eccentricity below fixation was flashed, on a black background, either once or twice for duration of one frame (~13 ms) per flash. When presented twice, the SOA was 53 ms. The flashes were accompanied by 0–2 beeps, presented simultaneously from two speakers, located on the two sides of the screen. The height of the speakers was set so as to induce the percept that the beeps came from the location of the disk. The beeps were pure tones (3.5 kHz) presented for duration of 7 ms at ~75 dB sound pressure level. The first beep always preceded the first flash by 23 ms. Consecutive beeps were spaced 57 ms apart. The 2-beep-1-flash condition was previously found to elicit the perception of illusory double flashes. Our main interest was in the results from the 2-beep-1-flash condition. The no sound conditions were used as baseline conditions for participant’s performance and feedback training effects. The 1-beep conditions were included in order to maintain the standard conditions of SIFI experiments.
Procedure
Participants sat at a viewing distance of 57 cm from the computer screen and speakers. A fixation point was presented at the center of the screen throughout each trial. The participant’s task was to decide whether a single flash (i.e., one pulse) or multiple flashes (multiple pulses) was/were displayed and to rate their confidence from two levels of high or low on each trial. Participants, therefore, chose one of four keys corresponding to the number of flashes and confidence level (i.e., 2 and high, 2 and low, 1 and low, 1 and high). Participants were given unlimited time to make a response. A feedback was provided by presenting the words “right” (green font) or “wrong” (red font) above fixation for 500 ms.
Prior to the start of the experiment, each participant was familiarized with the task in the no-sound conditions. Practice terminated when participant reached a criterion level of 90% correct, which generally only took a few minutes of training. In the few cases that participants failed to reach the criterion, practice was repeated, using a slightly longer flash SOA of 66 ms. One participant reached criterion with this longer SOA. One candidate participant who failed to reach criterion in both cases was not included in the experiment. After practice, participants were informed that sound stimuli would now be included. They were told that this sound would be ‘distracting’ and were asked to ignore the sound and to do their best to perform accurately. The experiment consisted of 60 trials of each condition, amounting to a total of 360 trials, ordered pseudo-randomly. In most cases the session took approximately an hour. Five participants who showed low performance level (≤65% correct ‘multiple flash’ responses; probably due to lack of motivation) in the no-beep condition, despite passing the practice criterion, were excluded from further analysis.
Data Analysis
Because there was inconsistency across participants in their use of the rating scale and many participants seemed to have chosen to use only one confidence level across trials and conditions, we lumped the data from high and low confidence ratings together. We used signal detection theory to differentiate between changes in participants’ general response bias (β) and changes in their perceptual sensitivity, d′ (the ability to perceptually discriminate single and double flashes (Macmillan and Creelman 1991)). Sensitivity and response bias were calculated as follows: d′ = z(H) − z(F) and β = 0.5*(z(H) + z(F)), where z(p) denotes the inverse of the cumulative Normal distribution corresponding to response rate p, and H and F denote hit (correct detection of multiple flashes) and ‘false-alarm’ (incorrect report of multiple flashes) response rates. Incidents of P = 0 and P = 1 were approximated by 1/N and 1 − (1/N), respectively, where N is the number of trials tested). As d′ reflects how well 1 and 2 flashes are discriminated, SIFI is expected to be expressed by significantly lower d′ level in 2-beep trials compared to 0-beep trials (Watkins et al. 2006, 2007; Wozny 2008). Therefore the magnitude of the illusion should correlate with the difference between d′ values in the 2-beep and no-sound (baseline) conditions: Δd′0,2 = d′(no-sound) − d′(2-beeps). Also, the addition of two beeps may increase the uncertainty which could lead to greater absolute response bias |β| in the 2-beep condition. To evaluate this effect we also looked at the difference in response biases between the 2-beep and no-sound conditions: Δ|β|0,2 = |β|(no-sound) − |β|(2-beeps).
Most hypothesis tests were planned comparisons. Post-hoc tests are explicitly specified. In all cases one-tailed t-tests were used, as the feedback is expected to improve performance (Watkins et al. 2006, 2007; Wozny 2008). Paired t–tests were used for within-subject comparisons.
Results
Figure 1 summarizes the results of Experiment 1. As in previous studies, participants by and large report the correct number of flashes in silence. However, when a single flash is accompanied by two beeps, there is a large increase in the probability of reporting multiple flashes, reflecting the SIFI (Fig. 1a). More importantly, there is no significant effect of feedback on how performance (proportion of reported “multiple”) differs between the no-sound and 2-beeps conditions (performance(no-sound) − performance(2-beeps)) in neither 1-flash (P = 0.33) nor 2-flashes (0.24) trials.
However, performance accuracy data combines stimulus sensitivity and decision bias factors, which could potentially mask different feedback effects on the different factors. We, therefore, examined the effect of feedback on each factor separately. Note that sensitivity (d′) served here as the factor-of-interest as it is linked to perception-related determinants of performance. Comparison of d′ values between no sound and 2-beep conditions indicated that the addition of two beeps significantly reduced participants’ ability to perceptually discriminate between single and double flashes (P < 0.01) in both feedback and no-feedback groups, as shown in Fig. 1b. Moreover, comparison of the absolute response bias between no sound and 2-beep conditions showed that two beeps biased participants towards reporting multiple flashes (P < 0.01; see Fig. 1c).
Next, we compared the (within-subject) difference in d′ between no-sound and 2-beep conditions, Δd′0,2, (reflecting the magnitude of the illusion) between feedback and no-feedback groups. Remarkably, we found no significant effect of feedback (Δd′0,2(no-feedback) = 0.92 ± 0.5, Δd′0,2(feedback) = 1.3 ± 0.8; P = 0.13). Likewise, no significant effect of feedback was found on the difference in response bias (Δ|β|0,2) between sound conditions, (Δ|β|0,2(no-feedback) = −1.01 ± 0.74; Δ|β|0,2(feedback) = −0.98 ± 0.63; P = 0.46). It is possible that the effect of feedback is not detectable in sensitivity and bias measures, but may be detected in a change in confidence of responses, (e.g., by making participants less confident in illusion trials when feedback is provided). We, therefore, also evaluated the effect of feedback on the difference in the rate of high-confidence reports (hcr) between 2-beep and no-sound conditions (hcr(no-sound) − hcr(2-beeps)). The feedback and no-feedback groups show comparable difference in high-confidence report rates in both the 1-flash (P = 0.18) and double-flash (P = 0.25) trials (2-tailed t-test). Note that also none of the individual conditions differed significantly between the two training conditions (P > 0.05).
Experiment 2
The absence of any significant effect of feedback on the illusion in Experiment 1 is striking, and suggests that the effect of sound on visual flash perception is strong. However, there are several possible reasons why Experiment 1 may not have been optimal for testing feedback-training effects on the sound-induced flash illusion. First, large variance across-subject in the between-groups comparison may have masked small training effects. Second, interleaving the various sound conditions may have interfered with making the best use of feedback. In Experiment 1, sound can be regarded as a distractor. Interleaving distractor conditions has been suggested to prevent learning (e.g., (Yu et al. 2004)), introduce more uncertainty about the distractor (e.g., (Coles et al. 1985; Gold and Shadlen 2002; Herzog and Fahle 1997)), reduce attention efficiency (Gorea and Sagi 2000), lead to slower processing of input evaluation (Coles et al. 1985) and compromised stimulus-response criteria, e.g., (Gorea and Sagi 2000; Coles et al. 1985).
In Experiment 2, we addressed these issues by making feedback a within-subjects factor and presenting different sound conditions in separate blocks. We examined whether these modifications would lead to revelation of any training effects.
Participants
Eight naïve participants (5 females) with ages ranging from 23 to 40 years old with normal or corrected-to-normal vision and normal hearing took part in the experiment. None of them had participated in Experiment 1. Participants gave their informed consent before inclusion in the study. Two subjects were excluded later from sample, due to poor performance in the visual-only conditions.
Stimuli and Procedure
Stimulus conditions were identical to those used in experiment 1, with the exception that the different sound conditions (0, 1, and 2 beeps) were presented in separate blocks. For each participant, the experiment consisted of two phases, one included feedback and the other did not. The order of no-feedback/feedback phases was counterbalanced across the participants. Trials within each phase were then blocked according to the sound condition, and the order of block presentation was counterbalanced across participants. Each block consisted of 30 trials, amounting to a total of 180 trials in the experiment.
As in Experiment 1, participants practiced with the no-sound conditions prior to the experiment. For two participants, an SOA of 66 ms was used. All participants succeeded to reach the performance criterion.
Results
Data were analyzed as in Experiment 1, with the exception that the feedback vs no-feedback conditions were compared using paired t-tests. As seen in Fig. 2, results are similar to those of Experiment 1. Double-beep sounds significantly reduced perceptual sensitivity, d′, reflecting SIFI, and increased response bias, |β| (Figs. 2b and c, respectively; P < 0.05 in both cases). There was also a trend to show a smaller response bias in the blocked 2-beep trials compared to those of Experiment 1 (interleaved trials), though a post-hoc comparison (Experiment 1 vs Experiment 2) failed to reach significance (2-tailed t-tests, P = 0.16, and P = 0.06 for no-feedback and feedback conditions, respectively).
Similar to Experiment 1, feedback did not significantly reduce the magnitude of the illusion, Δd′0,2 (Δd′0,2(no-feedback) = 0.87 ± 0.97, Δd′0,2(feedback) = 0.78 ± 0.81; P = 0.43), leaving the effect of sound (d′(no sound) vs d′(2-beeps)) very large (Cohen’s d effect size of 1.24 standard deviations). Also, it is worth mentioning that the actual Δd′0,2 values in the no-feedback and feedback conditions of Experiment 2 were comparable to the corresponding Δd′0,2 values of Experiment 1 (no-feedback: P = 0.90; feedback: P = 0.26, 2-tailed t-test). The replication of null feedback effect (despite the easier conditions in Experiment 2) as well as maintaining a very large illusion effect in the feedback training suggest against the possibility that this null result reflects insufficient statistical power, although such possibility can not be ruled out. Likewise, the difference between no-sound and 2-beeps conditions in high-confidence report rates was comparable in the feedback and no-feedback phases both in the 1-flash (P = 0.60) and double-flash (P = 0.83) trials (2-tailed paired t-test). Also, none of the individual conditions differed significantly between the two training conditions (P > 0.05).
Unlike Experiment 1, there was a significant effect of feedback on the criterion bias difference, (Δ|β|0,2(no-feedback) = −0.60 ± 0.68, Δ|β|0,2(feedback) = −0.18 ± 0.56; P < 0.05). This effect stemmed mainly from the reduction in the general bias to respond “multiple flashes” in the 2-beep condition when feedback was provided.
Experiment 3
The purpose of Experiment 3 was to examine whether increasing motivation to perform accurately would render feedback training effective in reducing the magnitude of sound-induced flash illusion. This is yet another way of examining robustness of the illusion to training. To increase motivation, the amount of the monetary reward given to participants was set to depend on their accumulated performance accuracy.
Participants
Six naïve participants (3 females; 20–33 years old) with normal or corrected-to-normal vision and normal hearing took part in the experiment. None of them had participated in Experiment 1 or 2. Participants gave their informed consent before inclusion in the study.
Stimuli and Procedure
Stimulus conditions were identical to those used in experiment 2 except that in this experiment the different sound conditions were pseudo-randomized, as in Experiment 1 (in order to evaluate effects of enhanced motivation in the standard conditions in which SIFI has been studied). The procedure was identical to that of Experiment 2, except for the following. Participants were told before the beginning of the first phase, and were reminded before the second phase that they would be paid according to their performance, between $7 and $20 for each phase. As in Experiment 2, the order of no-feedback and feedback phases was counter-balanced across participants.
As in the previous experiments, participants received short practice prior to the experiment. For one participant, a flash SOA of 66 ms was used as he failed to detect two flashes with 53 ms SOA. All of the participants succeeded to reach criterion with either 53 or 66 ms SOA.
Results
Similar to Experiments 1 and 2, results from the no-feedback phase reveal a significant illusion, i.e., a significantly lower d′ (P < 0.005; Fig. 3b) in the 2-beep (d′ = 0.78 ± 0.64) compared to no-sound condition (d′ = 2.16 ± 0.38), as well as a significantly larger criterion bias in the 2-beep condition (P < 0.05; Fig. 3c; |β|(no-sound) = 0.27 ± 0.21, |β|(2-beeps) = 0.95 ± 0.51). For the no-feedback phase, the magnitude of the illusion (Δd′0,2 = 1.38 ± 0.77) and difference in criterion bias (Δ|β|0,2 = −0.68 ± 0.52) did not differ significantly from those of Experiment 1 (2-tailed t-tests, P = 0.19 and 0.37, respectively).
However, in contrast to Experiments 1 and 2, providing performance-dependent reward led to a statistically significant effect of feedback on the magnitude of the illusion, Δd′0,2 (P < 0.05; Cohen’s d effect size of 0.78). Compared to the no-feedback condition, observers exhibited a smaller degree of illusion in the feedback condition (Δd′0,2 = 0.68 ± 1.16), rendering the difference in d′ between the no-sound (2.2 ± 0.71) and 2-beep conditions (1.52 ± 0.99) statistically insignificant (P = 0.105). The presence or absence of feedback did not affect the difference in criterion biases (Δ|β|0,2(feedback) = −0.39 ± 0.50; P = 0.13).
Interestingly, it did not seem to matter whether participants were presented with the no-feedback block before or after the feedback block (Pearson correlation coefficient = −0.29, P = 0.6). This suggests that the effect of feedback is short-lived. Importantly, following the experiment, the participants typically reported that they noticed a subtle phenomenological difference between the percepts induced by the actual and the illusory flashes and learned to discriminate between them (owing to the feedback), though they continued to perceive the illusory flash.
The feedback and no-feedback phases showed comparable difference between no-sound and 2-beep conditions in high-confidence report rates in both 1-flash-2-flash trials (P = 0.36 and 0.23, respectively; 2-tailed, paired t-test). None of the sound conditions differed significantly between the two training phases (P > 0.05).
Discussion
In this study we examined whether it is possible to reduce the magnitude of the illusion by providing feedback. Experiments 1 and 2 revealed that the illusion is resistant to feedback, even under relatively easy task conditions (Experiment 2), demonstrating that this modulation of vision by sound is robust to decision-related influences. In Experiment 3, where the monetary reward was made dependent on accuracy in performance, we found that the rate of illusion-based report of the illusory double-flash was reduced by feedback. However, (a) the improvement in accuracy seems not to be a result of perceptual learning, because the report of SIFI increased again immediately when they were no longer provided with feedback and was similar to the degree of the reported illusion without prior feedback training, (b) the debriefings obtained from participants after the experiment indicated that they did continue to perceive double-flashes in the 2-beep (illusion) condition but with the help of feedback they were pushed to notice and use a difference between the illusory and physical double flashes, (c) monetary reward affected the degree of reported illusion only when feedback was provided; the magnitude of reported illusion in the no-feedback condition of Experiment 3 did not significantly differ from that of Experiment 1.
Altogether (a), (b), and (c) suggest that the ability of the participants to learn to report one flash when the flash is accompanied by two beeps does not necessarily reflect a weakening of the illusory percept. Instead it appears that there is a subtle difference between the illusory double flash percept and the percept elicited by the specific physical double flashes used in this experiment, which can serve as the basis for discrimination between the two. However, because the distinction between the two is rather subtle it requires much effort and thus only possible when the observers are highly motivated and receive feedback in every trial.
This study indicates that sound-induced flash illusion is resistant to correct feedback, which informs observers that they are wrong in illusion trials. Extrapolating from the current results, we believe that the illusion would show resistance also in a case of false feedback training, when observers are incorrectly informed about ‘correct’ response in illusion trials. Although it is not unlikely that such false feedback could increase the proportion of “multiple flashes” reports in 1-flash-2-beep trials via feedback effect on the response criterion, it is unlikely that the incorrect feedback would affect the change in sensitivity induced by sound (Δd′0,2) significantly). It may also worth mentioning that, in a previous study we had not found effect on response bias introduced by two beeps (Watkins et al. 2006), whereas here we consistently found a higher response bias in the 2-beeps trials. Response biases are affected by a variety of factors, such as instruction to the subject, practice, familiarity with the task, reward, etc. These factors differed in our two studies, and any of them could potentially contribute to the difference in response bias results.
The neuroimaging studies of the illusion (Arden et al. 2003; Bhattacharya et al. 2002; Shams et al. 2001, 2005; Watkins et al. 2006, 2007; Mishra et al. 2007) have shown that the percept of the illusion is correlated with modulation of activity in early visual cortex. The present findings are consistent with low level of perceptual processing as the neural underpinning of the illusion, and provide further evidence against decision factors as the explanation for the flash illusion.
Acknowledgments
We thank Ione Fine for her thorough and insightful comments. This study was funded by a grant from Human Frontier of Science Program.
Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Footnotes
This article is published as part of the Special Issue on Multisensory Integration.
Contributor Information
Orna Rosenthal, Email: o.rosenthal@bham.ac.uk.
Ladan Shams, Email: ladan@psych.ucla.edu.
References
- Arden GB, Wolf JE, Messiter C (2003) Electrical activity in visual cortex associated with combined auditory and visual stimulation in temporal sequences known to be associated with a visual illusion. Vision Res 43(23):2469–2478 [DOI] [PubMed]
- Bhattacharya J, Shams L, Shimojo S (2002) Sound-induced illusory flash perception: role of gamma band responses. NeuroReport 13:1727–1730 [DOI] [PubMed]
- Coles MG, Gratton G, Bashore TR, Eriksen CW, Donchin E (1985) A psychophysiological investigation of the continuous flow model of human information processing. J Exp Psychol Hum Percept Perform 11:529–553 [DOI] [PubMed]
- Gold JI, Shadlen MN (2002) Banburismus and the brain: decoding the relationship between sensory stimuli, decisions, and reward. Neuron 36:299–308 [DOI] [PubMed]
- Gorea A, Sagi D (2000) Failure to handle more than one internal representation in visual detection tasks. Proc Natl Acad Sci 97:12380–12384 [DOI] [PMC free article] [PubMed]
- Herzog MH, Fahle M (1997) The role of feedback in learning a vernier discrimination task. Vision Res 37:2133–2141 [DOI] [PubMed]
- Macmillan NA, Creelman CD (1991) Detection theory: a user’s guide. Cambridge University Press, Cambridge
- Mishra J, Martinez A, Sejnowski T, Hillyard SA (2007) Early cross-modal interactions in auditory and visual cortex underlie a sound-induced visual illusion. J Neurosci 27(15):4120–4131 [DOI] [PMC free article] [PubMed]
- Shams L, Kamitani Y, Shimojo S (2000) What you see is what you hear. Nature 408:788 [DOI] [PubMed]
- Shams L, Kamitani Y, Thompson S, Shimojo S (2001) Sound alters visual evoked potentials in humans. NeuroReport 12(17):3849–3852 [DOI] [PubMed]
- Shams L, Kamitani Y, Shimojo S (2002) Visual illusion induced by sound. Cogn Brain Res 14:147–152 [DOI] [PubMed]
- Shams L, Iwaki S, Chawla A, Bhattacharya J (2005) Early Modulation of visual cortex by sound: an MEG study. Neurosci Lett 378(2):76–81 [DOI] [PubMed]
- Watkins S, Shams L, Tanaka S, Haynes J-D, Rees G (2006) Sound alters activity in human V1 in association with illusory visual perception. Neuroimage 31:1247–1256 [DOI] [PubMed]
- Watkins S, Shams L, Josephs O, Rees G (2007) Activity in human V1 follows multisensory perception. Neuroimage 37:572–578 [DOI] [PubMed]
- Wozny DR, Beierholm UR, Shams L (2008) Human trimodal perception follows optimal statistical inference. J Vis 8(3):24.1–24.11 [DOI] [PubMed]
- Yu C, Klein SA, Levi DM (2004) Perceptual learning in contrast discrimination and the (minimal) role of context. J Vis 4(3):169–182 [DOI] [PubMed]