(a) Sequence of Processing Steps. Inputs from the sense organs are thought to interact cross-modally at multiple phases of the processing pathways, including at very early stages [8]. Stimuli can be integrated automatically if a number of conditions are satisfied: (1) if initial saliency of one of the stimuli is at or above a critical threshold, preprocessing stages will attempt to spatio-temporally realign this stimulus with one of lesser salience [(2); spatio-temporal realignment]. The stimulus stream will then be monitored for congruence in stimulus patterns of the matched streams [(3); congruency detection]. If the realignment and/or congruency matching processes succeed, the neural responsiveness of brain areas in charge of processing the input streams will be increased [(4); recurrent stimulus driven sensitivity adjustments] to sustain the integration process. If stimuli cannot be realigned or when incongruency is detected, the sensory gain would tend to be decreased. Note that we consider these potential gain adjustments to be mainly stimulus driven, and therefore a reflection of the bottom-up driven shift part of the interaction between multisensory integration and attention. If none of the stimuli is of sufficient saliency, top-down attention may be necessary to set up an initial selection of to be-integrated-stimuli [(5);top-down sensory gain adjustments]. The resulting boost in sensory sensitivity due to a top-down gain manipulation might then be sufficient to initiate the processes of spatial temporal alignment and congruency matching that would otherwise not have occurred. In addition, top-down attention can modulate processing at essentially all of these stages of multisensory processing and integration. b) Three examples of interactive influences between multisensory integration and attention: (1) bottom-up multisensory integration, which can then drive a shift of attention, (2) the need for top-down attention for multisensory integration in the presence of many competing stimulus representations; (3) the spreading of attention across space and modality (visual to auditory).