Abstract
Studies of multisensory integration by single neurons have traditionally emphasized empirical principles that describe nonlinear interactions between inputs from two sensory modalities. We previously proposed that many of these empirical principles could be explained by a divisive normalization mechanism operating in brain regions where multisensory integration occurs. This normalization model makes a critical diagnostic prediction: a non-preferred sensory input from one modality, which activates the neuron on its own, should suppress the response to a preferred input from another modality. We tested this prediction by recording from neurons in macaque area MSTd that integrate visual and vestibular cues regarding self-motion. We show that many MSTd neurons exhibit the diagnostic form of cross-modal suppression, whereas unisensory neurons in area MT do not. The normalization model also fits population responses better than a model based on subtractive inhibition. These findings provide strong support for a divisive normalization mechanism in multisensory integration.
Introduction
We are often faced with situations in which multiple sources of information need to be combined to make good decisions. Multi-sensory integration provides an attractive model system in which to explore the neural mechanisms by which multiple sensory signals are combined to make perceptual decisions (Alais et al., 2010; Ernst and Bulthoff, 2004; Raposo et al., 2012; Spence, 2010, 2011). Despite many empirical studies (Driver and Noesselt, 2008; Stein and Stanford, 2008), a detailed understanding of the neural computations underlying multisensory integration remains elusive.
A diverse array of empirical observations has been usefully characterized as a set of empirical principles of multisensory integration (Meredith and Stein, 1986; Stein and Stanford, 2008). For example, multisensory neurons in cat superior colliculus often show super-additive integration of weak visual and auditory inputs, whereas they show additive or sub-additive summation for stronger inputs; this is known as the principle of inverse effectiveness (Meredith and Stein, 1986; Perrault et al., 2003, 2005; Stanford et al., 2005; Wallace et al., 1996). As another example, the spatial/temporal principle holds that multisensory stimuli need to be congruent in space/time for cross-modal enhancement of neural responses to occur; otherwise, cross-modal suppression may occur, a phenomenon in which the multisensory response of a neuron is less than its response to the stronger unisensory input (Kadunce et al., 1997; Meredith et al., 1987; Meredith and Stein, 1986; Wallace et al., 1996). These empirical principles generally involve non-linear interactions between inputs from different modalities, but experimental evidence that can distinguish among different possible types of nonlinearities has remained very limited.
We previously proposed a divisive normalization model of multisensory integration that accounts naturally for many of the key empirical principles of multisensory integration (Ohshiro et al., 2011). The model is based on a few well-established neural operations: linear summation of inputs (Ferster, 1994; Jagadeesh et al., 1993; Skaliora et al., 2004), a power-low non linearity that may characterize the transformation between neural inputs and outputs (Carandini and Ferster, 2000; Carandini et al., 1997; Heeger, 1992a; Miller and Troyer, 2002; Priebe and Ferster, 2008), and divisive normalization among neurons within a brain region (Carandini and Heeger, 2012; Heeger, 1992b). Divisive normalization has been suggested to account for nonlinear aspects of stimulus interactions in V1 (Busse et al., 2009), motion integration in area MT (Britten and Heuer, 1999), value representations in area LIP (Louie et al., 2011), attentional modulation in visual cortex (Lee and Maunsell, 2009; Reynolds and Heeger, 2009), and stimulus interactions in olfactory processing (Olsen et al., 2010). Thus, normalization may be a canonical neural computation (Carandini and Heeger, 2012).
We showed previously that a divisive normalization model of multisensory integration makes a critical prediction that can distinguish it from other models: specifically, a non-optimal stimulus for one modality, which activates a neuron when presented alone, should be able to suppress the response to a near-optimal stimulus of the other modality when the cues are combined (Ohshiro et al., 2011). Here, we provide a direct test of this prediction by recording from multisensory neurons in the dorsal medial superior temporal (MSTd) area that combine visual and vestibular signals regarding self-motion (Duffy, 1998; Gu et al., 2006). Neurons in area MSTd have been previously linked to perceptual integration of visual and vestibular heading cues (Gu et al., 2008) and perceptual weighting of these cues according to their reliability (Fetsch et al., 2012). MSTd has also been causally linked to perception of heading based on visual and vestibular cues (Britten and Van Wezel, 2002; Gu et al., 2007, 2012). Here, we show that responses of MSTd neurons fulfill the critical prediction of the normalization model with regard to cross-modal suppression. We also demonstrate that the normalization model, fit simultaneously to data from our entire population of MSTd neurons, accounts better for the data than an alternative model based on subtractive inhibition.
If cross-modal suppression is a diagnostic feature of a multisensory stage of normalization, then it should not be seen in relevant unisensory areas. MSTd receives its primary visual input from the adjacent middle temporal (MT) area (Born and Bradley, 2005; Maunsell and Van Essen, 1983; Ungerleider and Desimone, 1986), and MT neurons are not thought to carry vestibular signals (Chowdhury et al., 2009; Smith et al., 2012). Thus, we also probed for the existence of cross-modal suppression in area MT, and did not find any evidence for it. Together, our findings provide strong experimental support for the proposal that divisive normalization, operating at the level of multisensory integration, can account for many of the empirical principles of multisensory integration exhibited by single neurons (Ohshiro et al., 2011).
Results
Cross-modal suppression as predicted by the model
The divisive normalization model of visual-vestibular cue integration (Fig. 1a) assumes that MSTd neurons receive heading-selective vestibular and visual (optic flow) inputs, with heading preferences that may be matched or mismatched to varying degrees (Gu et al., 2006). Each neuron performs a weighted linear sum of its visual and vestibular inputs, with weights dvest and dvis that are fixed for that neuron. The result is then passed through a power-law non-linearity, after which its output gain is multiplicatively modulated by a function of the pooled spiking activity of other neurons in the population (“feed-back normalization”, Heeger 1992, see Discussion). Note that the semi-saturation constant in the model (α in Eqns. 5-7) produces responses that rise gradually and saturate as a function of stimulus intensity. Note also that, while this type of normalization model can produce averaging behavior under some circumstances (e.g., Busse et al., 2009), it more generally produces a range of possible interactions depending on the relative strengths of the stimuli and input weights.
Figure 1. Multisensory normalization model and the diagnostic prediction of cross-modal suppression.
(a) Schematic illustration of the divisive normalization model for visual-vestibular cue integration. Model MSTd neurons (rectangles) perform a weighted sum of heading-tuned inputs from unisensory (vestibular, visual) neurons, with weights given by dvest, dvis. The weighted sum is raised to an exponent and gain-modulated multiplicatively by a function of the total activity of the population of neurons (see Methods). (b) The diagnostic prediction of the normalization model. When visual and vestibular stimuli are presented at the heading preferences of the model neuron (Δ = 0°), the combined response (black) exceeds the si ngle cue responses (cyan, red) for all stimulus amplitudes (left). When the heading of cue2 is offset from the cell's heading preference by Δ = 60° (middle), cue2 is activating on its own but suppresses the combined response below that of cue1 (middle). For a larger offset (Δ = 120°), cue2 becomes suppressive on its own and suppression of the combined response is trivial (right). Motion amplitude indicates the total displacement of the body within a trial. (c) Summary of predictions of the normalization model (circles) and a family of alternative models involving subtractive inhibition (triangles). Responses of model neurons were simulated for heading offsets of cue2 (Δ) ranging from 0° to 180°, in steps of 15°. The response to cue2 (Rcue2) and the combined response (Rcomb) are divided by the response to cue1 (Rcue1), and the ratios (Rcue2/Rcue1, Rcomb/Rcue1) are plotted against each other. Rcue1, Rcue2 and Rcomb denote simulated responses to the largest stimulus amplitude.
More specifically, simulated responses of a typical model neuron show cross-modal enhancement when both cues are presented together at the visual and vestibular heading preferences of the neuron (Fig. 1b, left). However, the same neuron shows a paradoxical cross-modal suppression when one of the two cues (cue2) is presented at a non-preferred heading (Fig. 1b, middle): the non-preferred cue2 activates the model neuron when presented alone, but suppresses responses when presented together with the preferred cue1. This diagnostic cross-modal suppression only occurs at large stimulus intensities and only within a narrow heading range for the non-preferred cue. If the heading of cue2 is further from the cell's heading preference, then this non-optimal input can become suppressive on its own (Fig. 1b, right). In such a case, cross-modal suppression is trivially explained by summation of an activating input with a suppressive input. Thus, to test for operation of divisive normalization at the level of multisensory integration, we seek evidence of this narrow regime in which the non-optimal input drives a response on its own but induces cross-modal suppression.
To summarize these relationships graphically, we plot the combined response (Rcomb) to both cues against the response to the non-preferred cue (Rcue2), with both values normalized by the response to the preferred cue (Rcue1). In this representation (Fig. 1c), values above and below unity on the ordinate indicate cross-modal enhancement and suppression, respectively, and values greater than zero on the abscissa indicate that the non-preferred input activates the unit above baseline response. Thus, the critical prediction of multisensory normalization is that data should fall within the lower-right quadrant for some offsets (Δ) of cue2 from the preferred heading for that modality. Indeed, predicted responses of the model neuron move from the top-right quadrant through the lower-right quadrant as Δ is increased (Fig. 1c, circles). Eventually, as Δ is further increased, data points enter the lower-left quadrant, corresponding to the case of trivial cross-modal suppression.
Ursino et al. (2009) proposed an alternative model of multi-sensory integration that accounts for the principle of inverse-effectiveness and other key properties of multisensory integration in the superior colliculus (Cuppini et al., 2010). This alternative model, which is based on subtractive inhibition, makes a distinct prediction from the normalization model with regard to cross-modal suppression (Ohshiro et al., 2011). For the alternative (subtractive) model (Fig. 1c, triangles), cross-modal suppression only occurs when the non-preferred cue is suppressive on its own. As a result, the predictions of this class of model do not pass through the lower-right quadrant of Fig. 1c. We further demonstrate mathematically that this class of alternative model, in which the multisensory input to each neuron is subtractively modulated by the pooled spiking activity of other neurons, does not predict non-trivial cross-modal suppression under a realistic set of assumptions (see Methods). Therefore, cross-modal suppression by an activating non-preferred input is a diagnostic indicator of multisensory divisive normalization.
Cross-modal suppression in multisensory MSTd neurons
We sought to demonstrate the diagnostic form of cross-modal suppression by recording from MSTd neurons in macaques that were trained to fixate a visual target while multisensory (visual/vestibular) stimuli were presented using a projector mounted on a motion platform (see Methods for details). We recorded from a total of 165 MSTd neurons in two animals; among these, we obtained sufficient data (at least 5 repetitions of each stimulus) from a group of 101 neurons (42 from monkey A and 59 from monkey B), which forms the main database for this study. Of the remaining 64 neurons, isolation was lost for 55 neurons before the protocol was completed, and 9 neurons were discarded because they showed no heading selectivity for visual or vestibular cues. Among the 101 MSTd cells studied in detail, 68 were multisensory neurons (26 from monkey A and 42 from monkey B) that showed clear heading selectivity for both visual and vestibular cues, as well as firing rates that increased with stimulus amplitude. We first examine cross-modal suppression in these multisensory neurons. The remaining 33/101 cells (16 from monkey A and 17 from monkey B) were unisensory neurons that responded reliably only to optic flow. These neurons are analyzed separately in a later section.
Once an action potential was isolated, we first characterized the 3D heading selectivity of the neuron (Gu et al., 2006) and determined its heading preferences for each modality (Fig. 2a, white crosses). For most multisensory neurons, we used a ‘search’ protocol (Fig. 2b) to estimate the range of headings over which a non-preferred input might elicit cross-modal suppression while activating the neuron on its own. The heading for one modality (visual in this example) was offset by 30° steps in elevation from its preferred heading (white dots in Fig. 2a) while the heading of the other modality (vestibular) was fixed at the preferred value. As expected, visual responses of this neuron peaked at zero offset (Δ = 0°) and declined with the absolute value of the offset, |Δ| (Fig. 2b, red). The combined response (black) was suppressed below the vestibular response (cyan) for |Δ| > 30°. This cross-modal suppression is of the tri vial form for |Δ| > 90°, because visual responses are below baseline ac tivity (gray) for this range of Δ, suggesting that the visual cue becomes suppressive. Critically, within a narrow range of Δ values around ± 60°, this neuron shows cross-modal suppression along with significant visual activation, consistent with the diagnostic prediction of the divisive normalization model.
Figure 2. Data from an example multisensory MSTd neuron.
(a) 3D heading tuning functions (left: vestibular, right: visual) are shown as color contour maps. This neuron prefers rightward self-motion (-30° azimuth, 0° ele vation; white crosses) for both modalities. White dots indicate visual headings that were tested in the screening protocol. (b) Screening test. Combined (black), vestibular (cyan), and visual (red) responses are plotted as a function of the offset (Δ°) of the visual heading from the cell's heading preference. The heading tuning curves were fit with a modified sinusoid (Eqn. 3). Motion amplitude = 10.0 cm. Error bars denote s.e.m. (c) Responses are plotted as a function of stimulus amplitude. Data are shown for Δ = 0° (left), +60° (middle), and -60° (right). Asterisks indicate significant suppression or activation ([**], p < 0.01; Wilcoxon rank-sum test). Smooth curves show the hyperbolic-ratio functions (Eqn.1) that best fit the amplitude-response curves.
To quantify these effects in detail, we tested each neuron with a range of stimulus intensities and a substantial number of stimulus repetitions (5 -15, median: 8). Stimulus intensity was manipulated by varying the amplitude of self-motion, which followed a Gaussian velocity profile (see Methods, Fig. S1). Each neuron was tested with stimuli having Δ=0°, as well as at least one non-zero Δ value chosen to probe for cross-modal suppression (many cells were tested with multiple non-zero values of Δ, see Methods). For the example neuron, vestibular and visual responses rose monotonically with self-motion amplitude for Δ=0°, and combined responses exceeded unisensory responses for all amplitudes (Fig. 2c, left). Offsetting the visual heading by Δ=+60° (Fig. 2c, middle) greatly reduced visual resp onses, but these responses were still significantly greater than baseline activity for each non-zero motion amplitude (p < 1.8×10-4 for each amplitude, Wilcoxon rank-sum tests). Critically, at this offset, the combined response was significantly suppressed below the vestibular response for the two largest stimulus amplitudes (p < 1.8×10-4 for amplitudes of 4.4 and 11.8 cm, Wilcoxon rank-sum tests). Very similar observations were made for Δ = -60° (Fig. 2c, right). These results demonstrate clearly that a non-optimally presented cue, which activates the neuron when presented alone, can suppress the response to an optimal stimulus of the other modality, as predicted by the divisive normalization model.
Data from a few additional example neurons are illustrated in Fig. 3. These include another neuron for which a non-preferred visual stimulus suppressed the combined response below the level of the vestibular response (Fig. 3a), as well as two additional neurons for which the vestibular stimulus was offset from the preferred heading and caused a suppression of the visual response (Fig. 3b,c). Note that the example neurons in Fig. 3a, c had mismatched visual and vestibular heading preferences (“opposite cells”, Gu et al., 2006). For such neurons, a combined stimulus having the preferred heading for each modality has different directions of motion for visual and vestibular cues.
Figure 3. Demonstration of cross-modal suppression for 3 additional example neurons.
Format as in Fig. 2c. (a) A second example neuron showing cross-modal suppression by a non-preferred visual heading stimulus. Data are shown for Δ = 0° (left) and Δ = 80° (right). Note that this neuron is an opposit e cell; thus, the stimulus headings for Δ = 0° have different directions. (b, c) Two example neurons showing cross-modal suppression by a non-preferred vestibular stimulus. The non-zero vestibular offsets were Δ = 60° (b, right) and Δ = 180° (c, right). Asterisks indicate significant suppression or activation ([*], p < 0.05; [**], p < 0.01; Wilcoxon rank-sum test). Smooth curves show the best-fitting hyperbolic-ratio functions.
These examples illustrate that non-preferred stimuli of either modality can elicit cross-modal suppression while still producing responses on their own. This symmetry of the interaction between vestibular and visual inputs is expected from linear summation of tuned unisensory inputs, combined with multisensory normalization.
Population summary
All 68 multisensory neurons were tested with Δ=0° and with either the visual or the vestibular stimulus at a non-preferred heading (non-zero Δ), and 8 neurons were tested with multiple non-preferred headings. For 9 neurons, an additional block of trials was run in which the other sensory modality was set to a non-preferred heading. In total, 86 data sets were collected with preferred headings for both modalities, and 97 data sets were collected with a combination of preferred and non-preferred headings. To summarize our findings and pool data across conditions, we defined cue1 to be the cue that was held fixed at the neuron's preferred heading and cue2 to be the cue for which heading was varied from preferred to non-preferred directions.
To quantify cross-modal suppression, we fit the amplitude-response curves for each neuron with a sigmoidal function (see Methods, Eqn. 1), and we quantified the strength of activity using the response gain parameter (G). These fits provided an excellent description of the data (e.g., Fig. 4a,b), with a median R2 value of 0.97 across the population. We adopt a notation in which G(cue1, cue2) indicates the response gain for a particular combination of cue1 and cue2, while p and n denote whether each cue was preferred or non-preferred. Thus, G(p,p) represents the response gain when both cue1 and cue2 have preferred directions, G(p,n) denotes gain for a preferred cue1 and a non-preferred cue 2, G(p,0) indicates gain for a unisensory preferred cue1 stimulus, and G(0,p) or G(0,n) represent gains for unisensory stimuli with preferred or non-preferred headings for cue 2, respectively. We then computed the ratio of the combined response gain (G(p,p) or G(p,n)) to that of the modality presented at the preferred heading (G(p,0)). This ratio indicates whether combining the two cues produces cross-modal enhancement (ratio > 1), or cross-modal suppression (ratio <1). Similarly, the ratio of the gain parameter for cue2 (G(0,p) or G(0,n)) to G(p,0) indicates whether cue2 was activating (ratio > 0) or suppressive (ratio < 0). These gain ratios were compared in a scatter plot, analogous to Fig. 1c.
Figure 4. Quantification and visualization of cross-modal suppression effects.
(a) Amplitude-response functions for an example MSTd neuron, along with best-fitting hyperbolic-ratio functions (Eqn. 1). Here, both visual (cue1) and vestibular (cue2) stimuli were presented at the preferred headings, and cross-modal enhancement occurs. G(p,p): both cues at respective preferred headings; G(p,0): preferred cue1 only; G(0,p): preferred cue2 only. (b) Data and fits from the same neuron when the vestibular stimulus (cue2) was offset by 90° from the vestibul ar heading preference. G(p,n): preferred cue1 and non-preferred cue2; G(0,n): non-preferred cue2 only. (c) Combined response gains for the preferred-preferred (purple, G(p,p)) and preferred-nonpreferred (green, G(p,n)) stimulus combinations are plotted against the corresponding response gains for cue2 (G(0,p) or G(0,n)). Values on both axes are normalized by the response gain for cue1, G(p,0). Multiple data points are plotted for each neuron: one for the preferred-preferred combination (purple) and one or more for preferred-nonpreferred heading combinations (green). Filled green symbols represent cases with significant cross-modal suppression and significant activation by the non-preferred cue2. Filled purple symbols indicate the preferred-preferred stimulus combinations that correspond to the filled green symbols. Purple and green stars correspond to the data shown in (a) and (b), respectively. Solid black curve: second-order polynomial fit. Data points for a few cases with outlier values are plotted at the maximum values on the x- and y-axes.
For the example neuron of Fig. 4, data obtained with stimuli at the preferred heading values for both modalities map onto a point in the upper-right quadrant, indicating cross-modal enhancement (Fig. 4c, filled purple star). In contrast, data obtained with one cue at a non-preferred heading (Δ = 90°) map to a point in the lower-right quadrant (filled green star), the critical region in which cross-modal suppression is produced by a non-preferred stimulus that activates the neuron on its own.
Overall, population data from multisensory MSTd neurons provide strong support for the predictions of the normalization model. For combined stimuli having preferred heading values for both modalities, most data points (72/86) reside in the upper-right quadrant, corresponding to cross-modal enhancement (Fig. 4c, purple). In stark contrast, many of the data points (55/97) corresponding to combinations of preferred and non-preferred headings for the two modalities reside in the lower-right quadrant, thus showing cross-modal suppression combined with activation by the non-preferred stimulus. Among these 55 data points in the lower-right quadrant, 25 cases (filled green symbols) show both significant cross-modal suppression (G(p,n) /G(p,0) < 1; p < 0.05, bootstrap) and significant activation in response to the non-preferred heading stimulus (G(0,n) / G(p,0) > 0, p < 0.05). These neurons clearly exhibited the diagnostic form of cross-modal suppression that is predicted by divisive normalization acting at the level of multisensory integration.
A polynomial fit to the data of Fig. 4c (black line) has a y-intercept of 0.85 (S.E. = 0.026), which is significantly less than 1.0 (p = 2.0 × 10-8), indicating a significant shift toward the lower-right quadrant. Nearly identical results were obtained when firing rates were computed within narrower (200ms or 400ms) time windows centered around the peak population response (Fig. S2a,b), thus indicating that the results are robust to variations in the time window over which responses are measured. We also carried out the same analysis separately for experiments in which the vestibular or the visual modality was the non-preferred stimulus. We find similar effects in both cases (Fig. S2c,d), although the strength of cross-modal suppression was greater when the visual cue was non-preferred, which was the case in the minority of our experiments.
We have previously shown that multisensory neurons in area MSTd can have visual and vestibular heading preferences that are closely matched (‘congruent’ cells) or grossly mismatched (‘opposite’ cells) (Gu et al., 2008; Gu et al., 2006). We found that the strength and incidence of cross-modal suppression was similar for both types of neurons (Fig. S2e, f), thus suggesting that divisive normalization is a common property of MSTd neurons and is not likely related to the functional roles that these different cell types may play in cue integration or cue weighting (Fetsch et al., 2012; Gu et al., 2008; Morgan et al., 2008).
To rule out possible biases associated with systematic errors in curve fitting, we also performed similar analyses using the firing rate measured at the largest stimulus amplitude in lieu of the gain parameter of the sigmoidal function. This produced nearly identical results (Fig. S3a). We also examined the distribution of heading offsets (Δ) that produced the diagnostic form of cross-modal suppression (Fig. S3b). As expected from the model, cells showing significant cross-modal suppression and significant activation by cue2 were generally those tested with Δ values in a narrow range (60° < |Δ| < 90°). Overall, data from multisensory MSTd neurons largely conform to predictions of the divisive normalization model.
Cross-modal suppression in unisensory MSTd neurons
The literature on multisensory integration describes cases in which neurons that appear to be unisensory are suppressed by adding a non-effective stimulus of one modality to an effective stimulus of another modality (Avillac et al., 2007; Meredith and Stein, 1986; Sugihara et al., 2006). The normalization model accounts for this phenomenon because unisensory neurons have their responses normalized by the activity of a pool of neurons that includes many multisensory cells (Ohshiro et al., 2011). In area MSTd, a sub-population of neurons responds to only one modality, and almost all such neurons respond only to optic flow, with no vestibular response (Gu et al., 2006). Thus, we tested whether the responses of unisensory MSTd neurons (n = 33) would exhibit cross-modal suppression.
We used the same stimulation protocol as for multisensory MSTd neurons, except that the vestibular stimulus was presented at a somewhat arbitrary set of offsets (|Δ|) relative to the visual heading preference, most typically 0 °, 90°, and 180° (see Methods for details). Unisensory MSTd neurons typically showed combined responses that were slightly suppressed relative to the visual response, consistent with predictions of the model (Fig. 5a). Across the population tested, most unisensory MSTd neurons showed cross-modal suppression (Fig. 5c, orange), with 40/56 cases having a gain ratio, Gcombined /Gvisual, that was significantly less than 1 (p < 0.05, bootstrap, filled symbols). The median gain ratio was 0.86, which was significantly less than 1 (p = 1.2×10-8, Wilcoxon signed rank test). The average gain ratio did not depend significantly on the value of |Δ| (three ranges: 0-60°, 60-120°, and 120-180°, p = 0.87, one-way ANOVA), consistent with the idea that the normalization pool contains neurons with a broad range of vestibular heading preferences.
Figure 5. Cross-modal suppression is exhibited by unisensory MSTd neurons but not MT neurons.
(a) Data from a unisensory (visual only) MSTd neuron, along with best-fitting hyperbolic ratio functions. Format as in Fig. 3, except that data are superimposed for two values of Δ: 135° (solid curves) and 160° (dashed curves). Significant cross-modal suppression occurs for the largest stimulus amplitude ([*], p < 0.05; [**], p < 0.01; Wilcoxon rank-sum test). (b) Data from a typical MT neuron tested with Δ values of 0° and 90°. Smooth curves show the best- fitting hyperbolic-ratio functions. (c) The ratio of combined:visual response gains is plotted against the ratio of vestibular:visual response gains. Data are shown for unisensory MSTd neurons (orange symbols; 56 observations from 33 neurons) and MT neurons (green symbols; 127 observations from 43 neurons). Multiple data points may be plotted for each neuron, corresponding to the multiple vestibular headings tested. Filled symbols indicate cases for which Gcombined /Gvisual is significantly different from 1.0. A histogram of the ratio, Gcombined /Gvisual, is shown on the right margin.
Previous studies in visual cortex have shown that divisive normalization predicts a lateral shift of intensity response functions (also known as input gain control) when a non-effective stimulus is superimposed on an effective stimulus (Carandini and Heeger, 2012; Carandini et al., 1997; Heeger, 1992b), Thus, for a subset of unisensory MSTd neurons (n = 8), we also examined the effect of the amplitude of vestibular stimulation on cross-modal suppression. We found that cross-modal suppression became stronger with larger vestibular movement amplitudes, and that data from the majority of neurons was significantly better described by a rightward shift of the amplitude-response function than by a change in response gain (Fig. S4). The finding is reminiscent of the lateral shift of stimulus-response functions of sensory neurons when an ineffective mask stimulus is superimposed on an effective stimulus (Bonds, 1989; Carandini et al., 1997; Freeman et al., 2002; Olsen et al., 2010). These observations further support the notion that divisive normalization takes place following multisensory integration in MSTd.
Lack of cross-modal suppression in MT neurons
If cross-modal suppression is produced by divisive normalization acting at the level of visual-vestibular integration in area MSTd, then we would not expect to see cross-modal suppression in the unisensory inputs to MSTd. Alternatively, cross-modal suppression might be present in the inputs to MSTd and might not reflect normalization at the level of multisensory integration. MSTd neurons receive their predominant visual input from area MT (Born and Bradley, 2005; Maunsell and Van Essen, 1983; Ungerleider and Desimone, 1986), which is not thought to exhibit vestibular responses (Chowdhury et al., 2009; Smith et al., 2012). We found that vestibular stimulation caused no appreciable cross-modal suppression of MT responses (e.g., Fig. 5b). Across the population of MT neurons tested, the median gain ratio (Gcombined / Gvisual) was 0.99 (Fig. 5c), which was not significantly different from unity (p = 0.29, Wilcoxon signed rank test). Moreover, the median gain ratio for MT was significantly different from that for the unisensory MSTd neurons (p = 6.1 ×10-13, Wilcoxon rank-sum test). Similar analyses using the firing rate measured at the largest stimulus amplitude in lieu of the gain parameter of the sigmoidal function produced nearly identical results (Fig. S3c). These results suggest strongly that the cross-modal suppression exhibited by MSTd neurons is not inherited from area MT, consistent with the idea of a stage of divisive normalization acting at the level of multisensory integration (Ohshiro et al., 2011).
Note that vestibular stimulation caused a weak response in some MT neurons (Fig. 5c). Specifically, the median ratio of vestibular and visual response gains, Gvestibular/Gvisual, was 0.06, which is very small but significantly greater than zero (p = 3.8 ×10-10, Wilcoxon signed rank test). We believe that this apparent vestibular activation is not a genuine vestibular response to otolithic input, but rather arises due to retinal slip that is caused by imperfect VOR suppression (Chowdhury et al., 2009). This effect is likely exacerbated in the present study because the ‘vestibular’ stimulus contained stationary background dots (see Methods). Consistent with this interpretation, the apparent vestibular response of a subset of MT neurons was strongly reduced when the background dots were removed (Fig. S5c). In addition, the apparent vestibular response was only seen for MT neurons with receptive fields near the fovea (Fig. S5d), which have stronger responses to slow speeds of motion (Fig. S5e). A true vestibular response would not depend on receptive field location or visual speed selectivity. Critically, MT neurons failed to show any cross-modal suppression regardless of whether they exhibited an apparent vestibular response (Fig. 5c), indicating that cross-modal suppression arises downstream of MT.
Normalization fits MSTd population responses better than alternative model
We showed in Figure 1 that the normalization model exhibits the diagnostic form of cross-modal suppression, whereas the alternative (subtractive) model does not. This suggests that the normalization model should better fit the responses of a population of MSTd neurons than the alternative model. To test this directly, we fit both models to the amplitude response curves for our population of MSTd neurons (see Methods for details). Because the response of any one neuron in each model depends on the activity of all other neurons, it was necessary to fit the models simultaneously to data from all neurons and all stimulus conditions. This involved solving a large scale optimization problem with 833 free parameters (for each model), and required over 25,000 hours of computing time.
Fits of both models to data from an example multisensory MSTd neuron are shown in Figure 6a, b. The normalization model accounts quite well for the responses of this neuron across the various stimulus conditions (R2 = 0.998, Figure 6a), as does the alternative (subtractive) model which has the same number of parameters (R2 = 0.995, Figure 6b). Note, however, that the normalization model predicts cross-modal suppression at large motion amplitudes, whereas the alternative model does not (Fig. 6a,b). Specifically, for the alternative model, the predicted response in the Combined condition with a non-preferred cue2 (vestibular) never falls below the response to the preferred cue1 (visual). Despite failing to capture the cross-modal suppression effect, the alternative (subtractive) model still fits the data quite well because the cross-modal suppression effect is rather subtle for most neurons and often only appears for the largest stimulus amplitudes.
Figure 6. Summary of model fits to the population of MSTd neurons.
Amplitude-response functions for an example MSTd neuron, along with best-fitting curves based on the normalization model (a), or the alternative (subtractive) model (b). The vestibular stimulus (Cue2) was presented at both preferred and non-preferred headings. Note that the diagnostic cross-modal suppression effect is captured well in the normalization model fit (green curve below black curve), but not in the alternative model fit. The partial correlation coefficient between data and model fit is 0.76 for the normalization model, and 0.30 for the alternative model. (c) Fisher z-transformed partial correlation coefficient for the normalization model (ordinate) is plotted against from that for the alternative model (abscissa). Data from multisensory MSTd neurons are shown. Filled symbols correspond to cases that exhibit the diagnostic form of cross-modal suppression as defined in Figure 4. The red filled symbol corresponds to the data shown in (a) and (b). The plot is divided into three areas by the dashed lines; data points in the top-left region are significantly better fit by the normalization model, and those in the bottom-right region are significantly better fit by the alternative model.
To better quantify these subtle differences in performance between the two models, partial correlation coefficients between the neural data and each model fit were computed (see Methods). The Fisher z-transformation was applied to normalize the partial correlation coefficients such that they can be compared effectively (Figure 6c). The median z-transformed partial correlation coefficient for the normalization model (3.34) was significantly greater than that for the alternative (subtractive) model (1.74) (Figure 6c, p = 0.008, signed-rank test), indicating that the normalization model provides a better overall fit to the data.
Importantly, this difference between models was more pronounced for the subset of neurons that showed significant cross-modal suppression (Figure 6c, filled symbols). For this subset, the median z-transformed partial correlation coefficient for the normalization model (4.94) was substantially greater than that for the alternative (subtractive) model (1.22, p = 0.001, signed-rank test). Among the 25 cases with significant cross modal suppression, 18 were significantly better fit by the normalization model (filled symbols in the top-left region of Fig. 6c), whereas only 4 neurons were significantly better fit by the alternative model (filled symbols in the bottom-right region). Data and fits for these four cases are shown in Fig. S6c-f. Critically, for these neurons that were better fit by the alternative model, it is not because the alternative model captured cross-modal suppression. Rather, neither model fits some features of the data very well in most of these cases, especially for the weaker stimulus modality. This may result from inaccurate estimates of the modality dominance weights for these neurons, which were not free parameters. Note that the data from unisensory MSTd neurons were excluded from this model comparison because data from these neurons cannot differentiate the two models. Indeed, partial correlation coefficients from the unisensory MSTd neurons were not significantly different between the two models (p = 0.611, signed rank test).
To further illustrate the differential model predictions with regard to the diagnostic form of cross-modal suppression, we computed response gain ratios from the fitted curves and plotted them as in Figure 4c. For the normalization model, many data points lie in the lower-right quadrant, illustrating the diagnostic cross-modal suppression effect (Figure 7a). A polynomial fit to the data has a y-intercept of 0.91 ± 0.019 (S.E.), which is significantly less than 1 (p = 7.2 × 10-6). In stark contrast, gain ratios from fits of the alternative (subtractive) model avoid the lower-right quadrant (Figure 7b), and the y-intercept of the polynomial fit (0.98 ± 0.015 S.E.) is not significantly different from 1 (p = 0.11). Although the overall scatter of data around the fit of the alternative model is reduced, the key feature is that this model does not predict observations in the lower-right quadrant, inconsistent with data from MSTd (Figure 4c). Thus, the alternative model fails to capture the diagnostic form of cross-modal suppression exhibited by many MSTd neurons.
Figure 7. Summary of cross-modal suppression in fits of the normalization and alternative models.
Response gains were obtained from model fits to the data from each neuron, and are plotted in the same format as Figure 4c. (a) Results for the normalization model; (b) results for the alternative (subtractive) model. In each panel, Combined response gains are plotted against the corresponding response gains for cue2. Values on both axes are normalized by the response gain for cue1. Filled symbols represent cases with significant cross-modal suppression and significant activation by the non-preferred cue2, as identified by the analysis of actual firing rate data shown in Figure 4. Solid black curve: third-order polynomial fit to the data.
Together, these results demonstrate that the divisive normalization model better accounts for the multisensory responses of MSTd neurons than the alternative (subtractive) model.
Discussion
We tested a key diagnostic prediction of the divisive normalization model of multisensory integration and found that the activity of many neurons in macaque area MSTd exhibits cross-modal suppression by a non-preferred stimulus that activates the neuron on its own. This finding is incompatible with a class of alternative models of multisensory integration that involve subtractive inhibition, as discussed further below. We also find that unisensory MSTd neurons exhibit cross-modal suppression, whereas unisensory MT neurons do not. Our findings, including the horizontal shift of amplitude response functions induced by cross-modal suppression, are consistent with the action of a divisive mechanism operating at the level of multisensory integration in MSTd. Together, our findings provide strong experimental support for the proposal that basic properties of multisensory integration by single neurons can be accounted for by simple models that include mechanisms functionally equivalent to divisive normalization (Ohshiro et al., 2011).
Normalization as feedback
The divisive normalization operation is often schematized in a form such that the activity of a neuron after linear combination of inputs is divisively normalized by the summed “pre-normalized” activity of all other neurons (Heeger, 1992b; Ohshiro et al., 2011). However, it is biologically implausible that a real neuron has access to the “prenormalized” activity of other neurons. Fortunately, the original formula for the normalization model (Eqn. 5) can be transformed into a mathematically equivalent form (Eqn. 6) in which each neuron's response depends on the spiking activity of other neurons (see Methods).
The more intuitive formulation of the model in Eqn. 6 can be implemented in the form of “feed-back normalization” (Heeger, 1992b), and our multisensory normalization model (Figure 1) is also schematized in this form. In this scenario, the membrane potential of a neuron is transformed into spiking activity after a gain-modulation step involving a variable membrane conductance that depends on the pooled spiking activity of other neurons (Anderson et al., 2000; Borg-Graham et al., 1998; Carandini et al., 1997). However, a membrane-conductance mechanism is just one possibility. A pre-synaptic modulation of excitatory input (Boehm and Betz, 1997) is also compatible with the feedback normalization scheme. Consistent with this idea, withdrawal of excitation was recently demonstrated as a potential mechanism of divisive normalization in mouse V1 (Sato et al., 2016). Indeed, our findings are consistent with any mechanism that is operationally similar to divisive normalization.
The “feed-back normalization” formulation (Eqn. 6) allowed us to fit the model to data from our population of MSTd neurons (Figures 6 and 7). In previous work, a simplifying assumption—that the net unnormalized activity of a population of V1 neurons could be approximated by the local average image contrast—allowed a normalization model to be fit to responses of individual neurons (Busse et al., 2009; Carandini and Heeger, 2012; Carandini et al., 1997; Heeger, 1992b; Sato et al., 2016). Since we were not able to make a comparable simplifying assumption for MSTd responses to self-motion, and given that the response of each neuron depends on that of all other neurons, it was necessary to fit the models to data from all neurons simultaneously. This required solving a very large optimization problem.
Relationship to previous studies
We are not the first to describe cross-modal suppression in single-unit activity. Indeed, a number of examples of cross-modal suppression by a non-optimal stimulus from one modality have been demonstrated in the literature (Kadunce et al., 1997; Meredith and Stein, 1996; Wallace et al., 1996). However, in these cases, the response to the non-preferred input is generally very weak or absent, and it is not clear whether the non-preferred input activates or suppresses the neuron on its own. If the non-preferred input is suppressive by itself, then cross-modal suppression is rather trivial and does not distinguish different mechanisms. In contrast, cross-modal suppression by an activating input is a diagnostic feature of multisensory divisive normalization (Ohshiro et al., 2011). We have demonstrated conclusively, for the first time, that many multisensory neurons exhibit the diagnostic form of cross-modal suppression predicted by divisive normalization. Note that the diagnostic form of cross-modal suppression is only expected in a narrow stimulus regime (Figure 1). In most cases, one has to hunt through the stimulus space carefully to find such a stimulus regime (e.g., Figure 2). For many cells, we probably missed the optimal stimulus conditions to elicit the effect; thus, the percentage of neurons that show the diagnostic form of cross-modal suppression (filled green symbols, Fig. 4c) is almost certainly underestimated by a substantial amount.
The literature also contains some examples of cross-modal suppression in response to stimuli that may be preferred for both modalities tested (Avillac et al., 2007; Diehl and Romanski, 2014; Sugihara et al., 2006). These are generally cases in which the response of one modality is much weaker than the other, and we have previously shown that normalization can also produce cross-modal suppression in such cases (Ohshiro et al., 2011). In our sample of MSTd neurons, a few cells showed cross-modal suppression when tested with preferred heading stimuli for both modalities (purple symbols in the lower-right quadrant of Fig. 4c), and these were neurons with rather imbalanced visual and vestibular responses.
Our findings are analogous to stimulus interactions reported within sensory modalities. For example, an analogous form of suppression by a weakly activating stimulus has been demonstrated in responses of cat V1 neurons (Cavanaugh et al., 2002). Our results are also closely related to experimental observations of response averaging, a property widely observed in visual neurons that are tested with multiple stimuli (Alvarado et al., 2007; Britten and Heuer, 1999; Busse et al., 2009; Carandini et al., 1997; Recanzone et al., 1997; Zoccolan et al., 2005). Since the response of these neurons to multiple visual stimuli is typically intermediate between responses to the individual stimuli (similar to averaging), the response to the more effective stimulus is suppressed by simultaneous presentation of the less effective stimulus (which may activate the neuron when presented alone). Divisive normalization has also been shown to successfully account for these types of averaging phenomena within a single sensory modality (Busse et al., 2009; Carandini and Heeger, 2012; Carandini et al., 1997). This suggests that there are multiple stages of divisive normalization, and our findings provide novel evidence for stages of normalization following multisensory combination of inputs from different sensory systems.
Comparison to other models and mechanisms
Groh (2001) presented a “summation/saturation” model that has a similar architecture to the normalization model: a layer of “pre-normalized” units is followed by an inhibitory neuron that sums this “pre-normalized” activity and modulates neural responses in the subsequent target layer (Figure 1C of Groh, 2001). Although this model was not intended as an account of multisensory integration, it does predict interactions between multiple stimuli that are qualitatively similar to our normalization model. Specifically, it predicts that adding a weak stimulus to a strong stimulus produces response enhancement for low intensities but response suppression for high intensities (Figure 3C, Groh 2001), which is analogous to our diagnostic cross-modal suppression.
Other published models of multisensory integration have assumed a subtractive inhibition mechanism to account for cross-modal suppression (Cuppini et al., 2010; Ursino et al., 2009). We have shown previously (Ohshiro et al., 2011) that a generic form of these alternative models based on subtractive inhibition does not exhibit the diagnostic form of cross-modal suppression that is predicted by divisive normalization. In the alternative models (Cuppini et al., 2010; Ursino et al., 2009), a measure of total population activity is subtracted from the multisensory input to a neuron, and the resulting difference is transformed into firing rate by a sigmoidal static non-linearity. Indeed, this basic architecture involving subtractive inhibition is commonly used in neural network models (Dayan and Abbott, 2001). In contrast, the multisensory input is divided by a measure of total population activity in normalization models (Carandini et al., 1997; Ohshiro et al., 2011).
We provide a mathematical argument (see Methods) that alternative models based on subtractive inhibition only exhibit cross-modal suppression due to an excitatory non-preferred input if the sigmoidal static non-linearity, h(x), has a sharp exponential rising phase and the network is activated with weak stimuli. This prediction lies in marked contrast to that of the normalization model, which shows the diagnostic form of cross-modal suppression more robustly for strong stimuli than weak stimuli (see Fig. 1b). The subtractive inhibition term in our alternative model directly scales with population activity (Eqn. 10). Applying a threshold to this term would produce stronger inhibition for the combined stimuli than for the component unisensory stimuli, potentially producing the diagnostic cross-modal suppression. However, in separate simulations (data not shown), we found that this variant of the alternative model only produced the diagnostic cross-modal suppression effect for intermediate stimulus intensities, and that it was accompanied by a negative slope in the intensity-response function for the non-preferred cue. Neither of these features are predicted by the normalization model nor observable in our data. Thus, the thresholded variant of the alternative model is less likely to account for our findings.
Another recent subtraction-based model shows response normalization for combinations of stimuli (Rubin et al., 2015). This model consists of separate networks of excitatory and inhibitory neurons, and exhibits richer network dynamics than our normalization model. This includes simultaneous reductions in both excitation and inhibition during strong recurrent activation of the local network (Ozeki et al., 2009), which has been observed experimentally in mouse V1 (Sato et al., 2016). It remains to be examined if the model of Rubin et al. (2015) can demonstrate cross-modal suppression for strong stimuli, as observed for real MSTd neurons.
Schwabe et al. (2006) proposed a subtraction-based recurrent network model of V1 neurons. This model accounts for response facilitation/suppression induced by stimuli presented in the surround of V1 receptive fields (Ichida et al., 2007), which critically depends on the intensity of stimuli presented simultaneously to the receptive field center. These effects can resemble our diagnostic cross-modal suppression effect. However, no positive activating response to the surround stimulus alone was demonstrated in the study by Ichida et al. (2007) and a related study (Polat et al., 1998), nor in the recurrent model of Schwabe et al. (2006). Thus, the link between these studies and our cross-modal suppression effect is unclear. Importantly, our findings do not strongly constrain possible neural mechanisms of normalization; any mechanism that is operationally equivalent to division might account for the cross-modal suppression effects we observe.
Unisensory inputs to MSTd
MSTd receives its major visual input from the adjacent area MT (Maunsell and Van Essen, 1983; Ungerleider and Desimone, 1986). Our normalization model (Fig. 1a) assumes that MSTd neurons receive purely visual inputs from area MT, which is consistent with previous studies that did not find vestibular signals in MT (Chowdhury et al., 2009; Smith et al., 2012). In contrast to the well characterized pathway by which visual signals propagate to area MSTd, the source of vestibular inputs to MSTd remains unclear, and it is likely that MSTd receives vestibular input through other cortical areas (Chen et al., 2010, 2011a, b, c, 2013; Fetsch et al., 2010). Although our normalization model assumes that MSTd neurons receive visual and vestibular inputs from separate sources (Fig. 1a), we cannot rule out the possibility that vestibular inputs to MSTd have already been integrated with visual signals. Identification and detailed characterization of the vestibular input to MSTd awaits further study.
In our formulation of the multisensory integration model (Eqn. 2), a unisensory neuron is simply a cell for which either of the modality dominance weights is set to zero. Consistent with this idea of a continuum between unisensory and multisensory neurons, the gain ratio data from both types of MSTd neurons appear to fall along a single trend line (Fig. S3d). This observation suggests that unisensory MSTd neurons are not a distinct class of neurons; rather, their response properties are as expected from a normalization model in which one modality dominance weight (typically, the vestibular weight in MSTd) is close to zero.
In conclusion, our findings provide strong evidence for a stage of divisive normalization operating at the level of multisensory integration. These findings suggest that a diverse array of experimental findings in multisensory integration may be accounted for by a small number of nonlinear mechanisms. Together with previous results demonstrating a role for divisive normalization in diverse neural phenomena including attention, decision-making, and motor control, our results support the idea that normalization is a canonical neural computation that is repeated at multiple stages in the brain.
Star * Methods
Contact For Reagent and Resource Sharing
Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Gregory C. DeAngelis (gdeangelis@cvs.rochester.edu).
Experimental Model and Subject Details
Macaca mulatta
Physiological experiments were performed with two male rhesus monkeys (Macaca mulatta) weighing 13 and 16 kg. Both animals were ages 6-8 years during the course of the studies, and were pair-housed in a vivarium with a 12 hr light cycle (6AM to 6PM). All animal surgeries and experimental procedures were approved by the University Committee on Animal Resources at the University of Rochester and were in accordance with National Institutes of Health guidelines.
Method Details
Surgery
Under sterile conditions, monkeys were chronically implanted with a circular Delrin ring (diameter: 7cm) for head stabilization, as described previously (Gu et al., 2006), as well as a scleral search coil for measuring eye movements. After recovery, animals were trained to fixate visual targets for fluid reward using standard operant conditioning techniques. For electrophysiological recording, a Delrin grid (2.5 × 4.5 × 0.5 cm) containing rows of holes was stereotaxically secured to the skull inside the head-restraint ring and was positioned in the horizontal plane. The holes in the grid (0.8 mm spacing) allowed vertical penetration of microelectrodes into the brain via transdural guide tubes that were inserted through a small burr hole in the skull. Burr holes were made surgically under aseptic conditions while the subjects were anesthetized. The recording grid extended bilaterally from the midline to regions overlying areas MST and MT in both hemispheres.
Motion platform and visual stimuli
During experiments, monkeys were seated comfortably in a primate chair with their head restrained. The chair was securely attached to a 6-degree-of-freedom motion platform (MOOG 6DOF2000E; Moog, East Aurora, NY) and resided within a field-coil frame used to track eye movements. The motion platform allowed physical translation along any axis in 3D (Gu et al., 2006). Platform motion and optic flow stimuli could be presented either together or separately, and the directions of self-motion indicated by optic flow and platform motion could be either congruent or disparate.
Inertial motion (“vestibular”) stimuli were generated by physically translating the subject with the motion platform (whole body translation). Each movement followed a Gaussian velocity profile with duration of 2 s and a standard deviation of 1/3 s (Gu et al., 2006) (Figure S1). The command signal to the motion platform had a peak velocity at the middle (+1 s) of the stimulus presentation period; however, the actual motion of the platform is delayed by ∼100ms from the command signal due to platform dynamics. The amplitude of translation varied from ∼0 to 12 cm, total displacement. At the maximum translation amplitude, the peak velocity and peak acceleration of the movement were 28.7 cm/s and 104.5 cm/s2 (∼0.1 g), respectively.
Optic flow stimuli that simulated self-motion were rear-projected onto a 60 × 60 cm tangent screen that was positioned 30 cm in front of the monkey, thus subtending 90° × 90° of visual angle. Visual stimuli were presented using a three-chip digital light projector (Christie Mirage S+3K, Christie Digital, Cyrus, CA) mounted on top of the motion platform. Optic flow stimuli were generated in Visual C++ using the OpenGL graphics libraries, and were rendered by a Quadro FX 1400 graphics board. Self-motion was simulated by moving the OpenGL “cameras” through a virtual environment (Gu et al., 2006). The virtual environment consisted of a random 3D cloud of “stars” that was 100 cm wide, 100 cm tall, and 40 cm deep. The star field had a density of 0.01 stars per cm3, with each star being a 0.15 × 0.15 cm triangle. To provide stereoscopic depth cues, the stimulus was rendered as a red-green anaglyph and was viewed through custom red-green goggles (Kodak Wratten filters; red #29, green #61). To avoid extremely large (near) stars from appearing in the display, stimulus elements within 5 cm of the eyes were not rendered by virtue of the near clipping plane. Precise synchronization of optic flow and platform motion was achieved through a dynamical systems analysis of the motion platform and a delay parameter was adjusted to achieve synchronous movement within a resolution of a few ms (Gu et al., 2006).
For all stimulus conditions, the animal was required to fixate a central visual target (0.2° in diameter) for 200 ms before stimulus onset (fixation window spanned 1.7° × 1.7° of visual angle). The animal was rewarded at the end of each trial for maintaining fixation throughout stimulus presentation. To aid the animal to distinguish the fixation target from the star field and to reduce visual tracking of the optic flow stimuli, the fixation target was surrounded by a textured circular patch containing an orthogonal grid pattern of fine lines (width: 1 pixel) spaced 0.5° apart. The diameter of this fixation patch was gradually reduced to 1° to minimize occlusion of the optic flow stimulus. The sides and top of the search coil frame were covered with a black enclosure to prevent the animal from experiencing any visual motion other than the optic flow presented on the display screen.
Electrophysiological recordings
We recorded extracellular activity from single neurons in four hemispheres of two monkeys. A tungsten microelectrode (Frederic Haer Company, Bowdoinham, ME; tip diameter 3 μm, impedance 1-2 MOhm at 1 kHz) was advanced into the cortex through a transdural guide tube, using a hydraulic micromanipulator (Narishige) mounted on top of the head-restraint ring. Action potentials were amplified and isolated using a head-stage pre-amplifier, a band-pass eight-pole filter (Krohn-Hite, model 3384; 400-5000 Hz), and a dual voltage-time window discriminator (Bak Electronics, model RP-1). The times of occurrence of action potentials and all behavioral events were recorded with 1 ms resolution by the data acquisition computer. Raw neural signals were also digitized at 25 kHz and stored to disk for offline spike sorting and additional analyses. Experimental control and data acquisition was coordinated by scripts written with TEMPO software (Reflective Computing)
Area MSTd was localized with aid from structural MRI scans and a standard macaque atlas (Van Essen et al., 2001). MSTd was typically identified as a region centered ∼15 mm lateral to the midline and ∼3-6mm posterior to the interaural plane. Electrode penetrations were also guided by the pattern of background activity as the electrode traversed through gray and white matter, as well as the response properties of neurons to visual stimuli. MSTd was usually the first gray matter encountered, ∼6 to 10 mm below the cortical surface, that exhibited prominent response modulation to flashing random-dot stimuli and direction-selective responses to motion of the dots. Once action potentials were satisfactorily isolated from a single neuron, we first mapped the receptive field (RF) of the neuron manually by manipulating the size, location, and motion of a patch of drifting random dots using a custom graphical interface. MSTd neurons typically had large RFs that occupied most of a quadrant, if not more, in the contralateral hemi-field. The RF often extended into the ipsilateral visual field and included the fovea.
Recordings were also made from the middle temporal (MT) area by further advancing the electrode into the posterior bank of the superior temporal sulcus (STS). MT neurons were identified by their much smaller receptive fields (diameter ∼eccentricity) and strong direction selectivity in response to random-dot stimuli. We also took advantage of the known retinotopic organization of MT receptive fields to identify this area: receptive field centers shifted gradually from foveal to peripheral as the recording sites moved from antero-lateral to postero-medial within the posterior bank of the STS (Kolster et al., 2009; Komatsu and Wurtz, 1988; Maunsell and van Essen, 1983). We were careful to distinguish area MT from the lateral subdivision of area MST (MSTl), which is located anterior to area MT in the STS (Kolster et al., 2009; Komatsu and Wurtz, 1988; Nelissen et al., 2006). We mainly targeted regions of area MT that were located postero-medially, such that we were confident to avoid MSTl. Most MT neurons had receptive fields centered near the horizontal meridian, with a median eccentricity of 13° (Figure S5). Occasionally, we encountered neurons with RFs that covered the fovea but were fairly large in size (diameter ∼10°), or RFs that extended considerably into the ipsilateral visual field. As these might have been MSTl neurons (Komatsu and Wurtz, 1988), they were excluded from the sample out of caution. The direction tuning of MT neurons was examined using the same optic flow stimulus used in MSTd recordings (see below), except that headings were restricted to the fronto-parallel plane. MT neurons that did not respond to our optic-flow stimulus due to strong surround suppression (N = 16) were not recorded.
Experimental protocol
After a basic characterization of visual response properties was obtained, we next examined visual and vestibular heading selectivity in three-dimensional space while monkeys performed a fixation task. Self-motion stimuli, either vestibular or optic flow, were presented along 26 headings corresponding to all combinations of azimuth and elevation angles in increments of 45° (Gu et al., 2006). The visual and vestibular conditions were randomly interleaved within a block of trials. The amplitude of translation was 10 cm. In the vestibular condition of this preliminary test, the display was blank, except for a head-fixed fixation target and a small patch of texture around the fixation target. In the visual condition, the motion platform was stationary and optic flow simulated self-motion.
Heading tuning profiles were displayed in an on-line graphical user interface, similar to Figure 2A. We transformed the data using the Lambert cylindrical equal-area projection (Gu et al., 2006). The heading preference of a neuron was estimated for each stimulus modality by visually inspecting the heading tuning profile after two to five stimulus repetitions were complete. An initial classification of neuron type (multisensory or unisensory) was also made based on the heading tuning profiles: a neuron with clear heading selectivity for both modalities was labeled as multisensory. In a substantial minority of cases, the vestibular heading preference was not determined confidently due to weak responses and the appearance of multiple peaks in the heading tuning profile. In these cases, the stimulus that produced the largest response was chosen as the vestibular heading preference. Neurons that responded only to stimuli of one modality were provisionally classified as unisensory, and all unisensory MSTd neurons were visual (n = 32). Some neurons were later reclassified based on more extensive data from the main experimental protocol, as described below. The experimental protocol for unisensory MSTd neurons was slightly different from that for multisensory neurons.
For multisensory MSTd neurons, we proceeded to perform a search protocol to identify the range of heading offsets for one modality that might produce cross-modal suppression. The heading for one modality (e.g., vestibular) was fixed at the neuron's preferred heading, while the heading for the other modality (e.g., visual) varied in 30° steps of elevation around the clock from the neuron's preference. Motion amplitude was fixed at 10 cm for both modalities. The combined response, the two single-modality responses, and baseline activity were plotted on-line, as in Figure 2B. We sought a range of headings of the variable modality that produced: (1) a response greater than baseline activity, and (2) a suppression of the combined response relative to the fixed unimodal stimulus.
Note that these criteria for the diagnostic form of cross-modal suppression were not satisfied during the search protocol for all neurons. In such cases, we still chose multiple non-preferred headings that met the requirements partially, and proceeded to perform the main experimental protocol. Based on outcomes from the first animal, we generally did not perform the search protocol in monkey 2. Instead, one or two non-preferred headings, located along a flank of the heading tuning profile, were selected by visual inspection. We found that this time-saving step did not reduce the frequency with which we were able to identify neurons that showed the diagnostic form of cross-modal suppression. For some multisensory neurons, we explored heading offsets as large as 180° (Figure S3). These were typically cases in which some activating response was produced even at the anti-preferred heading. Regardless of the approach, the end result of these search procedures was a set of non-preferred headings that was likely to produce cross-modal suppression.
Next, we ran the main experimental protocol to measure neural responses to various stimulus amplitudes. We presented vestibular and visual stimuli separately and together at the following motion amplitudes: 0.0, 0.6, 1.6, 4.4 and 11.8 cm. For the combined cue condition, the amplitudes of both stimulus modalities were varied together (yoked). We fixed the heading of one modality to the neuron's preferred direction, whereas the heading of the other modality was specified as either the neuron's preferred heading or one of the non-preferred headings that were identified in the search process described above. We more frequently fixed the heading for the dominant modality (typically visual) of the neuron (81% of cells), but we also sometimes fixed the heading of the non-dominant modality (19%). All stimuli for the vestibular, visual, and combined conditions were tested in a block of randomly interleaved trials. When isolation remained satisfactory, we repeated the search protocol after switching the roles of the two modalities (fixed versus varying heading), and then ran the main experimental protocol again with the other modality fixed at the preferred heading (n = 9 neurons).
For MSTd neurons that were initially classified as unisensory (n = 32, all visual cells), we skipped the search protocol and proceeded directly to the main experimental protocol. In these cases, the visual stimulus was fixed at the preferred heading and was paired with vestibular stimuli presented at 0°, 90°, and/or 180° offsets from the visual heading preference. Motion amplitude was varied over the same range (0.0, 0.6, 1.6, 4.4 and 11.8 cm) for most neurons, although a subset of neurons (n = 12) was tested with a different range (0.0, 0.1, 1.0, 2.0, 4.0 and 8.0 cm).
MT neurons were tested using largely the same experimental protocol, with a couple of exceptions. First, the initial characterization of heading selectivity for MT neurons was restricted to the fronto-parallel plane. As for unisensory MSTd neurons, we did not carry out a search procedure for MT neurons; rather, the heading offsets for the variable modality were generally chosen to be 0°, 90°, and 180°. Motion amplitude was varied over the same range used for most MSTd neurons: 0.0, 0.6, 1.6, 4.4 and 11.8 cm.
Neurons were included in our main sample if each distinct stimulus was successfully repeated at least five times in the main experiment (median repetition number: 8). Data from 55 MSTd neurons were excluded based on this criterion.
Baseline activity of neurons was measured while the animal fixated on a central target, and a cloud of stationary stars was presented on the screen while the motion platform remained stationary. This baseline condition was included in the block of randomly interleaved trials with the other stimulus conditions. All evoked neural activity elicited by moving stimuli was referenced to this baseline. We chose this baseline such that any deviation from baseline response is attributable to motion energy contained in the stimulus. In the vestibular condition (i.e., optic flow motion amplitude = 0; vestibular amplitude ≠ 0), the cloud of stationary stars was also presented while the body was translated. We also tested 24 neurons for cross-modal suppression using a vestibular condition in which there were no stationary dots present on the display, and results (in the format of Figure 4C) were similar to those of our standard protocol (y-intercept = 0.78; 95% CI = [0.63, 0.92]).
Quantification of amplitude-response functions
Amplitude-response functions (e.g., Figures 4A and 4B) were fit with a hyperbolic ratio function (Albrecht and Hamilton, 1982):
(Equation 1) |
where r represents the mean firing rate of the neuron, c represents the stimulus motion amplitude, c50 is a semi-saturation constant that determines the motion amplitude at which the function begins to saturate, and b represents baseline response. G(cue1,cue2) indicates the response gain parameter that was used to quantify response strength for a particular combination of cue1 and cue2. In this notation, G(p,p) represents the response gain when both cue1 and cue2 have preferred directions, G(p,n) denotes gain for a preferred cue1 and a non-preferred cue 2, G(p,0) indicates gain for a unisensory preferred cue1 stimulus, and G(0,p) or G(0,n) represent gains for unisensory stimuli with preferred or non-preferred headings for cue 2, respectively. Curve fitting was performed simultaneously for all of the amplitude-response functions measured during a single block of trials (e.g., Figures 4A and 4B). c50, n and b are free parameters that are common to all of the response functions that were fit simultaneously. In other words, all fitted curves for a neuron were constrained to have the same shape and baseline response.
In a separate analysis, amplitude-response functions were also fit with a simple linear function including a threshold, to determine whether a saturating function is needed to fit the data well. The hyperbolic ratio model and the linear model were compared using the corrected form of Akaike's information criteria (AICc), and the hyperbolic ratio model was preferred for 111/119 neurons. Across the population, the median AICc value was significantly less for the hyperbolic ratio model than the linear model (p= 1.8 × 10−20, Wilcoxon signed-rank test).
A pair of response gain ratios ((G(0, p)/G(p, 0)), (G(p,p)/G(p,0))) was computed from the fits to stimulus conditions in which both modalities were presented at the heading preferences of the neuron (e.g., Figure 4A). An analogous pair of response gain ratios ((G(0,n)/G(p,0)), G(p,n)/G(p,0))) was computed from conditions in which one modality was presented at the preferred heading and the other modality at a non-preferred heading (e.g., Figure 4B). These two ratios were used to determine whether a neuron exhibited the diagnostic form of cross-modal suppression predicted by the normalization model (Figures 1C and 4C). For Figure 5C, Gvestibular and Gvisual denote the gains for the unisensory vestibular cue and the preferred (fixed) visual cue, respectively. Gcombined indicates the response gain for the combined cues.
Post hoc classification of unisensory neurons
During preliminary testing, each MSTd neuron was classified as multisensory or unisensory, as described above. However, in some cases, this classification was tenuous based on the limited data in our screening test. Thus, for all MSTd neurons, we performed a post hoc analysis to test whether the response gain ratio Gvestibular/Gvisual for the preferred vestibular heading was significantly greater than zero. From this analysis, 8 neurons that were provisionally classified as multisensory were not found to be significantly activated by vestibular stimuli at the preferred heading. These neurons were thus re-classified as unisensory neurons. On the other hand, 7 neurons that were provisionally classified as unisensory were found to be significantly activated at some of the headings tested. These neurons were re-classified as multisensory neurons. Thus, there were 33 unisensory MSTd neurons in our sample (Figure 5C), and the remaining 68 neurons were multisensory. This reclassification had no substantive effect on our main findings.
Divisive normalization model of multisensory integration in MSTd
We used a modified version of the MSTd model described previously (Ohshiro et al., 2011). Briefly, the multisensory input to the jth model MSTd neuron is expressed as a weighted linear sum of the vestibular and visual inputs:
(Equation 2) |
In this equation, dvest.j and dvis.j represent the modality dominance weights of each neuron. The modality dominance weights, which take non-negative values, determine the relative strengths of the vestibular and visual responses of a model neuron. cvest and cvis denote the vestibular and visual stimulus amplitudes, whereas n1.j and n2.j are exponents for the power-law non-linearity relating these physical quantities with neural activity. basej is a positive constant, which was not included in the previously published version (Ohshiro et al., 2011), to model the non-zero baseline activity of MSTd neurons. It is assumed to be constant over the experiment and unique to each neuron. Svest.j and Svis.j represent the tuning functions of vestibular and visual inputs to the multisensory neuron (Figure 1A). The heading tuning function of the vestibular input is modeled as
(Equation 3) |
where ∅vest.j represents the angle (in 3D) between the vestibular heading preference of the neuron and the vestibular stimulus heading. Svest.j is a bell-shaped function, symmetric around its peak at ∅vest.j = 0°. ∅vest.j can be expressed in terms of azimuth (φ̂vest.j) and elevation (θ̂vest.j) components of the heading preference, as well as azimuth ((φvest) and elevation (θvest) components of the stimulus:
(Equation 4) |
where Ĥvest.j = [cosθ̂vest.j * cosφ̂vest.j, cosθ̂vest.j * sinφ̂vest.j, sinθ̂vest.j] and Hvest = [cosθvest * cosφvest, cosθvest * sinφvest, sinθvest]. The dot operator ‘•’ denotes the inner product of the two vectors. Svis.j, φ̂vis.j and θ̂vis.j were defined analogously. n0.j controls the width of the tuning surface around its peak, without changing the height of the peak. The larger (smaller) the value is, the narrower (wider) the width. For the simulation results shown in Figure 1, we assumed a homogeneous population of neurons: base was fixed at 0.3, n1 and n2 were set to 1, and n0 was set to 2 for all neurons. Each dominance weight parameter could take one of five possible values [1.0, 0.75, 0.5, 0.25 or 0.0].
Other than these parameters, our model was designed to roughly mimic known neurobiological properties of MSTd neurons. We incorporated the experimental observations that vestibular and visual heading preferences are often mismatched, and that there are more neurons tuned to lateral self-motion than fore-aft motion (Gu et al., 2006). Specifically, two random vector variables, (φ̂vest, θ̂vest) and (φ̂vis, θ̂vis) were generated to mimic the experimentally observed distributions of heading preferences, then these preference vectors were paired randomly (1024 pairs). We subsequently added 265 pairs with congruent heading tuning and 256 pairs with opposite tuning to better mimic the experimentally observed joint distribution of heading preferences in MSTd neurons (Gu et al., 2006). For each distinct combination of visual and visual heading preferences, there were 5 × 5 = 25 possible combinations of modality dominance weights. Combining these factors, a population of 38625 units constituted the model of Figure 1 (1545 heading preference combinations × 25 dominance weight combinations).
The output of each model neuron was a divisively normalized version of its activity (Heeger, 1992b; Ohshiro et al., 2011), given by
(Equation 5) |
In this equation, N (= 38625) denotes the number of neurons in the population, Rmax.j is the maximum firing rate of the jth model neuron, n3.j is the exponent for the power-law nonlinearity that relates membrane potential and firing rate, and αj is the semi-saturation constant. Note that αj causes responses to rise gradually and saturate as a function of stimulus intensity. This semi-saturation constant is a common feature of divisive normalization models going back to Heeger (1992b). Finally, ej is a parameter introduced here to make the formulation of the normalization model consistent with the alternative (subtractive) model described below; it determines how much normalizing effect the population activity has on each model neuron.
After a mathematical transformation of Equation 5, we obtain the following alternative description of the normalization model:
(Equation 6) |
Importantly, the output of the jth neuron is now expressed in terms of the final (normalized) output of other neurons (rk), as opposed to being a function of the pre-normalized activity, Lk, of other neurons as it was in Equation 5. This transformation allows us to model the observable responses of one neuron in terms of the observable responses of other neurons in the population.
When Rmax.j, ej, (= 1)and αj are assumed to be common to all neurons in the population, Equation 6 can be simplified into a form that provides further intuition:
(Equation 7) |
From this form, one can easily recognize that the pre-normalized neural activity, (Lj)n3.j, is gain modulated by a term that decreases with the total population activity, Σkrk (Carandini et al., 1997). Somatostatin-positive inhibitory interneurons may play a critical role in pooling local spiking activity and exerting suppression on nearby pyramidal neurons (Adesnik et al., 2012).
For the simulations in Figure 1B, Rmax.j is fixed somewhat arbitrarily at 10 spikes/s for j = 1 to N, n3.j is fixed at 1.0, αj is fixed to 0.3, and ej is 1.0. All of our conclusions are robust to substantial variations in these parameters. Data are shown for a model neuron with equal dominance weights for both modalities (dvest = dvis = 1.0) and congruent heading preferences (φ̂vest = φ̂vis = 90°; θ̂vest = θ̂vis = 0°). The visual stimulus for the simulation was fixed at the neuron's preferred heading (φvis, θvis) = (90°, 0°), while the vestibular stimulus heading was offset from the preferred heading by 0°, 60°, and 120°: (φvest, θvest) = (90°, 0°), (90°, 60°), and (90°, 120°). As in the actual experiments, vestibular, visual and combined responses were simulated for the following stimulus amplitudes: cvest = cvis = 0.0, 0.6, 1.6, 4.4 and 11.8. For Figure 1C, the sampling interval of the vestibular stimulus heading was decreased to 15°, and stimulus amplitude was fixed at the maximum value: cvest = cvis = 11.8.
Alternative (subtractive inhibition) model
We contrasted predictions of the divisive normalization model with those made by a general family of alternative models that involve subtractive inhibitory interactions among neurons. The temporal dynamics of neural responses in the alternative model are given by (Ohshiro et al., 2011)
(Equation 8) |
In this equation, Vj represents the membrane potential of the jth neuron, and Lj denotes the linear combination of sensory inputs defined in Equation 2. ej has a similar role to that included in the normalization model, and is necessary to couple two physically distinct quantities, the multisensory inputs and the population firing rate. hj(x) (≥0: − ∞ < x < ∞) is a static non-linearity which relates the membrane potential to the instantaneous firing rate of the neuron. For the static non-linearity, we used a sigmoidal function, hj(x)=Rmax.j * ((x)n3.j/(αj+(x)n3.j for x ≥ 0, and hj(x) = 0, otherwise. αj and n3.j in the denominator have similar roles to those in Equation 5. wj,k (= wk,j for any k and j) represents the synaptic weights for lateral interactions between the jth and kth multisensory neurons. Therefore, the term Σkwj,k * h(Vk) represents the total input to the jth neuron from all other neurons via lateral connections. This term should be negative for the lateral interactions to be inhibitory.
For simplicity, we simply assumed equal weights among neurons, irrespective of their heading selectivity, modality dominance, and congruency (i.e., wj,k = - 1 for all k and j; Ohshiro et al., 2011). We note that equal weights are also assumed in the normalization pool, ΣLk (Equation 5), of the normalization model. Note that the number of the model parameters, Rmax.j, n1.j, n2.j, n3.j, αj, basej, and ej is the same as in the normalization model (Equation 5), simplifying comparison of fitting performance between the two models, as described further below.
Neural activity obtained at steady-state equilibrium (dV/dt = 0 in Equation 8) was used in all quantitative analyses of the alternative (subtractive) model:
(Equation 9) |
Then, we obtain the final simplified formula for the alternative model by substituting wj,k = −1:
(Equation 10) |
Note here that a quantity proportional to the net population activity is subtracted from the multisensory input, Lj, prior to transformation by the static non-linearity. This lies in stark contrast to the multiplicative effect of population activity in the normalization model (Equation 7). Equation 10 was solved using the MATLAB fsolve command. As in simulations of the normalization model, Figure 1C shows data for an alternative model neuron with modality dominance weights equal to 1.0 and congruent heading preferences. Stimulus amplitude was fixed at the maximum value: cvest = cvis = 11.8. base.j was fixed at 5. ej, αj and n3.j were set to 1, 8.7 and 1 (j = 1 to N), respectively.
Cross-modal suppression in the alternative model
Here, we examine the mathematical conditions necessary for the alternative model (Equation 9) to produce the diagnostic form of cross-modal suppression. For the treatment below, we assume (arbitrarily) that the vestibular stimulus is fixed at the preferred heading and that the visual stimulus is allowed to deviate from the cell's preference. We also assume that ej is constant across neurons (ej=e) and that the lateral connection weights wj,k are normalized by (e/N) for simplicity. To determine whether the visual input is activating or suppressive on its own, we compute the difference between the visual-only response (cvest = 0, cvis ≠ 0 in Equation 2) and baseline activity (cvest = 0, cvis = 0):
(Equation 11) |
Because h() is a monotonically increasing function, index1 defined below should be positive if the visual input is excitatory (i.e., Equation 11 > 0):
(Equation 12) |
Similarly, index2 defined below should be negative if the combined cue response rj(cvest, cvis) is suppressed below the vestibular-only response (cvest ≠ 0, cvis = 0):
(Equation 13) |
Importantly, index2 is related to index1 according to
(Equation 14) |
where the additivity term, addk, represents the difference between the combined response and the sum of unisensory responses (after removal of baseline activity):
(Equation 15) |
The term Σkwj,k * addk(cvis, cvest) in Equation 14 has a critical influence upon whether the alternative model exhibits the diagnostic form of cross-modal suppression. If Σkwj,k * addk(cvis, cvest) is positive, then the diagnostic type of cross-modal suppression (index1 > 0 and index2 < 0) cannot occur. We show below that this term is very likely to be positive in the alternative model under the conditions in which MSTd neurons show the diagnostic form of cross-modal suppression.
The additivity term, addk, determines whether the neuron integrates the multisensory cues super-additively (addk > 0), additively (addk = 0), or sub-additively (addk < 0). To be consistent with the responses of real multisensory neurons and the principle of inverse effectiveness (Stein and Stanford, 2008), the model needs to show subadditive integration for strong inputs; thus, it must be the case that addk(cvest, cvis)≤0 for strong stimuli.
Given that the alternative model involves subtractive inhibition, we expect that the weights, wjk, will be largely negative. Indeed, these weights would have to be largely negative for model neurons to show cross-modal suppression at all. Combining this observation with the fact that responses should be subadditive for strong stimuli, we conclude that Σkwj,k * addk(cvest, cvis) must be positive for the alternative model when stimulus intensities are relatively strong. Thus, the alternative model should not be able to exhibit the diagnostic form of cross-modal suppression in this regime. The simplest possibility is that all weights are negative (wjk ≤0, for all j and k); however, in simulations we have found that this result is robust for a variety of weight structures, as long as weights are negative on balance.
It should be noted that the above argument is true only for strong stimuli. The alternative model can show super-additive cue integration (addk > 0) for weak stimuli, consistent with some multisensory neurons (Stein and Stanford, 2008). Therefore, the diagnostic form of cross-modal suppression may occur in the alternative model for weak stimuli. However, this prediction lies in stark contrast to the prediction of the normalization model (Figure 1B) that cross-modal suppression should occur most clearly for strong stimuli. In addition, MSTd neurons generally show the diagnostic form of cross-modal suppression only for strong stimuli, which is incompatible with the prediction of the alternative model and in line with the divisive normalization model. Lastly, we note that whether the alternative model shows super-additive integration for weak stimuli depends on the form of the static non-linearity, h(x), in the model. The nonlinearity must be expansive for weak stimuli to produce super-additivity in that range of intensities.
This mathematical argument demonstrates that the alternative model should not be able to account for cross-modal suppression for strong stimuli. This argument is borne out by the fits of both models to the neural population data (Figure 7), which show that the alternative model did not produce the diagnostic form of cross-modal suppression.
Population curve-fit analysis
To more directly assess whether data from MSTd are better explained by divisive than subtractive mechanisms, responses of MSTd neurons to congruent and incongruent multisensory cues (e.g., Figures 4A and 4B) were fit by both the normalization model (Equation 6) and the alternative (subtractive) model (Equation 10). The goal of this analysis was to find a set of model parameters that minimize the sum squared error across datasets from all of the recorded MSTd neurons:
(Equation 16) |
Here, βj denotes model parameters for the jth neuron, rj,m denotes the measured response of the jth neuron to the stimulus condition indexed by m, and r (βj, Constj, Stimm) is the corresponding model prediction for the same stimulus condition. Constj represents the neuron-specific constants including the preferred headings and dominance weights (see below). Stimm indicates the stimulus parameters for the mth stimulus condition, including the heading and amplitude of motion.
For both the normalization and alternative models, there are nine model parameters and two tuning functions to be estimated for each MSTd neuron in the population: dvest.j, dvis.j, Svest.j, Svis.j, n1.j, n2.j, basej (Equation 2), Rmax.j, n3.j, αj, and ej (Equations 6 and 10). Among these, the modality dominance weights, dvest.j, dvis.j and the two heading tuning functions, Svestj and Svis.j were estimated directly for each neuron by fitting the visual and vestibular heading tuning profiles (e.g., Figure 2A) with Equation 2. Thus, these four quantities were treated as fixed constants that are unique to each neuron (Constj). The remaining seven parameters, Rmax.j, n1.j, n2.j, n3.j, αj, basej, and ej, were free parameters to be optimized (βj) by fitting the models to MSTd response curves obtained as a function of self-motion amplitude (e.g., Figures 4A and 4B). In total, 119 datasets from 101 neurons (68 multisensory neurons and 33 unisensory neurons) were fit with the models, as some cells contributed more than one dataset resulting from different combinations of visual and vestibular headings. As a result there were 7 (free-parameters) × 119 (datasets) = 833 total parameters that needed to be simultaneously optimized at each step in the fitting process. These 833 parameters were used to fit a total of 3044 data points from all neurons in the population.
The optimization was performed using the MATLAB lsqcurvefit command with the parameter boundary constraint option (all free parameters were bounded between zero and positive infinity). During each optimization step, r (βj, Constj, Stimm) was computed by solving the system of non-linear equations (Equation 6 or 10) using the MATLAB fsolve command. We started the optimization from a case of all parameters initialized to 1.0. The optimization was stopped when the lsqcurvefit optimizer returned the same sum squared error (SSE) for more than twenty optimization steps (tolerance: 0.0001%). To avoid stalling in a local minimum, the optimization was then resumed after adding 0.1 to 1% random noise to all of the free parameters. This noise treatment typically resulted in ∼0.1% increase in SSE. Twenty such noise treatments were independently performed and the optimization proceeded to convergence for each noise treatment (same stopping criteria as above). If no improvement in SSE was seen in these 20 noise treatments (i.e., all resulted in larger SSEs than before the noise treatment), we decided that we had reached an optimal solution. Otherwise, we took the optimization with the lowest resulting SSE, we performed 20 new noise treatments, and we continued the optimization.
Ultimately, five cycles of noise treatments were necessary for the normalization model to converge. In the final cycle, the SSEs after each noise treatment barely decreased with iterations of the optimization, and were never below the lowest SSE obtained in previous cycles. Therefore, we adopted the result from the fourth cycle as the final fit (SSE = 9829.08). For the alternative (subtractive) model, nine cycles of noise treatments were needed to reach the final fit convergence (SSE = 12919.30). The optimization was performed using the MATLAB parallel-computing toolbox installed on three multi-core (8 or 4 cores) computers, running continuously for a total of 26,280 hr.
Partial correlation coefficients between the fits of these two models were computed and z-scored using Fisher's r-to-z transform (Angelaki et al., 2004). The resulting amplitude-response functions for each neuron in each model were fit to a hyperbolic ratio function (Equation 1), and the gain ratios were plotted for each model neuron in Figure 7.
Quantification and Statistical Analysis
Action potentials recorded in the interval from 500-2000ms after stimulus onset were used to compute firing rates because most of the evoked neural activity occurred during this period (Figure S1) (Gu et al., 2006). Neurons were included in our main sample if each distinct stimulus was successfully repeated at least five times in the main experiment (median repetition number: 8). All data analyses and statistical tests were done using custom scripts written in MATLAB.
Analyses of population data were performed using appropriate parametric and non-parametric statistical tests (as described in the main text), including Wilcoxon rank sum tests, Wilcoxon signed rank tests, and ANOVA. Error bars in Figures 2, 3, 4, and 5 denote s.e.m. Exact p values and numbers of neurons for each statistical test are provided in the Results section.
Bootstrap analysis was also used to test whether specific values (e.g., response gain ratios for particular neurons in Figures 4C and 5C) were significantly different from reference values. For assessing significance of gain ratios, bootstrap amplitude-response functions were generated by randomly resampling (with replacement) the error residues and adding them to the fitted curves (Efron and Tibshirani, 1993). The bootstrap amplitude-response functions were curve-fit and the gain parameters were estimated. This was repeated 3000 times to produce bootstrap distributions of each gain ratio, from which the 5th and 95th percentiles were determined. A ratio was considered significantly greater (less) than a particular value (e.g., 0 or 1), if its 5th percentile (95th percentile) was above (below) the value.
A stepwise regression analysis of the population gain-ratio data (Figures 4C, 7A, 7B, S2, and S3) was performed using the MATLAB stepwiselm command, with a 3rd order polynomial as a starting model. Fits of the gain-ratio data that are shown in Figures 4C, 7A, 7B, S2, and S3 correspond to the lowest-order polynomial that was justified by the stepwise regression (with entry and exit p values of 0.001 and 0.01, respectively).
Partial correlation coefficients between model fits and the amplitude-response curves of individual neurons were computed and transformed to z-scores using Fisher's r-to-z transform (Angelaki et al., 2004). This allowed us to define regions in Figure 6C for which one model fit is significantly superior to the other.
No statistical methods were used to pre-determine appropriate sample sizes for the neural recordings, but our sample size is comparable to those generally employed in similar studies in the field. Experimenters were not blind to the purposes of the study, but all data collection was automated by computer. All stimulus conditions in the main experimental test were randomly interleaved.
Supplementary Material
Data File S1. Zip file of program code for model simulations, Related to Figure 1
Key Resources Table.
Reagent or Resource | Source | Identifier |
---|---|---|
Experimental Models: Organisms/Strains | ||
Macaca mulatta | Primate Products | N/A |
Software and Algorithms | ||
MATLAB 2016a statistics and machine learning toolbox | Mathworks | https://www.mathworks.com |
MATLAB 2016a optimization toolbox | Mathworks | https://www.mathworks.com |
MATLAB 2016a parallel-computing toolbox | Mathworks | https://www.mathworks.com |
MATLAB model simulation code | This paper | N/A |
TEMPO Experiment control system | Reflective computing | http://reflectivecomputing.com/ |
Visual stimulus generation code, written in Visual C++ using the OpenGL graphics libraries | This paper; Gu et al. 2006 | N/A |
Other | ||
Tungsten microelectrodes, Epoxylite insulated | Frederic Haer Company | https://www.fh-co.com |
MOOG 6DOF2000E, 6-degree-of-freedom motion platform | Moog, Inc | http://www.moog.com |
Christie Mirage S+3K, digital light projector | Christie Digital | www.christiedigital.com |
Highlights.
Many neurons in macaque area MSTd show a specific form of cross-modal suppression
Cross-modal suppression is predicted by a divisive normalization mechanism
Neurons in area MT, which project to MSTd, do not show cross-modal suppression
Multisensory integration in area MSTd likely involves divisive normalization
Acknowledgments
This work was supported by NIH grant EY016178 (to GCD) and by a CORE grant (EY001319) from the National Eye Institute. DEA was supported by EY022538. TO acknowledges support by JSPS KAKENHI (16K06984) and by a Grant-in-Aid for Scientific Research on Innovative Areas (“Non-linear Neuro-oscillology”). We thank Dina Knoedl and Swati Shimpi for excellent animal care and technical support, and Johnny Wen for programming support.
Footnotes
Author Contributions: TO and GCD designed the experiments; TO collected and analyzed the data and performed the simulations; TO, GCD, and DEA wrote the manuscript and consulted on revisions.
Data and Software Availability: Program code (written in MATLAB) for reproducing the simulation results of Figures 1B and 1C is provided in Data S1. Other custom code for analyses will be provided upon request to the Lead Contact.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Adesnik H, Bruns W, Taniguchi H, Huang ZJ, Scanziani M. A neural circuit for spatial summation in visual cortex. Nature. 2012;490:226–231. doi: 10.1038/nature11526. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alais D, Newell FN, Mamassian P. Multisensory processing in review: from physiology to behaviour. Seeing and perceiving. 2010;23:3–38. doi: 10.1163/187847510X488603. [DOI] [PubMed] [Google Scholar]
- Albrecht DG, Hamilton DB. Striate cortex of monkey and cat: contrast response function. Journal of neurophysiology. 1982;48:217–237. doi: 10.1152/jn.1982.48.1.217. [DOI] [PubMed] [Google Scholar]
- Alvarado JC, Vaughan JW, Stanford TR, Stein BE. Multisensory versus unisensory integration: contrasting modes in the superior colliculus. Journal of neurophysiology. 2007;97:3193–3205. doi: 10.1152/jn.00018.2007. [DOI] [PubMed] [Google Scholar]
- Anderson JS, Carandini M, Ferster D. Orientation tuning of input conductance, excitation, and inhibition in cat primary visual cortex. Journal of neurophysiology. 2000;84:909–926. doi: 10.1152/jn.2000.84.2.909. [DOI] [PubMed] [Google Scholar]
- Angelaki DE, Shaikh AG, Green AM, Dickman JD. Neurons compute internal models of the physical laws of motion. Nature. 2004;430:560–564. doi: 10.1038/nature02754. [DOI] [PubMed] [Google Scholar]
- Avillac M, Ben Hamed S, Duhamel JR. Multisensory Integration in the Ventral Intraparietal Area of the Macaque Monkey. The Journal of Neuroscience. 2007;27:1922–1932. doi: 10.1523/JNEUROSCI.2646-06.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boehm S, Betz H. Somatostatin Inhibits Excitatory Transmission at Rat Hippocampal Synapses via Presynaptic Receptors. The Journal of Neuroscience. 1997;17:4066–4075. doi: 10.1523/JNEUROSCI.17-11-04066.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bonds AB. Role of inhibition in the specification of orientation selectivity of cells in the cat striate cortex. Vis Neurosci. 1989;2:41–55. doi: 10.1017/s0952523800004314. [DOI] [PubMed] [Google Scholar]
- Borg-Graham LJ, Monier C, Fregnac Y. Visual input evokes transient and strong shunting inhibition in visual cortical neurons. Nature. 1998;393:369–373. doi: 10.1038/30735. [DOI] [PubMed] [Google Scholar]
- Born RT, Bradley DC. Structure and function of visual area MT. Annual review of neuroscience. 2005;28:157–189. doi: 10.1146/annurev.neuro.26.041002.131052. [DOI] [PubMed] [Google Scholar]
- Britten KH, Heuer HW. Spatial summation in the receptive fields of MT neurons. J Neurosci. 1999;19:5074–5084. doi: 10.1523/JNEUROSCI.19-12-05074.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Britten KH, Van Wezel RJ. Area MST and heading perception in macaque monkeys. Cerebral cortex (New York, NY : 1991) 2002;12:692–701. doi: 10.1093/cercor/12.7.692. [DOI] [PubMed] [Google Scholar]
- Busse L, Wade AR, Carandini M. Representation of concurrent stimuli by population activity in visual cortex. Neuron. 2009;64:931–942. doi: 10.1016/j.neuron.2009.11.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carandini M, Ferster D. Membrane Potential and Firing Rate in Cat Primary Visual Cortex. The Journal of Neuroscience. 2000;20:470–484. doi: 10.1523/JNEUROSCI.20-01-00470.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carandini M, Heeger DJ. Normalization as a canonical neural computation. Nature reviews Neuroscience. 2012;13:51–62. doi: 10.1038/nrn3136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carandini M, Heeger DJ, Movshon JA. Linearity and normalization in simple cells of the macaque primary visual cortex. J Neurosci. 1997;17:8621–8644. doi: 10.1523/JNEUROSCI.17-21-08621.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cavanaugh JR, Bair W, Movshon JA. Nature and interaction of signals from the receptive field center and surround in macaque V1 neurons. Journal of neurophysiology. 2002;88:2530–2546. doi: 10.1152/jn.00692.2001. [DOI] [PubMed] [Google Scholar]
- Chen A, DeAngelis GC, Angelaki DE. Macaque parieto-insular vestibular cortex: responses to self-motion and optic flow. J Neurosci. 2010;30:3022–3042. doi: 10.1523/JNEUROSCI.4029-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen A, DeAngelis GC, Angelaki DE. A comparison of vestibular spatiotemporal tuning in macaque parietoinsular vestibular cortex, ventral intraparietal area, and medial superior temporal area. J Neurosci. 2011a;31:3082–3094. doi: 10.1523/JNEUROSCI.4476-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen A, DeAngelis GC, Angelaki DE. Convergence of vestibular and visual self-motion signals in an area of the posterior sylvian fissure. J Neurosci. 2011b;31:11617–11627. doi: 10.1523/JNEUROSCI.1266-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen A, DeAngelis GC, Angelaki DE. Representation of vestibular and visual cues to self-motion in ventral intraparietal cortex. J Neurosci. 2011c;31:12036–12052. doi: 10.1523/JNEUROSCI.0395-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen A, Deangelis GC, Angelaki DE. Functional specializations of the ventral intraparietal area for multisensory heading discrimination. J Neurosci. 2013;33:3567–3581. doi: 10.1523/JNEUROSCI.4522-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chowdhury SA, Takahashi K, DeAngelis GC, Angelaki DE. Does the middle temporal area carry vestibular signals related to self-motion? J Neurosci. 2009;29:12020–12030. doi: 10.1523/JNEUROSCI.0004-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cuppini C, Ursino M, Magosso E, Rowland BA, Stein BE. An emergent model of multisensory integration in superior colliculus neurons. Front Integr Neurosci. 2010;4:6. doi: 10.3389/fnint.2010.00006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dayan P, Abbott LF. Theoretical Neuroscience. Cambridge, MA: MIT Press; 2001. [Google Scholar]
- Diehl MM, Romanski LM. Responses of prefrontal multisensory neurons to mismatching faces and vocalizations. J Neurosci. 2014;34:11233–11243. doi: 10.1523/JNEUROSCI.5168-13.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Driver J, Noesselt T. Multisensory Interplay Reveals Crossmodal Influences on Sensory-Specific Brain Regions, Neural Responses, and Judgments. Neuron. 2008;57:11–23. doi: 10.1016/j.neuron.2007.12.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duffy CJ. MST neurons respond to optic flow and translational movement. Journal of neurophysiology. 1998;80:1816–1827. doi: 10.1152/jn.1998.80.4.1816. [DOI] [PubMed] [Google Scholar]
- Efron B, Tibshirani RJ. An introduction to the bootstrap. New York: Chapman & Hall; 1993. [Google Scholar]
- Ernst MO, Bulthoff HH. Merging the senses into a robust percept. Trends in cognitive sciences. 2004;8:162–169. doi: 10.1016/j.tics.2004.02.002. [DOI] [PubMed] [Google Scholar]
- Ferster D. Linearity of synaptic interactions in the assembly of receptive fields in cat visual cortex. Current opinion in neurobiology. 1994;4:563–568. doi: 10.1016/0959-4388(94)90058-2. [DOI] [PubMed] [Google Scholar]
- Fetsch CR, Pouget A, DeAngelis GC, Angelaki DE. Neural correlates of reliability-based cue weighting during multisensory integration. Nature neuroscience. 2012;15:146–154. doi: 10.1038/nn.2983. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fetsch CR, Rajguru SM, Karunaratne A, Gu Y, Angelaki DE, Deangelis GC. Spatiotemporal properties of vestibular responses in area MSTd. Journal of neurophysiology. 2010;104:1506–1522. doi: 10.1152/jn.91247.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Freeman TC, Durand S, Kiper DC, Carandini M. Suppression without inhibition in visual cortex. Neuron. 2002;35:759–771. doi: 10.1016/s0896-6273(02)00819-x. [DOI] [PubMed] [Google Scholar]
- Groh JM. Converting neural signals from place codes to rate codes. Biological cybernetics. 2001;85:159–165. doi: 10.1007/s004220100249. [DOI] [PubMed] [Google Scholar]
- Gu Y, Angelaki DE, Deangelis GC. Neural correlates of multisensory cue integration in macaque MSTd. Nature neuroscience. 2008;11:1201–1210. doi: 10.1038/nn.2191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gu Y, DeAngelis GC, Angelaki DE. A functional link between area MSTd and heading perception based on vestibular signals. Nature neuroscience. 2007;10:1038–1047. doi: 10.1038/nn1935. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gu Y, Deangelis GC, Angelaki DE. Causal links between dorsal medial superior temporal area neurons and multisensory heading perception. J Neurosci. 2012;32:2299–2313. doi: 10.1523/JNEUROSCI.5154-11.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gu Y, Watkins PV, Angelaki DE, DeAngelis GC. Visual and nonvisual contributions to three-dimensional heading selectivity in the medial superior temporal area. J Neurosci. 2006;26:73–85. doi: 10.1523/JNEUROSCI.2356-05.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heeger DJ. Half-squaring in responses of cat striate cells. Vis Neurosci. 1992a;9:427–443. doi: 10.1017/s095252380001124x. [DOI] [PubMed] [Google Scholar]
- Heeger DJ. Normalization of cell responses in cat striate cortex. Vis Neurosci. 1992b;9:181–197. doi: 10.1017/s0952523800009640. [DOI] [PubMed] [Google Scholar]
- Ichida JM, Schwabe L, Bressloff PC, Angelucci A. Response Facilitation From the “Suppressive” Receptive Field Surround of Macaque V1 Neurons. Journal of neurophysiology. 2007;98:2168–2181. doi: 10.1152/jn.00298.2007. [DOI] [PubMed] [Google Scholar]
- Jagadeesh B, Wheat HS, Ferster D. Linearity of summation of synaptic potentials underlying direction selectivity in simple cells of the cat visual cortex. Science. 1993;262:1901–1904. doi: 10.1126/science.8266083. [DOI] [PubMed] [Google Scholar]
- Kadunce DC, Vaughan JW, Wallace MT, Benedek G, Stein BE. Mechanisms of within- and cross-modality suppression in the superior colliculus. Journal of neurophysiology. 1997;78:2834–2847. doi: 10.1152/jn.1997.78.6.2834. [DOI] [PubMed] [Google Scholar]
- Kolster H, Mandeville JB, Arsenault JT, Ekstrom LB, Wald LL, Vanduffel W. Visual field map clusters in macaque extrastriate visual cortex. J Neurosci. 2009;29:7031–7039. doi: 10.1523/JNEUROSCI.0518-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Komatsu H, Wurtz RH. Relation of cortical areas MT and MST to pursuit eye movements. I. Localization and visual properties of neurons. Journal of neurophysiology. 1988;60:580–603. doi: 10.1152/jn.1988.60.2.580. [DOI] [PubMed] [Google Scholar]
- Lee J, Maunsell JH. A normalization model of attentional modulation of single unit responses. PloS one. 2009;4:e4651. doi: 10.1371/journal.pone.0004651. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Louie K, Grattan LE, Glimcher PW. Reward value-based gain control: divisive normalization in parietal cortex. J Neurosci. 2011;31:10627–10639. doi: 10.1523/JNEUROSCI.1237-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maunsell JH, Van Essen DC. The connections of the middle temporal visual area (MT) and their relationship to a cortical hierarchy in the macaque monkey. J Neurosci. 1983;3:2563–2586. doi: 10.1523/JNEUROSCI.03-12-02563.1983. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meredith MA, Nemitz JW, Stein BE. Determinants of multisensory integration in superior colliculus neurons. I. Temporal factors. J Neurosci. 1987;7:3215–3229. doi: 10.1523/JNEUROSCI.07-10-03215.1987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meredith MA, Stein BE. Visual, auditory, and somatosensory convergence on cells in superior colliculus results in multisensory integration. Journal of neurophysiology. 1986;56:640–662. doi: 10.1152/jn.1986.56.3.640. [DOI] [PubMed] [Google Scholar]
- Meredith MA, Stein BE. Spatial determinants of multisensory integration in cat superior colliculus neurons. Journal of neurophysiology. 1996;75:1843–1857. doi: 10.1152/jn.1996.75.5.1843. [DOI] [PubMed] [Google Scholar]
- Miller KD, Troyer TW. Neural Noise Can Explain Expansive, Power-Law Nonlinearities in Neural Response Functions. Journal of neurophysiology. 2002;87:653–659. doi: 10.1152/jn.00425.2001. [DOI] [PubMed] [Google Scholar]
- Morgan ML, Deangelis GC, Angelaki DE. Multisensory integration in macaque visual cortex depends on cue reliability. Neuron. 2008;59:662–673. doi: 10.1016/j.neuron.2008.06.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nelissen K, Vanduffel W, Orban GA. Charting the lower superior temporal region, a new motion-sensitive region in monkey superior temporal sulcus. J Neurosci. 2006;26:5929–5947. doi: 10.1523/JNEUROSCI.0824-06.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ohshiro T, Angelaki DE, DeAngelis GC. A normalization model of multisensory integration. Nature neuroscience. 2011:775–782. doi: 10.1038/nn.2815. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Olsen SR, Bhandawat V, Wilson RI. Divisive normalization in olfactory population codes. Neuron. 2010;66:287–299. doi: 10.1016/j.neuron.2010.04.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ozeki H, Finn IM, Schaffer ES, Miller KD, Ferster D. Inhibitory stabilization of the cortical network underlies visual surround suppression. Neuron. 2009;62:578–592. doi: 10.1016/j.neuron.2009.03.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Perrault TJ, Jr, Vaughan JW, Stein BE, Wallace MT. Neuron-specific response characteristics predict the magnitude of multisensory integration. Journal of neurophysiology. 2003;90:4022–4026. doi: 10.1152/jn.00494.2003. [DOI] [PubMed] [Google Scholar]
- Perrault TJ, Jr, Vaughan JW, Stein BE, Wallace MT. Superior colliculus neurons use distinct operational modes in the integration of multisensory stimuli. Journal of neurophysiology. 2005;93:2575–2586. doi: 10.1152/jn.00926.2004. [DOI] [PubMed] [Google Scholar]
- Polat U, Mizobe K, Pettet MW, Kasamatsu T, Norcia AM. Collinear stimuli regulate visual responses depending on cell's contrast threshold. Nature. 1998;391:580–584. doi: 10.1038/35372. [DOI] [PubMed] [Google Scholar]
- Priebe NJ, Ferster D. Inhibition, spike threshold, and stimulus selectivity in primary visual cortex. Neuron. 2008;57:482–497. doi: 10.1016/j.neuron.2008.02.005. [DOI] [PubMed] [Google Scholar]
- Raposo D, Sheppard JP, Schrater PR, Churchland AK. Multisensory decision-making in rats and humans. J Neurosci. 2012;32:3726–3735. doi: 10.1523/JNEUROSCI.4998-11.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Recanzone GH, Wurtz RH, Schwarz U. Responses of MT and MST neurons to one and two moving objects in the receptive field. Journal of neurophysiology. 1997;78:2904–2915. doi: 10.1152/jn.1997.78.6.2904. [DOI] [PubMed] [Google Scholar]
- Reynolds JH, Heeger DJ. The normalization model of attention. Neuron. 2009;61:168–185. doi: 10.1016/j.neuron.2009.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rubin Daniel B, Van Hooser Stephen D, Miller Kenneth D. The Stabilized Supralinear Network: A Unifying Circuit Motif Underlying Multi-Input Integration in Sensory Cortex. Neuron. 2015;85:402–417. doi: 10.1016/j.neuron.2014.12.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sato TK, Haider B, Hausser M, Carandini M. An excitatory basis for divisive normalization in visual cortex. Nature neuroscience. 2016;19:568–570. doi: 10.1038/nn.4249. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schwabe L, Obermayer K, Angelucci A, Bressloff PC. The Role of Feedback in Shaping the Extra-Classical Receptive Field of Cortical Neurons: A Recurrent Network Model. The Journal of Neuroscience. 2006;26:9117–9129. doi: 10.1523/JNEUROSCI.1253-06.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skaliora I, Doubell TP, Holmes NP, Nodal FR, King AJ. Functional topography of converging visual and auditory inputs to neurons in the rat superior colliculus. Journal of neurophysiology. 2004;92:2933–2946. doi: 10.1152/jn.00450.2004. [DOI] [PubMed] [Google Scholar]
- Smith AT, Wall MB, Thilo KV. Vestibular inputs to human motion-sensitive visual cortex. Cerebral cortex (New York, NY : 1991) 2012;22:1068–1077. doi: 10.1093/cercor/bhr179. [DOI] [PubMed] [Google Scholar]
- Spence C. Crossmodal spatial attention. Annals of the New York Academy of Sciences. 2010;1191:182–200. doi: 10.1111/j.1749-6632.2010.05440.x. [DOI] [PubMed] [Google Scholar]
- Spence C. Crossmodal correspondences: a tutorial review. Attention, perception & psychophysics. 2011;73:971–995. doi: 10.3758/s13414-010-0073-7. [DOI] [PubMed] [Google Scholar]
- Stanford TR, Quessy S, Stein BE. Evaluating the operations underlying multisensory integration in the cat superior colliculus. J Neurosci. 2005;25:6499–6508. doi: 10.1523/JNEUROSCI.5095-04.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stein BE, Stanford TR. Multisensory integration: current issues from the perspective of the single neuron. Nature reviews Neuroscience. 2008;9:255–266. doi: 10.1038/nrn2331. [DOI] [PubMed] [Google Scholar]
- Sugihara T, Diltz MD, Averbeck BB, Romanski LM. Integration of auditory and visual communication information in the primate ventrolateral prefrontal cortex. J Neurosci. 2006;26:11138–11147. doi: 10.1523/JNEUROSCI.3550-06.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ungerleider LG, Desimone R. Cortical connections of visual area MT in the macaque. The Journal of comparative neurology. 1986;248:190–222. doi: 10.1002/cne.902480204. [DOI] [PubMed] [Google Scholar]
- Ursino M, Cuppini C, Magosso E, Serino A, di Pellegrino G. Multisensory integration in the superior colliculus: a neural network model. J Comput Neurosci. 2009;26:55–73. doi: 10.1007/s10827-008-0096-4. [DOI] [PubMed] [Google Scholar]
- Van Essen DC, Drury HA, Dickson J, Harwell J, Hanlon D, Anderson CH. An integrated software suite for surface-based analyses of cerebral cortex. Journal of the American Medical Informatics Association : JAMIA. 2001;8:443–459. doi: 10.1136/jamia.2001.0080443. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wallace MT, Wilkinson LK, Stein BE. Representation and integration of multiple sensory inputs in primate superior colliculus. Journal of neurophysiology. 1996;76:1246–1266. doi: 10.1152/jn.1996.76.2.1246. [DOI] [PubMed] [Google Scholar]
- Zoccolan D, Cox DD, DiCarlo JJ. Multiple object response normalization in monkey inferotemporal cortex. J Neurosci. 2005;25:8150–8164. doi: 10.1523/JNEUROSCI.2058-05.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data File S1. Zip file of program code for model simulations, Related to Figure 1