Abstract
To increase computational flexibility, sensory processing changes with behavioral context. In the visual system, active behavioral states characterized by motor activity and pupil dilation [1, 2] enhance sensory responses but typically leave the preferred stimuli of neurons unchanged [2–9]. Here we find that behavioral state also modulates stimulus selectivity in mouse visual cortex in the context of colored natural scenes. Using population imaging in behaving mice, pharmacology, and deep neural network modeling, we identified a rapid shift of color selectivity towards ultraviolet stimuli during an active behavioral state. This was exclusively caused by pupil dilation, resulting in a dynamic switch from rod to cone photoreceptors, thereby extending their role beyond night and day vision. The change in tuning facilitated the decoding of ethological stimuli, such as aerial predators against the twilight sky [10]. In contrast to previous studies that have used pupil dilation as an indirect measure of brain state, our results suggest that state-dependent pupil dilation itself differentially recruits rods and cones on fast timescales to tune visual representations to behavioral demands.
Introduction
Neural responses are modulated by the animal’s behavioral and internal state, to flexibly adjust information processing to different behavioral contexts. This phenomenon is well-described across animal species, from invertebrates [11, 12] to primates [4, 9]. In the mammalian visual cortex, neural activity is desynchronized and sensory responses are enhanced during an active behavioral state [1–3, 5, 7, 8], characterized by pupil dilation [1] and locomotion activity [2]. Mechanistically, these effects have been linked to neuromodulators like actelycholine and norepinephrine [reviewed in 13, 14]. Other than changes in response gain, the tuning of visual neurons, such as orientation selectivity, typically does not change across quiet and active states [2, 5, 9]. So far, however, this has largely been studied in non-ecological settings using simple synthetic stimuli.
In this work, we study how behavioral state modulates cortical visual tuning in mice in the context of naturalistic scenes. Critically, these scenes include the color domain of the visual input due to its ethological relevance across species [reviewed in 15]. Mice, like most mammals, are dichromatic and have two types of cone photoreceptors, expressing ultraviolet (UV)- and green-sensitive S- and M-opsin [16]. These UV- and green-sensitive cone photoreceptors predominantly sample upper and lower visual field, respectively, through an uneven opsin distribution across the retina [16, 17].
To systematically study the relationship between neural tuning and behavioral state in the context of naturalistic scenes, we combined in-vivo population calcium imaging of primary visual cortex (V1) in awake, head-fixed mice with deep convolutional neural network (CNN) modeling. We extended a recently described model [18, 19] to predict neural responses jointly based on visual input and the animal’s behavior. This enabled us to characterize the relationship between neural tuning and behavior in extensive in-silico experiments without the need to control the behavior experimentally. Finally, we experimentally confirmed the in-silico model predictions in-vivo [18, 20].
Using this approach, we find that color tuning of mouse V1 neurons rapidly shifts towards higher UV sensitivity during an active behavioral state. By pharmacologically manipulating the pupil, we show that this is solely caused by pupil dilation. Dilation during active behavioral states increases the amount of light entering the eye sufficiently to cause a dynamic switch between rod- and cone-dominated vision – even for constant ambient light levels. Finally, we show that the increased UV sensitivity during active periods may tune the mouse visual system to improved detection of predators against the UV background of the sky. Our results identify a novel functional role of state-dependent pupil dilation: To rapidly tune visual feature representations to changing behavioral requirements in a bottom-up manner.
Results
CNNs identify optimal colored stimuli
Here, we study the relationship of neural tuning in mouse V1 and the animal’s behavior, specifically focusing on color processing because of its behavioral relevance [reviewed in 15]. We presented colored naturalistic images (Extended Data Fig. 1) to awake, head-fixed mice positioned on a treadmill (Fig. 1a), while recording the calcium activity of L2/3 neurons in V1 using two-photon imaging (Fig. 1c,d). We simultaneously recorded locomotion activity, pupil size, as well as instantaneous change of pupil size, which all have been associated with distinct behavioral states [1, 2]. Visual stimuli were presented using a projector with UV and green LEDs [Fig. 1b; 21], allowing us to differentially activate UV- and green-sensitive mouse photoreceptors. We recorded neural responses along the posterior-anterior axis of V1 (Fig. 1c), sampling from various elevations across the visual field. This choice was motivated by the gradient of spectral sensitivity of mouse cone photoreceptors across the retina [16, 17].
We used a deep convolutional neural network to learn an in-silico model of the recorded neuron population as a function of the visual input and the animal’s behavior [Fig. 1e; 18]. The CNN had the following input channels: (i) UV and green channels of the visual stimulus, (ii) three channels set to the recorded behavioral parameters (i.e. changes in pupil size and locomotion), and (iii) two channels that were shared across all inputs encoding the x and y pixel positions of the stimulus image. The latter was previously shown to improve CNN model performance in cases where feature representations depend on image position [22], similar to the gradient in mouse color sensitivity across visual space. Our neural predictive models also included a shifter network [18] that spatially shifted the model neurons’ receptive fields according to the recorded pupil position traces. For each dataset, we trained an ensemble of 4-layer CNN models end-to-end [19] to predict the neuronal responses to individual images and behavioral parameters. The prediction performance of the resulting ensemble model (Extended Data Fig. 2) was comparable to state-of-the-art predictive models of mouse V1 [19].
Using our CNN ensemble model as a ”digital twin” of the visual cortex, we synthesized maximally exciting inputs (MEIs) for individual neurons (Fig. 1f, Extended Data Fig. 3a). To this end, we optimized the UV and green color channels of a contrast-constrained image to produce the highest activation in the given model neuron using regularized gradient ascent [18, 20]. For the vast majority of neurons, MEI color channels were positively correlated, suggesting that color-opponency is rare given our stimulus paradigm (Extended Data Figs. 3, 4). We confirmed that the computed MEIs indeed strongly drive the recorded neurons by performing inception loop experiments [18]. For that, we randomly selected MEIs of 150 neurons above a response reliability threshold for presentation on the next day (Fig. 1g). For most neurons, the MEIs were indeed the most exciting stimuli: >65% of neurons exhibited the strongest activation to their own MEI while showing little activity to the MEIs of other neurons (Fig. 1h). Together, this demonstrates that our modelling approach accurately captures tuning properties of mouse V1 neurons in the context of colored naturalistic scenes.
V1 color tuning changes with behavior
To study how cortical color tuning changes with behavioral state, we performed detailed in-silico characterizations using the trained CNN model described above. For that, we focused on two well-described and spontaneously occurring behavioral states [1, 2]: A quiet state with no locomotion and a small pupil (3rd percentile of locomotion and pupil size across all trials) and an active state indicated by locomotion and a larger pupil (97th percentile). For each neuron and distinct behavioral state, we optimized an MEI and then generated a color tuning curve by predicting the neuron’s activity to varying color contrasts of this MEI (Fig. 2a, Extended Data Fig. 5).
For both behavioral states, we found that the neurons’ optimal spectral contrast systematically varied along the anterior-posterior axis of V1 (Fig. 2b; for statistics, see figure legends and Suppl. Statistical Analysis): The UV-sensitivity significantly increased from anterior to posterior V1, in line with the distribution of cone opsins across the retina [16, 17], and previous work from V1 [23] and dLGN [24]. Nevertheless, for quiet behavioral periods, nearly all neurons preferred a green-biased stimulus (Fig. 2b, left) – even the ones positioned in posterior V1 receiving input from the ventral retina, where cones are largely UV-sensitive [17]. This distribution of V1 color preferences indicates that visual responses during the quiet state are largely driven by rod photoreceptors which are green-sensitive [25].
In contrast, during active periods we found that the neurons’ color tuning systematically shifted towards higher UV-sensitivity (Fig. 2b-d). This was accompanied by an overall increase in neuronal activation predicted by the model (Fig. 2c, Extended Data Fig. 6a,d), in line with previous results [2, 5]. The shift in color selectivity was observed across animals for both posterior and anterior V1 (Fig. 2e). As a result, neurons in posterior V1 exhibited UV-biased MEIs, while neurons in anterior V1 largely maintained their preference for green-biased stimuli, consistent with a cortical distribution of color tuning expected from a shift from rod- to cone-dominated visual responses [25]. Notably, the spatial structure of the MEIs was largely unchanged across behavioral states (Fig. 2c and Extended Data Fig. 5).
We found that the shift in color selectivity with behavioral state was fast, operating on the timescale of seconds (Fig. 2f). To test the temporal dynamics of the shift in tuning, we identified state changes from quiet to active periods by detecting rapid increases in pupil size after a prolonged quiet period. Then, we sampled active trials within different time bins after the state change, trained CNN models on this sub-selection of active trials and all quiet trials and optimized MEIs as described above. The shift in color selectivity with behavioral state was evident for a 10 second read-out window for all animals tested. Notably, for the majority of animals (n=4/6), the shift was already present when training a model based on active trials sampling just one second following the state change.
We confirmed the above prediction from our in-silico analysis that mouse V1 color tuning rapidly shifts towards higher UV-sensitivity during active periods using a well established sparse noise paradigm for mapping the receptive fields of visual neurons (Extended Data Fig. 7a). Trials were separated into quiet (<50th percentile) and active periods (>75th percentile) using the simultaneously recorded pupil size trace. For each neuron and behavioral state, we estimated a “spike”-triggered average (STA) representing the neuron’s preferred stimulus in the context of the sparse noise input (Extended Data Fig. 7b). Consistent with the in-silico analysis, we observed (i) that most V1 neurons preferred a green-biased stimulus during the quiet behavioral state (Extended Data Fig. 7c) and that (ii) neurons in posterior and medial V1 showed increased UV-sensitivity during active periods (Extended Data Fig. 7c,d). The UV-shift was also present in anterior V1, however, only for more extreme pupil size thresholds (20th and 85th percentile; Extended Data Fig. 7e). Finally, we confirmed that V1 color preference is shifted within a few seconds after onset of an active behavioral state (Extended Data Fig. 7e). Together, these results confirm the CNN model’s prediction that mouse V1 color tuning rapidly changes with behavioral state, particularly for neurons sampling the upper visual field.
Pupil dilation shifts neural tuning
Next, we investigated the mechanism underlying the observed behavior-related changes in color tuning of mouse V1 neurons. On the one hand, the animal’s behavioral state affects neural activity through neuromodulation acting at multiple stages of the visual system [6, 8, 26–28]. On the other hand, state-dependent pupil dilation results in higher light intensities at the level of the retina which might also affect visual processing [e.g. 29, 30].
To experimentally test the relative contribution of these two mechanisms, we dissociated state-dependent neuromodulatory effects from changes in pupil size by pharmacologically dilating and constricting the pupil with atropine and carbachol eye drops (Fig. 3a,f), respectively. We recorded visual responses to naturalistic scenes during control and pharmacology conditions and trained separate CNN models (Extended Data Fig. 2c).
Pupil dilation with atropine eye drops was sufficient to shift the neurons’ color tuning towards higher UV-sensitivity, while locomotion activity was not necessary: During a quiet state with no locomotion, MEI color tuning systematically shifted towards higher UV-sensitivity for the dilated pupil compared to the control condition (Fig. 3b-d). We confirmed the role of pupil size in modulating color tuning of mouse V1 neurons by also recording visual responses to the sparse noise stimulus after dilating the pupil with atropine (Extended Data Fig. 8).
To test whether pupil dilation is necessary for the behavioral shift of color tuning, we next dissociated pupil dilation from neuromodulation during active periods by temporarily constricting the pupil with carbachol eye drops (Extended Data Fig. 2f). We found that the gain increase of neural responses with locomotion persisted under these pharmacological manipulations of the pupil [Fig. 3e,j 6, 26, 28], suggesting that this well-known effect of neuromodulation was unaffected. For quiet periods, pupil constriction resulted in a systematic shift towards higher green-sensitivity compared to the control condition (Fig. 3g,h). Importantly, we did not observe a significant shift towards higher UV sensitivity during active periods for the constricted condition, while the shift was evident in the control condition (Fig. 3i). This suggests that neuromodulation or other internal state-dependent mechanisms during active behavioral periods are not sufficient to drive the shift in color tuning with behavior, while state-dependent pupil dilation is necessary for the effect.
Tuning shift is caused by photoreceptors
Previous studies have shown that in mice, pupil size regulates retinal illuminance levels by more than one order of magnitude [e.g. 31]. This affects the relative activation levels of the green-sensitive rods and UV- and green-sensitive cones, thereby changing cortical color preferences in anaesthetized mice [25]. To test whether our data could be explained by a shift from rod to cone photoreceptors during active behavioral periods due to a larger pupil (Fig. 4a), we estimated activation levels of mouse photoreceptors as a function of pupil size [10]. For our experiments, we observed up to a 10-fold increase in pupil area and an equal increase in estimated photoisomerization rate for an active compared to a quiet behavioral state (Fig. 4a, bottom). Therefore, the change in retinal light level due to pupil dilation during an active state is likely sufficient to dynamically shift the mouse visual system from a rod- to cone-dominated operating regime.
If this were true, we would expect that the shift in color selectivity can be reproduced for constant pupil sizes by changing ambient light levels. We confirmed this prediction experimentally by reducing the light intensity of the visual stimulus by 1.5 orders of magnitude, while keeping the pupil size constant across recordings via pharmacological dilation with atropine (Fig. 4b). The low light intensity condition is expected to predominantly activate rod photoreceptors which are green sensitive. Indeed, V1 neurons exhibited more green-biased MEIs for the low compared to the high light condition. Together with our pupil dilation and constriction experiments, this strongly suggests that pupil dilation during active states results in a dynamic shift from rod- to cone-driven visual responses and a corresponding shift of spectral sensitivity.
Tuning shift affects population decoding
Next, we tested whether the shift of color tuning during an active state might increase visual performance at the level of large neuronal populations in response to naturalistic stimuli. First, we applied an image reconstruction paradigm with a contrast constraint on the image [32] using the trained CNN model described above (Extended Data Fig. 9a). Stimulus reconstruction from neural activity has previously been used to infer the most relevant visual features encoded by the neuron population [33], like the neurons’ color sensitivity. We found that most reconstructed images for a quiet behavioral state exhibited higher contrast in the green channel, while the contrast was shifted towards the UV channel during active states (Extended Data Fig. 9b,c). This suggests that the increase in UV-sensitivity during active periods observed at the single cell level might contribute to specific visual tasks like stimulus discrimination performed by populations of neurons in mouse V1.
We experimentally confirmed this prediction by showing that the neural decoding of UV objects selectively improves during active periods. For that, we modified a recent object decoding paradigm [34]: Mice passively viewed movie clips with two different objects presented in either UV or green (Fig. 5b), while we recorded the population calcium activity in posterior V1 as described above. We estimated the discriminability of object identity of UV and green objects from the recorded neural responses using a non-linear support vector machine decoder (SVM; Fig. 5a). Consistent with previous reports [1, 35, 36], we found that decoding discriminability was higher during active compared to quiet behavioral periods (Fig. 5c). However, the increase in decoding discriminability of UV objects was larger than for green objects, in agreement with an increased UV-sensitivity during active behavioral periods. This result was statistically significant compared to the result of a permutation test that shuffled quiet and active trials. The selective increase in decoding discriminability of UV objects was also present for some recordings with modified stimuli, such as with lower object contrast or different object polarity (Extended Data Fig. 10).
What might be the behavioral relevance of this increased UV-sensitivity during an active state for mice? It has recently been shown that during dusk and dawn, aerial predators in the natural environment of mice are much more visible in the UV than the green wavelength range (Fig. 5d; [10]). Therefore, an increase in UV-sensitivity of mouse visual neurons for an alert behavioral state might facilitate the detection of predators visible as dark silhouettes in the sky. To investigate this hypothesis on the neural population level, we presented passively viewing mice with parametric stimuli inspired by these natural scenes, that either contained only noise or an additional dark object in the green or UV image channel (Fig. 5e). This revealed that decoding detection of the behaviorally relevant stimulus – corresponding to the dark object being presented in the UV channel – substantially increased for an active behavioral state. Decoding detection of the green objects did also increase significantly, but not to a similar extent (Fig. 5f). This suggests that, on the population level, the shift towards higher UV-sensitivity might be behaviorally relevant, as it selectively improves the decoding detection of dark objects in the UV channel, analogous to a predatory bird flying in a UV-bright sky.
Discussion
Our work identifies a novel mechanism, by which state-dependent pupil dilation dynamically tunes the feature selectivity of the mouse visual system to behaviorally relevant stimuli.
The fact that sensory responses are modulated by the animal’s motor activity and internal state was first demonstrated by elegant studies on invertebrates many decades ago [11, 37]. Since then, modulation of sensory responses as a function of behavioral and internal state, such as attention, has been described in many animals [e.g. 2, 4, 38, 39]. Across animal species, state-dependent modulation predominantly affects neural responsiveness [2, 9, 27, 28], resulting in better behavioral performance [7, 35, 36, 40]. In a few cases, however, the tuning properties of sensory circuits are also affected by this modulation. In the visual system, this has been reported, for instance, for temporal tuning in Drosophila [12], rabbits [39], and mice [41], as well as for direction selectivity in primates [4]. In these cases, the visual system might bias processing towards visual features relevant for current behavioral goals, such as higher temporal frequencies during walking, running, and flying periods.
Here, we report a shift of neural tuning with behavioral state in mice, focusing on the color domain which has been rarely studied in the context of behavioral modulation. Our results suggest that the shift towards higher UV-sensitivity during active behavioral periods may help support ethological tasks, like the detection of predators in the sky. In particular, UV vision has been implicated in predator and prey detection in a number of animal species as an adaptation to living in different natural environments [reviewed in 42]. This is related to stronger scattering of short wavelength light as well as ozone absorption [43] in the sky, facilitating the detection of objects as dark silhouettes against a UV-bright background [10]. However, it will be important to directly test the behavioral relevance of the described shift in color tuning during an active state for mouse predator detection. For example, combining an overhead detection task of a looming stimulus presented in UV or green [44] with pharmacological pupil manipulations or careful tracking of pupil dynamics [45] will reveal whether pupil dilation indeed results in better behavioral detection of UV-stimuli, as suggested by our results.
Mechanistically, state-dependent modulation of visual responses has been linked to neuromodulators like acetylcholine and norepinephrine [reviewed in 13, 14], released with active behavioral states and alert internal states. Our results demonstrate that in addition to internal brain state mechanisms, dynamic changes in pupil size are both sufficient and necessary to affect cortical tuning (see also Suppl. Discussion). We propose that this mechanism changes color sensitivity by differential rod versus cone activation – reminiscent of the Purkinje shift described in humans [46], although acting on faster timescales. A recent neurophysiological study on anaesthetized mice demonstrates that pharmacological pupil dilation at constant ambient light levels is sufficient to induce a shift from rod- to cone-driven visual responses in V1 [25]. Our data indicates that a switch between the rod and cone system can also happen dynamically at the time of scale of seconds in behaving mice as a consequence of changes in pupil size across distinct behavioral states. As rod and cone photoreceptors differ with respect to spatial distribution, temporal resolution, and degree of non-linearity [discussed in 47], dynamically adjusting their relative activation will likely influence the sensory representation of the visual scene far beyond the color domain of the visual input.
Changes in pupil size driven by the animal’s behavioral and internal state are a common feature shared across most vertebrate species studied so far [reviewed in 48], including amphibians, birds and mammals (see also Suppl. Discussion). Interestingly, pupil dilation is likely under voluntary control for some animals such as birds and reptiles [discussed in 49], and potentially even for some humans [50]. We propose that state-dependent pupil size changes might act as a general mechanism across species to rapidly switch between the rod- and cone-driven operating regime, thereby tuning the visual system to different features – as suggested here for predator detection in mice during dusk and dawn. To our knowledge, our findings provide the first functional explanation to the long standing debate why pupil size is modulated with internal and behavioral state.
Materials and Methods
Neurophysiological experiments
All procedures were approved by the Institutional Animal Care and Use Committee of Baylor College of Medicine. Owing to the explanatory nature of our study, we did not use randomization and blinding. No statistical methods were used to predetermine sample size.
Mice of either sex (Mus musculus, n=13; 6 weeks to five months of age) expressing GCaMP6s in excitatory neurons via Slc17a7-Cre and Ai162 transgenic lines (stock number 023527 and 031562, respectively; The Jackson Laboratory) were anesthetized and a 4 mm craniotomy was made over the visual cortex of the right hemisphere as described previously [1, 51]. For functional recordings, awake mice were head-mounted above a cylindrical treadmill and calcium imaging was performed using a Ti-Sapphire laser tuned to 920 nm and a two-photon microscope equipped with resonant scanners (Thorlabs) and a 25x objective (MRD77220, Nikon). Laser power after the objective was kept below 60mW. The rostro-caudal treadmill movement was measured using a rotary optical encoder with a resolution of 8,000 pulses per revolution. We used light diffusing from the laser through the pupil to capture eye movements and pupil size. Images of the pupil were reflected through a hot mirror and captured with a GigE CMOS camera (Genie Nano C1920M; Teledyne Dalsa) at 20 fps at a 1,920 × 1,200 pixel resolution. The contour of the pupil for each frame was extracted using DeepLabCut [52] and the center and major radius of a fitted ellipse were used as the position and dilation of the pupil.
For image acquisition, we used ScanImage. To identify V1 boundaries, we used pixelwise responses to drifting bar stimuli of a 2,400 × 2,400 μm scan at 200 μm depth from cortical surface [53], recorded using a large field of view mesoscope [54] not used for other functional recordings. In V1, imaging was performed using 512 × 512 pixel scans (650 × 650 μm) recorded at approx. 15 Hz and positioned within L2/3 at around 200 μm from the surface of the cortex. Imaging data were motion-corrected, automatically segmented and deconvolved using the CNMF algorithm [55]; cells were further selected by a classifier trained to detect somata based on the segmented masks. In addition, we excluded cells with low stimulus correlation. For that, we computed the first principal component (PC) of the response matrix of size number of neurons x number of trials and then for each neuron estimated the linear correlation of its responses to the first PC, as the first PC captured unrelated background activity. We excluded neurons with a correlation lower or higher than −0.25 or 0.25, respectively. This resulted in 450–1,100 selected soma masks per scan depending on response quality and blood vessel pattern. A structural stack encompassing the scan plane and imaged at 1.6 × 1.6 × 1 μm xyz resolution with 20 repeats per plane was used to register functional scans into a shared xyz frame of reference. Cells registered to the same three-dimensional stack were then matched for distances of <10 μm. For inception loop experiments, we confirmed the anatomical matching with a functional matching procedure, using the cells’ responses to the same set of test images (see also [18]) and only included matched neurons with a response correlation of >0.5 for further analysis. To bring different recordings across the posterior-anterior axis of V1 into the same frame of reference, we manually aligned the mean image of each functional recording to the mean image of the 2,400 × 2,400 μm scan acquired at the mesoscope (see above) using the blood vessel pattern. Then, each cell within the functional scan was assigned a new xy coordinate (in μm) in the common frame of reference. To illustrate coarse differences across visual space, scan fields were manually assigned into three broad location categories within V1 (posterior, medial and anterior) depending on their position relative to V1 boundaries.
Visual stimulation
Visual stimuli were presented to the left eye of the mouse on a 42 × 26 cm light-transmitting teflon screen (McMaster-Carr) positioned 12 cm from the animal, covering approx. 120 × 90 degree visual angle. Light was back-projected onto the screen by a DLP-based projector [EKB Technologies Ltd; 21] with UV (395 nm) and green (460 nm) LEDs that differentially activated mouse S- and M-opsin. LEDs were synchronized with the microscope’s scan retrace. Please note that the UV LED not only drives UV-sensitive S-opsin, but also slightly activates green-sensitive M-opsin and Rhodopsin, due to their sensitivity tail for shorter wavelengths (beta-band). This cross-activation could be addressed by using a silent substitution protocol, where one type of photoreceptor is selectively stimulated by presenting a steady excitation to all other photoreceptor types using a counteracting stimulus. However, this comes at a cost of overall contrast. We believe that our ‘imperfect’ spectral separation of cone types is suitable to investigate most questions concerning chromatic processing in the visual system [discussed in 21] — especially as there rarely is photoreceptor type-isolating stimulation in natural scenes.
Light intensity (as estimated photoisomerization rate, P* per second per cone) was calibrated using a spectrometer (USB2000+, Ocean Optics) to result in equal activation rates for mouse M- and S-opsin [for details see 21]. In brief, the spectrometer output was divided by the integration time to obtain counts/s and then converted into electrical power (in nW) using the calibration data (in μJ/count) provided by Ocean Optics. The intensity (in μW) of the whole screen (255 pixel values) was approx. 1.28 and 1.39 for green and UV LED, respectively. To obtain the estimated photoisomerization rate per photoreceptor type, we first converted electrical power into energy flux (in eV/s) and then calculated the photon flux (in photons/s) using the photon energy (in eV). The photon flux density (in photons/s/μm2) was then computed and converted into photoisomerization rate using the effective activation of mouse cone photoreceptors by the LEDs and the light collection area of cone outer segments. In addition, we considered both the wavelength-specific transmission of the mouse optical apparatus [56] and the ratio between pupil size and retinal area [57]. Please see the calibration iPython notebook provided online for further details. For a pupil area of 0.2 mm2 during quiet trials and maximal stimulus intensities (255 pixel values), this resulted in 400 P* per second and cone type corresponding to the mesopic range. During active periods, the pupil area increased to 1.9 mm2 resulting in 4,000 P* per second and cone type corresponding to the low photopic regime.
Prior to functional recordings, the screen was positioned such that the population receptive field across all neurons, estimated using an achromatic sparse noise paradigm, was within the center of the screen. Screen position was fixed and kept constant across recordings of the same neurons. We used Psychtoolbox in Matlab for stimulus presentation and showed the following light stimuli:
Natural images:
We presented naturalistic scenes from an available online database [58]. We selected images based on two criteria (see also Extended Data Fig. 1). First, to avoid an intensity bias in the stimulus, we selected images with no significant difference in mean intensity of blue and green image channel across all images. Second, we selected images with high pixelwise mean squared error (MSE >85) across color channels to increase chromatic contrast, resulting in lower pixel-wise correlation across color channels compared to a random selection. Then, we presented the blue and green image channel using the UV and green LED of the projector, respectively. For a single scan, we presented 4,500 unique colored and 750 monochromatic images in UV and green, respectively. We added monochromatic images to the stimulus to include images without correlations across color channels, thereby diversifying the input to the model. As test set, we used 100 colored and 2 × 25 monochromatic images that were repeated 10 times uniformly spread throughout the recording. Each image was presented for 500 ms, followed by a gray screen (UV and green LED at 127 pixel value) for 300 to 500 ms, sampled uniformly from that range. The mean intensity of presented natural images across green and UV color channel varied between 5 and 204 (8-bit, gamma corrected). For a small pupil during quiet states, this corresponds to approx. 8 and 320 photoisomerizations (R*) per cone and second (R*/cone x s). Each natural image was preceded by a gray blank period (all pixel values set to 127), which reduced the range of monitor intensities to approx. 57.2 to 213 R*/cone x s when integrating over one second, spanning less than one order of magnitude. For the low photopic light intensities we are using, previous studies have found that pupil size is relatively constant for changes in ambient light intensities below one order of magnitude [31, 59]. Indeed, we found that ambient monitor intensity does not contribute strongly to the recorded changes in pupil size (see Extended Data Fig. 1).
Sparse noise:
To map the receptive fields of V1 neurons, we used a sparse noise paradigm. UV and green bright (pixel value 255) and dark (pixel value 0) dots of approx. 10° visual angle were presented on a gray background (pixel value 127) in randomized order. Dots were presented for 8 and 5 positions along the horizontal and vertical axis of the screen, respectively, excluding screen margins. Each presentation lasted 200 ms and each condition (e.g. UV bright dot at position x=1 and y=1) was repeated 50 times. For a subset of recordings (n=2 animals, n=3 scan fields; cf. Fif. ??), each condition was repeated 150 times to increase the number of trials for more extreme behavioral states.
Full-field binary white noise:
We used a binary full-field binary white noise stimulus of UV and green LED to estimate temporal kernels of V1 neurons. For that, the intensity of UV and green LED was determined independently by a balanced 15-minute random sequence updated at 10 Hz. A similar stimulus was recently used in recordings of mouse [60] and zebrafish retina [61].
Colored objects:
To test for object discrimination, we used two synthesized objects rendered in Blender (www.blender.org) as described recently [34]. In brief, we smoothly varied object position, size, tilt, and axial rotation. For bright objects, we also varied either the location or energy of 4 light sources. Stimuli were either rendered as bright objects on a black screen and Gaussian noise in the other color channel (condition 1), bright and dark objects on a gray screen and Gaussian noise in the other color channel (condition 2 and 3) or as bright objects on a black screen without Gaussian noise (condition 4). Per object and condition, we rendered movies of 875 seconds, which we then divided into 175 5-second clips. We presented the clips with different conditions and objects in random order.
Images with dark objects:
For the object detection task, we generated images with independent Perlin noise [62] in each color channel using the perlin-noise package for Python. Then, for all images except the noise images, we added a dark ellipse (pixel value 0) of varying size, position, and angle to one of the color channels. We adjusted the contrast of all images with a dark object to match the contrast of noise images, such that the distribution of image contrasts did not differ between noise and object images. We presented 2,000 unique noise images and 2,000 unique images with a dark object in the UV and green image channel, respectively. Each image was presented for 500 ms, followed by a gray screen (UV and green LED at 127 pixel value) for 300 to 500 ms, sampled uniformly from that range.
For the presentation of naturalistic scenes and object movies and images, we applied a gamma function of 1.9 to the 8-bit pixel values of the monitor.
Preprocessing of neural responses and behavioral data
Neural responses were first deconvolved using constrained non-negative calcium deconvolution [55]. For all stimulus paradigms except the full-field binary white noise stimulus, we subsequently extracted the accumulated activity of each neuron between 50 ms after stimulus onset and offset using a Hamming window. For the presentation of objects, we segmented the 5-second clips into 9 bins of 500 ms, starting 250 ms after stimulus onset. Behavioral traces were extracted using the same temporal offset and integration window as deconvolved calcium traces. To train our models, we isotropically downsampled stimuli images to 64 × 36 pixels. Input images, the target neuronal activities, behavioral traces and pupil positions were normalized across the training set during training.
Pharmacological manipulations
To dilate and constrict the pupil pharmacologically, we applied 1–3% atropine and carbachol eye drops, respectively, to the left eye of the animal facing the screen for visual stimulation. Functional recordings started after the pupil was dilated or constricted. Pharmacological pupil dilation lasted >2 hours, allowing to use all data for further analysis. In contrast, carbachol eye drops constricted the pupil for approx. 30 minutes and were re-applied once during the scan. For analysis, we then only selected trials with constricted pupil and matched data analyzed in the control scans to the same trial numbers.
Sparse noise spatial receptive field mapping
We estimated spatial spike-triggered averages (STAs) of V1 neurons in response to the sparse noise stimulus by multiplying the stimulus matrix with the response matrix of each neuron [63], separately for each stimulus color and polarity as well as behavioral state. For the latter, we separated trials into small (< 50th percentile) and large pupil trials (> 75th percentile). We used different pupil size thresholds for the two behavioral states compared to the model due to shorter recording time. For recordings with pupil dilation, we used locomotion speed instead of pupil size to separate trials into two behavioral states. For each behavioral state, STAs computed based on On and Off dots were averaged to yield one STA per cell and stimulus color. Green and UV STAs of the same behavioral state were peak-normalized to the same maximum. To assess STA quality, we generated response predictions by multiplying the flattened STA of each neuron with the flattened stimulus frames and compared the predictions to the recorded responses by estimating the linear correlation coefficient. For analysis, we only included cells where correlation >0.2 for at least one of the stimulus conditions.
In contrast to the modelling results, STA spectral contrast for a quiet state did only slightly vary across anterior-posterior axis of V1. We verified that this is due to different pupil size thresholds by using the data in response to natural images (cf. Fig. 2) to train a separate model without behavior as input channels on trials with small pupil (<50th percentile) and subsequently optimized MEIs – a procedure more similar to the STA paradigm. When looking at the spectral contrast of the resulting MEIs, we indeed observed a smaller variation of color preference across the anterior-posterior axis of V1, confirming our prediction.
To confirm that the shift in color preference with behavior in response to the sparse noise is not dependent on the specific pupil size thresholds we used, we presented 150 instead of 50 repeats per stimulus condition in a subset of experiments. The larger number of trials for more extreme behavioral states allowed to compute STAs for behavioral states more similar to the model (<20th versus >85th percentile). We found that this resulted in a stronger shift in color preference during active periods compared to the lower thresholds of pupil sizes, suggesting that we have likely underestimate the effect for the shorter recordings shown in Extended Data Fig. 3a-d.
Full-field binary white noise temporal receptive field mapping
We used the responses to the 10 Hz full-field binary white noise stimulus of UV and green LED to compute temporal STAs of V1 neurons. Specifically, we upsampled both stimulus and responses to 60 Hz and then multiplied the stimulus matrix with the response matrix of each neuron. Per cell, this resulted in a temporal STA in response to UV and green flicker, respectively. Kernel quality was measured by comparing the variance of each temporal STA with the variance of the baseline, defined as the first 100 ms of the STA. Only cells with at least 5-times more variance of the kernel compared to baseline were considered for further analysis.
Simulated data using Gabor neurons
We simulated neurons with Gabor receptive fields with varying Gabor parameters across the two color channels. Then, we normalized each Gabor receptive field to have a background of 0 and an amplitude range between −1 and 1. To generate responses of simulated neurons, we used the same set of training images presented during functional recordings. First, we subtracted the mean across all images from the training set, multiplied each Gabor receptive fields with each training image and computed the sum of each multiplication across the two color channels . We then passed the resulting scalar response per neuron through a rectified linear unit (ReLU), to obtain the simulated response , such that:
where
with and . We varied orientation , size , spatial aspect ratio , phase , and color preference indepentenly for each color channel and neuron, while keeping spatial frequency constant across all neurons. Finally, we passed the simulated responses through a Poisson process and normalized the responses by the respective standard deviation of the responses across all images. We used the responses of the simulated Gabor neurons together with the natural images to train the model (see below). Our model recovered both color-opponency and color preference of simulated neurons. Only extreme color preferences were slightly underestimated by our model, which is likely due to high correlations across color channels of natural scenes.
In-Silico tuning characterization
It has been our main interest to investigate the change of tuning properties with the behavioral state of the animal. Ideally, this includes manipulating the animal’s behavior and investigating the resulting effect on different visual tuning properties. While this is experimentally very challenging and time consuming, it is straightforward with a deep learning based neural predictive model emulating the biological circuit. This allowed us to selectively study how tuning to color or spatial features changes with behavior. To perform our in in-silico tuning characterization, we created a convolutional neural network (CNN) model, which was split into two parts: the core and the readout. The core computed latent features from the inputs, which were shared among all neurons. The readout was learned per neuron and mapped the output features of the core onto the neuronal responses via regularized regression.
Representation/Core:
We based our model on the work of [19], as it was demonstrated to set the state of the art for predicting the responses of a population of mouse V1 neurons. In brief, we modelled the core as a four-layer CNN, with 64 feature channels per layer. Each layer consisted of a 2d-convolutional layer followed by a batch normalization layer and an ELU nonlinearity [64, 65]. Except for the first layer, all convolutional layers were depth-separable convolutions [66] which lead to better performance while reducing the number of core parameters. Each depth-separable layer consisted of a 1×1 pointwise convolution, followed by a 7×7 depthwise convolution, again followed by a 1×1 pointwise convolution. Without stacking the outputs of the core, the output tensor of the last layer was passed on to the readout.
Readouts:
To get the scalar neuronal firing rate for each neuron, we computed a linear regression between the core output tensor of dimensons (width, height, channels) and the linear weight tensor , followed by an ELU offset by one (ELU+1), to keep the response positive. We made use of the recently proposed Gaussian readout [19], which simplifies the regression problem considerably. Our Gaussian readout learned the parameters of a 2D Gaussian distribution and sampled a location of height and width in the core output tensor in each training step, for every image and neuron. Given a large enough initial to ensure gradient flow, , i.e. the uncertainty about the readout location, decreased during training for more reliable estimates of the mean location , which represented the center of a neuron’s receptive field. At inference time (i.e. when evaluating our model), we set the readout to be deterministic and to use the fixed position . We thus learned a position of a single point in core feature space for each neuron. In parallel to learning the position, we learned the weights of the weight tensor of the linear regression of size per neuron. Furthermore, we made use of the retinotopic organization of V1, by coupling the recorded cortical 2d-coordinates of each neuron with the estimation of the receptive field position of the readout. We achieved this by learning a common function , which is shared by all neurons. We set to be a randomly initialized linear fully connected network of size 2–2 followed by a tanh nonlineartiy.
Shifter network:
Because we used a free viewing paradigm when presenting the visual stimuli to the head-fixed mice, the RF positions of the neurons with respect to the presented images had considerable trial to trial variability. To inform our model of the trial dependent shift of the neurons receptive fields, we shifted , the model neuron’s receptive field center, using the estimated pupil center (see section Neurophysiological experiments above). We accomplished this by passing the pupil center through a small shifter network, a three layer fully connected network with hidden features, again followed by a tanh nonlineartiy, that calculates the shift and per trial. The shift is then added to of each model neuron.
Input of behavior and image position encoding:
In addition to the green and UV channel of the visual stimulus, we appended five extra channels to each input to the model. We added three channels of the recorded behavioral parameters in each given trial (pupil size, instantaneous change of pupil size, and running speed), such that each channel simply consisted of the scalar for the respective behavioral parameter, transformed into stimulus dimension. This enabled the model to predict neural responses as a function of both visual input and behavior and thus to learn the relationship between behavioral states and neuronal activity. This modification allowed us to investigate the effect of behavior by selecting different inputs in the behavioral channels while keeping the image unchanged. Furthermore, we added a positional encoding to the inputs, consisting of two channels, encoding the horizontal and vertical pixel positions of the visual stimulus. These encodings can be thought of as simple grayscale gradients in either direction, with values from . Appending position encodings of this kind have been shown to improve the ability of CNNs to learn spatial relationships between pixel positions of the input image and high level feature representations [22]. We found that including the position embedding increased the performance of our model (Extended Data Fig. 2b). We also observed a smoother gradient of color tuning across the different scan fields (Fig. 2b, Extended Data Fig. 6b) when adding the position encoding, indicating that indeed the model learned the color sensitivity tuning of mouse V1 more readily.
Model training and evaluation
We first split the unique training images into the training and validation set, using a split of 90% to 10%, respectively. Then we trained our networks with the training set by minimizing the Poisson loss , where denotes the number of neurons, the predicted neuronal response and the observed response. After each full pass through the training set (i.e. epoch), we calculated the correlation between predicted and measured neuronal responses across all neurons on the validation set: if the correlation failed to increase during a fixed number of epochs, we stopped the training and restored the model to its state after the best performing epoch. After each stopping, we either decreased the learning rate or stopped training altogether, if the number of learning-rate decay steps was reached. We optimized the network’s parameters via stochastic gradient descent using the Adam optimizer [67]. Furthermore, we performed an exhaustive hyper-parameter selection using Bayesian search on a held-out dataset. All parameters and hyper-parameters can be found in our GitHub repository (see Code Availability). When evaluating our models on the test set (Extended Data Fig. 2a-c), we used two different types of correlation. Firstly, referred to as test correlation, we computed the correlation between the model’s prediction and neuronal responses across single trials, including the trial by trial variability across repeats. Secondly, we computed the correlation of the predicted responses with the average responses across repeats and refer to it here as correlation to average. We also computed the fraction of variance explained, using proposed by [68], which provides an unbiased estimate of the variance explained based on the expected neuronal response across image repetitions. However, our model computed different predictions for each repetition of a given test set image, because we also fed the behavioral parameters of each trial into the model. We thus simply averaged the model responses across repetitions and calculated the accordingly. When evaluating the model performance for the pharmacology conditions (Extended Data Fig. 2c), we found that they led to a lower model performance compared to the control condition. This could be due to the fact that, for the dilated condition, we did not incorporate pupil-related behavioral parameters into the model due to difficulties in pupil tracking for this pharmacological condition. For the drug condition with carbachol, we selected a subset of trials where the pupil was constricted (see pharmacological manipulations section above), which led to fewer trials to train the models with. Finally, for some of our datasets that had either a low number of trials or a low yield of neurons, we trained a single model on multiple datasets [19], such that the convolutional core of the model is trained with more examples. The training of the per-neuron readout is unaffected by this joint training of datasets. We assigned a model ID to each trained model, which can be found in the Suppl. Table 3, such that datasets that were trained together in one model can be easily identified.
Ensemble models
For all analyses as well as for the generation of MEIs, we used an ensemble of models, rather than individual models. Instead of training just one model for each dataset, we trained 10 individual models that were initialized with different random seeds. We then selected the best 5 models as measured by their performance on the validation set to be part of a model ensemble. The inputs to the ensemble model were passed to each member, and the resulting predictions were averaged to obtain the final model prediction.
Generation of maximally exciting inputs
We used a variant of regularized gradient ascent on our trained deep neural network models to obtain a maximally exciting input image (MEI) for each neuron, given by , with height , width , and channels . Because of our particular model inputs (see section Input of behavioral parameters and image position encoding), each MEI, like the natural images used for training, had seven channels of which we optimize only the first two: the green and UV color channels. To obtain MEIs, we initialized a starting image with Gaussian white noise. We set the behavioral channels of the starting image to the desired behavioral values (usually <3rd and >97th percentile for quiet and active states, respectively. In addition, we set the position channels to the default position encoding. Then, in each iteration of our gradient ascent method, we presented the image to the model and computed the gradients of a single target neuron w.r.t the two image channels (green and UV). We smoothed the gradient in each iteration by applying Gaussian blur with a of 1 pixel. To constrain the contrast of the image, we calculated the Euclidean (L2) norm of the resulting MEI
across all pixels of the two color channels and compared the L2 norm to a fixed norm budget , which we set to 10. The norm budget can be effectively thought of as a contrast constraint. An L2 norm of 10, calculated across all pixel intensities of the image, proved to be optimal such that the resulting MEI had minimal and maximal values similar to those found in our training natural image distribution. If the image exceeded the norm budget during optimization, we divided the entire image by factor with . Additionally, we made sure that the MEI could not contain values outside of the 8-bit pixel range by clipping the MEI outside of these bounds, corresponding to 0 or 255 pixel-intensity. As an optimizer, we used stochastic gradient descent with learning rate of 3. We ran each optimization for 1000 iterations, without an option for early stopping. Our analyses showed that the resulting MEIs were highly correlated across behavioral states (Extended Data Fig. 5a-c). To validate this finding, we performed a control experiment using two separate models exclusively trained on trials from active or quiet states. We again split the trials into quiet and active periods using pupil size (quiet: <50th percentile, active: >75th percentile). When inspecting the MEIs generated from these two models, we found that the MEIs were again highly correlated across color channels, albeit less than for the model that was trained on the entire data. This can partly be explained by the limited amount data for the model trained with trials from the active state that occurred less frequently in our data. Furthermore, we found that the spatial structure of MEIs of anatomically matched neurons across control and pharmacology condition was very similar, suggesting that the two models trained separately both converged on the same tuning properties, despite differences in prediction performance (Extended Data Fig. 2)
Spectral contrast
For estimating the chromatic preference of the recorded neurons, we used spectral contrast (). It is estimated as Michelson contrast ranging from −1 and 1 for a neuron responding solely to UV and green image contrast, respectively. We decided to quantify spectral sensitivity in relative terms for each behavioral state, because visual responses to both green and UV stimuli are gain modulated in an active state. Therefore, interpretation of absolute response amplitudes to UV and green stimuli across behavioral states can be challenging. See Extended Data Fig. 6a,d for an illustration how responses to stimuli of diverse spectral contrasts are gain modulated during an active state. We define as
where and correspond to (i) the norm of green and UV MEI channel to estimate the neurons’ chromatic preference in the context of naturalistic scenes, (ii) the amplitude (mean of all pixels >90th percentile) of UV and green spatial STA to estimate the neurons’ chromatic preference in the context of the sparse noise paradigm, (iii) the norm of the green and UV channel of reconstructed images to quantify chromatic preference at a populational level and (iv) the norm of green and UV channel of simulated Gabor RFs to obtain each simulated neuron’s chromatic preference.
In silico color tuning curves
To generate in silico color tuning curves for recorded V1 neurons, we systematically varied the L2-norm of the green and UV MEI channel while keeping the overall norm across color channels constant (with norm = 10). We used n=50 spectral contrast levels, ranging from all contrast in the UV channel to all contrast in the green channel. We then showed the modified MEIs to the model and plotted the predicted responses across all n=50 spectral contrast levels. Modified MEIs were either presented to the model for a quiet or active state (see also above).
Temporal dynamics of shift in color tuning with behavior
To investigate the timescale of the shift in color selectivity with behavior, we tested how fast we could observe the shift after a transition from a quiet to an active behavioral state. For that, we identified state changes from quiet to active periods by detecting rapid increases in pupil size above a certain threshold (>95th percentile of differentiated pupil size trace) after a prolonged quiet state period (>5 seconds below 50th percentile of pupil size). Results were consistent across varying threshold (data not shown). Then, we sampled active trials with pupil size >75th percentile of pupil size for varying read-out windows (1, 2, 3, 5 and 10 seconds) after that state change. Model training was performed on all quiet trials (<50th percentile of pupil size) and the selection of active trials. MEIs and STAs were then estimated as described above.
Reconstruction analysis
We visualize which image features the population of model neurons are sensitive to by using a novel resource constrained image reconstruction method based on the responses of a population of model neurons. [32]. The reasoning behind a resource constrained reconstruction is to recreate the responses of a population of neurons when presented with a target image, by optimizing a novel image and matching the neuron’s responses given that novel image as close as possible to the target image’s responses. By limiting the image contrast of the reconstructed image during the optimization, the reconstructions will only contain the image features that are most relevant to recreate the population responses, thereby visualizing the sensitivities and invariances of the population of neurons. As target images for our reconstruction, we chose natural images from our test set. For each reconstruction, we first calculated the neuronal responses of all model neurons when presented with target image . We then initialized an image () with Gaussian white noise as the basis for reconstruction of the target image by minimizing the squared loss between the target responses and the responses from the reconstructed image subject to a norm constraint. In this work, we set the contrast (i.e. L2-norm, see section Generation of maximally exciting inputs for details) of the reconstructions to 40, which corresponds to ~60% of the average norm of our natural image stimuli. We chose this value to be high enough to still allow for qualitative resemblance between the the reconstructed image and the target, while keeping the constraint tight enough to avoid an uninformative trivial solution, i.e. the identical reconstruction of the target. We improved the quality of the reconstructions by using an augmented version of our model, which reads out each neuronal response not from the model neuron’s actual receptive field position (see Readouts for details), but instead from all height-times-width positions in feature space, except the n=10 pixels around each border to avoid padding artefacts. This yielded 18 ∗ 46 = 828 copies per neuron and with the N=478 original model neurons, this resulted in overall n=395,784 augmented neurons for our reconstruction analyses. We found stochastic gradient descent with a learning rate of 1000 to yield the qualitatively best reconstructions, resulting in images with the least amount of noise. We always optimized for 5000 steps per image, without the early stopping the optimization.
Decoding analysis
We used a support vector machine classifier with a radial basis function kernel to estimate decoding accuracy between the neural representations of two stimulus classes - either object 1 and object 2 (object discrimination) or dark object and no object (object detection). For that, we used all neurons recorded within one scan and built four separate decoders for UV and green stimuli and small and large pupil trials, respectively. Then, we train each decoder with randomly selected training trials (usually 176 trials, but only 60–126 trials for n=3 scans due to lower amount of trials with locomotion activity), test its accuracy randomly selected test trials (15% of train trials) and compute the mean accuracy across n=10 different training/test trial splits. Finally, we convert the decoding accuracy into discriminability, the mutual information between the true class and its estimate using
where is the probability of observing the true class and predicted class and and denote the respective marginal probabilities.
To quantify significance for each animal, we compared the observed shift in decoding performance of UV versus green objects across behavioral states per animal with a distribution of shifts (n=500) obtained when shuffling the labels of quiet and active trials using bootstrapping. More specifically, we sampled half of the training data and test data from quiet trials, and the other half from active trials at random. We then trained SVMs to compute the decoding accuracy based on this particular shuffling. We repeated this n=500 times and obtained a p-value by computing the upper quantile of the real shift given the distribution of shifts obtained when shuffling the behavioral states.
Response reliability
We calculated the signal-to-noise ratio (SNR, [68]) as our measure for response reliability. It is defined as follows:
The SNR expresses the ratio of the variance in the expected responses against trial-by-trial variability across repeats. Here, corresponds to the expected response to the stimulus, with the the average expected response given as
The trial-by-trial variance is computed by averaging the variance across repeats over all stimuli. We assume that is constant across all responses to different stimuli. This is achieved by a variance stabilizing transform of the responses , for which we use the Anscombe transformation. We thus obtain the transformed responses as follows:
The SNR has been shown to be a reliable estimate of data quality for neuronal responses across diverse recording modalities and brain regions [68]).
Statistical analysis
We used Generalized Additive Models (GAMs) to analyze the relationship of MEI spectral contrast, cortical position and behavioral state (see Suppl. Statistical Analysis for details). GAMs extend the generalized linear model by allowing the linear predictors to depend on arbitrary smooth functions of the underlying variables [69]. In practice, we used the mgcv-package for R to implement GAMs and perform statistical testing. For all other statistical tests, we used the Wilcoxon signed rank test and the the two- or one-sampled t-test.
Extended Data
Supplementary Material
Acknowledgements:
We thank Greg Horwitz, Thomas Euler, Mackenzie Mathis, Tom Baden, Lara Höfling and Yongrong Qiu for feedback on the manuscript and Donnie Kim, Daniel Sitonic, Dat Tran, Zhuokun Ding, Konstantin Lurz, Mohammad Bashiri, Christoph Blessing and Edgar Walker for technical support and helpful discussions. This work was supported by the Carl-Zeiss-Stiftung (FS), the DFG Cluster of Excellence “Machine Learning – New Perspectives for Science” (FS; EXC 2064/1, project number 390727645), the AWS Machine Learning research award (FS), and the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior/Interior Business Center (DoI/IBC) contract number D16PC00003 (AST). The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/IBC, or the U.S. Government. Also supported by R01 EY026927 (AST), NEI/NIH Core Grant for Vision Research (T32-EY-002520-37) and NSF NeuroNex grant 1707400 (AST).
Footnotes
Code availability
Our coding framework uses general tools like PyTorch, Numpy, scikit-image, matplotlib, seaborn, DataJoint, Jupyter, and Docker. We also used the following custom libraries and code: neuralpredictors (https://github.com/sinzlab/neuralpredictors) for torch-based custom functions for model implementation, nnfabrik (https://github.com/sinzlab/nnfabrik) for automatic model training pipelines using DataJoint, nndichromacy for utilities, (https://github.com/sinzlab/nndichromacy), and mei (https://github.com/sinzlab/mei) for stimulus optimization.
Competing interests: The authors declare no competing interests.
Data availability
The stimulus images and neural data used in this paper are stored at .
References
- [1].Reimer J. et al. Pupil fluctuations track fast switching of cortical states during quiet wakefulness. Neuron 84 (2), 355–362 (2014) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Niell CM & Stryker MP Modulation of visual responses by behavioral state in mouse visual cortex. Neuron 65 (4), 472–479 (2010) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Vinck M, Batista-Brito R, Knoblich U. & Cardin JA Arousal and locomotion make distinct contributions to cortical activity patterns and visual encoding. Neuron 86 (3), 740–754 (2015) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Treue S. & Maunsell JH Attentional modulation of visual motion processing in cortical areas MT and MST. Nature 382 (6591), 539–541 (1996) . [DOI] [PubMed] [Google Scholar]
- [5].Erisken S. et al. Effects of locomotion extend throughout the mouse early visual system. Curr. Biol 24 (24), 2899–2907 (2014) . [DOI] [PubMed] [Google Scholar]
- [6].Reimer J. et al. Pupil fluctuations track rapid changes in adrenergic and cholinergic activity in cortex. Nat. Commun 7, 13289 (2016) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Bennett C, Arroyo S. & Hestrin S. Subthreshold mechanisms underlying state-dependent modulation of visual responses. Neuron 80 (2), 350–357 (2013) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Liang L. et al. Retinal inputs to the thalamus are selectively gated by arousal. Curr. Biol 30 (20), 3923–3934.e9 (2020) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].McAdams CJ & Maunsell JH Effects of attention on orientation-tuning functions of single neurons in macaque cortical area V4. J. Neurosci 19 (1), 431–441 (1999) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Qiu Y. et al. Natural environment statistics in the upper and lower visual field are reflected in mouse retinal specializations. Curr. Biol (2021) . [DOI] [PubMed]
- [11].Rowell CH Variable responsiveness of a visual interneurone in the Free-Moving locust, and its relation to behaviour and arousal. Journal of Experimental Biology (1971) .
- [12].Chiappe ME, Seelig JD, Reiser MB & Jayaraman V. Walking modulates speed sensitivity in drosophila motion vision. Curr. Biol 20 (16), 1470–1475 (2010) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Busse L. The influence of locomotion on sensory processing and its underlying neuronal circuits. eNeuroforum 24 (1), A41–A51 (2018) . [Google Scholar]
- [14].Schneider DM Reflections of action in sensory cortex. Curr. Opin. Neurobiol 64, 53–59 (2020) . [DOI] [PubMed] [Google Scholar]
- [15].Gerl EJ & Morris MR The causes and consequences of color vision. Evolution: Education and Outreach 1 (4), 476–486 (2008) . [Google Scholar]
- [16].Szél A. et al. Unique topographic separation of two spectral classes of cones in the mouse retina. J. Comp. Neurol 325 (3), 327–342 (1992) . [DOI] [PubMed] [Google Scholar]
- [17].Baden T. et al. A tale of two retinal domains: near-optimal sampling of achromatic contrasts in natural scenes through asymmetric photoreceptor distribution. Neuron 80 (5), 1206–1217 (2013) . [DOI] [PubMed] [Google Scholar]
- [18].Walker EY et al. Inception loops discover what excites neurons most using deep predictive models. Nat. Neurosci 22 (12), 2060–2065 (2019) . [DOI] [PubMed] [Google Scholar]
- [19].Lurz K-K et al. Generalization in data-driven models of primary visual cortex (2021).
- [20].Bashivan P, Kar K. & DiCarlo JJ Neural population control via deep image synthesis. Science (New York, N.Y.) 364 (6439) (2019). 10.1126/science.aav9436 [DOI] [PubMed] [Google Scholar]
- [21].Franke K. et al. An arbitrary-spectrum spatial visual stimulator for vision research. Elife 8 (2019) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Liu R. et al. An intriguing failing of convolutional neural networks and the CoordConv solution (2018). arXiv:1807.03247 [cs.CV].
- [23].Rhim I, Coello-Reyes G, Ko H-K & Nauhaus I. Maps of cone opsin input to mouse V1 and higher visual areas. J. Neurophysiol 117 (4), 1674–1682 (2017) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Denman DJ, Siegle JH, Koch C, Reid RC & Blanche TJ Spatial organization of chromatic pathways in the mouse dorsal lateral geniculate nucleus. J. Neurosci 37 (5), 1102–1116 (2017) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Rhim I, Coello-Reyes G. & Nauhaus I. Variations in photoreceptor throughput to mouse visual cortex and the unique effects on tuning. Sci. Rep 11 (1), 1–21 (2021) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Fu Y. et al. A cortical circuit for gain control by behavioral state. Cell 156 (6), 1139–1152 (2014) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Schröder S. et al. Arousal modulates retinal output. Neuron 107 (3), 487–495.e9 (2020) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Eggermann E, Kremer Y, Crochet S. & Petersen CCH Cholinergic signals in mouse barrel cortex during active whisker sensing. Cell Rep. 9 (5), 1654–1660 (2014) . [DOI] [PubMed] [Google Scholar]
- [29].Tikidji-Hamburyan A. et al. Retinal output changes qualitatively with every change in ambient illuminance. Nat. Neurosci 18 (1), 66–74 (2015) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Grimes WN, Schwartz GW & Rieke F. The synaptic and circuit mechanisms underlying a change in spatial encoding in the retina. Neuron 82 (2), 460–473 (2014) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Pennesi ME, Lyubarsky AL & Pugh EN Jr. Extreme responsiveness of the pupil of the dark-adapted mouse to steady retinal illumination. Invest. Ophthalmol. Vis. Sci 39 (11), 2148–2156 (1998) . [PubMed] [Google Scholar]
- [32].Safarani S. et al. Towards robust vision by multi-task learning on monkey visual cortex (2021). arXiv:2107.14344.
- [33].Bialek W, Rieke F, de Ruyter van Steveninck RR & Warland D. Reading a neural code. Science 252 (5014), 1854–1857 (1991) . [DOI] [PubMed] [Google Scholar]
- [34].Froudarakis E. et al. Object manifold geometry across the mouse cortical visual hierarchy (2020).
- [35].Dadarlat MC & Stryker MP Locomotion enhances neural encoding of visual stimuli in mouse V1. J. Neurosci 37 (14), 3764–3775 (2017) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].Spitzer H, Desimone R. & Moran J. Increased attention enhances both behavioral and neuronal performance. Science 240 (4850), 338–340 (1988) [DOI] [PubMed] [Google Scholar]
- [37].Wiersma CA & Oberjat T. The selective responsiveness of various crayfish oculomotor fibers to sensory stimuli. Comp. Biochem. Physiol 26 (1), 1–16 (1968) . [DOI] [PubMed] [Google Scholar]
- [38].Maimon G, Straw AD & Dickinson MH Active flight increases the gain of visual motion processing in drosophila. Nat. Neurosci 13 (3), 393–399 (2010) . [DOI] [PubMed] [Google Scholar]
- [39].Bezdudnaya T. et al. Thalamic burst mode and inattention in the awake LGNd. Neuron 49 (3), 421–432 (2006) . [DOI] [PubMed] [Google Scholar]
- [40].de Gee JW et al. Mice regulate their attentional intensity and arousal to exploit increases in task utility (2022).
- [41].Andermann ML, Kerlin AM, Roumis DK, Glickfeld LL & Reid RC Functional specialization of mouse higher visual cortical areas. Neuron 72 (6), 1025–1039 (2011) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [42].Cronin TW & Bok MJ Photoreception and vision in the ultraviolet. J. Exp. Biol 219 (Pt 18), 2790–2801 (2016) . [DOI] [PubMed] [Google Scholar]
- [43].Hulburt EO Explanation of the brightness and color of the sky, particularly the twilight sky. J. Opt. Soc. Am., JOSA 43 (2), 113–118 (1953) [Google Scholar]
- [44].Storchi R. et al. Measuring vision using innate behaviours in mice with intact and impaired retina function. Sci. Rep 9 (1), 10396 (2019) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [45].Meyer AF, Poort J, O’Keefe J, Sahani M. & Linden JF A Head-Mounted camera system integrates detailed behavioral monitoring with multichannel electrophysiology in freely moving mice. Neuron 100 (1), 46–60.e7 (2018) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [46].Wald G. HUMAN VISION AND THE SPECTRUM. Science 101 (2635), 653–658 (1945) . [DOI] [PubMed] [Google Scholar]
- [47].Lamb TD Why rods and cones? Eye 30 (2), 179–185 (2016) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [48].Larsen RS & Waters J. Neuromodulatory correlates of pupil dilation. Front. Neural Circuits 12, 21 (2018) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [49].Douglas RH The pupillary light responses of animals; a review of their distribution, dynamics, mechanisms and functions. Prog. Retin. Eye Res 66, 17–48 (2018) . [DOI] [PubMed] [Google Scholar]
- [50].Eberhardt LV, Grön G, Ulrich M, Huckauf A. & Strauch C. Direct voluntary control of pupil constriction and dilation: Exploratory evidence from pupillometry, optometry, skin conductance, perception, and functional MRI. Int. J. Psychophysiol 168, 33–42 (2021) . [DOI] [PubMed] [Google Scholar]
- [51].Froudarakis E. et al. Population code in mouse V1 facilitates readout of natural scenes through increased sparseness. Nat. Neurosci 17 (6), 851–857 (2014) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [52].Mathis A. et al. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat. Neurosci 21 (9), 1281–1289 (2018) . [DOI] [PubMed] [Google Scholar]
- [53].Garrett ME, Nauhaus I, Marshel JH & Callaway EM Topography and areal organization of mouse visual cortex. J. Neurosci 34 (37), 12587–12600 (2014) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [54].Sofroniew NJ, Flickinger D, King J. & Svoboda K. A large field of view two-photon mesoscope with subcellular resolution for in vivo imaging. Elife 5, e14472 (2016) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [55].Pnevmatikakis EA et al. Simultaneous denoising, deconvolution, and demixing of calcium imaging data. Neuron 89 (2), 285–299 (2016) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [56].Henriksson JT, Bergmanson JPG & Walsh JE Ultraviolet radiation transmittance of the mouse eye and its individual media components. Exp. Eye Res 90 (3), 382–387 (2010) . [DOI] [PubMed] [Google Scholar]
- [57].Schmucker C. & Schaeffel F. A paraxial schematic eye model for the growing C57BL/6 mouse. Vision Res. 44 (16), 1857–1867 (2004) . [DOI] [PubMed] [Google Scholar]
- [58].Russakovsky O. et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis 115 (3), 211–252 (2015) . [Google Scholar]
- [59].Grozdanic S. et al. Characterization of the pupil light reflex, electroretinogram and tonometric parameters in healthy mouse eyes. Curr. Eye Res 26 (6), 371–378 (2003) . [DOI] [PubMed] [Google Scholar]
- [60].Szatko KP et al. Neural circuits in the mouse retina support color vision in the upper visual field. Nat. Commun 11 (1), 3481 (2020) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [61].Yoshimatsu T, Schröder C, Nevala NE, Berens P. & Baden T. Fovea-like photoreceptor specializations underlie single UV cone driven Prey-Capture behavior in zebrafish. Neuron 107 (2), 320–337.e6 (2020) . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [62].Perlin K. An image synthesizer. SIGGRAPH Comput. Graph 19 (3), 287–296 (1985) . [Google Scholar]
- [63].Schwartz O, Pillow JW, Rust NC & Simoncelli EP Spike-triggered neural characterization. J. Vis 6 (4), 484–507 (2006) . [DOI] [PubMed] [Google Scholar]
- [64].Ioffe S. & Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift, ICML’15, 448–456 (JMLR.org, 2015). [Google Scholar]
- [65].Clevert D-A, Unterthiner T. & Hochreiter S. Fast and accurate deep network learning by exponential linear units (ELUs) (2015). arXiv:1511.07289 [cs.LG].
- [66].Chollet F. Xception: Deep learning with depthwise separable convolutions (2017).
- [67].Kingma DP & Ba J. Bengio Y. & LeCun Y. (eds) Adam: A method for stochastic optimization. (eds Bengio Y. & LeCun Y) 3rd International Conference on Learning Representations, ICLR; 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings (2015). [Google Scholar]
- [68].Pospisil DA & Bair W. The unbiased estimation of the fraction of variance explained by a model (2020). 10.1101/2020.10.30.361253. [DOI] [PMC free article] [PubMed]
- [69].Wood SN Generalized additive models: an introduction with R (Chapman and Hall/CRC, 2006). [Google Scholar]
- [70].Tan Z, Sun W, Chen T-W, Kim D. & Ji N. Neuronal representation of ultraviolet visual stimuli in mouse primary visual cortex. Scientific Reports 5 (1) (2015). 10.1038/srep12597 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [71].Mouland JW et al. Extensive cone-dependent spectral opponency within a discrete zone of the lateral geniculate nucleus supporting mouse color vision. Curr. Biol 31 (15), 3391–3400.e4 (2021) . [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The stimulus images and neural data used in this paper are stored at .