Abstract
Visual cortex plays an important role in representing the affective significance of visual input. The origin of these affect-specific visual representations is debated: they are innate to the visual system versus they arise through reentry from frontal emotion processing structures such as the amygdala. We examined this problem by combining a convolutional neural network (CNN) model of the human ventral visual cortex pre-trained on ImageNet with two datasets of affective images. Our results show that (1) in all layers of the CNN model, there were artificial neurons that responded consistently and selectively to neutral, pleasant, or unpleasant images and (2) lesioning these neurons by setting their output to 0 or enhancing these neurons by increasing their gain lead to decreased or increased emotion recognition performance respectively. These results support the idea that the visual system may have the innate ability to represent the affective significance of visual input and suggest that CNNs offer a fruitful platform for testing neuroscientific theories.
Introduction
Human emotions are complex and multifaceted, and under the influence of many factors, including individual differences, cultural influences, and the context in which the emotion is experienced (1–5). Still, a large number of people, across different cultures, levels of knowledge, and backgrounds, experience similar feelings when viewing images of varying affective content (6–9). What fundamental principles in the functions of the human visual system underlie such universality requires elucidation.
Previous studies of emotion perception have primarily relied on empirical cognitive experiments (10–12). Some of them have focused on capturing human behavioral valence or arousal judgment on affective images (13–16), while others have recorded brain activities to look for neural correlates of affective stimuli processing (17–21). Despites decades of effort, how the brain transforms visual stimuli into subjective emotion judgments (e.g., happy, neutral, or unhappy) remains not well understood. The advent of machine learning especially artificial neural networks (ANNs) opens the possibility of attacking this problem using a modeling approach.
In machine learning, artificial neural networks can project the raw visual information to a feature space and encode the feature as the activation patterns of hidden layers for object recognition. One type of artificial neural networks, convolutional neural networks (CNNs), owing to their hierarchical organization resembling that of the visual system, are increasingly used as models of visual processing in the primate brain (22–26). CNNs trained to recognize visual objects can achieve performance levels rivaling or even exceeding that of humans. Interestingly, CNNs trained on images from such databases as the ImageNet (27) are found to demonstrate neural selectivity to stimuli that are not in the training data. For instance, (28) showed that neurons in a CNN trained on ImageNet became selective for numbers without having been trained on any “number” datasets.(29) demonstrated that face-selective neurons could arise in a CNN model trained on recognizing nonface objects. These studies raise the natural question of whether it is possible that the brain uses similar neural mechanisms to extract relevant features from the stimuli and selectively use them to guide emotion judgments.
The role of the visual cortex in visual emotion processing is debated (30, 31). (32) argued that emotion representation is an intrinsic property of the visual cortex. They used a CNN pre-trained on ImageNet to show that the model can accurately predict the emotion categories of affective images. (20), on the other hand, showed that the affective representations found in the visual cortex during affective scene processing might arise as the result of reentry from anterior emotion modulating structures such as the amygdala. The goal of this study is to further examine this question using CNN models.
CNN models are well suited for addressing questions related to the human visual system. Among the many well-established CNN models, VGG-16 (33) has an intermediate level of complexity and is shown to have superior object recognition performance (34). Using VGG-16, recent cognitive neuroscience studies have explored how encoding and decoding of sensory information are hierarchically processed in the brain (23, 35, 36). (23) used VGG-16 to quantitatively demonstrate an explicit gradient of feature complexity encoded in the ventral visual pathway. (35) used VGG-16 to model the visual cortical activity of human participants reviewing images of objects and demonstrated that different layers of the model highly correlate with brain activity in different visual areas. (36) investigated qualitative similarities and differences between VGG-16 and other feed-forward CNNs in the representation of the visual object and showed these CNNs exhibit multiple perceptual and neural phenomena such as the Thatcher effect (37), Weber’s law (38), relative size, image object selectivity, and so on.
In this study, we adopted VGG-16 pre-trained on ImageNet as the mode of the human visual system. Using two well-established affective image datasets: International Affective Picture System (IAPS) (15) and Nencki Affective Picture System (NAPS) (16), we tested whether emotion selectivity can spontaneously emerge from such systems. Emotional selectivity was established by computing neural responses to three broad classes of images: pleasant, neutral, and unpleasant (tuning curves). A neuron is considered selective for a particular emotion if it exhibits the strongest responses to images of that category from both datasets. To test whether these neurons have a functional role, we replaced the last layer of the VGG-16 with two units and trained the connections to this layer to decode pleasant versus non-pleasant, neutral vs. non-neutral, and unpleasant vs. non-unpleasant images. Two neural manipulations were carried out: lesion and feature attention enhancements. Lesioning the selective neurons for a specific emotion is expected to degrade the network’s performance in recognizing that emotion, whereas applying attention enhancement to the selective neurons is expected to increase the network’s performance in recognizing that emotion.
Results
We tested whether emotion selectivity can naturally arise in a CNN model trained to recognize non-emotional natural images. VGG-16 pre-trained on ImageNet data (27) was used for this purpose (see Figure 1). Selectivity for pleasant, neutral, and unpleasant emotions was defined for each neuron based on its response profiles to images from two affective picture sets (IAPS and NAPS). The functional significance of these neurons was then assessed using lesion and attention enhancement methods.
Fig. 1. The architecture of VGG-16.
We used the VGG-16 pre-trained on ImageNet to model the visual system. VGG-16 has 13 convolutional layers and three fully connected (FC) layers. Each convolutional layer (light yellow color) is followed by a ReLU activation layer (yellow color) and a max-pooling layer (red color). Each FC layer (light purple color) is followed by a ReLU layer (purple color). The last FC layer is followed by a ReLU and a SoftMax layer (dark purple color). Affective images from two datasets (IAPS and NAPS) were presented to the model to define emotion-selectivity of neurons in the convolutional layers (see details in the Methods section). For attention enhancement and lesion experiments, we replaced the last FC layer with 2 units to represent different emotion categories: (1) pleasant vs. non-pleasant; (2) neutral vs. non-neutral; (3) unpleasant vs. non-unpleasant.
Emotion selectivity of neurons in different convolutional layers of VGG-16
The tuning curve for a neuron is defined as the normalized mean response to pleasant, neutral, and unpleasant images in a given dataset plotted as a function of the emotion category. The maximum of the tuning curve indicates the neuron’s preferred emotion category for that picture set. Given that the two picture sets (IAPS and NAPS) each contain its own inevitable idiosyncrasies, to guard against noise and spurious effects, a neuron is defined to be selective for a given emotion if it exhibits maximum response to the same emotion category for both datasets.
Figure 2A (top) shows the tuning curves of three neurons from the Convolutional Layer 3 (an early layer). According to the definition above, these neurons are selective for pleasant, neutral, and unpleasant categories, respectively. For the top 100 images from IAPS and NAPS that elicited the strongest response in these neurons, Figure 2A (bottom) shows the valence distribution of these images. As can be seen, for these early layer neurons, while the pleasant neuron is more activated by images with high valence ratings (pleasant), for the neutral and unpleasant neurons, the patterns are less clear. For the neurons in Convolutional Layer 6 (a middle layer), however, as shown in Figure 2B, their emotion selectivity and the category of images they prefer show greater agreement. Namely, the pleasant neuron prefers predominately images with high valence (pleasant), the neural neuron prefers predominately images with intermediate valence (neutral), and the unpleasant neuron prefers predominately images with low valence (unpleasant). The results for the three neurons from Convolutional Layer 13 (a deep layer) are similar to that from Layer 6; see Figure 2C.
Fig. 2. Tuning curves and emotion selectivity.

(A-C) Tuning curves of example neurons from different convolutional layers (top panel) along with the valence distribution of the top 100 images that elicited the strongest responses from a given neuron. (D) Tuning quality as a function of layers, calculated on the VGG-16 pre-trained on ImageNet (left) and on a randomly initialized network (right), respectively. The neurons in deeper layers of the pre-trained network show stronger tuning quality, especially for the IAPS dataset, suggesting that they are better able to differentiate different emotional categories, while no such effects are found for the randomly initialized network.
Do different emotion categories become better differentiated as we go from earlier to deeper layers? We used tuning quality, which is the maximum absolute tuning value of a neuron (39), to address this question. As shown in Figure 2D, emotional tuning became stronger as one ascended the layers from early to deep, an effect that is especially noticeable for the IAPS datasets, supporting the notion that emotion differentiability increases as we go from earlier to deeper layers. One of the computational principles embodied in CNN models of the brain is that earlier layer neurons encode lower-level stimulus properties (e.g., colors and edges), whereas deeper layer neurons encode higher-level properties such as semantic meaning (e.g., object identities) (40–42). The results in Figure 2, suggesting that from earlier to deeper layers, (1) the emotion selectivity of a neuron become more consistent with the valence of its preferred images and (2) different emotion categories become better differentiated, are seen to be in general agreement with this principle and add emotion as another higher-level concept encoded in deeper layers of a CNN model.
Generalizability of emotion-selective neurons
In the foregoing, a neuron was considered selective for a given emotion if it did so consistently for both IAPS and NAPS datasets. A natural question is whether such neurons arise as the result of random chance or as an emergent property of the network that is generalizable across datasets. To address this question, we quantified the overlap between two groups of randomly selected neurons using the Jaccard Index (JI) (43–46) and compared it to the overlap of the two groups of neurons defined to be selective for a given emotion by IAPS and by NAPS. The value of the Jaccard index ranges from 0.0 to 1.0, with 0 indicating nonoverlap and 1 total overlap. If emotion selectivity arises randomly, then the JI of two random sets of neurons will not be different from that derived from the overlap between IAPS identified and NAPS identified neurons selecting for the same emotion. As shown in Figure 3 (left), for the VGG16 pre-trained on the ImageNet, the overlap between IAPS identified and NAPS identified emotion-selective neurons is significantly higher than that between two randomly selected neurons ( in most layers), and the effect is more prominent in deeper than earlier layers. These results suggest that emotion selectivity is unlikely the result of random chance but an emergent property of the trained network. For the randomly initialized VGG-16, shown in Figure 3 (right), the overlap between emotion-selective neurons defined by IAPS and by NAPS, respectively, and that between randomly selectively two groups of neurons is largely the same, except for a small difference for the pleasant neurons.
Fig. 3. Generalizability of emotion selectivity across the two datasets.
Blue: the overlap between the group of neurons defined to be selective for a given emotion category by IAPS and that by NAPS was assessed using Jaccard Index in each layer of the pre-trained VGG-16 (left column) and a randomly initialized network (right column). Orange: the overlap between two groups of randomly selected neurons.
The functionality of emotion-selective neurons
To test whether emotion-selective neurons have a functional role, we followed (39) and replaced the last layer of the VGG-16, which originally contained 1,000 units for recognizing 1000 different object categories, with a fully connected layer containing two units for recognizing two types of emotions. Three models were trained and tested for each of the two datasets: Model 1: pleasant versus non-pleasant, Model 2: neutral versus non-neutral, and Model 3: unpleasant versus non-unpleasant. Once these models were shown to have adequate emotion recognition performance, two neural manipulations were considered: feature attention enhancement and lesion. For feature attention enhancement (47–49), the gain of the neurons selective for a given emotion is increased by increasing the slope of the ReLU activation function (see Methods) (50–53), whereas for lesion, the output of the neurons selective for a given emotion was set to 0, which effectively removes the contribution of these neurons, i.e., they are lesioned. We hypothesized that (1) with attention enhancement, the network’s ability to recognize emotion is increased and (2) with lesion, the network’s ability to recognize emotion is decreased.
Feature attention enhancement:
For IAPS images, Figure 4A compares performance changes after enhancing the emotion-selective neurons as well as enhancing the same number of randomly sampled neurons. The optimal tuning strength in which we achieved the best performance enhancement was chosen for each layer in the plot. As one can see, for pleasant versus non-pleasant, neutral versus non-neutral, and unpleasant versus non-unpleasant emotions, enhancing the gain of the neurons selective for a specific emotion can significantly improve the emotion recognition performance of the CNN model for that emotion. Moreover, deeper layer gain enhancement tends to yield greater performance improvements than earlier layer gain enhancement. Increasing the gain in randomly selected neurons, however, showed marginal changes in performance, which exhibited no systematic trends across layers. The feature-attention performance of emotion-selective neurons over random neurons is highly statistically significant in the middle and deeper layers ( for the middle layers and for the deeper layers). Figure 4A (right) shows the performance changes across layers as the tuning strength varied from 0 to 5. Again, it is apparent that deeper layer tuning yields more performance increase than earlier layers. Interestingly, for a given layer, the performance change is not necessarily a monotonic function of . When got too large, the performance started to decline. This could be because linear tuning may not fully reflect the non-linear property of the neuron. It might be instructed to recall the Yerkes-Dodson law, which posits that performance can improve as arousal (gain) increases up to a certain point, after which further increases in arousal can lead to a decline in performance (54, 55).
Fig. 4. Effects of enhancing emotion-selective neurons and randomly selected neurons.
(A) IAPS dataset. (B) NAPS dataset. (C) Model trained on NAPS dataset tested on IAPS data.
We carried out the same analysis for the NAPS dataset in Figure 4B. The results largely replicated that in Figure 4A for the IAPS dataset. One thing worth noting is that enhancing unpleasant neurons did not result in significant performance improvement over enhancing randomly selected neurons in some of the layers. One possibility is that the emotion recognition performance of the original model is close to the ceiling for NAPS, which leaves little room for performance improvement. When we tested the NAPS trained model on IAPS, as shown in Figure 4C, a clear performance improvement was seen in deeper layers.
Lesion analysis:
The functional importance of the emotion-selective neurons can be further assessed through lesion analysis (56–59). As shown in Figure 5, we compared the emotion recognition performance changes by setting the output from emotion-selective neurons to 0 as well as by setting the output of an equal number of randomly chosen neurons to 0. As can be seen, lesioning the emotion-selective neurons led to significant performance declines, especially for the deeper layers; the performance decline can be as high as 85%. In contrast, lesioning randomly selected neurons produces almost no performance changes. This result supports the hypothesis that the emotive-selective neurons are important for emotion recognition, and this is more so in deeper layers than in earlier layers.
Fig. 5. Lesion Analysis.
Performance changes were compared between lesioning emotion-selective neurons and randomly selected neurons.
Discussion
It has been argued that the human visual system has the innate ability to recognize the motivational significance of environmental inputs (60). We examined this problem using convolutional neural networks (CNNs) as models of the human visual system (61–66). Selecting the VGG16 pre-trained on non-emotional images from the ImageNet as our model (67–69) and using two sets of affective images (IAPS and NAPS) as test stimuli, we found the existence of emotion-selective neurons in all layers of the model even though the model has never been explicitly exposed to emotional content. Additionally, emotion selectivity becomes stronger and more consistent in the deeper layers, in agreement with prior literature demonstrating that the deeper layers of CNNs encode higher-level semantic information. Applying two manipulations: feature attention enhancement and lesion, we are able to show further that the emotion-selective neurons are functionally significant, specifically: (1) after increasing the gain of emotion-selective neurons (e.g., feature attention enhancement), the network’s performance in emotion recognition is enhanced relative to increasing the gain of randomly selected neurons and (2) in contrast, after lesioning the emotion-selective neurons, the network’s performance in emotion recognition is degraded relative to lesioning randomly selected neurons. These performance differences are stronger and more noticeable in deeper layers than in earlier layers. Together, these findings indicate that emotion selectivity can spontaneously emerge in CNN models trained to recognize visual objects, and these emotion-selective neurons play a significant role in recognizing emotion in natural images, lending credence to the notion that the visual system’s ability to represent affective information may be innate.
Affective processing in the visual cortex
The perception of opportunities and threats in complex visual scenes represents one of the main functions of the human visual system. The underlying neurophysiology is often studied by having observers view pictures varying in affective content. (70) reported greater functional activity in the visual cortex when subjects viewed pleasant and unpleasant pictures than neutral images. (71) showed the visual cortex has differential sensitivities in response to emotional stimuli compared to the amygdala. (72)demonstrated that emotional significance (e.g., valence or arousal) could modulate the perceptual encoding in the visual cortex. Two competing but not mutually exclusive groups of hypotheses have been advanced to account for emotion-specific modulations of activity in the visual cortex. The so-called reentry hypothesis states that the increased visual activation evoked by affective pictures results from reentrant feedback, meaning that signals arising in subcortical emotion processing structures such as the amygdala propagate to the visual cortex to facilitate the processing of motivationally salient stimuli (73–75). Recent work (20) provides support for this view. Using multivariate pattern analysis and functional connectivity, these authors showed that (1) different emotion categories (e.g., pleasant versus neutral and unpleasant versus neutral) are decodable based on the multivoxel patterns in the visual cortex and (2) the decoding accuracy is positively associated with reentry signals from anterior emotion-modulating regions. A second group of hypotheses states that the visual cortex may itself code for emotional qualities of a stimulus, without the necessity for recurrent processing (see (76) for a review). Evidence supporting this hypothesis comes from empirical studies in experimental animals (77, 78) as well as in human observers (79), in which the extensive pairing of simple sensory cues such as tilted lines or sinusoidal gratings with emotionally relevant outcomes shapes early sensory responses (80). Beyond simple visual cues, recent computational work using deep neural networks has also suggested that the visual cortex may intrinsically represent emotional value as contained in complex visual media such as video clips of varying affective content (32). Our results, by showing that emotion-selective neurons exist in all layers of the visual cortex and that these neurons play an important role in emotion recognition, appear to support the view that the visual cortex has the innate ability to code the emotional qualities of visual stimuli.
Neural selectivity in ANNs and the brain
That CNNs, or more generally ANNs, can be trained to recognize a large variety of visual objects has long been recognized. Remarkably, recent studies note that ANNs trained on recognizing visual objects can spontaneously develop selectivity for other types of input, including visual numbers and faces (81). The number sense is considered an inherent ability of the brain to estimate the quantity of certain items in a visual set (82, 83). There is significant evidence demonstrating that the number sense exists in both humans (e.g., adults and infants) (84–86) and non-human primates (e.g., numerically naïve monkeys) (87–89). (90) found that number-selective units spontaneously emerged in a deep artificial neural network trained on ImageNet for object recognition. (91) demonstrated that number selectivity can even arise spontaneously in randomly initialized deep neural networks without any training. Both studies focused on the last convolutional layers, in which the number-selective units were found, and they also demonstrated that the emergence of number-selective units could result from the weighted sum of both decreasing and increasing the activity of some units. In addition, it is well known that face-selective neurons exist in humans (92) and non-human primates. (81) showed that neurons in a randomly initialized deep neural network without training could selectively respond to faces, and the neurons in the deeper layers are more selective. (93) demonstrated that brain-like functional segregation can emerge spontaneously in deep neural networks trained on object recognition and face perception and proposed that the development of functional segregation of face recognition in the brain is a result of computational optimization in the cortex. Augmenting this rapidly growing literature, our study is the first to demonstrate that emotion selectivity can spontaneously emerge in a deep artificial neural network model of the human visual system trained to recognize non-emotional objects.
Layer dependence
Like the brain, the CNN model has a layered structure which allows the processing of information in a hierarchical fashion. Our layer-wise analysis showed that the extent and strength of emotion-neural selectivity are a function of the model layers. Compared to the early layers, the deeper layers have larger portions of neurons that show emotion selectivity, and the selectivity is stronger, consistent with the previous observations that deeper layers of CNN models encode more abstract concepts. For example, (40, 94) examined the internal representations of different layers in a CNN and found that deeper layers of the network tend to encode more abstract concepts, such as object parts and textures. The layered processing of emotional information may have several functional benefits. First, by processing visual information in hierarchical stages, the brain can quickly and efficiently respond to stimuli without the need for a complete and detailed analysis of the entire stimulus at once (95–97). This is especially important for the processing of emotionally salient stimuli, as quick and accurate emotional responses can be crucial for survival. Second, it would offer more flexibility for the processing of emotion at different levels of detail, which may depend on the perception task and the environmental context. For example, If the stimulus is perceived as significant or crucial, it elicits a stronger and more widespread neural response, engaging multiple regions and processing stages. On the other hand, if the stimulus is not significant, it elicits a weaker and more limited neural response involving fewer regions or layers and processing stages (98–100). Third, the integration of information from different levels allows for a more complete and nuanced representation of the visual stimulus and emotional response. This allows for the creation of a final representation that takes into account not just the visual properties of the stimulus but also its emotional significance and its impact on the individual (101–103). Lastly, by processing information in a layer-dependent manner, the brain can adapt and change the processing of information based on experience and learning (104). This allows the brain to refine its processing strategies and improve its performance over time (105).
Limitations
Several limitations of our study should be noted. First, we tested our hypothesis on a VGG-16 architecture. Although VGG-16 is a common architecture and widely used in studies to model the human visual system, it remains possible that the neuronal selectivity reported is model specific. Future work can address this problem by examining a broader range of CNN models as well as other deep neural network architectures to test the generality of the results. Second, the present study considered whether neuronal selectivity for emotion could arise in neurons of a visual system model that has not been exposed to affective content. It remains to be determined whether the selectivity could arise inherently from the hierarchical network structures, even in the absence of any training. Third, in our study, emotion was divided into three broad categories: pleasant, unpleasant and neutral. While this is in line with many neurophysiological studies in humans, future work should examine finer differentiations of emotion, e.g., joy, sadness, horror, disgust, and so on, and their neural representations in the brain. Lastly, despite the repeated demonstration that CNNs provide good models of human visual processing, caution must be exercised when extending the findings made on CNN models to the human brain.
To conclude, the present study shows that emotion selectivity can spontaneously emerge in a deep neural network trained to recognize non-emotional visual objects, which becomes stronger as we move deeper into the network. Two manipulations, attention enhancement, and lesion, further demonstrate the functional significance of the emotion-selective neurons. In addition to offering support to the idea that the visual system may have an innate ability to represent motivational values of sensory input, our findings also suggest that CNNs offer a valuable platform for testing neuroscience ideas in a way that is not practical in empirical studies.
Materials and Methods
Affective picture sets
Two sets of widely used affective images were used in this study. The IAPS library includes 1,182 images covering approximately 20 subclasses of emotions such as joy, surprise, entrancement, sadness, romance, disgust, and fear. The NAPS library has 1,356 images that can be divided into similar subclasses. For both libraries, each image has a normative valence rating, ranging from 1 to 9, indicating whether the image expresses unpleasant, neutral or pleasant emotions; the distributions of the valence rating from the two datasets were given in Fig.S1-C (right). In this study, for simplicity and in accordance with the common practice in human imaging studies of emotion (20, 32, 106–108), we classified images into three main categories based on their valence scores: “pleasant,” “neutral,” and “unpleasant.” For images that fell near the boundary between categories, we used soft thresholds of 4.3±0.5 and 6.0±0.5 to determine their classification as either “unpleasant” or “neutral,” or “neutral” or “pleasant.” We also visually examined each image to confirm its category. Finally, any images that we could not confidently classify were marked as “unknown” and removed from the analysis. This process resulted in some differences in the number of images in each category from the original datasets. After this categorization, IAPS images were divided into 296 pleasant, 390 neutral, and 341 unpleasant images, and NAPS into 352 pleasant, 477 neutral, and 281 unpleasant images (see Fig.S1-B).
The convolutional neural network model
VGG-16, a well-tested deep convolutional neural network for natural image recognition, was used in this study to evaluate emotion selectivity. It has 13 convolutional layers followed by three fully connected layers, with the last fully connected layer containing 1000 units for recognizing 1000 different types of visual objects. Each layer of VGG-16 contains a large number of artificial neurons (also referred to variously as filters or channels or feature maps). Each neuron is characterized by a ReLU activation function (see Fig.S1-A). Through this function, neurons within a given layer, upon receiving and processing the input from the previous layer, yield activation maps called feature maps which become the input for the next layer. Previous studies have compared the activation patterns of the VGG-16 model have been compared with experimental recordings from both humans and non-human primates and found that early layers of the model behave similarly to early visual areas such as V1, whereas deeper layers of the model are more analogous to higher-order visual areas such as the object-selective lateral occipital areas (22, 109–111).
In this study, VGG-16 was used in two ways. First, to examine whether emotional selectivity emerges in neurons trained to recognize non-emotional objects, we took the VGG-16 model pre-trained on 1.2 million natural images from the ImageNet, presented affective pictures from the two aforementioned affective picture datasets to the model, and analyzed the activation profiles of neurons from each layer. Emotional selectivity of each neuron was determined from these activation profiles (see below). Second, to test the functionality of the emotion-selective neurons, we replaced the last layer of the VGG-16 with a two-unit fully connected layer and trained the connections to this two-unit layer to recognize two categories of emotion: pleasant versus non-pleasant, neutral versus non-neutral, or unpleasant versus non-unpleasant. The training of the two-unit last layer used cross-entropy as the objective function. The other weights in the network remained the same as that trained on the ImageNet data (i.e., they are frozen).
Emotional selectivity calculation
The output from each unit in a feature map (see Fig.S1-A) can be written as:
| (1) |
where indicates the kernel weight of the filter in the convolutional layer, and * indicates mathematical convolution which applies matrix multiplication between and the outputs from the layer. Of note in Eq. (1) is that the ReLU activation function typically has a slope of 1 . Here in this work the slope is a tunable parameter. By tuning the slope of the ReLU function, we change the gain of the neuron, simulating the effect of feature based attention control (39, 58).
Let represents the response of the unit located at coordinates in the filter in layer to image . Then
| (2) |
is the response to the image averaged across the entire filter. Here and represent the width and height of the feature map. Thus, the mean activity of the filter in layer in response to all images in a dataset can be formulated as:
| (3) |
where represents the total number of images in a given set. Emotional selectivity of the filter is calculated according to
| (4) |
where represents the normalized activation of filter in layer in response to all images of emotion category , where . A neuron is considered selective for a specific emotion if the normalized activation for the images within that emotion category is highest among the three possible values. For example, if , and , the artificial neuron is considered selective for “unpleasant images”.
To mitigate the possible influence of the idiosyncrasies within each picture dataset set on emotion selectivity definition, we further specified that a neuron is said to be selective for a given emotion if it is selective for the same emotion defined by both the IAPS dataset and the NAPS dataset.
Testing the functionality of the emotion-selective neurons
Do the emotion-selective neurons defined above have a functional role? We applied two different approaches to examine this question: lesion and attention enhancement.
Lesion.
If the emotion-selective neurons are functionally important, then lesioning these neurons should lead to degraded performance in recognizing the emotion of a given image. Here the lesion of a specific neuron is achieved by setting its output to 0 (namely, setting in Eq. (1)). In our experiments, we lesioned the neurons selective for a given emotion as well as randomly selected neurons in a particular layer and observed the changes in the emotion recognition performance of the model.
Attention enhancement.
We further tested whether enhancing the activity of an emotion-selective neuron can lead to performance improvement in emotion recognition. Following (39), the strength of was increased from 0 to 5 with interval step size 0.1, where is the conventional choice and represents increased neuronal gain (i.e., enhanced feature attention). According to the feature similarity gain theory, increasing the gain of a neuron leads to enhanced performance of the neuron in perceiving stimuli with the relevant features. In our experiments, we enhanced the neurons selective for a given emotion as well as randomly selected neurons in a particular layer and observed the changes in the emotion recognition performance of the model (39) (see Fig.S2-A,B).
The generalization and overlap of the emotion-selective neurons
To test whether a neuron’s selectivity for a particular emotion is innate and independent of the idiosyncrasies of different picture sets, we compared neurons determined to be selective for a certain type of emotion by the IAPS dataset and by the NAPS dataset. The overlap between the two groups of neurons was quantified by the Jaccard index as follows:
| (5) |
where represents the Jaccard index of the overlapped neurons in layer in response to emotion category . represents the number of overlapping neurons in layer in response to emotion category across two datasets: IAPS and NAPS. is the total number of selective neurons in layer in response to emotion category defined on dataset IAPS or NAPS. To validate the results are not by chance, we randomly selected the same number of neurons as defined on the dataset IAPS and NAPS separately to get the overlap between two groups of randomly selected neurons of .
Supplementary Material
Teaser:
A convolutional neural network (CNN) model of the human ventral visual cortex shows that artificial neurons respond selectively to emotional images, supporting the idea of an innate ability to represent affective significance of visual input.
Acknowledgments
Funding:
This work was supported by the National Institutes of the National Institutes of Health/National Institute of Mental Health grants MH112558 and MH125615, the National Science Foundation grant 1908299, the University of Florida Artificial Intelligence Research Catalyst Fund, the University of Florida Informatics Institute Graduate Student Fellowship, the University of Florida McKnight Brain Institute, the University of Florida Center for Cognitive Aging and Memory, and the McKnight Brain Research Foundation.
Footnotes
Competing interests: The authors declare that they have no competing interests.
Data and materials availability:
All data are publicly available in the main text or the supplementary materials. The analysis code will available and may be requested from the authors.
References
- 1.Kitayama S., Emotion and culture: Empirical studies of mutual influence (American Psychological Association, Washington, DC, US, 1994). [Google Scholar]
- 2.McCarthy E. D., The Social Construction of Emotions: New Directions from Culture Theory. Sociology Faculty Publications (1994) (available at https://research.library.fordham.edu/soc_facultypubs/4). [Google Scholar]
- 3.Banks S. J., Eddy K. T., Angstadt M., Nathan P. J., Phan K. L., Amygdala–frontal connectivity during emotion regulation. Social Cognitive and Affective Neuroscience. 2, 303–312 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Gross J. J., “Emotion regulation: Conceptual and empirical foundations” in Handbook of emotion regulation, 2nd ed (The Guilford Press, New York, NY, US, 2014), pp. 3–20. [Google Scholar]
- 5.Barrett L. F., Lewis M., Haviland-Jones J. M., Handbook of Emotions (Guilford Publications, 2016; https://books.google.com/books?id=cbKhDAAAQBAJ). [Google Scholar]
- 6.Elfenbein H. A., Ambady N., On the universality and cultural specificity of emotion recognition: a meta-analysis. Psychol Bull. 128, 203–235 (2002). [DOI] [PubMed] [Google Scholar]
- 7.Hareli S., Kafetsios K., Hess U., A cross-cultural study on emotion expression and the learning of social norms. Frontiers in Psychology. 6 (2015) (available at 10.3389/fpsyg.2015.01501). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Ford B. Q., Mauss I. B., Culture and emotion regulation. Curr Opin Psychol. 3, 1–5 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Olderbak S., Wilhelm O., Emotion perception and empathy: An individual differences test of relations. Emotion. 17, 1092–1106 (2017). [DOI] [PubMed] [Google Scholar]
- 10.Lazarus R. S., Emotion and Adaptation (Oxford University Press, 1991). [Google Scholar]
- 11.Coan J. A., Handbook of emotion elicitation and assessment (Oxford University Press, New York, NY, US, 2007), Handbook of emotion elicitation and assessment. [Google Scholar]
- 12.LoBue V., Behavioral evidence for a continuous approach to the perception of emotionally valenced stimuli. Behavioral and Brain Sciences. 38, e79 (2015). [DOI] [PubMed] [Google Scholar]
- 13.Greenwald M. K., Cook E. W., Lang P. J., Affective judgment and psychophysiological response: Dimensional covariation in the evaluation of pictorial stimuli. Journal of Psychophysiology. 3, 51–64 (1989). [Google Scholar]
- 14.Bradley M. M., Lang P. J., Measuring emotion: The self-assessment manikin and the semantic differential. Journal of Behavior Therapy and Experimental Psychiatry. 25, 49–59 (1994). [DOI] [PubMed] [Google Scholar]
- 15.Lang P., International affective picture system (IAPS) : affective ratings of pictures and instruction manual. undefined (2005) (available at https://www.semanticscholar.org/paper/International-affective-picture-system-(IAPS)-%3A-of-Lang/788e2f5a24784ce952eec8a57902a6f03cd9318c).
- 16.Marchewka A., Żurawski Ł., Jednoróg K., Grabowska A., The Nencki Affective Picture System (NAPS): Introduction to a novel, standardized, wide-range, high-quality, realistic picture database. Behav Res. 46, 596–610 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Canli T., Zhao Z., Desmond J. E., Kang E., Gross J., Gabrieli J. D. E., An fMRI study of personality influences on brain reactivity to emotional stimuli. Behavioral Neuroscience. 115, 33–42 (2001). [DOI] [PubMed] [Google Scholar]
- 18.Vrticka P., Simioni S., Fornari E., Schluep M., Vuilleumier P., Sander D., Neural Substrates of Social Emotion Regulation: A fMRI Study on Imitation and Expressive Suppression to Dynamic Facial Signals. Frontiers in Psychology. 4 (2013) (available at 10.3389/fpsyg.2013.00095). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Résibois M., Verduyn P., Delaveau P., Rotgé J.-Y., Kuppens P., Van Mechelen I., Fossati P., The neural basis of emotions varies over time: different regions go with onset- and offset-bound processes underlying emotion intensity. Social Cognitive and Affective Neuroscience. 12, 1261–1271 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Bo K., Yin S., Liu Y., Hu Z., Meyyappan S., Kim S., Keil A., Ding M., Decoding Neural Representations of Affective Scenes in Retinotopic Visual Cortex. Cerebral Cortex. 31, 3047–3063 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Saarimäki H., Naturalistic Stimuli in Affective Neuroimaging: A Review. Frontiers in Human Neuroscience. 15 (2021) (available at 10.3389/fnhum.2021.675068). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Yamins D. L. K., Hong H., Cadieu C. F., Solomon E. A., Seibert D., DiCarlo J. J., Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences. 111, 8619–8624 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Güçlü U., van Gerven M. A. J., Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream. J. Neurosci. 35, 10005–10014 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Yamins D. L. K., DiCarlo J. J., Using goal-driven deep learning models to understand sensory cortex. Nat Neurosci. 19, 356–365 (2016). [DOI] [PubMed] [Google Scholar]
- 25.Marblestone A. H., Wayne G., Kording K. P., Toward an Integration of Deep Learning and Neuroscience. Front. Comput. Neurosci. 10 (2016), doi: 10.3389/fncom.2016.00094. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Richards B. A., Lillicrap T. P., Beaudoin P., Bengio Y., Bogacz R., Christensen A., Clopath C., Costa R. P., de Berker A., Ganguli S., Gillon C. J., Hafner D., Kepecs A., Kriegeskorte N., Latham P., Lindsay G. W., Miller K. D., Naud R., Pack C. C., Poirazi P., Roelfsema P., Sacramento J., Saxe A., Scellier B., Schapiro A. C., Senn W., Wayne G., Yamins D., Zenke F., Zylberberg J., Therien D., Kording K. P., A deep learning framework for neuroscience. Nat Neurosci. 22, 1761–1770 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Deng J., Dong W., Socher R., Li L., Li Kai, Fei-Fei Li, “ImageNet: A large-scale hierarchical image database” in 2009 IEEE Conference on Computer Vision and Pattern Recognition (2009), pp. 248–255. [Google Scholar]
- 28.Nasr K., Viswanathan P., Nieder A., Number detectors spontaneously emerge in a deep neural network designed for visual object recognition. Science Advances. 5, eaav7903 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Dobs K., Kell A., Martinez J., Cohen M., Kanwisher N., Kanwisher N., Why Are Face and Object Processing Segregated in the Human Brain? Testing Computational Hypotheses with Deep Convolutional Neural Networks (2020). [Google Scholar]
- 30.Vuilleumier P., Richardson M. P., Armony J. L., Driver J., Dolan R. J., Distant influences of amygdala lesion on visual cortical activation during emotional face processing. Nat Neurosci. 7, 1271–1278 (2004). [DOI] [PubMed] [Google Scholar]
- 31.Shuler M. G., Bear M. F., Reward Timing in the Primary Visual Cortex. Science. 311, 1606–1609 (2006). [DOI] [PubMed] [Google Scholar]
- 32.Kragel P. A., Reddan M. C., LaBar K. S., Wager T. D., Emotion schemas are embedded in the human visual system. Science Advances. 5, eaaw4358 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Simonyan K., Zisserman A., Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs] (2015) (available at http://arxiv.org/abs/1409.1556). [Google Scholar]
- 34.Russakovsky O., Deng J., Su H., Krause J., Satheesh S., Ma S., Huang Z., Karpathy A., Khosla A., Bernstein M., Berg A. C., Fei-Fei L., ImageNet Large Scale Visual Recognition Challenge. Int J Comput Vis. 115, 211–252 (2015). [Google Scholar]
- 35.Seeliger K., Fritsche M., Güçlü U., Schoenmakers S., Schoffelen J.-M., Bosch S. E., van Gerven M. A. J., Convolutional neural network-based encoding and decoding of visual object recognition in space and time. NeuroImage. 180, 253–266 (2018). [DOI] [PubMed] [Google Scholar]
- 36.Jacob G., Pramod R. T., Katti H., Arun S. P., Qualitative similarities and differences in visual object representations between brains and deep networks. Nat Commun. 12, 1872 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Thompson P., Margaret Thatcher: A New Illusion. Perception. 9, 483–484 (1980). [DOI] [PubMed] [Google Scholar]
- 38.Sowden P. T., “Psychophysics” in APA handbook of research methods in psychology, Vol 1: Foundations, planning, measures, and psychometrics (American Psychological Association, Washington, DC, US, 2012), APA handbooks in psychology®, pp. 445–458. [Google Scholar]
- 39.Lindsay G. W., Miller K. D., How biological attention mechanisms improve task performance in a large-scale visual system model. eLife. 7, e38105 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Zeiler M. D., Fergus R., “Visualizing and Understanding Convolutional Networks” in Computer Vision – ECCV 2014, Fleet D., Pajdla T., Schiele B., Tuytelaars T., Eds. (Springer International Publishing, Cham, 2014), pp. 818–833. [Google Scholar]
- 41.Lee G., Tai Y.-W., Kim J., Deep Saliency with Encoded Low level Distance Map and High Level Features. arXiv:1604.05495 [cs] (2016) (available at http://arxiv.org/abs/1604.05495). [Google Scholar]
- 42.Lindsay G. W., Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future. Journal of Cognitive Neuroscience, 1–15 (2020). [DOI] [PubMed] [Google Scholar]
- 43.Pimashkin A., Kastalskiy I., Simonov A., Koryagina E., Mukhina I., Kazantsev V., Spiking Signatures of Spontaneous Activity Bursts in Hippocampal Cultures. Frontiers in Computational Neuroscience. 5 (2011) (available at 10.3389/fncom.2011.00046). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Winnubst J., Bas E., Ferreira T. A., Wu Z., Economo M. N., Edson P., Arthur B. J., Bruns C., Rokicki K., Schauder D., Olbris D. J., Murphy S. D., Ackerman D. G., Arshadi C., Baldwin P., Blake R., Elsayed A., Hasan M., Ramirez D., Dos Santos B., Weldon M., Zafar A., Dudman J. T., Gerfen C. R., Hantman A. W., Korff W., Sternson S. M., Spruston N., Svoboda K., Chandrashekar J., Reconstruction of 1,000 Projection Neurons Reveals New Cell Types and Organization of Long-Range Connectivity in the Mouse Brain. Cell. 179, 268–281.e13 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Palmateer C. M., Moseley S. C., Ray S., Brovero S. G., Arbeitman M. N., Analysis of cell-type-specific chromatin modifications and gene expression in Drosophila neurons that direct reproductive behavior. PLOS Genetics. 17, e1009240 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Schmitz M. T., Sandoval K., Chen C. P., Mostajo-Radji M. A., Seeley W. W., Nowakowski T. J., Ye C. J., Paredes M. F., Pollen A. A., The development and evolution of inhibitory neurons in primate cerebrum. Nature. 603, 871–877 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Maunsell J. H. R., Treue S., Feature-based attention in visual cortex. Trends in Neurosciences. 29, 317–322 (2006). [DOI] [PubMed] [Google Scholar]
- 48.Lindsay G. W., Feature-based Attention in Convolutional Neural Networks. arXiv:1511.06408 [cs] (2015) (available at http://arxiv.org/abs/1511.06408). [Google Scholar]
- 49.Yeh C.-H., Lin M.-H., Chang P.-C., Kang L.-W., Enhanced Visual Attention-Guided Deep Neural Networks for Image Classification. IEEE Access. 8, 163447–163457 (2020). [Google Scholar]
- 50.Cardin J. A., Palmer L. A., Contreras D., Cellular mechanisms underlying stimulus-dependent gain modulation in primary visual cortex neurons in vivo. Neuron. 59, 150–160 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Eldar E., Cohen J. D., Niv Y., The effects of neural gain on attention and learning. Nat Neurosci. 16, 1146–1153 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Jarvis S., Nikolic K., Schultz S. R., Neuronal gain modulability is determined by dendritic morphology: A computational optogenetic study. PLOS Computational Biology. 14, e1006027 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Bos H., Oswald A.-M., Doiron B., Untangling stability and gain modulation in cortical circuits with multiple interneuron classes (2020), p. 2020.06.15.148114, , doi: 10.1101/2020.06.15.148114. [DOI] [Google Scholar]
- 54.Broadhurst P. L., Emotionality and the Yerkes-Dodson Law. Journal of Experimental Psychology. 54, 345–352 (1957). [DOI] [PubMed] [Google Scholar]
- 55.Sörensen L. K. A., Bohté S. M., Slagter H. A., Scholte H. S., Arousal state affects perceptual decision-making by modulating hierarchical sensory processing in a large-scale visual system model. PLOS Computational Biology. 18, e1009976 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Aharonov R., Segev L., Meilijson I., Ruppin E., Localization of Function via Lesion Analysis. Neural Computation. 15, 885–913 (2003). [DOI] [PubMed] [Google Scholar]
- 57.Chareyron L. J., Amaral D. G., Lavenex P., Selective lesion of the hippocampus increases the differentiation of immature neurons in the monkey amygdala. Proceedings of the National Academy of Sciences. 113, 14420–14425 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Yang G. R., Joglekar M. R., Song H. F., Newsome W. T., Wang X.-J., Task representations in neural networks trained to perform many cognitive tasks. Nat Neurosci. 22, 297–306 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Cohen-Zimerman S., Khilwani H., Smith G. N. L., Krueger F., Gordon B., Grafman J., The neural basis for mental state attribution: A voxel-based lesion mapping study. Human Brain Mapping. 42, 65–79 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Lang P. J., Bradley M. M., Cuthbert B. N., “Motivated attention: Affect, activation, and action” in Attention and orienting: Sensory and motivational processes (Lawrence Erlbaum Associates Publishers, Mahwah, NJ, US, 1997), pp. 97–135. [Google Scholar]
- 61.Kriegeskorte N., Deep Neural Networks: A New Framework for Modeling Biological Vision and Brain Information Processing. Annual Review of Vision Science. 1, 417–446 (2015). [DOI] [PubMed] [Google Scholar]
- 62.Brachmann A., Barth E., Redies C., Using CNN Features to Better Understand What Makes Visual Artworks Special. Frontiers in Psychology. 8 (2017) (available at 10.3389/fpsyg.2017.00830). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Iigaya K., Yi S., Wahle I. A., Tanwisuth K., O’Doherty J. P., Aesthetic preference for art can be predicted from a mixture of low- and high-level visual features. Nat Hum Behav. 5, 743–755 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.van Dyck L. E., Kwitt R., Denzler S. J., Gruber W. R., Comparing Object Recognition in Humans and Deep Convolutional Neural Networks—An Eye Tracking Study. Frontiers in Neuroscience. 15 (2021) (available at 10.3389/fnins.2021.750639). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Singer J. J. D., Seeliger K., Kietzmann T. C., Hebart M. N., From photos to sketches - how humans and deep neural networks process objects across different levels of visual abstraction. Journal of Vision. 22, 4 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Lee J., Jung M., Lustig N., Lee J.-H., Neural representations of the perception of handwritten digits and visual objects from a convolutional neural network compared to humans. Human Brain Mapping. n/a (2023), doi: 10.1002/hbm.26189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Kauramäki J., Jääskeläinen I. P., Sams M., Selective Attention Increases Both Gain and Feature Selectivity of the Human Auditory Cortex. PLOS ONE. 2, e909 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Moldakarimov S., Bazhenov M., Sejnowski T. J., Top-Down Inputs Enhance Orientation Selectivity in Neurons of the Primary Visual Cortex during Perceptual Learning. PLOS Computational Biology. 10, e1003770 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Pasternak T., Tadin D., Linking Neuronal Direction Selectivity to Perceptual Decisions About Visual Motion. Annu Rev Vis Sci. 6, 335–362 (2020). [DOI] [PubMed] [Google Scholar]
- 70.Lang P. J., Bradley M. M., Fitzsimmons J. R., Cuthbert B. N., Scott J. D., Moulder B., Nangia V., Emotional arousal and activation of the visual cortex: An fMRI analysis. Psychophysiology. 35, 199–210 (1998). [PubMed] [Google Scholar]
- 71.Rotshtein P., Malach R., Hadar U., Graif M., Hendler T., Feeling or Features: Different Sensitivity to Emotion in High-Order Visual Cortex and Amygdala. Neuron. 32, 747–757 (2001). [DOI] [PubMed] [Google Scholar]
- 72.Schupp H. T., Markus J., Weike A. I., Hamm A. O., Emotional Facilitation of Sensory Processing in the Visual Cortex. Psychol Sci. 14, 7–13 (2003). [DOI] [PubMed] [Google Scholar]
- 73.Sabatinelli D., Bradley M. M., Fitzsimmons J. R., Lang P. J., Parallel amygdala and inferotemporal activation reflect emotional intensity and fear relevance. Neuroimage. 24, 1265–1270 (2005). [DOI] [PubMed] [Google Scholar]
- 74.Lang P. J., Bradley M. M., Emotion and the motivational brain. Biol Psychol. 84, 437–450 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Pessoa L., Emotion and Cognition and the Amygdala: From “what is it?” to “what’s to be done?” Neuropsychologia. 48, 3416–3429 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Miskovic V., Anderson A. K., Modality general and modality specific coding of hedonic valence. Curr Opin Behav Sci. 19, 91–97 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Weinberger N. M., Specific long-term memory traces in primary auditory cortex. Nat Rev Neurosci. 5, 279–290 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Li Z., Yan A., Guo K., Li W., Fear-Related Signals in the Primary Visual Cortex. Curr Biol. 29, 4078–4083.e2 (2019). [DOI] [PubMed] [Google Scholar]
- 79.Thigpen N. N., Bartsch F., Keil A., The malleability of emotional perception: Short-term plasticity in retinotopic neurons accompanies the formation of perceptual biases to threat. Journal of Experimental Psychology: General. 146, 464–471 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Miskovic V., Keil A., Acquired fears reflected in cortical sensory processing: A review of electrophysiological studies of human classical conditioning. Psychophysiology. 49, 1230–1241 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Baek S., Song M., Jang J., Kim G., Paik S.-B., Face detection in untrained deep neural networks. Nat Commun. 12, 7328 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Burr D., Ross J., A Visual Sense of Number. Current Biology. 18, 425–428 (2008). [DOI] [PubMed] [Google Scholar]
- 83.Nieder A., The neuronal code for number. Nat Rev Neurosci. 17, 366–382 (2016). [DOI] [PubMed] [Google Scholar]
- 84.Xu F., Spelke E. S., Large number discrimination in 6-month-old infants. Cognition. 74, B1–B11 (2000). [DOI] [PubMed] [Google Scholar]
- 85.Xu F., Spelke E. S., Goddard S., Number sense in human infants. Dev Sci. 8, 88–101 (2005). [DOI] [PubMed] [Google Scholar]
- 86.Santens S., Roggeman C., Fias W., Verguts T., Number Processing Pathways in Human Parietal Cortex. Cerebral Cortex. 20, 77–88 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Hauser M. D., Carey S., Hauser L. B., Spontaneous number representation in semi-free-ranging rhesus monkeys. Proc Biol Sci. 267, 829–833 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Sawamura H., Shima K., Tanji J., Numerical representation for action in the parietal cortex of the monkey. Nature. 415, 918–922 (2002). [DOI] [PubMed] [Google Scholar]
- 89.Hauser M. D., Tsao F., Garcia P., Spelke E. S., Evolutionary foundations of number: spontaneous representation of numerical magnitudes by cotton–top tamarins. Proceedings of the Royal Society of London. Series B: Biological Sciences. 270, 1441–1446 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Nasr K., Viswanathan P., Nieder A., Number detectors spontaneously emerge in a deep neural network designed for visual object recognition. Science Advances. 5, eaav7903. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Kim G., Jang J., Baek S., Song M., Paik S.-B., Visual number sense in untrained deep neural networks. Science Advances. 7, eabd6127 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Kanwisher N., McDermott J., Chun M. M., The Fusiform Face Area: A Module in Human Extrastriate Cortex Specialized for Face Perception. J. Neurosci. 17, 4302–4311 (1997). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Dobs K., Martinez J., Kell A. J. E., Kanwisher N., Brain-like functional specialization emerges spontaneously in deep neural networks. Science Advances. 8, eabl8913 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Zhang C., Bengio S., Hardt M., Recht B., Vinyals O., Understanding deep learning requires rethinking generalization. arXiv:1611.03530 [cs] (2017) (available at http://arxiv.org/abs/1611.03530). [Google Scholar]
- 95.VanRullen R., Thorpe S. J., The time course of visual processing: from early perception to decision-making. J Cogn Neurosci. 13, 454–461 (2001). [DOI] [PubMed] [Google Scholar]
- 96.Srinivasan N., Gupta R., Rapid communication: Global-local processing affects recognition of distractor emotional faces. Q J Exp Psychol (Hove). 64, 425–433 (2011). [DOI] [PubMed] [Google Scholar]
- 97.Cabral L., Stojanoski B., Cusack R., Rapid and coarse face detection: With a lack of evidence for a nasal-temporal asymmetry. Atten Percept Psychophys. 82, 1883–1895 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Zipser K., Lamme V. A. F., Schiller P. H., Contextual Modulation in Primary Visual Cortex. J. Neurosci. 16, 7376–7389 (1996). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Tschechne S., Neumann H., Hierarchical representation of shapes in visual cortex—from localized features to figural shape segregation. Front Comput Neurosci. 8, 93 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Willems R. M., Peelen M. V., How context changes the neural basis of perception and language. iScience. 24, 102392 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Bradley M. M., Lang P. J., Affective reactions to acoustic stimuli. Psychophysiology. 37, 204–215 (2000). [PubMed] [Google Scholar]
- 102.Harmon-Jones E., Gable P. A., Peterson C. K., The role of asymmetric frontal cortical activity in emotion-related phenomena: A review and update. Biological Psychology. 84, 451–462 (2010). [DOI] [PubMed] [Google Scholar]
- 103.Niedenthal P. M., Wood A., Does emotion influence visual perception? Depends on how you look at it. Cognition and Emotion. 33, 77–84 (2019). [DOI] [PubMed] [Google Scholar]
- 104.Li G., Forero M. G., Wentzell J. S., Durmus I., Wolf R., Anthoney N. C., Parker M., Jiang R., Hasenauer J., Strausfeld N. J., Heisenberg M., Hidalgo A., A Toll-receptor map underlies structural brain plasticity. eLife. 9, e52743 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Tierney A. L., Nelson C. A., Brain Development and the Role of Experience in the Early Years. Zero Three. 30, 9–13 (2009). [PMC free article] [PubMed] [Google Scholar]
- 106.Sato W., Kochiyama T., Yoshikawa S., Naito E., Matsumura M., Enhanced neural activity in response to dynamic facial expressions of emotion: an fMRI study. Brain Res Cogn Brain Res. 20, 81–91 (2004). [DOI] [PubMed] [Google Scholar]
- 107.Cichy R. M., Pantazis D., Oliva A., Resolving human object recognition in space and time. Nat Neurosci. 17, 455–462 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Putnam P. T., Gothard K. M., Multidimensional Neural Selectivity in the Primate Amygdala. eNeuro. 6 (2019), doi: 10.1523/ENEURO.0153-19.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Cadieu C. F., Hong H., Yamins D. L. K., Pinto N., Ardila D., Solomon E. A., Majaj N. J., DiCarlo J. J., Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition. PLOS Computational Biology. 10, e1003963 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Cichy R. M., Khosla A., Pantazis D., Torralba A., Oliva A., Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Sci Rep. 6, 27755 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111.Eickenberg M., Gramfort A., Varoquaux G., Thirion B., Seeing it all: Convolutional network layers map the function of the human visual system. NeuroImage. 152, 184–194 (2017). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data are publicly available in the main text or the supplementary materials. The analysis code will available and may be requested from the authors.




