Significance
The present study reveals responses in the primary visual cortex modulated by musical information: tonal dissonance recruits early visual processing via feedback interactions from the auditory ventral pathway to the primary visual cortex. We demonstrate that the auditory “what” ventral stream plays a role in assigning meaning to nonverbal sound cues, such as dissonant music conveying negative emotions, providing an interpretative framework that serves to process the audio-visual experience. The findings substantiate the critical role of audio-visual integration in shaping higher-order functions such as social cognition.
Keywords: controlled naturalistic stimuli, dissonant music, theory of mind, auditory ventral pathway, primary visual cortex
Abstract
The neuroscientific examination of music processing in audio-visual contexts offers a valuable framework to assess how auditory information influences the emotional encoding of visual information. Using fMRI during naturalistic film viewing, we investigated the neural mechanisms underlying the effect of music on valence inferences during mental state attribution. Thirty-eight participants watched the same short-film accompanied by systematically controlled consonant or dissonant music. Subjects were instructed to think about the main character’s intentions. The results revealed that increasing levels of dissonance led to more negatively valenced inferences, displaying the profound emotional impact of musical dissonance. Crucially, at the neuroscientific level and despite music being the sole manipulation, dissonance evoked the response of the primary visual cortex (V1). Functional/effective connectivity analysis showed a stronger coupling between the auditory ventral stream (AVS) and V1 in response to tonal dissonance and demonstrated the modulation of early visual processing via top-down feedback inputs from the AVS to V1. These V1 signal changes indicate the influence of high-level contextual representations associated with tonal dissonance on early visual cortices, serving to facilitate the emotional interpretation of visual information. Our results highlight the significance of employing systematically controlled music, which can isolate emotional valence from the arousal dimension, to elucidate the brain’s sound-to-meaning interface and its distributive crossmodal effects on early visual encoding during naturalistic film viewing.
We share with our ancestors the ability to identify other’s intentions based on the available sensory information. The concept of neocortical operations being multisensory is widely accepted and emphasizes the importance of integration in sensory encoding (1). This distributive aspect of sensory processing not only grants evolutionary advantages, such as prompt reactions to threats, but also serves to reduce the uncertainty derived from individual sensory estimates and to endure brain damage or sensory loss (2).
Previous research on audio-visual integration in humans has primarily relied on task-based paradigms employing simple stimuli (e.g., checkerboards or flashes as visual stimuli, and tones or noises as auditory information) (for a review see ref. 3), limiting the generalizability of findings to complex, real-life situations (4–6). Functional MRI (fMRI) experiments during naturalistic film viewing provide an alternative approach, enabling the study of audio-visual integration in ecologically valid contexts and facilitating investigations into high-level processes, such as social cognition (7).
During film watching, viewers effortlessly recognize and follow the mental states of onscreen characters, a process associated with theory-of-mind function (8–10). This seemingly undemanding task, however, can be complex; especially when visual cues are ambiguous. In these cases, viewers strongly rely on contextual cues, such as the accompanying music, to aid their comprehension (11–13). When integrated into a visual context such as film, music’s influence goes beyond mirroring a meaning portrayed by the visual images (14, 15). Music can shape our experience of the film’s narrative, affecting perceptual judgments, emotion, and memory of the events (11–13, 16–22).
The influence of affect in cognitive processes has been widely recognized in past literature. Theorists, such as Isen (23) and Bower (24), proposed that affective states could operate like category names to prime related material in the cognitive system: any event stored in memory is also related to its associated mood, and events with similar affect are bound together into higher-order categories. A specific affective state of an individual or context may enhance the accessibility of this material within the cognitive system, as a consequence of spreading activation in the semantic network (25). This phenomenon, known as a “congruency effect” or “priming effect,” has been consistently observed [e.g., effect of induced mood on the recall of previously learned personality trait words (26), effect of depressed mood on the accessibility of autobiographical memories (27), effect of induced mood on word associations (28)]. Priming effects have long been acknowledged also within the music cognition field (29–31). In the film music research domain, mood congruency effects have been explained by means of the schema theory (11), which holds a strong connection with current views of the brain as a prediction machine, continuously generating inferences that approximate the relevant future (32, 33). Schemas (34) are memory associations built by extracting statistical regularities from our environment, which are thought to be assembled in memory structures that function as the building block for predictions (33, 35–37). Most researchers agree that one of the primary functions of schemas is to provide an interpretative framework, that serves to lessen the amount of attentional effort required for perception and comprehension. According to this perspective, film soundtracks exert a schematic influence on the cognitive processing of visual events (11, 38): the perceptual content is seen as resulting from knowledge-driven predictions that can be modulated by musical information. In other words, film music encourages viewers to predict scenarios that are consistent with its implied mood (11).
Despite the significant impact of music on film, there is limited neuroscientific research investigating the joint processing of music and visual information. Previous studies have shown that the combined presentation of music and visual information increases activation in brain structures associated with emotion (39, 40). However, the lack of control over specific musical structure variables in these studies has made it challenging to determine how such features modulate brain function.
In the present study, we employed fMRI during naturalistic film viewing to examine the neural mechanisms underlying the emotional responses to musical dissonance (41–44). Although one of the most common harmonic manipulations in film music (e.g., Spielberg’s “Jaws” or Kubrick’s “The Shining”), systematically controlled transformations of tonal dissonance have not yet been neuroscientifically studied in regard to affective processing biases of visual contexts. By strictly controlling musical variables such as instrumental timbre, dynamics, rhythm, textural density, and melodic contour, we aimed to investigate the effects of tonal dissonance on the neural processing of emotional valence during mental state inferences (45, 46).
Why certain tone combinations are perceived as more attractive, harmonious, and positively valenced than others has been debated for centuries. The concept of pleasant-sounding (consonant) intervals entailing special numerical properties has been attributed to Pythagoras (47). Empirical evidence indeed shows that musical intervals with simple frequency ratios confer perceptual processing advantages (48, 49). Intervals such as the octave (2:1) and perfect fifth (3:2) are also judged as more consonant (i.e. more pleasant, smooth) than intervals with more complex ratios, such as the minor second (16:15) and tritone (45:32), which have been consistently appraised as more dissonant (i.e. more unpleasant and less smooth) (50–53). Theorists have argued that the singular perceptual status of consonant intervals could derive from the exposure to particular musical cultures or styles [psychocultural hypothesis: (54, 55)] or from their presence in naturally occurring sounds including those of speech [vocal similarity hypothesis: (56–58)]. Recent studies on auditory roughness, a psychoacoustic factor originally studied by Helmholtz (59), suggest that roughness-mediated aversion mechanisms could underlie the vocal similarity model. Multidisciplinary research on human and animal vocalizations has further revealed that auditory roughness is associated with distress and danger, and that it can therefore induce defensive behavioral and neural responses (for a review see ref. 60). Substantial evidence indicates that the degree of consonance/dissonance is strongly associated with the percept of valence (51, 52, 61). Valence judgments have been shown to reliably index the evaluation of consonance/dissonance level in Western musicians and nonmusicians (50, 51, 62), and this association has been also observed in listeners never exposed to Western music [(53, 63–71) although see also (72, 73)].
The present experiment focused on valence judgments during mental state attribution to the main character in an animated short film (Materials and Methods and Fig. 4). The hypothesis was that although participants would watch the same visual scene, tonal dissonance would bias their interpretative framework (11) leading to more negatively valenced mental state inferences. Furthermore, we predicted that early visual processing would interact with higher-level auditory association areas, indicating the influence of music on the visual experience. Accordingly, we hypothesized that the auditory ventral stream (AVS) (74–76), a brain pathway involved in mapping contextual sound cues to semantic associated attributes, would modulate low-level visual encoding via feedback projections.
Fig. 4.
Experimental stimuli and design: After 180 s of fixation, an instruction to attend specifically to mental states is given (10 s), the film clip [Man with pendulous arms” (Laurent Gorgiard, 1997); duration: 125 s] follows with either music condition 1 (consonance) or condition 2 (dissonance) in randomized order. Following the clip participants are asked to rate the valence of the movie character’s intentions (10 s) on a scale ranging from 1 (good intentions) to 4 (bad intentions). Two control conditions are included; i) visual alone category (i.e., film clip with no soundtrack but same instruction as above) and, ii) film clip with an instruction to describe the physical appearance of the character, to control for multimodal sensory processing, working memory, and attentional demands of the task.
To test our hypotheses, we designed an experimental paradigm that instructed participants to ascribe mental states to the main character in the film clip. Through investigating the recruitment of theory-of-mind neural substrates, the influence of tonal dissonance on mental state inferences, the interaction between early visual processes and sound-to-meaning systems, and the potential impact of top–down contextual influences on visual encoding in V1, we aimed to uncover the cognitive mechanisms and the neural underpinnings of the emotional responses to musical dissonance during naturalistic film viewing.
Results
Behavioral Results.
Impact of tonal consonance/dissonance on valence judgments.
During the present fMRI study, 38 subjects watched the same short-film with controlled consonant or dissonant music, in randomized order. Prior to the start of the audio-visual film clip subjects were instructed to think about the intentions of the main character in the following film clip. Following the clip participants were asked to rate the valence of the movie character’s intentions on a scale ranging from one (good intentions) to four (bad intentions). Two control conditions were included; i) visual alone category (i.e., film clip with no soundtrack but same instruction as above) and, ii) film clip with an instruction to describe the “physical” appearance of the character, to control for multimodal sensory processing, working memory, and attentional demands of the task, without cueing subjects to attend specifically to mental states (see Materials and Methods and Fig. 4 for details).
Behavioral results revealed significant differences between the consonant, dissonant, and visual alone conditions (Table 1). A repeated measures MANOVA showed a significant effect of condition on the judgments of valence regarding the character’s intentions (Wilks’ Lambda = 0.509, F(2, 36) = 17.348, P < 0.001). [The data did not violate the sphericity assumption; Mauchly’s test of sphericity was not significant (P = 0.345)]. Post hoc tests (Bonferroni corrected) indicated that the average valence rating for the dissonant condition was significantly more negative than the valence rating for the consonant condition (P < 0.001, d = 0.895) and the visual alone condition (P = 0.013, d = 0.368). There was also a significant difference in valence ratings between the consonant and the visual alone conditions (P < 0.001, d = 0.526). Concretely, subjects ascribed significantly more negative (positive) intentions to the character on-screen when the film clip was paired with dissonant (consonant) music. Polynomial contrasts on the mean ratings for the three categories indicated a significant linear trend (F (1, 5.263) = 15.289, P < 0.001), dissonance > visual alone > consonance (Table 1).
Table 1.
Valence means with SD and 95% CI for each condition (consonance: movie with consonant music; dissonance: movie with dissonant music, visual alone: movie without music)
| Valence ratings for the dissonant, consonant, and visual alone conditions | |||
|---|---|---|---|
| Condition | Mean | SD | 95% CI (adjusted) |
| Dissonance | 3.052 | 0.804 | [2.892, 3.213] |
| Visual alone | 2.684 | 0.620 | [2.543, 2.825] |
| Consonance | 2.157 | 0.973 | [1.984, 2.332] |
The range for valence ratings was one (good intentions) to four (bad intentions).
The results support previous research that has examined the affective reactions induced by consonance/dissonance (50–52, 61, 66, 77, 78); and, specifically converge with studies assessing the impact of consonance/dissonance on valence judgments during mental state attribution, in which increasing levels of dissonance induce more negatively valenced inferences (79–81).
fMRI Results: Subtractive Analysis.
Effects of the theory-of-mind task compared with the control condition (physical appearance).
The comparison between the audio-visual theory-of-mind (ToM) task (i.e., think about the intentions of the main character) against the audio-visual control condition (focus on the physical appearance of the main character) evidenced signal changes in bilateral supramarginal and inferior parietal gyrus (Table 2A and Fig. 1D). The results are in line with previous studies that have assessed the neural systems supporting implicit nonverbal ToM encoding (82) and corroborate that our task engaged mental state inference processing (45, 46).
Table 2.
(A) fMRI results [at threshold P < 0.001 voxel-level uncorrected, P < 0.05 cluster-level FWE-corrected; (Additionally, * markings show brain regions with significant signal changes at voxel-level FWE-corrected P < 0.05)] of group General Linear Model for the contrasts: dissonance > consonance, consonance > dissonance, consonance > visual alone, dissonance > visual alone, and audio-visual task (Theory of Mind) > audio-visual control (Physical appearance)
| Region | Cluster Peak MNI | Voxels | Max t-value (z-value) | Mean t (std.) | Cluster P (FWE) |
|---|---|---|---|---|---|
| (A) Subtractive analysis | |||||
| Dissonance > Consonance | |||||
| Occipital Mid L* | −15 −94 8 | 78 | 5.70 (4.80) | 4.05 (0.47) | <0.0001 |
| Occipital Sup L* | 17 | 4.07 (0.64) | Cl. | ||
| Occipital Inf L | 15 | 4.23 (0.58) | Cl. | ||
| Lingual L | 8 | 3.82 (0.49) | Cl. | ||
| Calcarine L | 5 | 3.85 (0.19) | Cl. | ||
| Calcarine R* | 12 −94 8 | 37 | 5.51 (4.80) | 3.72 (0.33) | 0.037 |
| Cuneus R | 5 | 4.72 (0.81) | Cl. | ||
| Occipital Sup R | 5 | 3.79 (0.45) | Cl. | ||
| Occipital Mid R | 4 | 3.60 (0.16) | Cl. | ||
| Consonance > Dissonance | |||||
| (No suprathreshold activations were observed) | |||||
| Consonance > Visual Alone | |||||
| Temporal Sup R* | 54 −16 5 | 237 | 8.31 (6.21) | 4.76 (1.05) | <0.0001 |
| Heschl R* | 44 | 4.69 (1.18) | Cl. | ||
| Temporal Sup L* | −48 −25 8 | 177 | 7.05 (5.58) | 4.58 (0.87) | <0.0001 |
| Heschl L* | 25 | 5.02 (1.09) | Cl. | ||
| Dissonance > Visual Alone | |||||
| Temporal Sup L* | −48 −16 5 | 296 | 8.54 (6.31) | 4.74 (0.99) | <0.0001 |
| Heschl L* | 42 | 5.29 (1.34) | Cl. | ||
| Temporal Sup R* | 60 −13 5 | 399 | 8.14 (6.12) | 4.78 (1.00) | <0.0001 |
| Heschl R* | 56 | 5.19 (1.01) | Cl. | ||
| Occipital Inf L* | −33 −88 −10 | 96 | 6.76 (5.42) | 4.34 (0.74) | <0.0001 |
| Occipital Mid L* | 86 | 4.33 (0.81) | Cl. | ||
| Occipital Sup L | 10 | 3.78 (0.28) | Cl. | ||
| Calcarine L | 3 | 4.53 (0.20) | Cl. | ||
| Precentral L* | −39 −7 62 | 144 | 6.63 (5.35) | 4.20 (0.74) | <0.0001 |
| Supp Motor Area L | 84 | 3.93 (0.51) | Cl. | ||
| Occipital Inf R | 45 −73 −7 | 38 | 5.47 (4.65) | 3.81 (0.37) | <0.0001 |
| Calcarine R | 37 | 3.68 (0.29) | Cl. | ||
| Occipital Mid R | 20 | 3.58 (0.20) | Cl. | ||
| Occipital Sup R | 2 | 3.72 (0.06) | Cl. | ||
| Parietal Sup L | −30 −52 53 | 69 | 5.32 (4.56) | 3.79 (0.37) | <0.0001 |
| Parietal Inf L | 30 | 4.13 (0.46) | Cl. | ||
| Frontal Inf Tri L | −42 17 26 | 34 | 5.01 (4.38) | 3.94 (0.44) | 0.006 |
| Frontal Inf Oper L | 10 | 3.53 (0.18) | Cl. | ||
| Insula L | 5 | 3.76 (0.37) | Cl. | ||
| Frontal Inf Tri R | 54 32 11 | 52 | 4.82 (4.22) | 3.74 (0.36) | 0.008 |
| Frontal Inf Oper R | 6 | 3.95 (0.47) | Cl. | ||
| Insula R | 1 | 3.35 (-) | Cl. | ||
| Mid Prefrontal R | 27 41 20 | 17 | 4.79 (4.20) | 3.75 (0.43) | <0.0001 |
| Cingulum Ant R | 13 | 3.88 (0.34) | Cl. | ||
| Task (Theory of Mind) > Control (Physical appearance) - ROI analysis: TPJ, mPFC, ACC | |||||
| Parietal Inf L* | −57 −28 29 | 100 | 6.36 (5.20) | 4.17 (058) | <0.0001 |
| SupraMarginal L* | 43 | 4.16 (0.72) | Cl. | ||
| Postcentral L | 19 | 3.92 (0.40) | Cl. | ||
| Parietal_Inf_R | 33 −37 41 | 25 | 5.44 (4.63) | 3.90 (0.38) | <0.0001 |
| (B) Functional connectivity analysis (PPI). Dissonance > consonance | |||||
| Seed region (6 mm radius sphere) located around highest peak for each subject within the right Calcarine | |||||
| Temporal Mid R* | 51 −70 −4 | 182 | 7.86 (5.99) | 4.45 (0.92) | <0.0001 |
| Temporal Inf R* | 81 | 4.76 (1.09) | Cl. | ||
| Occipital Mid R* | 27 | 4.39 (0.83) | Cl. | ||
| Occipital Mid L | −42 −76 −1 | 94 | 5.40 (4.61) | 4.16 (0.56) | <0.0001 |
| Temporal Mid L | 60 | 4.05 (0.55) | Cl. | ||
| Temporal Inf L | 3 | 3.59 (0.21) | Cl. | ||
| Seed region (6 mm radius sphere) located around highest peak for each subject within the left middle Occipital | |||||
| Temporal Mid R* | 51 −70 −1 | 88 | 6.33 (5.18) | 4.29 (0.82) | <0.0001 |
| Temporal Inf R | 27 | 4.01 (0.54) | Cl. | ||
| Occipital Mid R | 9 | 4.15 (0.50) | Cl. | ||
| Occipital Mid L* | −48 −76 5 | 62 | 5.77 (4.84) | 4.18 (0.64) | <0.0001 |
| Temporal Mid L* | 60 | 3.96 (0.55) | Cl. | ||
| Temporal Inf L | 3 | 3.79 (0.17) | Cl. | ||
| (C) Effective connectivity analysis. DCM – BMS (Effects of dissonance) Models | Expected sk|Y | Exceedance ψk | |||
| 1) right middle temporal gyrus → left middle occipital gyrus | 0.6874 | 0.9104 | |||
| 2) left middle occipital gyrus → right middle temporal gyrus | 0.1499 | 0.0350 | |||
| 3) Bidirectional | 0.1627 | 0.0546 | |||
(B) Results of group psychophysiological interactions analysis (PPI) with seed voxels (spheres with a 6 mm radius) located around highest peak for each subject within the right calcarine cortex and within the left middle occipital gyrus (contrast dissonance > consonance). The regions described showed stronger positive functional connectivity with either of these regions during the dissonant condition compared to the consonant condition. (C) Dynamic Causal Modeling (DCM) Bayesian model selection (BMS) results: conditional probability (expected posterior probability representing the probability of a model given the observed data) and exceedance probability (probability compared with other tested models). Model 1 specified a connection from the rMTG to the lMOG (hypothesized). Model 2 specified a connection from the lMOG to the rMTG. Model 3 specified bidirectional connections between both regions. Model 1 obtained the most evidence. Abbreviations: L, left; R, right; TPJ, temporo-parietal junction: supramarginal gyrus, angular gyrus, and superior temporal gyrus; mPFC, medial prefrontal cortex; ACC, anterior cingulate cortex; Cl., areas integrating the above-detailed cluster-level P-value.
Fig. 1.
Effects of tonal dissonance and theory-of-mind function during naturalistic film-viewing. fMRI results (FWE-corrected P < 0.05 for cluster-level inference). Colored areas (red) reflect statistical parametric maps (SPM) superimposed onto a standard brain in stereotactic MNI space. (A) consonance > visual alone and (B) dissonance > visual alone: similar patterns of brain response were observed bilaterally in primary and secondary auditory cortices (AC) including Heschl’s gyri. The contrast dissonance > visual alone further evidenced signal changes in occipital areas. (C) SPM showing voxels in the middle occipital gyrus and calcarine cortex in which the response was higher during the dissonant compared to consonant condition. (D) SPM showing activation in the supramarginal and inferior parietal gyrus (bilaterally) for the comparison between ToM > control conditions.
Effects of music in a ToM audio-visual context, compared to visual alone.
To establish the general effects of music compared to visual (alone) information, we contrasted consonance > visual alone and dissonance > visual alone. These comparisons showed similar patterns of brain response in primary and secondary auditory cortices (AC) (superior temporal gyrus, Heschl’s gyrus) bilaterally (Table 2A and Fig. 1 B and C). These findings converge with previous evidence from studies that have employed music listening paradigms containing harmonized chord progressions (83–85), in which signal changes were observed bilaterally in primary and secondary AC when comparing sound conditions against no-sound (or silent) conditions. The contrast between dissonance > visual alone revealed additional activations in the visual cortex (inferior, middle, superior occipital gyrus, and bilateral calcarine cortices), which we interpret as effects driven by increased tonal dissonance, and not by sound information per se (see following subsections).
The analysis of the contrast dissonance > visual alone further evidenced signal differences in the left superior/inferior parietal, bilateral frontal inferior, bilateral insula, and right medial prefrontal and anterior cingulate cortex (ACC) (Table 2A). It should be noted that signal changes within these attentional (superior/inferior parietal), salience, and emotional evaluation regions (insula and medial prefrontal cortices) were only observed while participants watched the dissonant and not the consonant condition (compared to visual alone). These findings converge with previous empirical evidence that has reported activation of attentional, salience detection, and emotional appraisal networks in response to negatively valenced auditory stimuli (80, 86) or threat-related signals (87, 88).
Effects of music in a control (physical appearance) audio-visual context.
To examine whether the above-reported effect on the visual cortex for the contrast dissonance (ToM) > visual alone was specific for the ToM task, we contrasted control dissonance (physical appearance) > visual alone, which yielded no response in the visual pathway, but only within primary and secondary AC bilaterally. The contrast of control consonance (physical appearance) > visual alone yielded a similar response on primary and secondary AC bilaterally. The comparison of control dissonance (physical appearance) > control consonance (physical appearance) did not show any suprathreshold signal changes (SI Appendix, Table S6). Overall, these results suggest that the previously reported findings of signal changes in early visual processing for the contrast dissonance (ToM) > visual alone might indicate a specific cross-modal modulation effect driven by musical dissonance during intention attribution, and which was not observed during the control (physical appearance) task.
Effects of musical dissonance compared to musical consonance within an audio-visual context.
The contrast of interest, dissonance > consonance, revealed significant signal changes in the visual cortex (bilateral middle and superior occipital gyrus, left inferior occipital gyrus), including the primary visual cortex (bilateral calcarine cortex) (Table 2A and Fig. 1A). The opposite contrast (consonance > dissonance) did not show any suprathreshold signal changes. Given the experimental design, showing identical visual information in both conditions, the results substantiate our prediction of early visual systems modulated by contextual sound cues and indicate a role of the visual system beyond visual recognition (89–91). This study reports activations encompassing early visual cortices (V1) modulated by musical information with systematically controlled manipulations of the tonal consonance/dissonance level.
It is important to note that in our previous work (86), we found no evidence for BOLD signal changes within the early visual cortex when examining the same (auditory only) musical stimuli for the contrast dissonance > consonance. In that experiment (86), we employed fMRI to investigate whether different brain regions would be involved in the processing of musical information with contrasting levels of tonal consonance/dissonance. Participants listened to the same musical materials as in the present experiment within a passive music listening paradigm. We aimed to assess whether, without an explicit instruction for cognitive processing of emotional responses, the brain areas underlying participants’ responses to consonance/dissonance would engage. By adopting this approach, the focus was oriented toward the neural substrates that could inform about the more concealing effects of dissonance, such as those exerted in audiovisual contexts. Increased activations were observed in the left medial prefrontal cortex (mPFC) and the left rostral ACC while participants listened to dissonant compared to consonant music, converging with studies that have proposed a key role of these regions during conflict monitoring (detection and resolution), and in the appraisal of negative emotion and fear-related information. The findings highlighted the involvement of the ACC and the mPFC in the appraisal of negatively valenced stimuli signaled by auditory information. The auditory cortex (bilaterally) showed stronger functional connectivity with the ACC during the dissonant portion of the task, implying a demand for greater information integration when processing negatively valenced musical stimuli. There was no evidence for suprathreshold responses within the early visual stream for the contrast dissonance > consonance (auditory only). These previous results, seem to exclude an alternative, more linear explanation of our present findings (i.e., visual cortex activation for audiovisual-dissonance > audiovisual-consonance) as responses to the auditory stimulus alone, which could have been linked to the potentially higher attentional demands that might be allocated during listening to dissonant sounds. (A detailed description of the methodology used in that experiment is included in SI Appendix, to facilitate a comparison with the present study.)
fMRI Results: Brain Connectivity.
Psychophysiological interaction analysis (PPI, functional connectivity).
The function of the primary visual cortex in visual recognition is known to be strongly modulated by multisensory information (2, 3, 92, 93). In particular, auditory feedback signals are the largest contributor (3, 94). To elucidate why dissonance recruited additional regions and to examine potential neuromodulatory influences on early visual processing, we measured functional connectivity with PPI (95). A PPI analysis identifies voxels that increase their relationship with a seed region of interest (ROI) in a given psychological context, such as when participants watch the film with dissonant compared to consonant music. Seeds ROIs (spheres with a 6 mm radius) were defined around the highest activated peak for each subject in the left middle occipital gyrus and in the right calcarine cortex, based on the significantly activated clusters for the contrast dissonance > consonance. The results (displayed in Table 2B and Fig. 2) showed that, during the dissonant condition (compared with the consonant condition), a cluster with peak in the right middle (extending to the right inferior) posterior temporal gyrus exhibited stronger functional connectivity with the right calcarine cortex and with the left middle occipital gyrus, indicating that tonal dissonance modulated the connection between early visual encoding and the ventral auditory system [represented by the right posterior middle temporal gyrus; (74, 76, 96, 97)], which is known to serve as a sound-to-meaning interface, mapping sound-based cues (e.g., distinct levels of dissonance) to their semantic associated attributes (e.g., emotional valence).
Fig. 2.
Functional connectivity modulated by tonal dissonance. fMRI results (FWE-corrected P < 0.05 for cluster-level inference). Psycho-physiological interaction analysis (PPI): blue color identifies voxels in the middle (extending to inferior) posterior temporal gyrus, which exhibited stronger functional connectivity with the primary visual cortex (PPI seed regions were 6 mm spheres around the highest activated peak for each subject within the right calcarine cortex and the left middle occipital gyrus, for the contrast dissonance > consonance).
The PPI analysis (contrast dissonance > consonance) also revealed a significant coupling between the above-described ROIs and voxels in the middle occipital gyrus. The middle occipital gyrus is considered a spatial processing region in the visual dorsal pathway (98, 99), which has also shown a functional preference for processing spatial properties of both auditory and tactile stimuli in early-blind subjects (100). These results suggest that tonal dissonance further reinforced the integration of information between the primary visual cortex and spatial processing in the dorsal visual stream.
Dynamic Causal Modeling (DCM, effective connectivity).
Methodologically, DCM is an approach used to infer causal relationships of activity patterns between different brain regions (101, 102), which therefore differs from functional connectivity measures, such as PPI, that explore nondirectional statistical dependencies (i.e., patterns of synchronization or correlation) between brain regions (95). To further examine the causal flow of information between early visual processing areas and the ventral auditory system and, specifically, to assess whether dissonance engages top–down or bottom–up interactions, we carried out an effective connectivity analysis using DCM (101). We hypothesized that distinct dissonance/consonance levels would exert differential influences on early visual encoding through feedback projections from the ventral auditory stream (103). To test this hypothesis, DCM analysis was conducted for each participant on two ROIs: i) the left middle occipital gyrus (lMOG), which showed the strongest activation in the subtractive analysis dissonance > consonance; and ii) the right middle temporal gyrus (rMTG), which exhibited the most robust functional coupling with the lMOG (see Materials and Methods for detailed information). The modulatory effect of interest was the dissonance condition, since this category appeared to strongly modulate the functional interaction (PPI) between the aforementioned areas. We modeled the direction of interaction. For each subject, three models were defined. Model 1 specified a connection from the rMTG to the lMOG (hypothesized). Model 2 specified a connection from the lMOG to the rMTG. Model 3 specified bidirectional connections between both regions. Model 1, in which the left middle occipital gyrus received information from the right middle temporal gyrus obtained the most evidence. Bayesian model averaging further indicated a negative modulation effect of dissonance (−0.0002) on the connection from the rMTG to the lMOG that changed the strength of the intrinsic connection (0.0189) (Table 2C). The results are consistent with a role of feedback signals being sent from the AVS to the early visual cortex, modulated by increased tonal dissonance.
Discussion
Impact of Tonal Consonance/Dissonance on Valence Judgments during Social Cognition Processing.
The present study aimed to investigate the cognitive and neural mechanisms underlying the effect of music on valence inferences during mental state attribution. To assess the capacity of our task to recruit mentalizing neural substrates we compared the audio-visual theory-of-mind condition with the audio-visual control condition (focused on physical appearance). This comparison revealed signal changes in bilateral supramarginal and inferior parietal gyrus; areas supporting implicit nonverbal ToM reasoning (82) as well as emotional and visual perspective taking, action observation, social attention, and encoding biases (104–107).
Following each film sequence, participants were asked to rate the valence of the movie character’s intentions (i.e., positive or negative). The results showed that musical dissonance led to significantly more negative mental state inferences, exposing its strong emotional impact. While in line with previous research that has examined the negative affects elicited by dissonance (50–52, 61, 66, 77, 78), our findings extend the evidence for its strong influence on valence judgments during mental state attribution (79–81).
Activation of V1 by Auditory Cues with Increased Level of Tonal Dissonance.
Univariate whole-brain analysis revealed significant bilateral activation of primary visual cortices (calcarine cortex) including the middle and superior occipital gyrus, while participants were watching the film sequence accompanied by dissonant, compared to consonant, music. The results endorse our prediction of modulations in early visual systems driven by musical information and provide demonstration for the engagement of crossmodal low-level sensory substrates in response to tonal dissonance.
The findings are in agreement with previous models which have highlighted the adaptable properties of cortical neurons (89–91). Electrophysiological recordings show that V1 neurons function as spatiotemporal filters encoding elementary visual features, upon which feedforward connections with higher visual areas assist in representing progressively more complex aspects of the visual scenario (108). Although the primary function of V1 is visual perception, its role expands beyond visual recognition and encompasses multisensory aspects (2, 3, 92, 93), with the largest contribution of crossmodal reentrant signals being auditory feedback (3). An illustration at relatively basic sensory processing levels has been reported by Shams et al. (94). The authors showed how a single visual flash could be misperceived as multiple flashes when paired with multiple auditory beeps. Our results go beyond this previous work by demonstrating that dissonant sound cues can qualitatively bias higher-level mental state inference processes, by exerting neuromodulatory effects on early visual encoding. The findings emphasize the role of the brain as a highly distributed processing engine in which areas across multiple hierarchies work in parallel to perform complex cognitive functions.
Top–Down Modulation of Visual Processing.
We investigated whether V1 could be subject to top–down contextual influences modulated by tonal dissonance. PPI analysis showed strong coupling between the right middle posterior temporal gyrus and V1 in response to dissonance. Effective connectivity analysis demonstrated that musical dissonance modulated visual processing via top–down feedback inputs from the right middle posterior temporal gyrus to V1.
The traditional view of sensory information processing is based on feedforward connections. Within the visual system these types of connections characterize a hierarchy of cortical areas initiated in V1 and ascending via the ventral or dorsal pathways (109, 110). Each feedforward connection, however, coexists with a reciprocal feedback connection that carries contextual information. Cortical neurons have been described as “active blackboards” or “adaptive processors” (89, 90), which can modify their function and response according to the behavioral context or the specific demands of the task being carried out. It is well established that the visual cortex can be subjected to diverse top–down influences, including among others, those of attention, reward, and emotion responses (111). Top–down signals represent influences from higher-order cognitive representations that impact earlier stages of information processing. These higher-order representations can carry contextual information that prepares the visual system for optimizing behavioral responses (90, 112).
While prior research has examined the functional effects of auditory information in the primary visual cortex, most of these studies have employed basic stimuli or simplified displays to investigate the role of sound in assisting the spatial localization of visual inputs, consequently reducing the complexity of naturalistic interaction conditions (113). Only few studies have assessed higher-order cognitive representations using naturalistic stimulus. In the study by Vetter et al. (114), 10 subjects wore a blindfold and were instructed to keep their eyes closed at all times and room lights were switched off. Participants listened to three distinct natural sound stimuli: traffic noise (a busy road with cars and motorbikes), a forest scene (birds singing and a stream), and a crowd scene (people talking without clear semantic information). Using fMRI in combination with multivariate pattern analysis (MVPA), the authors showed that category-specific information from complex natural sounds could be read out from early visual cortex activity, particularly V2 and V3 in the absence of feedforward visual stimulation. Using a univariate approach (even at a liberal threshold of P < 0.05 uncorrected) the authors did not find activation in early visual regions. Petro et al. (91) replicated the study by Vetter et al. (114) with an adjusted experimental design, to examine whether contextual auditory information could be decoded from the primary visual cortex during concurrent visual information. In this design variation, participants had their eyes open and the auditory stimuli were paired with uniform visual stimulation: a blank fixation screen. Also, using MVPA, the authors showed that sound imagery could be decoded from the early visual cortex. However, as the authors stated, “it remains to be seen if auditory information can be read out from early visual cortex when this is also receiving a more driving visual stimulus” (91). We hereby broaden these previous findings by using nonuniform, conspicuous visual stimuli (i.e., naturalistic film-viewing), and demonstrating a strong response of the primary visual cortex to musical dissonance via univariate whole-brain analyses.
Linking the schema theory with predictive coding approaches (36, 115, 116), we propose a potential mechanism to explain these modulatory influences. We argue that our manipulation of the musical stimuli triggered high-level contextual representations (i.e., affective valence associated to the distinct levels of consonance/dissonance), which were fed back to early visual cortices to facilitate the interpretation of the visual information. Throughout an audio-visual experience such as watching a film, musical affect can direct the viewer’s attention toward visual aspects that portray a matching connotative meaning. This, in turn, can allow the spectator to develop certain inferences about the characters’ internal motivations, e.g., their intentions, therefore contributing to the comprehension of the narrative. At a behavioral level, this mechanism has been conceptualized as “mood congruency effects” and explained by means of the schema theory (11). Schemas (34) [also described as scripts (117) or frames (118)] are memory associations built by extracting repeating patterns and statistical regularities from our environment. This “related” information refers to events that tend to be linked on some level (such as dissonance leading to negative valence), and is thought to be clustered in memory structures that serve as the building block for predictions (35–37). From a sociocognitive perspective, schemas allow us comprehend the social environment: They help us understand why people behave in the ways they do, and predict what is most likely to happen next. We posit that the predictions set by these memory associations are probably manifest in the signal changes we observe in the primary visual cortex.
Feedback Influences from the Auditory “What” Ventral Stream (Mapping Sound Cues to Meaning) to V1.
The two-stream hypothesis is a model of the neural processing of the cortical organization of vision (119). The model proposes two information-processing pathways originating in the occipital cortex. A dorsal visual stream leading to the parietal lobe, which is involved in processing spatial relationships among objects (“where pathway”); and a ventral visual stream that projects to the inferior temporal cortex and which is crucial for the visual identification of objects (also known as the “what pathway”).
Intersecting Goodale and Milner’s model with Wernicke’s research (120), which suggested that sensory representations of speech should at least interface with two systems: a motor-articulatory and a conceptual system; Hickok and Poeppel (121) advanced an analogous dual stream hypothesis for auditory language processing. The model entails a ventral stream supporting conceptual representations via projections to the middle posterior temporal gyrus, and a dorsal stream entailing motor representations via projections to temporal-parietal regions (75, 103). Although the specific role of the dorsal auditory stream is still a subject of debate with various hypotheses including localization, spectral processing, and auditory–motor integration, there is general agreement on the what processing function of the AVS (74–76). For instance, the prototypical verbal paradigm for targeting the ventral auditory pathway involves listening to meaningful speech versus meaningless pseudospeech (122). The ventral pathway serves as a sound-to-meaning interface by mapping sound-based representations to distributed conceptual representations (74, 76, 96, 97). In our study, functional/effective connectivity analysis revealed a causal flow of information from the right middle posterior temporal gyrus to V1 modulated by tonal dissonance. These findings suggest the involvement of the AVS in the encoding of the above-described schemas: building global representations for perceptual and semantic associated attributes by mapping nonverbal contextual sound cues to meaning (e.g., dissonant music leading to negative valence). These, in turn, provide an internal model or interpretative framework that helps to organize the visual experience into a comprehensible whole via feedback projections to the early visual cortices (Fig. 3).
Fig. 3.
Schematic representation of the findings. Left: The auditory what ventral stream plays a role in assigning meaning to nonverbal sound cues, such as dissonant music conveying negative emotions, providing an interpretative framework that serves to process the audio-visual experience. Right: Functional/effective connectivity analysis showed a coupling between the AVS and V1 in response to tonal dissonance and demonstrated the modulation of early visual processing via top–down feedback inputs from the AVS to V1.
Why does tonal dissonance, rather than consonance, drive stronger the modulatory effects on V1?
It has been consistently acknowledged that emotionally laden stimuli benefit from preferential processing given their adaptive significance (123–133). We have previously shown that the specific harmonic manipulation applied in this study can isolate emotional valence from the arousal and potency dimensions, facilitating a novel access to the neural representation of negative emotion (79, 86). Negative valence can further modulate the interaction between visual processing and attentional systems implicated in the appraisal of behaviorally relevant, unexpected, and potentially threatening events (80, 86–88). We postulate that the negative valence associated with the dissonant musical background can explain the heightened weight on visual sensory evidence. Our findings conceptually expand previous work, which has exclusively focused on the affective value associated with visual stimuli, to provide evidence of crossmodal interactions and enhanced neural responses in early visual pathways signaled by nonverbal auditory information: music.
Future research could examine other ways of inducing negative valence to modulate the neural dynamics associated with mental state attribution. For example, Chen and Spence (134) reported that the valence associated to different odors can exert a strong crossmodal influence over judgments of facial attractiveness. Could this effect also impact intentional attribution if distinct fragrances are concurrently presented during film watching? Film sound design is another captivating area to further investigate. The term, introduced by Walter Murch (135), refers to an often unconventional use of nonverbal sound effects to shape the film narrative. Even very subtle changes to the spectral components of sound effects (e.g., footsteps on a fragile vs. solid wood floor) can elicit a robust emotional effect. Empirical research within this domain of film sound could therefore bring novel insights into alternative approaches to systematically manipulate sound information to modulate mental state inferences.
The ultimate purpose of our work is to develop advanced methods of applying musical information in naturalistic paradigms to characterize complex mental health disorders. Relative to traditional paradigms, naturalistic stimuli, such as the one used in this study, are often more engaging than conventional highly constrained tasks, and therefore allow the observation of brain activity that closely resembles to freeform cognition. Film-watching provides an effective means to study multiplexed neural responses, from low-level sensory processes to high-level components: We showed that a simple, but controlled, manipulation of musical dissonance can elucidate the brain’s sound-to-meaning interface and its distributive effects during social cognition. We envision the use of nonverbal music-based fMRI paradigms for identifying behavioral signatures and measurable neural markers associated with severe psychopathology.
Conclusions
In the present study, we employed fMRI to assess the effects of musical dissonance upon the affective processing of visual information during naturalistic film viewing. Participants watched the same short-film with either dissonant or consonant music. Compared to consonant music, dissonance led to more negative mental state attributions. Neuroscientific analyses revealed that tonal dissonance modulated early visual processing via top–down cortical feedback inputs from the auditory what ventral stream to the primary visual cortex. We offer evidence for the involvement of the AVS in mapping nonverbal sound cues to meaning, providing an internal model that serves to organize the audio-visual experience. Taken together, these findings demonstrate the critical role of multisensory processing, particularly audio-visual integration, in shaping higher-order functions such as social cognition.
Materials and Methods
Subjects.
We conducted an independent prestudy (n = 8) in the same scanner to test the study design, modeling strategy, and for power calculations. Target sample sizes were selected based on fMRI power analysis conducted following (136). The power calculation was based on the primary visual and AC as ROIs, with a P-value threshold of 0.005 for a one-sided hypothesis test. Results showed that a sample size of 34 would be able to detect high effect sizes (Cohen’s d > 0.8) at power = 80%.
Data for the fMRI experiment were obtained from thirty-eight healthy volunteers. The participants (18 females and 20 males; mean age = 27 ± 4) were native German speakers, with no history of neurological or psychiatric illness, or use of psychotropic medication. They reported no long-term hearing impairment. All participants were right-handed. The sample did not include any professional musician. Five participants (two males and three females) reported having received informal musical training for less than 3 y. All subjects gave written informed consent. The study received ethical approval from the Ethics Commission at TU-Dresden (Reference Number: EK 305072016).
Stimulus Material.
The materials employed in the present study were created for, and tested in two previous experiments (79, 86). The musical stimuli comprised two soundtracks (i.e., consonant and dissonant) composed for the same short film. In summary, a choral piece, written in the form of musical variations, was made to sound consonant or dissonant by modifying its harmonic structure (i.e., interval content manipulation) producing two otherwise-identical versions of the same musical piece. The consonant condition consisted of a theme followed by three variations written by Fernando Bravo in a romantic musical style. The dissonant condition was achieved by lowering by a semitone the second violin, viola, and violoncello lines (of the consonant piece), while maintaining the other instruments (i.e., first violin and double bass) at their original pitch (SI Appendix, Fig. S2). The level of tonal dissonance (41, 42) was the manipulated variable, while other factors that are also known to contribute to the building and release of musical tension such as instrumental timbre, dynamics, rhythm, textural density, and melodic contour were strictly controlled (44, 84, 137, 138) (see SI Appendix for detailed technical description of the audiovisual stimuli, which can be downloaded from: https://doi.org/10.6084/m9.figshare.24710442.v2) (139).
Procedure.
fMRI experiment.
Prior to the start of the audio-visual film clip, and based on established methods (140–143), an instruction was given to participants designed to engage affective mental state inference processes. Subjects were instructed to “Please, think about the intentions of the main character in the following film clip” (Fig. 4). The paradigm, therefore, required to ascribe mental states (e.g., intentions) to an actor on-screen (45, 46). After the instruction, the film clip followed with either music condition 1 (consonance), music condition 2 (dissonance) in randomized order (Fig. 4). At the end of the respective clip, a valence inference question was presented (“What type of intentions does the main character have?”), which subjects responded using a 4-point scale (ranging from 1 to 4) with extremes labeled “positive” or “negative” (order counterbalanced).
Two control conditions were included; i) visual alone category (i.e., film clip with no soundtrack) with identical instructions as above to control for basic visual sensory processing, and; ii) the same audio-visual film clips with an instruction to describe the physical appearance of the character on-screen (“Please, focus on the physical appearance of the main character in the following film clip”). This additional control condition was aimed at controlling for multimodal sensory processing, working memory, and attentional demands of the task, without cueing subjects to attend specifically to mental states (141, 143, 144).
Order of presentation: The two main conditions of interest (audio-visual ToM consonance and audio-visual ToM dissonance) were presented first in counterbalanced order, to ensure that each treatment appeared comparably often as the first (or second) viewed audio-visual clip. The visual alone control condition was always at the third position of the sequence, followed by the physical appearance control condition. There was only one presentation of the physical appearance control condition, paired with consonant music when the first condition was the audio-visual ToM consonance condition, and paired with dissonant music when the first condition was the audio-visual ToM dissonance condition.
Even though we did not find differences in the neural encoding when we tested different presentation orders during the independent pilot study (which was also conducted for power calculations), we decided to maintain the audio-visual ToM conditions fixed in the first position as described above, since film watching habits rarely resemble repeated measures designs.
The experiment was run in the Neuroimaging Center (NIC) at TU Dresden. Participants were asked to arrive 45 min before the fMRI scanning in order to undertake a preparatory session of 10 min in a separate room (contiguous to the scanner room). Subjects were familiarized with the task and trained on the procedure, including the use of the 4-point scale response box system. In the fMRI setup, the audio-visual stimulus was projected onto a screen and presented to the subject via a 45° angled mirror positioned above the participant’s head. The auditory stimuli were delivered via MRI-compatible headphones. Sound pressure levels were measured with a Galaxy Audio CM130 Meter, the output volume was set to 70 dB. Participants were instructed to avoid head movements throughout the functional scan, to keep their eyes on the screen throughout the session and to focus on the presented audio-visual stimuli. Following the scanning session, each subject completed a questionnaire in order to collect subject-specific sociodemographic information.
MRI data acquisition.
Scanning was performed with a 3.0T system (General Electric, Signa). Prior to the functional magnetic resonance measurements, high-resolution (1 × 1 × 1 mm) T1-weighted anatomical images were acquired from each participant using three-dimensional fast spoiled gradient-echo (3D-FSPGR) sequence. Continuous Echo Planar Imaging (EPI) with BOLD contrast was employed (TR = 2,000 ms, TE = 25 ms, in-plane resolution = 3 × 3 mm). Each functional volume contained 34 contiguous 3.2 mm-thick axial slices separated by a 0.8 mm interslice gap.
fMRI data analysis.
Data were processed using SPM, version 12 (www.fil.ion.ucl.ac.uk/spm). Following correction for the temporal difference in acquisition between slices, EPI volumes were realigned and resliced to correct within subject movement. A mean EPI volume was obtained during realignment and the structural MRI was coregistered with that mean volume. The coregistered structural scan was normalized to the Montreal Neurological Institute (MNI) T1 template (145). The same deformation parameters obtained from the structural image, were applied to the realigned EPI volumes, which were resampled into MNI-space with isotropic voxels of 3 cubic millimeters. The normalized images were smoothed using a 3D Gaussian kernel and a filter size of 6 mm FWHM. A temporal high-pass filter with a cutoff frequency of 256 Hz was applied with the purpose of removing scanner-attributable low frequency drifts in the fMRI time series.
An event-related design was employed (146), modeling each of the thirty-seven chord presentations per music category using a canonical hemodynamic response function (147) (model specification timings are available together with the fMRI data in the figshare repository). The design matrix for the first-level analysis included the following four regressors: consonance (i.e., movie with consonant music); dissonance (i.e., movie with dissonant music), visual alone (i.e., movie without music); control (i.e., movie with dissonant or consonant music with a control task -physical appearance-). Parameter estimate images were generated. Seven contrast images per individual were calculated: dissonance > consonance, consonance > dissonance, consonance > visual alone, dissonance > visual alone, consonance/dissonance task > consonance/dissonance control, control dissonance (physical appearance) > visual alone, and control consonance (physical appearance) > visual alone. The second-level group analysis was carried out using one-sample t tests. The significant map for the group random effects analysis was thresholded at P < 0.001 for voxel-level inference with a cluster-level threshold of P < 0.05 corrected for the whole brain volume using FWE (family-wise error), which controls for the expected proportion of false-positive clusters. (See SI Appendix for further details, including information about psycho-physiological interactions and dynamic causal modelling analyses.)
Supplementary Material
Appendix 01 (PDF)
Video_consonance (Audiovisual stimuli employed for the consonant condition).
Video_dissonance. (Audiovisual stimuli employed for the dissonant condition).
Acknowledgments
We are very grateful to F. Franco. We thank I. Cross, L. N. L. Uelze, P. Heaton, N. Gonzalez, T. Popescu, S. Koelsch, C. Hopkins, and M. Rohrmeier for helpful discussions. We also thank two anonymous reviewers for insightful comments. We thank Christine Ahrends, Katharina Pitt, and Aya Keller for help with data collection. Fernando Bravo is funded by a Research Fellowship from the University of Tübingen. This project was made possible through the support from Queens’ College Cambridge (Walker Studentship to F.B.), the Andrea von Braun Stiftung, and Zukunftskonzept at TU Dresden (Exzellenzinitiative of the Deutsche Forschungsgemeinschaft). The funding sources had no involvement in the study design or the collection, analysis, and interpretation of the data.
Author contributions
F.B. designed research; F.B. and J.G. performed research; F.B. analyzed data; F.B. conceptualized research; and F.B., E.A.S., and K.H. wrote the paper.
Competing interests
The authors declare no competing interest.
Footnotes
This article is a PNAS Direct Submission.
Data, Materials, and Software Availability
fMRI data have been deposited in Figshare (https://doi.org/10.6084/m9.figshare.21345240.v1) (139).
Supporting Information
References
- 1.Ghazanfar A. A., Schroeder C. E., Is neocortex essentially multisensory? Trends Cogn. Sci. 10, 278–285 (2006). [DOI] [PubMed] [Google Scholar]
- 2.van Atteveldt N., Murray M. M., Thut G., Schroeder C. E., Multisensory integration: Flexible use of general operations. Neuron 81, 1240–1253 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Murray M. M., et al. , The multisensory function of the human primary visual cortex. Neuropsychologia 83, 161–169 (2016). [DOI] [PubMed] [Google Scholar]
- 4.Hari R., Henriksson L., Malinen S., Parkkonen L., Centrality of social interaction in human brain function. Neuron 88, 181–193 (2015). [DOI] [PubMed] [Google Scholar]
- 5.Khosla M., Ngo G. H., Jamison K., Kuceyeski A., Sabuncu M. R., Cortical response to naturalistic stimuli is largely predictable with deep neural networks. Sci. Adv. 7, eabe7547 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Varoquaux G., Poldrack R. A., Predictive models avoid excessive reductionism in cognitive neuroimaging. Curr. Opin. Neurobiol. 55, 1–6 (2019). [DOI] [PubMed] [Google Scholar]
- 7.Finn E. S., Is it time to put rest to rest? Trends Cogn. Sci. 25, 1021–1032 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Abell F., Happe F., Frith U., Do triangles play tricks? Attribution of mental states to animated shapes in normal and abnormal development. Cogn. Dev. 15, 1–16 (2000). [Google Scholar]
- 9.Castelli F., Happé F., Frith U., Frith C., Movement and mind: A functional imaging study of perception and interpretation of complex intentional movement patterns. NeuroImage 12, 314–325 (2000). [DOI] [PubMed] [Google Scholar]
- 10.Saxe R., Wexler A., Making sense of another mind: The role of the right temporo-parietal junction. Neuropsychologia 43, 1391–1399 (2005). [DOI] [PubMed] [Google Scholar]
- 11.Boltz M. G., Musical soundtracks as a schematic influence on the cognitive processing of filmed events. Music Percept. 18, 427–454 (2001). [Google Scholar]
- 12.Cohen A. J., “How music influences the interpretation of film and video” in Perspectives in Systematic Musicology, Kendall R. A., Savage R. W. H., Eds. (Dept. of Ethnomusicology, University of California, Los Angeles, CA, 2005), pp. 15–36.
- 13.Cohen A. J., “Film music and the unfolding narrative” in Language, Music and the Brain. Strüngmann Forum Reports, Arbib M., Ed. (MIT Press, Cambridge, MA, J. Lupp, series ed., 2013), vol. 10, pp. 173–201. [Google Scholar]
- 14.Chion M., Audio-Vision: Sound on Screen (Columbia University Press, 1994). [Google Scholar]
- 15.Pudovkin V. I., “Asynchronism as a principle of sound film” in Film Technique and Film Acting (Grove Press Inc., New York, 1978), pp. 183–193 (1929). [Google Scholar]
- 16.Bolivar V. J., Cohen A. J., Fentress J. C., Semantic and formal congruency in music and motion pictures: Effects on the interpretation of visual action. Psychomusicol. J. Res. Music Cogn. 13, 28–59 (1994). [Google Scholar]
- 17.Boltz M., Schulkind M., Kantra S., Effects of background music on the remembering of filmed events. Mem. Cognit. 19, 593–606 (1991). [DOI] [PubMed] [Google Scholar]
- 18.Bullerjahn C., Güldenring M., An empirical investigation of effects of film music using qualitative content analysis. Psychomusicol. J. Res. Music Cogn. 13, 99–118 (1994). [Google Scholar]
- 19.Cohen A. J., “Music as a source of emotion in film” in Music and Emotion: Theory and Research, Juslin P. N., Sloboda J. A., Eds. (Oxford University Press, Oxford, 2001), pp. 249–272.
- 20.Hoeckner B., Wyatt E. W., Decety J., Nusbaum H., Film music influences how viewers relate to movie characters. Psychol. Aesthet. Creat. Arts 5, 146–153 (2011). [Google Scholar]
- 21.Sirius G., Clarke E. F., The perception of audiovisual relationships: A preliminary study. Psychomusicol. J. Res. Music Cogn. 13, 119–132 (1994). [Google Scholar]
- 22.Tan S.-L., Spackman M. P., Bezdek M. A., Viewers’ interpretations of film characters’ emotions: effects of presenting film music before or after a character is shown. Music Percept. Interdiscip. J. 25, 135–152 (2007). [Google Scholar]
- 23.Isen A. M., “Toward understanding the role of affect in cognition” in Handbook of Social Cognition, R. S. Wyer, Jr., Srull T. K., Eds. (Lawrence Erlbaum Associates Publishers, 1984), vol. 3, pp. 179–236. [Google Scholar]
- 24.Bower G. H., Mood and memory. Am. Psychol. 36, 129–148 (1981). [DOI] [PubMed] [Google Scholar]
- 25.Fazio R., Sanbonmatsu D., Powell M., Kardes F., On the automatic activation of attitudes. J. Pers. Soc. Psychol. 50, 229–38 (1986). [DOI] [PubMed] [Google Scholar]
- 26.Teasdale J. D., Russell M. L., Differential effects of induced mood on the recall of positive, negative and neutral words. Br. J. Clin. Psychol. 22, 163–171 (1983). [DOI] [PubMed] [Google Scholar]
- 27.Teasdale J. D., Fogarty S. J., Differential effects of induced mood on retrieval of pleasant and unpleasant events from episodic memory. J. Abnorm. Psychol. 88, 248–257 (1979). [DOI] [PubMed] [Google Scholar]
- 28.Fisher V. E., Marrow A. J., Experimental study of moods. Charac. Pers. Q. Psychodiagn. Allied Stud. 2, 201–208 (1934). [Google Scholar]
- 29.Armitage J., Eerola T., Cross-modal transfer of valence or arousal from music to word targets in affective priming? Audit. Percept. Cogn. 5, 192–210 (2022). [Google Scholar]
- 30.Bharucha J. J., Stoeckig K., Reaction time and musical expectancy: Priming of chords. J. Exp. Psychol. Hum. Percept. Perform. 12, 403–410 (1986). [DOI] [PubMed] [Google Scholar]
- 31.Bharucha J. J., Stoeckig K., Priming of chords: Spreading activation or overlapping frequency spectra? Percept. Psychophys. 41, 519–524 (1987). [DOI] [PubMed] [Google Scholar]
- 32.Bar M., The proactive brain: Using analogies and associations to generate predictions. Trends Cogn. Sci. 11, 280–289 (2007). [DOI] [PubMed] [Google Scholar]
- 33.Seth A. K., Interoceptive inference, emotion, and the embodied self. Trends Cogn. Sci. 17, 565–573 (2013). [DOI] [PubMed] [Google Scholar]
- 34.Mandler J. M., Johnson N. S., Some of the thousand words a picture is worth. J. Exp. Psychol. [Hum. Learn.] 2, 529–540 (1976). [PubMed] [Google Scholar]
- 35.Bar M., Visual objects in context. Nat. Rev. Neurosci. 5, 617–629 (2004). [DOI] [PubMed] [Google Scholar]
- 36.Bar M., Predictions: A universal principle in the operation of the human brain. Philos. Trans. R. Soc. B, Biol. Sci. 364, 1181–1182 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Bar M., Ullman S., Spatial context in recognition. Perception 25, 343–352 (1996). [DOI] [PubMed] [Google Scholar]
- 38.Boltz M. G., The cognitive processing of film and musical soundtracks. Mem. Cognit. 32, 1194–1205 (2004). [DOI] [PubMed] [Google Scholar]
- 39.Eldar E., Ganor O., Admon R., Bleich A., Hendler T., Feeling the real world: Limbic response to music depends on related content. Cereb. Cortex 17, 2828–2840 (2007). [DOI] [PubMed] [Google Scholar]
- 40.Baumgartner T., Lutz K., Schmidt C. F., Jäncke L., The emotional power of music: How music enhances the feeling of affective pictures. Brain Res. 1075, 151–164 (2006). [DOI] [PubMed] [Google Scholar]
- 41.Bigand E., Parncutt R., Lerdahl F., Perception of musical tension in short chord sequences: The influence of harmonic function, sensory dissonance, horizontal motion, and musical training. Percept. Psychophys. 58, 124–141 (1996). [PubMed] [Google Scholar]
- 42.Bigand E., Parncutt R., Perceiving musical tension in long chord sequences. Psychol. Res. 62, 237–254 (1999). [DOI] [PubMed] [Google Scholar]
- 43.Huron D., Sweet Anticipation: Music and the Psychology of Expectation (A Bradford Book, 2008). [Google Scholar]
- 44.Lerdahl F., Krumhansl C. L., Modeling tonal tension. Music Percept. Interdiscip. J. 24, 329–366 (2007). [Google Scholar]
- 45.Van Overwalle F., Social cognition and the brain: A meta-analysis. Hum. Brain Mapp. 30, 829–858 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Van Overwalle F., Van den Eede S., Baetens K., Vandekerckhove M., Trait inferences in goal-directed behavior: ERP timing and localization under spontaneous and intentional processing. Soc. Cogn. Affect. Neurosci. 4, 177–190 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Apel W., Harvard Dictionary of Music: Second Edition, Revised and Enlarged (Belknap Press, 1969). [Google Scholar]
- 48.Schellenberg E. G., Trainor L. J., Sensory consonance and the perceptual similarity of complex-tone harmonic intervals: Tests of adult and infant listeners. J. Acoust. Soc. Am. 100, 3321–3328 (1996). [DOI] [PubMed] [Google Scholar]
- 49.Schellenberg E. G., Trehub S. E., Frequency ratios and the discrimination of pure tone sequences. Percept. Psychophys. 56, 472–478 (1994). [DOI] [PubMed] [Google Scholar]
- 50.Blood A. J., Zatorre R. J., Bermudez P., Evans A. C., Emotional responses to pleasant and unpleasant music correlate with activity in paralimbic brain regions. Nat. Neurosci. 2, 382–387 (1999). [DOI] [PubMed] [Google Scholar]
- 51.Plomp R., Levelt W. J. M., Tonal consonance and critical bandwidth. J. Acoust. Soc. Am. 38, 548–560 (1965). [DOI] [PubMed] [Google Scholar]
- 52.Trainor L. J., Heinmiller B. M., The development of evaluative responses to music. Infant Behav. Dev. 21, 77–88 (1998). [Google Scholar]
- 53.Zentner M. R., Kagan J., Perception of music by infants. Nature 383, 29–29 (1996). [DOI] [PubMed] [Google Scholar]
- 54.Dowling W. J., Harwood D. L., Music Cognition (Academic Press, 1981). [Google Scholar]
- 55.Serafine M. L., Cognition in music. Cognition 14, 119–183 (1983). [Google Scholar]
- 56.Bowling D. L., Purves D., A biological rationale for musical consonance. Proc. Natl. Acad. Sci. U.S.A. 112, 11155–11160 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Terhardt E., Psychoacoustic evaluation of musical sounds. Percept. Psychophys. 23, 483–492 (1978). [DOI] [PubMed] [Google Scholar]
- 58.Terhardt E., The concept of musical consonance: A link between music and psychoacoustics. Music Percept. Interdiscip. J. 1, 276–295 (1984). [Google Scholar]
- 59.von Helmholtz H., Ellis A. J., On the Sensations of Tone as a Physiological Basis for the Theory of Music (Longmans, Green, and Co., London, UK/New York, NY, 1895). [Google Scholar]
- 60.Di Stefano N., Vuust P., Brattico E., Consonance and dissonance perception. A critical review of the historical sources, multidisciplinary findings, and main hypotheses. Phys. Life Rev. 43, 273–304 (2022). [DOI] [PubMed] [Google Scholar]
- 61.Costa M., Bitti P. E. R., Bonfiglioli L., Psychological connotations of harmonic musical intervals. Psychol. Music 28, 4–22 (2000). [Google Scholar]
- 62.Bugg E. G., An Experimental Study of Factors Influencing Consonance Judgments (Johnson, 1970). [Google Scholar]
- 63.Bidelman G. M., Krishnan A., Neural correlates of consonance, dissonance, and the hierarchy of musical pitch in the human brainstem. J. Neurosci. 29, 13165–13171 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Chiandetti C., Vallortigara G., Chicks like consonant music. Psychol. Sci. 22, 1270–1273 (2011). [DOI] [PubMed] [Google Scholar]
- 65.Fannin H. A., Braud W. G., Preference for consonant over dissonant tones in the albino rat. Percept. Mot. Skills 32, 191–193 (1971). [DOI] [PubMed] [Google Scholar]
- 66.Fritz, et al. , Universal recognition of three basic emotions in music. Curr. Biol. 19, 573–576 (2009). [DOI] [PubMed] [Google Scholar]
- 67.Fujisawa T. X., Cook N. D., The perception of harmonic triads: An fMRI study. Brain Imaging Behav. 5, 109–125 (2011). [DOI] [PubMed] [Google Scholar]
- 68.Itoh K., Suwazono S., Nakada T., Central auditory processing of noncontextual consonance in music: An evoked potential study. J. Acoust. Soc. Am. 128, 3781–3787 (2010). [DOI] [PubMed] [Google Scholar]
- 69.Izumi A., Japanese monkeys perceive sensory consonance of chords. J. Acoust. Soc. Am. 108, 3073–3078 (2000). [DOI] [PubMed] [Google Scholar]
- 70.Peretz I., Blood A. J., Penhune V., Zatorre R., Cortical deafness to dissonance. Brain J. Neurol. 124, 928–940 (2001). [DOI] [PubMed] [Google Scholar]
- 71.Sugimoto T., et al. , Preference for consonant music over dissonant music by an infant chimpanzee. Primates J. Primatol. 51, 7–12 (2010). [DOI] [PubMed] [Google Scholar]
- 72.McDermott J. H., Schultz A. F., Undurraga E. A., Godoy R. A., Indifference to dissonance in native Amazonians reveals cultural variation in music perception. Nature 535, 547–550 (2016), 10.1038/nature18635. [DOI] [PubMed] [Google Scholar]
- 73.McDermott J. H., Hauser M., Are consonant intervals music to their ears? Spontaneous acoustic preferences in a nonhuman primate. Cognition 94, B11–21 (2004). [DOI] [PubMed] [Google Scholar]
- 74.Belin P., et al. , “What”, “where” and “how” in auditory cortex [2]. Nat. Neurosci. 3, 965–966 (2000). [DOI] [PubMed] [Google Scholar]
- 75.Hickok G., Poeppel D., Towards a functional neuroanatomy of speech perception. Trends Cogn. Sci. 4, 131–138 (2000). [DOI] [PubMed] [Google Scholar]
- 76.Kaas J. H., Hackett T. A., “What” and “where” processing in auditory cortex. Nat. Neurosci. 2, 1045–1047 (1999). [DOI] [PubMed] [Google Scholar]
- 77.Fritz T. H., et al. , Anatomical differences in the human inferior colliculus relate to the perceived valence of musical consonance and dissonance. Eur. J. Neurosci. 38, 3099–3105 (2013), 10.1111/ejn.12305. [DOI] [PubMed] [Google Scholar]
- 78.Koelsch S., Fritz T., Cramon D. Y. V., Müller K., Friederici A. D., Investigating emotion with music: An fMRI study. Hum. Brain Mapp. 27, 239–250 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Bravo F., “The influence of music on the emotional interpretation of visual contexts. Designing interactive multimedia tools for psychological research” in From Sounds to Music and Emotions. Lecture Notes in Computer Science, Aramaki M., Barthet M., Kronland-Martinet R., Ystad S., Eds. (Springer, Berlin/Heidelberg, Germany, 2013), pp. 366–377. [Google Scholar]
- 80.Bravo F., et al. , Neural mechanisms underlying valence inferences to sound: The role of the right angular gyrus. Neuropsychologia 102, 144–162 (2017). [DOI] [PubMed] [Google Scholar]
- 81.Bravo F., Cross I., Stamatakis E. A., Rohrmeier M., Sensory cortical response to uncertainty and low salience during recognition of affective cues in musical intervals. PLoS ONE 12, e0175991 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Grosse Wiesmann C., Friederici A. D., Singer T., Steinbeis N., Two systems for thinking about others’ thoughts in the developing brain. Proc. Natl. Acad. Sci. U.S.A. 117, 6928–6935 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Brown S., Martinez M. J., Parsons L. M., Passive music listening spontaneously engages limbic and paralimbic systems. Neuroreport 15, 2033–2037 (2004). [DOI] [PubMed] [Google Scholar]
- 84.Menon, Neural correlates of timbre change in harmonic sounds. NeuroImage 17, 1742–1754 (2002). [DOI] [PubMed] [Google Scholar]
- 85.Ohnishi T., et al. , Functional anatomy of musical perception in musicians. Cereb. Cortex 11, 754–760 (2001). [DOI] [PubMed] [Google Scholar]
- 86.Bravo F., et al. , Anterior cingulate and medial prefrontal cortex response to systematically controlled tonal dissonance during passive music listening. Hum. Brain Mapp. 41, 46–66 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Corbetta M., Patel G., Shulman G. L., The reorienting system of the human brain: From environment to theory of mind. Neuron 58, 306–324 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Corbetta M., Shulman G. L., Control of goal-directed and stimulus-driven attention in the brain. Nat. Rev. Neurosci. 3, 215–229 (2002). [DOI] [PubMed] [Google Scholar]
- 89.Bullier J., Feedback connections and conscious vision. Trends Cogn. Sci. 5, 369–370 (2001). [DOI] [PubMed] [Google Scholar]
- 90.Gilbert C. D., Li W., Top-down influences on visual processing. Nat. Rev. Neurosci. 14, 350–363 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Petro L. S., Paton A. T., Muckli L., Contextual modulation of primary visual cortex by auditory signals. Philos. Trans. R. Soc. B, Biol. Sci. 372, 20160104 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.De Meo R., Murray M. M., Clarke S., Matusz P. J., Top-down control and early multisensory processes: Chicken vs. egg. Front. Integr. Neurosci. 9, 17 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Ghazanfar A. A., Maier J. X., Hoffman K. L., Logothetis N. K., Multisensory integration of dynamic faces and voices in rhesus monkey auditory cortex. J. Neurosci. 25, 5004–5012 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Shams L., Kamitani Y., Shimojo S., What you see is what you hear. Nature 408, 788–788 (2000). [DOI] [PubMed] [Google Scholar]
- 95.Friston K. J., et al. , Psychophysiological and modulatory interactions in neuroimaging. Neuroimage 6, 218–229 (1997). [DOI] [PubMed] [Google Scholar]
- 96.Rauschecker J. P., Cortical processing of complex sounds. Curr. Opin. Neurobiol. 8, 516–521 (1998). [DOI] [PubMed] [Google Scholar]
- 97.Romanski L. M., et al. , Dual streams of auditory afferents target multiple domains in the primate prefrontal cortex. Nat. Neurosci. 2, 1131–1136 (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Wandell B. A., Dumoulin S. O., Brewer A. A., Visual field maps in human cortex. Neuron 56, 366–383 (2007). [DOI] [PubMed] [Google Scholar]
- 99.Dumoulin S. O., et al. , A new anatomical landmark for reliable identification of human area V5/MT: A quantitative analysis of sulcal patterning. Cereb. Cortex N. Y. N. 1991 10, 454–463 (2000). [DOI] [PubMed] [Google Scholar]
- 100.Renier L. A., et al. , Preserved functional specialization for spatial processing in the middle occipital gyrus of the early blind. Neuron 68, 138–148 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Friston K. J., Harrison L., Penny W., Dynamic causal modelling. NeuroImage 19, 1273–1302 (2003). [DOI] [PubMed] [Google Scholar]
- 102.Stephan K. E., et al. , Ten simple rules for dynamic causal modeling. Neuroimage 49, 3099–3109 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Hickok G., Poeppel D., The cortical organization of speech processing. Nat. Rev. Neurosci. 8, 393–402 (2007). [DOI] [PubMed] [Google Scholar]
- 104.Kanske P., Böckler A., Trautwein F.-M., Parianen Lesemann F. H., Singer T., Are strong empathizers better mentalizers? Evidence for independence and interaction between the routes of social cognition. Soc. Cogn. Affect. Neurosci. 11, 1383–1392 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Schurz M., et al. , Clarifying the role of theory of mind areas during visual perspective taking: Issues of spontaneity and domain-specificity. NeuroImage 117, 386–396 (2015). [DOI] [PubMed] [Google Scholar]
- 106.Silani G., Lamm C., Ruff C. C., Singer T., Right supramarginal gyrus is crucial to overcome emotional egocentricity bias in social judgments. J. Neurosci. 33, 15466–15476 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Steinbeis N., Bernhardt B. C., Singer T., Age-related differences in function and structure of rSMG and reduced functional connectivity with DLPFC explains heightened emotional egocentricity bias in childhood. Soc. Cogn. Affect. Neurosci. 10, 302–310 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.DiCarlo J. J., Zoccolan D., Rust N. C., How does the brain solve visual object recognition? Neuron 73, 415–434 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Mishkin M., Ungerleider L. G., Macko K. A., Object vision and spatial vision: Two cortical pathways. Trends Neurosci. 6, 414–417 (1983). [Google Scholar]
- 110.Ungerleider L., “What” and “where” in the human brain. Curr. Opin. Neurobiol. 4, 157–165 (1994). [DOI] [PubMed] [Google Scholar]
- 111.Petro L. S., Vizioli L., Muckli L., Contributions of cortical feedback to sensory processing in primary visual cortex. Front. Psychol. 5, 1223 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.Desimone R., Duncan J., Neural mechanisms of selective visual attention. Annu. Rev. Neurosci. 18, 193–222 (1995). [DOI] [PubMed] [Google Scholar]
- 113.Peelen M. V., Kastner S., Attention in the real world: Toward understanding its neural basis. Trends Cogn. Sci. 18, 242–250 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Vetter P., Smith F. W., Muckli L., Decoding sound and imagery content in early visual cortex. Curr. Biol. 24, 1256–1262 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115.Friston K., A theory of cortical responses. Philos. Trans. R. Soc. B, Biol. Sci. 360, 815–836 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 116.Friston K., The free-energy principle: A rough guide to the brain? Trends Cogn. Sci. 13, 293–301 (2009). [DOI] [PubMed] [Google Scholar]
- 117.Schank R. C., “Using knowledge to understand” in Proceedings of the 1975 Workshop on Theoretical Issues in Natural Language Processing (Association for Computational Linguistics, 1975), pp. 117–121. [Google Scholar]
- 118.Minsky M., “A framework for representing knowledge” (Technical Report 306, Massachusetts Institute of Technology, USA, 1974).
- 119.Goodale M. A., Milner A. D., Separate visual pathways for perception and action. Trends Neurosci. 15, 20–25 (1992). [DOI] [PubMed] [Google Scholar]
- 120.Wernicke C., The symptom complex of aphasia: A psychological study on an anatomical basis. Boston Stud. Philos. Sci. 4, 34–97 (1874). [Google Scholar]
- 121.Hickok G., Poeppel D., Dorsal and ventral streams: A framework for understanding aspects of the functional anatomy of language. Cognition 92, 67–99 (2004). [DOI] [PubMed] [Google Scholar]
- 122.Saur D., et al. , Ventral and dorsal pathways for language. Proc. Natl. Acad. Sci. U.S.A. 105, 18035–18040 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 123.Carretié L., Martín-Loeches M., Hinojosa J. A., Mercado F., Emotion and attention interaction studied through event-related potentials. J. Cogn. Neurosci. 13, 1109–1128 (2001). [DOI] [PubMed] [Google Scholar]
- 124.Dolcos F., LaBar K. S., Cabeza R., Dissociable effects of arousal and valence on prefrontal activity indexing emotional evaluation and subsequent memory: An event-related fMRI study. NeuroImage 23, 64–74 (2004). [DOI] [PubMed] [Google Scholar]
- 125.Dolcos F., et al. , Neural correlates of emotion-attention interactions: From perception, learning, and memory to social cognition, individual differences, and training interventions. Neurosci. Biobehav. Rev. 108, 559–601 (2020). [DOI] [PubMed] [Google Scholar]
- 126.Dolcos F., Cabeza R., Event-related potentials of emotional memory: Encoding pleasant, unpleasant, and neutral pictures. Cogn. Affect. Behav. Neurosci. 2, 252–263 (2002). [DOI] [PubMed] [Google Scholar]
- 127.Kensinger E. A., Schacter D. L., Neural processes supporting young and older adults’ emotional memories. J. Cogn. Neurosci. 20, 1161–1173 (2008). [DOI] [PubMed] [Google Scholar]
- 128.Kosslyn S. M., et al. , Neural effects of visualizing and perceiving aversive stimuli: A PET investigation. Neuroreport 7, 1569–1576 (1996). [DOI] [PubMed] [Google Scholar]
- 129.Lang P. J., et al. , Emotional arousal and activation of the visual cortex: An fMRI analysis. Psychophysiology 35, 199–210 (1998). [PubMed] [Google Scholar]
- 130.Mickley Steinmetz K. R., Kensinger E. A., The effects of valence and arousal on the neural activity leading to subsequent memory. Psychophysiology 46, 1190–1199 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 131.Pourtois G., Schettino A., Vuilleumier P., Brain mechanisms for emotional influences on perception and attention: What is magic and what is not. Biol. Psychol. 92, 492–512 (2013). [DOI] [PubMed] [Google Scholar]
- 132.Vuilleumier P., Affective and motivational control of vision. Curr. Opin. Neurol. 28, 29–35 (2015). [DOI] [PubMed] [Google Scholar]
- 133.Vuilleumier P., Richardson M. P., Armony J. L., Driver J., Dolan R. J., Distant influences of amygdala lesion on visual cortical activation during emotional face processing. Nat. Neurosci. 7, 1271–1278 (2004). [DOI] [PubMed] [Google Scholar]
- 134.Chen Y.-C., Spence C., Investigating the crossmodal influence of odour on the visual perception of facial attractiveness and age. Multisens. Res. 35, 1–23 (2022). [DOI] [PubMed] [Google Scholar]
- 135.Thom R., Designing a movie for sound. IRIS-PARIS 27, 9–20 (1999). [Google Scholar]
- 136.Mumford J. A., A power calculation guide for fMRI studies. Soc. Cogn. Affect. Neurosci. 7, 738–742 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 137.Barthet M., Depalle P., Kronland-Martinet R., Ystad S., Acoustical correlates of timbre and expressiveness in clarinet performance. Music Percept. Interdiscip. J. 28, 135–154 (2010). [Google Scholar]
- 138.Paraskeva S., McAdams S., “Influence of timbre, presence/absence of tonal hierarchy and musical training on the perception of musical tension and relaxation schemas” in ICMC (1997).
- 139.Bravo F.. We see what we hear. Figshare. Dataset. 10.6084/m9.figshare.21345240.v1. Deposited 17 October 2022. [DOI]
- 140.Gallagher H. L., et al. , Reading the mind in cartoons and stories: An fMRI study of “theory of mind” in verbal and nonverbal tasks. Neuropsychologia 38, 11–21 (2000). [DOI] [PubMed] [Google Scholar]
- 141.Saxe R., “The right temporo-parietal junction: A specific brain region for thinking about thoughts” in Handbook of Theory of Mind, Leslie A., German T., Eds. (Psychology Press, Taylor & Francis Group, 2010), pp. 1–35. [Google Scholar]
- 142.Saxe R., Kanwisher N., People thinking about thinking people: The role of the temporo-parietal junction in “theory of mind”. NeuroImage 19, 1835–1842 (2003). [DOI] [PubMed] [Google Scholar]
- 143.Völlm B. A., et al. , Neuronal correlates of theory of mind and empathy: A functional magnetic resonance imaging study in a nonverbal task. NeuroImage 29, 90–98 (2006). [DOI] [PubMed] [Google Scholar]
- 144.Happé F., An advanced test of theory of mind: Understanding of story characters’ thoughts and feelings by able autistic, mentally handicapped, and normal children and adults. J. Autism Dev. Disord. 24, 129–154 (1994). [DOI] [PubMed] [Google Scholar]
- 145.Karl Friston J., et al. , Spatial registration and normalization of images. Hum. Brain Mapp. 3, 165–189 (1995). [Google Scholar]
- 146.Henson R., “Efficient experimental design for fMRI” in Statistical Parametric Mapping, Friston K., et al., Eds. (Elsevier, 2007), pp. 193–210. [Google Scholar]
- 147.Friston K. J., Zarahn E., Josephs O., Henson R. N., Dale A. M., Stochastic designs in event-related fMRI. NeuroImage 10, 607–619 (1999). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Appendix 01 (PDF)
Video_consonance (Audiovisual stimuli employed for the consonant condition).
Video_dissonance. (Audiovisual stimuli employed for the dissonant condition).
Data Availability Statement
fMRI data have been deposited in Figshare (https://doi.org/10.6084/m9.figshare.21345240.v1) (139).




