Abstract
Auditory Scene Analysis (ASA) refers to the grouping of acoustic signals into auditory objects. Previously, we have shown that perceived musicality of auditory sequences varies with high-level organizational features. Here, we explore the neural mechanisms mediating ASA and auditory object perception. Participants performed musicality judgments on randomly generated pure-tone sequences and manipulated versions of each sequence containing low-level changes (amplitude; timbre). Low-level manipulations affected auditory object perception as evidenced by changes in musicality ratings. fMRI was used to measure neural activation to sequences rated most and least musical, and the altered versions of each sequence. Next, we generated two partially overlapping networks: (i) a music processing network (music localizer) and (ii) an ASA network (base sequences vs. ASA manipulated sequences). Using Representational Similarity Analysis, we correlated the functional profiles of each ROI to a model generated from behavioral musicality ratings as well as models corresponding to low-level feature processing and music perception. Within overlapping regions, areas near primary auditory cortex correlated with low-level ASA models, whereas right IPS was correlated with musicality ratings. Shared neural mechanisms that correlate with behavior and underlie both ASA and music perception suggests that low-level features of auditory stimuli play a role in auditory object perception.
Keywords: auditory scene analysis, auditory object perception, neuroimaging, RSA
Introduction
A central challenge of auditory neuroscience involves the identification of mechanisms responsible for transforming low-level acoustic features into perceptually stable and meaningful units of information. The difficulty of this task is compounded by the observation that the acoustic signal reaching the human ear typically contains a sum of pressure waves emanating from multiple environmental sources. Thus, for auditory perception to occur, this signal must be decomposed into its constituent features and grouped in a manner that approximates the surrounding auditory scene. The numerous processes associated with these transformations are referred to as Auditory Scene Analysis (ASA; van Noorden 1975; Bregman 1990; Darwin 1997). Successful ASA will typically result in the formation of perceptual units known as auditory objects. While the precise definition of an auditory object has generated considerable debate (Bregman 1990; Griffiths and Warren 2004; Bizley and Cohen 2013), it is frequently used to indicate an auditory percept that corresponds to a stimulus in the environment (Bregman 1990; Shinn-Cunningham 2008; Winkler et al. 2009; Brefczynski-Lewis and Lewis 2017). While auditory objects can reference discrete events, such as a car horn or a bird chirp, it is often the case that successful auditory perception relies on the grouping on multiple discrete events that unfold over time (i.e. footsteps or notes of a melody). This grouping of sequential acoustic events is sometimes referred to as an “auditory stream”; however, such streams can also be considered auditory objects (Griffiths and Warren 2004; Bizley and Cohen 2013). Auditory objects are central to a coherent auditory experience and serve as input to higher level cognitive operations such as attention, memory, or motor systems (Shinn-Cunningham 2008; Shinn-Cunningham and Best 2008; Seydell-Greenwald et al. 2014; Zimmermann et al. 2016 but see Shamma et al. 2011).
In order for auditory perception to occur, acoustic properties of the impinging soundwave must be transformed into perceptual features and subsequently grouped into objects. The dual pathway model is one influential framework for understanding the neural bases of this process (Romanski et al. 1999; Alain et al. 2001; Maeder et al. 2001; Tian et al. 2001; Arnott et al. 2004; Griffiths 2008; Lomber and Malhotra 2008; Rauschecker and Scott 2009 but see Belin et al. 2000; Kaas and Hackett 2000). According to this model, neural activity originating in the primary auditory cortex is processed along two parallel, cortical pathways; each marshaling the information toward different behavioral goals. Analogous to the influential two-stream model of the visual system (Ungerleider and Mishkin 1982), the dorsal pathway is thought to be involved in representing the location of incoming sound as well as preparing for appropriate motor behaviors. Conversely, the computations along the ventral pathway are geared toward representing the content and identity of the auditory stimulus. While there is anatomical connectivity between the two pathways (Webster et al. 1994; Kaas and Hackett 1999, 2000) and apparent crosstalk with behavioral consequences (Stecker and Middlebrooks 2003; Cloutman 2013), most research pertaining to ASA and auditory object perception has focused primarily on processes happening in the ventral (“what”) pathway.
Before auditory objects can be formed, perceptual features of sound must be extracted from the acoustic signal. A number of cortical regions have been found that show sensitivity to manipulations of low-level auditory features. For example, regions sensitive to pitch (Bendor and Wang 2005; Hall and Plack 2009; Griffiths et al. 2010; Bizley et al. 2013), timbre (Menon et al. 2002; Kumar et al. 2007; Allen et al. 2017), and loudness (Reiterer et al. 2008; Watkins and Barbour 2011; Röhl and Uppenkamp 2012; Uppenkamp and Röhl 2014) have all been discovered throughout the ventral auditory pathway. While some debate exists regarding whether the processing of auditory features is hierarchical or distributed (Bizley and Walker 2009; Bizley and Cohen 2013; Allen et al. 2017), or the role of subcortical structures in these computations (Pressnitzer et al. 2008), it is clear that multiple regions of the cortex are specialized for the extraction of low-level auditory information.
Other research has focused on the neural correlates of fully formed auditory objects. According to some studies, regions as early as primary auditory cortex may be involved in the representation of auditory objects or streams (Ohl et al. 2001; Selezneva et al. 2006). Beyond A1, areas in the belt region show sensitivity to categorical boundaries in monkeys (Tsunada et al. 2012), while neurons in human anterior superior temporal lobe show categorical responses to human speech and musical sounds (Binder et al. 2004; Chang et al. 2010; Leaver and Rauschecker 2010). As one moves further along the ventral hierarchy to higher level areas such as the ventrolateral prefrontal cortex (VLPFC), activity in response to auditory objects becomes increasingly more abstract, invariant and correlated with behavior (Romanski et al. 2004; Gifford et al. 2005; Russ et al. 2008; Cohen et al. 2009; Lee 2009). Thus, the body of research appears to depict a hierarchical transformation of acoustic properties into perceptual features and finally the grouping of those features into auditory objects that guide behavior and correlate with subjective experience. However, the relationship between ASA and auditory object perception remains poorly understood. For example, are there regions that are involved in both the extraction of low-level auditory features as well as the representation of fully formed auditory objects? If such areas exist, what types of computations do they perform?
Music constitutes one class of auditory stimuli that have been used in the study of auditory perception. Given its importance in organizing and managing human activities and its relation to higher level cognitive systems (i.e. memory, emotion, language), music has generated great interest amongst psychologists and neuroscientists (Zatorre 2005; Ashley et al. 2006; McDermott and Oxenham 2008; Fedorenko et al. 2012; Pearce and Rohrmeier 2012). Furthermore, given that music is an emergent property of sound, it can also be viewed as a compelling example of an auditory object. Music, as a perceptual experience, is contingent upon the grouping of disparate sounds into a temporally evolving, unified percept that is governed by a syntactic organization of notes into a hierarchical structure (Lerdhal and Jackendoff 1996; Rohrmeier 2007). Such formal descriptions of musical syntax have inspired “neurocognitive” models that attempt to link music perception with cognitive processes such as Gestalt grouping and ASA, as well as to the neural loci and computational processes underlying this cognitive phenomenon (Koelsch and Siebel 2005; Koelsch 2011). Neuroimaging work on music perception has revealed the involvement of numerous regions that have also been implicated in other nonmusical investigations of auditory object perception. For example, Broca’s area, which is a part of the VLPFC and ventral auditory pathway, seems to be involved in musical syntax processing in addition to its better-known role in speech function (Maess et al. 2001; Fadiga et al. 2009; Kunert et al. 2015). While these findings have contributed to a biological understanding of musicality, the attribution of perceptual processes to specific brain regions is a challenging scientific and methodological endeavor.
Our labs have recently begun using musicality in order to explore the formation of auditory objects (Randall and Greenberg 2016; Gurariy et al. 2021). In these studies, we introduced a novel stimulus set and methodological approach whereby participants are asked to provide musicality ratings in response to randomly generated sequences of pure tones. A high musicality rating ascribed to an auditory sequence implies that the organization of the constituent notes mimics, albeit serendipitously, key aspects of musical structure. It follows that such sequences should be viewed as strong exemplars of auditory objects. In contrast, a low musicality rating would imply the absence of such organizational integrity, resulting in the perception of poorly integrated sounds or a weaker object percept.
Using this approach affords a number of methodological advantages. First, our method does not rely on a priori definitions of musical vs. nonmusical categories. Rather, these categories emerge in a bottom-up manner (based on participant ratings) while simultaneously capturing variability and other idiosyncrasies that may exist between individuals. Second, by employing a simpler stimulus set, the researcher can exert greater control over low-level metrics of the auditory sequence. For example, musical vs. nonmusical stimuli can be compared while controlling for timbre, loudness, rhythm, length, pitch, and fade-in. Third, these stimuli can be used to explore how manipulations of low-level auditory features modulate the perceived auditory “Gestalt,” or objectness, as measured by changes in musicality ratings following low-level changes to the auditory sequence.
Randall and Greenberg (2016) first used these stimuli and task in order to investigate the factors which give rise to the perception of musical objects. Using Principal Component Analysis, they discovered several low-level metrics that are involved in the experience of musicality and auditory object perception. Gurariy et al. (2021) expanded upon these findings by incorporating low-level manipulations into this experimental paradigm. Specifically, participants rated altered versions of each auditory sequence in which a randomly chosen subset of notes was manipulated along one of three auditory dimensions (timbre, amplitude, or fade-in) at one of three strength levels (low, medium, or high). Incorporating these manipulations allowed for the exploration of low-level ASA cues and their effect on auditory object formation (by comparing changes in musicality ratings between altered and original sequences). The results of this study revealed that manipulations of low-level acoustic features can indeed modulate the perception of auditory objects, there exist asymmetries in the effects of different acoustic features on auditory object perception, as well as individual differences in the impact of said features.
The current study aims to extend this research by exploring the relationship between ASA and auditory object perception using both behavioral and neural measures. In the first part of the study, participants provided perceived musicality ratings in response to randomly generated pure tone sequences. Furthermore, musicality ratings were also collected for altered versions of each sequence in which a subset of tones was manipulated along one of two auditory dimensions (timbre or amplitude) at one of two strength levels (low or high). In the second part of the study, the most musical and least musical sequences were selected (along with their manipulated counterparts) for use during neuroimaging based on each participant’s ratings. Next, fMRI was used to measure neural activation in response to the most musical and least musical sequences (and manipulated counterparts), while participants performed a 1-back memory task. Additionally, we ran a separate music localizer (real music vs. scrambled music) in order to independently identify the neural regions involved in music/auditory object perception. These data allowed us to identify a musicality/auditory object network (music localizer) as well as an ASA network (using a contrast between original and manipulated sequences). Our analysis revealed overlapping regions between the two aforementioned networks whose functional properties were explored using RSA (Kriegeskorte 2008).
Materials and methods
Participants
Fourteen participants (eight female, six male) were recruited with ages ranging from 19 to 38 (M = 23.71, SD = 5.28). All participants reported having normal hearing and normal or corrected-to-normal vision. The majority were not trained musicians given that the modal number of years spent in formal musical education was 0. All were native speakers of American English. Participants provided informed consent prior to the experiment in accordance with standards set forth by the Institutional Review Board.
Stimuli
The stimulus set consisted of pure-tone auditory sequences, each lasting 5 s and containing a total of 10 pure tones. This stimulus set was previously used by Randall and Greenberg (2016) and originally contained 50 unique sequences that were rated in terms of perceived musicality. Of the original 50 auditory sequences, the current experiment used the 11 most musical and the 11 least musical sequences identified by Randall and Greenberg (2016) study. Each of 22 sequences were also manipulated along one of two low-level dimensions (timbre or amplitude), at one of two strength levels (low or high). Thus, each of the 22 original sequences (referred to as base sequences) contained 4 manipulated counterparts (amplitude low, amplitude high, timbre low, and timbre high) for a total of 110 sequences. Manipulations occurred on a subset of tones (4 out of 10) that were chosen randomly, except to make sure that altered notes did not occur either on the first or last position, or back-to-back (thus manipulations could on tones 2-4-6-8 or 3-5-7-9). Timbre manipulations consisted of changing the shape of the waveform (sinusoidal to sawtooth), while amplitude manipulations involved modulating the amplitude at which the tones were presented. Low manipulations were presented at threshold and high manipulations at 3× threshold (perceptual thresholds corresponding to each category of manipulation were measured via psychophysical staircase paradigm for each participant prior to the start of the experiment).
Behavioral task
The task involved rating each auditory sequence in terms of perceived musicality. In order to quantify musicality ratings, participants were instructed to use a Likert-type scale (Likert 1932) of 1–5, with 1 indicating the absence of musicality (i.e. nonmusical) and 5 indicating a sequence that is perceived to be highly musical. Participants were instructed to use the entire scale when making musicality judgments. Ratings were recorded using a keyboard and participants were given 1.5 s to respond. Trials in which the response time exceeded a 1.5-s window were discarded from analysis. Participants completed a minimum of six practice trials, while additional practice trials were available upon request. Once the participant reported comprehension of the task, they proceeded to the main experiment. The experiment consisted of 11 blocks; each block was composed of 38 trials. Each base sequence was presented once, while each manipulated sequence was presented two times for a total of 418 trials. However, the behavioral portion of the study contained five other manipulations that were not used in this investigation. Thus, of the 418 trials, only 198 corresponding to the experimental conditions described in the previous section were selected for further analysis. Stimuli were presented on an Apple Mac Mini via headphones (Sennheiser HD210) at a comfortable listening volume. The experiment was coded using MATLAB (MathWorks 2007) with Psychophysics Toolbox extensions (Brainard 1997; Pelli 1997).
Analysis of musicality ratings
Ratings of perceived musicality in response to all stimuli were z-scored for each participant. Next, we selected the four sequences with highest z-scores and the four sequences with the lowest z-scores, thereby splitting the data into “most musical” and “least musical” conditions. However, this selection was done separately for each individual, and thus, the most musical and least musical categories could be comprised of different sequences for different people (based on their ratings). That being said, our previous work has shown significant between-subject concordance in the ascription of musicality to auditory sequences (Randall and Greenberg 2016; Gurariy et al. 2021). In order to explore the impact of low-level manipulations on musicality, individual ratings were averaged for the base sequences as well as for the manipulated sequences; this was done separately for the most musical and least musical stimuli. Finally, ratings for each condition were averaged across participants.
fMRI task and stimuli
Of the 22 auditory sequences used in the behavioral task, we selected the four most musical and four least musical (as rated by individual participants), in addition to their manipulated counterparts (timbre, amplitude; low, high). During fMRI, participants were asked to performed a 1-back memory task on the sequences to encourage a deep level of stimulus processing. Each participant performed a total of 8 runs, each consisting of 40 trials. Every sequence was presented for 5 s followed by 5 s of silence, during which the participant could respond using a button response box.
Music network localizer
In addition to the auditory sequences described above, fMRI data were collected in response to a separate music network localizer which served to independently localize the network of brain regions specialized for music processing. The localizer included recorded piano music as well as scrambled versions of each musical piece. There were a total of 12 different piano pieces, each lasting 5 s. In order to generate the scrambled condition, each piano sequence was time-domain and multi-band scrambled. Time-domain scrambling involved segmenting the signal into short windows tapered with a cosine waves and then randomly shuffling the windows, while multi-band scrambling consisted of scrambling the signal within different frequency bands followed by recombination (Ellis and Lee 2004; Ellis 2010). This process resulted in 12 scrambled stimuli that lasted 5 s each. Participants performed a 1-back memory task during the localizer to encourage a deep level of stimulus processing. The presentation of each stimulus was followed by 5 s of silence during which participants could respond via a button response box. A total of two localizer runs were acquired for each participant.
fMRI acquisition
Data were acquired on a SIEMENS Verio scanner and an eight-channel parallel imaging head coil. High-resolution, whole brain anatomical data were acquired using a T1-weighted MPRAGE (1 × 1 mm in-plane resolution, matrix = 256 × 256, TE = 2.98 ms, TR = 2300 ms, flip angle = 9°). Whole-brain functional image volumes (2 × 2 × 3 mm resolution) were acquired using a T2*-weighted echoplanar imaging (EPI) pulse sequence (TR: 2500 ms; TE: 29 ms; flip angle: 79°; acquisition matrix 96 × 96; 38 transverse slices).
fMRI preprocessing
Functional data were analyzed with the AFNI/SUMA software package (Cox 1996; Saad et al. 2005) as well as custom scripts written in MATLAB. EPI slices were first deobliqued, then slice-time and motion corrected to the first functional run. Next, functional data were co-registered to a high-resolution anatomical volume that was collected individually for each participant. Following co-registration, individual subject data were mapped from EPI volume to a common standard mesh surface using methods described in Greenberg et al. (2010). Afterward, the mean of each EPI volume was computed and used to calculate the percent signal change. Spatial smoothing was conducted on the surface using a heat kernel with a full-width at half-maximum of 4 mm.
ROI selection
For the music localizer, a canonical hemodynamic response function (Cohen 1997) was convolved with a model of the timing of the stimulation epochs during the experiment, yielding two regressors corresponding to the musical and scrambled conditions. For the ASA localizer, there were also two regressors (all base sequences and all sequences containing a high manipulation of either timbre or amplitude). For the ASA localizer, only the high manipulations were chosen as a contrast to the base sequences for two reasons: (i) to minimize the discrepancy in the number of trials between the two conditions and (ii) because the high manipulations showed the largest difference form the base sequences in the behavioral data (see Fig. 1). Both localizers (music and ASA) were separately submitted to a multiple regression GLM (Worsley and Friston 1995). Following a paired samples t-test across participants, T-statistics were extracted for every surface vertex. T-stats that survived an alpha threshold of P = 0.05 were further submitted to a cluster analysis with a minimum threshold of 50 nodes. The resulting ROIs created two distinct but overlapping networks: a musicality/auditory object network (music vs. scrambled) and an ASA network (all base sequences vs. all high manipulated sequences). Projecting the ROIs from both localizers onto the cortical surface revealed regions of overlap between areas involved in musicality/auditory object perception (defined by the music localizer) and those involved in the processing of low-level acoustic differences (defined by the ASA localizer). These areas of overlap resulted in seven additional ROIs (see Fig. 2).
Fig. 1.
Group average musicality ratings for all experimental conditions broken down into most and least musical categories for amplitude manipulations (left) and timbre manipulations (right). Error bars represent standard error of the mean.
Fig. 2.
Regions of interest. The above figure shows lateral and medial views of inflated cortical surfaces corresponding to the left and right hemispheres. Colors represent ROIs that were generated using the musicality localizer (green) and the ASA localizer (blue). Furthermore, regions of overlap between the ROIs produced by these two localizers are highlighted in red (abbreviations: PCS—Postcentral sulcus; PT—Planum Temporale; STG—Superior temporal gyrus). Membership in the dorsal or ventral processing stream is signified by (D) and (V), respectively.
fMRI univariate analysis
For each participant, the mean hemodynamic BOLD signal was extracted from each surface ROI (see previous section for details of ROI selection). EPI runs were concatenated followed by a linear interpolation in order to convert the temporal dimension from TRs to seconds. Next, the data from each individual were averaged and grouped into 10 experimental conditions (musicality [most; least] × ASA manipulation [base; low; high; timbre; amplitude]). The interpolated hemodynamic response for each of the conditions was then averaged across all participants within every ROI. The group-averaged peak value of the HRF response is plotted for the seven overlapping ROIs in Fig. 3.
Fig. 3.
Overlapping ROIs and univariate data. The ROIs highlighted in red depict regions that were activated by both the music and ASA localizers. For each region, bar graphs show the univariate BOLD response for every experimental condition used in the study (key: M—Most musical; L—Least musical; Tim—Timbre; amp—Amplitude). Error bars represent standard error of the mean. Results from paired samples t-tests conducted on all possible pairwise comparisons are shown using gray lines and asterisks that appear above the bars (*P ≤ 0.05; **P ≤ 0.01; ***P ≤ 0.0001). Comparisons that did not reach statistical significance are not marked in the graphs. An asterisk below a bar indicates that the percent signal change of the corresponding condition was significantly different from zero as measured by a one-sample t-test.
RSA analysis
To explore the representational content of the ROIs defined by the musicality and ASA localizers, we conducted an RSA analysis (Kriegeskorte 2008). First, an RDM was computed for each ROI by extracting a vector of surface vertex data in response to each of the 10 conditions. Next, for each ROI, a matrix consisting of conditions × vertex values was converted into an RDM using the “Euclidean distance” measure to quantify dissimilarity between conditions. This process was carried out separately in each participant for each ROI generated by the two networks as well as the seven overlapping ROIs. RDMs from each ROI were averaged together across participants. Next, we created RDMs corresponding to four a priori models, each representing a hypothetical auditory process as well as an RDM based on participant behavior (musicality ratings) collected prior to the fMRI scan. All models are visually depicted in Fig. 4 and described in the following section. Lastly, in order to test which model best explains the variance of each neural region, we computed correlations (Kendall Tau correlation coefficient; Kendall 1938) between group-averaged RDMs from all ROIs and each of the four models as well as the behavioral RDM. In order to compute the statistical significance of each correlation, we ran a permutation test, conducting the same analysis on shuffled data. This was conducted 10,000 times resulting in a null distribution of correlation coefficients which was subsequently used to identify values corresponding to a P of 0.05. To correct for multiple comparisons, we used the false discovery rate procedure, with a q-value of 0.1 (Benjamini and Hochberg 1995; Benjamini and Yekutieli 2001).
Fig. 4.
RSA analysis. The results of the RSA analysis plotted on inflated brain maps (right). Highlighted regions depict ROIs within which the neural RDM was significantly correlated with the corresponding model (left). The colors assigned to each ROI highlight the manner in which that region was generated (green: Musicality localizer; blue: ASA localizer; red: Overlapping regions). ROIs that were not significantly correlated to any of the models are omitted from the figure. Models A–D are not based on actual data but were generated in an a priori manner. Model E was generated based on musicality ratings collected from participants prior to fMRI scanning.
Descriptions of a priori and behavioral RSA models
ASA model
A hypothetical model of a neural region is sensitive to both timbre and amplitude manipulations as well as the level of manipulation; but not sensitive to musicality. An ROI displaying this functional profile would show a different pattern of activation in response manipulated sequences vs. base sequences. Furthermore, activation in response to timbre manipulations would differ from that of amplitude manipulations. Finally, the differences in activation between the base and manipulated sequences are modeled to increase with increases of strength level (high > low). Unlike the musicality model (see below), the ASA model does not display sensitivity to auditory objects/musicality but rather to low level manipulations only.
Timbre model
Hypothetical model of a neural region that is sensitive to manipulations of timbre (but not amplitude) as well as the strength of the timbre manipulation. An ROI displaying this functional profile would show a different pattern of activation only in response to timbre manipulated sequences. Furthermore, strong manipulations (high) elicit greater differences than weaker (low) manipulations relative to the base sequence.
Amplitude model
Hypothetical model of a neural region that is sensitive to manipulations of amplitude (but not timbre) as well as the strength of the amplitude manipulation. An ROI displaying this functional profile would show a different pattern of activation only in response to amplitude manipulated sequences. Furthermore, strong manipulations (high) elicit greater differences than weaker (low) manipulations relative to the base sequence.
Musicality model
A hypothetical model of a neural region that is sensitive only to the musicality of the auditory sequence. An ROI displaying this functional profile would show a different pattern of activation in response to most musical vs. least musical sequences, irrespective of any low-level ASA manipulations.
Behavioral RDM
Unlike the previous models, which are purely hypothetical and based on a priori speculation about computational processes that might be occurring within the defined ROIs, the behavioral RDM was generated on the basis of actual data (participant musicality ratings of auditory sequences).
MDS analysis
The multidimensional scaling (MDS) technique (Torgerson 1952) was used to generate a 2D representation of the neural relationships between all experimental conditions. An RDM of the neural activity in response to all conditions was generated for every ROI within the overlapping network. This RDM was subsequently converted into an MDS plot using the cmdscale command in MATLAB. In the plot, Euclidian distance between conditions serves as a proxy for similarity. Conditions closer together in the MDS plot elicit more similar patterns of neural activation compared with conditions that are further apart.
Results
Behavioral ratings
In order to explore the effects of ASA manipulations on the perception of musical objects, we divided the auditory sequences into two categories: most musical (positive z-scores) and least musical (negative z-scores). Next, ratings were averaged within each level of manipulation (base, low, and high) for timbre and amplitude, separately (Fig. 1). Within each of these groups, one-way ANOVAs were used to examine the effects of ASA manipulation level (base, low, and high) on musicality ratings. The results show that the effect of ASA level was significant for each of the four groups: Most-Musical Amplitude F(2,42) = 10.24, P < 0.000, ηp2 = 0.33, Least-Musical Amplitude F(2,42) = 5.22, P = 0.009, ηp2 = 0.19, Most-Musical Timbre F(2,36) = 12.16, P < 0.000, ηp2 = 0.40, and Least Musical Timbre F(2,36) = 8.1, P = 0.001, ηp2 = 0.31. To further explore these findings, we conducted post hoc, paired-samples t-tests for the data within each group. For amplitude manipulations of the most musical base sequences (M = 1.28, SD = 0.29), we observed a significant decrease in musicality for both low (M = 0.85, SD = 0.48) and high (M = 0.51, SD = 0.58) levels of manipulation. The same trend was observed for timbre manipulations; however, the base and low (M = 0.90, SD = 0.34) categories were not statistically different from one another, whereas high manipulations (M = 0.64, SD = 0.46) were significantly lower relative to base. For manipulations of the least musical base (M = −1.00, SD = 0.35), similar effects were observed for both timbre and amplitude; specifically, manipulations of the base sequences resulted in increased musicality ratings. For amplitude, there were significant differences between the base and low (M = −0.63, SD = 0.18), but not base and high (M = −0.81, SD = 0.37). Timbre manipulations of least-musical base sequences resulted in significantly lower ratings in both low (M = −0.48, SD = 0.27) and high levels (M = −0.63, SD = 0.35). No significant differences were observed between low and high strength levels in either timbre or amplitude, irrespective of musicality.
Regions of interest
The ROIs identified by both localizers (see “ROI Selection” in Methods above) are plotted in Fig. 2. The musicality localizer (generated from a contrast between piano music and scrambled stimuli) activated large swathes of cortex across both hemispheres. The ROIs generated by this localizer included anterior and posterior superior temporal gyri (STG), planum temporale (PT), regions in the occipital and parietal cortex, as well as regions across the frontal lobes, including ventrolateral, dorsolateral, and orbitofrontal cortex. While the ASA localizer did not generate as many ROIs as the music localizer, it also resulted in activation of the STG, parietal cortex, and some frontal regions, bilaterally. In order to explore shared neural mechanisms between ASA and auditory object processing, we isolated regions that were activated by both the musicality, as well as the ASA, localizer. A total of seven such regions were identified. In the left hemisphere, overlapping regions included anterior STG, PT, postcentral sulcus, and a region in the anterior cingulate cortex (ACC). In the right hemisphere, overlapping regions were confined to PT, postcentral sulcus, and intraparietal sulcus (IPS).
Univariate analysis of the overlapping regions
Figure 3 shows the results of a univariate analysis of the fMRI data (BOLD signal) within overlapping regions of the musicality and ASA networks. Of the seven regions, three displayed significant activation in response to all of the auditory stimuli (as measured by a one-sample t-test vs. zero). These regions consisted of bilateral PT as well as right anterior STG. Within anterior STG, a statistically significant difference was observed between most musical base and least musical timbre high conditions. Within bilateral PT, the least musical timbre high condition was significantly different from both the base conditions as well as the most musical amplitude condition. In the right PT, the activation in response to the most musical base condition was significantly different from most musical timbre high and least musical timbre low conditions. The remaining four regions (bilateral postcentral gyri, right IPS, and left ACC) did not show significant activation in response to any experimental condition nor significant differences between any two conditions. Ostensibly, these data suggest that only the overlapping regions along the STG are sensitive to low-level manipulations (especially those involving timbre), while the remaining regions do not make significant contributions to the processing of these stimuli. However, given that these regions were activated by both the ASA and musicality localizers, they likely play an important role in auditory processing. Univariate analyses often lack sensitivity and can be ill-equipped for exploring subtle patterns and representational structure that might exist within a neural region. Thus, in an effort to further explore the functional contributions and computational processes that may be occurring within these overlapping ROIs, we employed a multivariate RSA analysis.
RSA analysis
The results of the RSA analysis are plotted in Fig. 4. On the left are the a priori RSA models (A–D) as well as the behavioral model (E), depicted as RDMs. The inflated surfaces on the right highlight which ROIs showed significant correlations with each model. The colors assigned to each ROI indicate the manner in which that region was generated (green: musicality network; blue: ASA network; red: overlapping regions).
Generally speaking, the models can be divided into low-level models, which relate to the processing of auditory features, and high-level models, which relate to the processing of auditory objects. For the low-level category, there are models based on each of the ASA manipulations (timbre; amplitude) and the level of manipulation (low; high). The ASA model incorporates both types of ASA manipulation as well as the level of manipulation. All three low-level models showed significant correlations to regions throughout the auditory cortex and as well as surrounding auditory areas of the temporal lobe bilaterally. Additionally, the ASA model was correlated with ROIs in the left postcentral sulcus. Interestingly, the ASA model also correlated with regions across all three categories of ROIs (musicality network; ASA network and overlapping regions). Between the timbre and amplitude models, timbre resulted in more localized activation, primarily of the overlapping ROIs in PT. Conversely, the amplitude model correlated with a substantially larger number of regions which included the temporal lobes as well as areas within the prefrontal cortex (from both the ASA and music localizers).
The higher level models included the musicality model as well as the RDM generated on the basis of participant behavior (musicality ratings). However, unlike the musicality model, which is sensitive only to most musical vs. least musical stimuli, the behavioral RDM also displays some sensitivity to low-level ASA features in addition to musicality. A clear categorical divide is seen between most musical vs. least musical sequences, irrespective of ASA manipulations. However, within the most and least musical categories, the effects of ASA manipulation and level of manipulation also exert an effect on participant ratings (as is further evidenced in Fig. 1). Thus, this can be thought of as a mixed-model given that it reflects both musicality and ASA manipulations. However, given that musicality seems to generate the most salient differences in the ratings, this model was considered high-level.
Significant correlations with the musicality model were observed in only one region belonging to the musicality network: the right VLPFC. The behavioral ratings showed significant correlations with medial orbitofrontal cortex, precentral gyrus, and occipital cortex of the left hemisphere. All three of these ROIs were defined using the music localizer. In the right hemisphere, the behavioral RDM was significantly correlated with a region in IPS which was one of the seven ROIs defined by the overlap between the music and ASA localizers.
MDS analysis of overlapping regions
The MDS technique, when applied to fMRI data, can be used to visualize the representational structure of stimuli based on neural activation within a particular region. Euclidian distance between conditions in a 2D MDS plot is viewed as a proxy for neural similarity. Figure 5 depicts the results of the MDS analysis for each of the overlapping ROIs. Colors (blue, green, and red) as well as symbols (stars and circles) are used to represent the 10 conditions used in the experiment. A qualitative analysis suggests that several regions within the overlapping ROIs are sensitive to low-level manipulations of amplitude and timbre. The most salient examples are found within PT, where clustering is observed amongst base sequences, timbre manipulated sequences, and amplitude manipulated sequences in both the left and right hemispheres. A similar albeit much weaker pattern can be seen in left ACC. In the left postcentral gyrus, some clustering is observable between the base and manipulated sequences. In right IPS, there appears to be some clustering along the musicality dimension (with the exception of most-musical amplitude-low, which clusters closer to the least-musical conditions). Another noteworthy observation relates to the clustering of conditions within each musical category. Specifically, the manipulated versions of the most-musical sequences appear to cluster further apart from each other relative to the manipulated versions of the least-musical sequences. This may arise from the fact that the most-musical sequences are more differentiable from one another compared with the least-musical counterparts that are likely experienced as noise.
Fig. 5.
MDS analysis. The ROIs highlighted in red depict regions that were active in both the music and ASA localizers. For each region, an MDS plot is shown depicting the relationship between all experimental conditions.
Discussion
The aim of the current study was to explore the neural mechanisms underlying ASA and auditory object perception. Toward this end, we employed a methodological approach in which participants judged the perceived musicality (auditory object manipulation) of random pure-tone auditory sequences as well as versions of the same sequences in which low-level acoustic features were altered in a subset of tones (ASA manipulation). Using fMRI, we measured the neural activation in response to these experimental conditions. Additionally, a music localizer and an ASA localizer uncovered two separate, yet partially overlapping, networks: a musicality/auditory object network that is sensitive to fully formed auditory objects, as well as an ASA network sensitive to low-level changes of acoustic properties. The results of this analysis yielded a substantial number of brain regions spread diffusely across the cortical surface. The specific contribution of each region to auditory perception lies beyond the scope of this study. However, we did uncover areas of the brain that appear to be involved jointly in both neural processes: ASA and auditory object perception. The dual involvement in both categories of neural computation (ASA and auditory object perception) suggests important contributions from these regions to the transformation of acoustic input into a fully formed auditory percept, a principal aim of this study. Based on these considerations, we focused the bulk of our analyses and discussion on this network of overlapping ROIs.
Changes to low-level ASA features modulate auditory object perception
Our behavioral data suggest that low-level ASA manipulations modulate the perception of auditory objects. In our sequences, random subsets of tones were subject to low-level manipulations (timbre or amplitude, at different strength levels: low or high) without affecting the frequencies of the tones or overall contour of the sequence. The fact that this resulted in changes to musicality judgments suggests that our manipulations successfully modulated the perceptual organization of these auditory objects. We observed similar trends for manipulations of both amplitude and timbre. For sequences rated most-musical, low-level manipulations resulted in decreases of perceived musicality. The low-level manipulations may have caused the altered tones of the sequence to group into separate objects, hence adversely affecting the structural integrity of the auditory sequence and subsequent musicality rating. In the case of the least-musical sequences, ASA manipulations elicited higher musicality ratings for both timbre and amplitude, perhaps due to the grouping of the altered tones into a separate object. While these new results suggest close similarities between the domains of timbre and amplitude, our previous work (Gurariy et al. 2021) explores the differential effects of ASA features on the perception of auditory objects, as well as individual differences in the effects of these features on auditory perception.
Representational content of musicality and ASA networks
In order to better understand the representational content of the ROIs generated by the musicality (music vs. scrambled) and ASA (base vs. manipulated sequences) localizers, we employed RSA. Below is a discussion of the findings that emerged from this analysis as they relate to the musicality and ASA networks. The RSA results pertaining to the seven overlapping regions are discussed in a separate section, far below.
Low-level models of auditory processing
All three low-level RSA models (the ASA, timbre, and amplitude model) showed significant correlations with large swaths of auditory cortex and superior temporal cortex. These results are consistent with existing knowledge about the auditory cortex and its role in decomposing the incoming auditory signal into constituent acoustic features (Kumar et al. 2007; Rauschecker and Scott 2009; Bizley and Cohen 2013). Significant correlations of the timbre model were confined to bilateral regions of the PT (findings discussed in detail below). The amplitude model was significantly correlated with a large region of the STG in the right hemisphere as well as a smaller region in the PT of the left hemisphere. Additionally, the amplitude model was correlated with ROIs in the frontal lobe of both hemispheres.
Amplitude changes are utilized by the auditory system as an important cue regarding the distance and location of an acoustic source (Kopčo et al. 2012; Ahveninen et al. 2013; Middlebrooks 2015; Kolarik et al. 2016). For example, decreases in the amplitude of a soundwave might be indicative of increasing distance between the listener and the sound source. Conversely, increases in amplitude can be interpreted by the auditory system as evidence for the approach of an environmental stimulus. Given the fact that amplitude modulations carry information relevant to the spatial position of an auditory source, we might expect increased activation within regions dedicated to spatial processing. Consistent with this predication, some of the ROIs in which activity was best explained by the amplitude model were located along the auditory dorsal “where” pathway (Rauschecker and Scott 2009; Arnott and Alain 2011) which begins in primary auditory cortex (A1) and terminates in the dorsolateral prefrontal cortex (Rauschecker and Scott 2009; Bizley and Cohen 2013).
High-level models of auditory processing
Musicality model
The musicality model was significantly correlated with one region defined by the music localizer—the right VLPFC. Initially, one may have expected this model to correlate with more regions defined by the music localizer since the categorical distinction between most musical and least musical shares some conceptual similarities between real music and scrambled music (which was the contrast used in the localizer). However, there are important differences between these stimuli which might explain the restricted nature of the correlation. Firstly, the music localizer was comprised of real piano music belonging to an identifiable musical genre (classical music). In contrast, while the most musical sequences appeared to capture certain key aspects of musical structure (as evidenced by musicality ratings; also see Randall and Greenberg 2016; Gurariy et al. 2021), the status of these sequences as authentic exemplars of music is unlikely. Second, due to the nature of the multiband scrambling of musical pieces, it may have been harder to perceive a distinct spectrotemporal pattern composed of differentiable tones. On the contrary, sequences that made up the most musical vs. the least musical categories were controlled for timbre, loudness, rhythm, length, pitch, and fade-in. In other words, both the most musical and least musical categories contained auditory sequences composed of identifiable tones (differing only in frequency) unfolding over time. This implies that perceived differences in musicality were likely caused by the extent to which the syntactical organization of these sequences mimicked that of music. Thus, it could be argued that while the randomly generated auditory sequences may be pseudo-exemplars of music, they do a better job of controlling for extraneous differences that may exist between the music vs. scrambled contrasts used by some studies (including this one). Based on these considerations, it is likely that this region in right VLPFC, which was defined using the music localizer and exhibited a significant correlation with the musicality model, plays an important role in the processing of musical stimuli.
Behavioral model (participant musicality ratings)
The RDM generated from participant behavior was significantly correlated with four ROIs. In the left hemisphere, these ROIs include precentral sulcus, medial orbitofrontal cortex, and occipital cortex (all three ROIs generated using the musicality localizer). In the right hemisphere, participant behavior was significantly correlated with an area in the IPS that was identified as one of the overlapping regions of both the music and ASA networks (the right IPS is discussed in a separate section).
We remind the reader that musicality judgments of auditory sequences were collected prior to the fMRI portion of the study, outside the scanner. While in the scanner, participants performed a 1-back working memory task which required neither attention to, or perception of, the musical properties of the auditory sequences. Had participants been instructed to engage with the musical aspects of the stimuli during the scan, it is likely that we would have observed correlations with additional brain regions. However, given these task-based circumstances, it is notable that correlations were observed between the behavioral RDM and any of the examined ROIs.
Precentral gyrus
The precentral gyrus is traditionally considered to be a part of the motor cortex responsible for the control of voluntary movement (Wise 2001). However, multiple imaging studies have documented precentral gyrus activation in response to auditory stimuli, even in tasks that do not require motor planning (Schön et al. 2010; Yu et al. 2017). For example, Yu et al. (2017) found that melodic analysis was correlated with activity in bilateral precentral gyri. Other studies have found precentral gyrus to be involved in tasks requiring musical discrimination (Brown and Martinez 2006), the perception of rhythm (Chen et al. 2008), musical error detection (LaCroix et al. 2015), and passive listening of music (Zatorre et al. 1994; LaCroix et al. 2015). The activation of classically motoric areas in response to auditory stimuli may signify audio-motor networks that are responsible for generating actions guided by auditory information. Such networks likely contain areas capable of processing purely sensory stimuli, even in the absence of specific behavioral goals. Indeed, the existence of multimodal neurons in motor and premotor cortical areas has been previously documented in animal studies (Graziano et al. 1999; Graziano and Gandhi 2000).
Medial orbitofrontal cortex
The orbitofrontal cortex has dense connections to limbic regions (Zald and Kim 1996) and is typically associated with value judgments, emotion, and reward (Rolls 2000, 2004; Kringelbach 2005). In addition to abstract information (O’Doherty et al. 2001), the orbitofrontal cortex is capable of responding to sensory stimuli across numerous modalities, to the extent that those stimuli contain affective salience (Rolls 2000). Furthermore, the activity in orbitofrontal cortex can be modulated by sounds (Winkowski et al. 2018) as well as music (Blood and Zatorre 2001; Lehne et al. 2013). In general, this region exhibits sensitivity to consonance/dissonance, aesthetic value, and the degree of pleasure elicited by auditory stimuli such as music or chords (Blood and Zatorre 2001; Lehne et al. 2013). Given that our behavioral task utilized musicality ratings as a dependent measure, it is reasonable to assume that auditory sequences rated as highly musical could have been perceived as intrinsically more pleasurable or more familiar relative to their less musical counterparts. Thus, in addition to musicality and auditory objecthood, the differential ratings across experimental categories may also be tracking hedonic value associated with these stimuli, which might have resulted in the observed correlation with activity in orbitofrontal cortex.
Occipital cortex
The activation of occipital regions in response to the music localizer as well as the correlation to participant behavior both constitute unexpected findings. However, we are not the first to report occipital activity in response to purely auditory stimuli. Increasing evidence suggests that the integration of multisensory information is not restricted to higher level association areas but may begin as early as primary sensory cortices (Petro et al. 2017). Long range projections are known to exist between auditory areas and visual cortex (Clavagnier et al. 2004; Muckli and Petro 2013), while neural populations whose tuning properties include auditory stimuli have been discovered in the visual cortex of anesthetized cats (Morrell 1972; Fishman and Michael 1973). In humans, multisensory illusions demonstrate the modulation of visual perception by auditory stimuli (Shams et al. 2000) and the neural correlates of these modulations have been discovered as early as V1 (Watkins et al. 2006, 2007). Furthermore, visual cortex activity has been documented in response to various auditory stimuli and tasks (Maeder et al. 2001; Feng et al. 2014; Vetter et al. 2014; Green et al. 2018) including music discrimination (Pelli 1997; Janata et al. 2002), and some have even proposed a role for visual cortex in the extraction of spatial information from auditory stimuli (Zimmer et al. 2004). One interpretation of the occipital activation observed in our study is that sound sources located outside the field of view can prime peripheral visual areas to facilitate the detection of a yet unseen, but approaching, stimulus (Cate et al. 2009; Vetter et al. 2014; Petro et al. 2017).
ASA and auditory object perception recruit common neural mechanisms
The primary goal of this study was to identify the neural mechanisms involved in both low-level ASA computations and auditory object perception to better understand how these processes are jointly exploited to form perceptual objects. Our approach yielded two networks: one subserving ASA processing and the other auditory object perception. While these two networks were largely distinct, we observed some areas of overlap. Specifically, we uncovered a total of seven regions that appear to be involved in both ASA processing as well as the perception of auditory objects. A univariate analysis of these regions (Fig. 3) found that only the areas in bilateral temporal lobe showed a significant response to the auditory sequences as well as some significant differences between the experimental conditions. However, the existence of shared neural resources underlying ASA and auditory object perception suggests that these regions play an important role in transforming acoustic features into higher level auditory percepts. Therefore, we employed a more sensitive RSA analysis to better understand the functional properties of these overlapping regions. Below is a discussion of the representational content of these areas as revealed by the RSA analysis.
The planum temporale and ASA
The PT was found to be bilaterally activated by both the musicality and ASA localizers. The results of the RSA analysis suggest that the activity within bilateral PT is best explained by two models: the ASA model as well as the timbre model. The ASA model is not sensitive to higher level attributes, such as musicality, but does incorporate low-level manipulations, including the type of manipulation (timbre or amplitude) as well as the strength of the manipulation (low or high). The functional profile of PT can be visualized using the results of the MDS analysis, generated from neural activity extracted from these regions (Fig. 5). A qualitative analysis of the MDS plots for bilateral PT reveals the grouping of the stimulus categories into three main clusters: base sequences, amplitude manipulated sequences, and timbre manipulated sequences. Furthermore, the timbre and amplitude clusters show further separation between the high and low levels of manipulation in the MDS space. Moreover, sequences manipulated at lower strength levels group closer to the base sequences. This pattern of results bears resemblance to participant behavior (see Fig. 1) with the exception of the musicality dimension, which does not appear to strongly modulate the activity in these regions.
Activity in the PT was also significantly correlated with the timbre model. Unlike the ASA model, the timbre model represents sensitivity to timbre manipulations (as well as level of manipulation) exclusively while disregarding all other stimulus categories. Using very similar stimuli and behavioral design, we have previously shown that asymmetries exist between different types of low-level features and their effects on auditory object perception (Gurariy et al. 2021). Specifically, of the low-level features that were examined (timbre, amplitude, and fade-in), timbre manipulations were most salient in terms of their effects on participant behavior (Gurariy et al. 2021). This is likely because timbre manipulations uniquely alter the harmonic content of the altered notes. Here, we show neural evidence of this asymmetry given that overlapping regions involved in ASA processing (bilateral PT) also correlate with the timbre model but not the amplitude model. A qualitative analysis of the MDS plots suggests that timbre manipulated sequences appear to group separately from the other conditions along the component 1 axis; this is especially evident in the right hemisphere. It has been previously proposed that a right lateralized auditory network is specialized for the processing of spectral features, while a left lateralized system is specialized for the processing of temporal features (Zatorre and Belin 2001). Although the amplitude model was not significantly correlated with bilateral PT, these regions also exhibit some sensitivity toward amplitude manipulations based on the MDS plots and significant correlation with the ASA model (which incorporates sensitivity to amplitude manipulations).
Our data suggest that PT can differentiate between base sequences and manipulated sequences, as well as the type of manipulation (timbre/amplitude). The ability to represent complex spectrotemporal patterns is likely important for ASA as well as for the perception of music (or other auditory objects with complex spectrotemporal properties). In fact, it has been proposed that the PT serves as a computational hub involved in the matching and segregation of spectrotemporal information (Griffiths and Warren 2002; van der Heijden et al. 2019). However, computations within this region are likely not sufficient for the evaluation of musical structure, which may be why the neural activity in these areas was not correlated with either the musicality or behavioral RDM.
The IPS correlates with participant behavior
We observed right IPS involvement in both the musicality/auditory object network, as well as in the ASA network. Interestingly, this was the only region amongst the overlapping ROIs that showed a significant correlation with participant behavior. The involvement of the IPS in processes related to auditory object perception as well as its correlation with participant behavior may be somewhat puzzling given that IPS has not traditionally been considered a part of the auditory system (Alain et al. 2001; Rauschecker and Scott 2009). In the past, numerous functions have been ascribed to the IPS including but not limited to suppression of task-irrelevant stimuli (Wojciulik and Kanwisher 1999), visuospatial attention (Corbetta 1998), salience mapping (Kusunoki et al. 2000; Itti and Koch 2001), saccade planning (Heide et al. 2001; Sereno et al. 2001, Colby and Goldberg 1999), working memory (Todd and Marois 2005), multiple object tracking (Howe et al. 2009), perceptual organization of visual stimuli (Xu and Chun 2009; Erlikhman et al. 2016), and sensorimotor organization (Grefkes and Fink 2005; Cui 2014). Since the majority of research into IPS function has been conducted within the domains of vision, visual attention, and motor planning, less is known about how this area might be involved in auditory processing.
One possibility is that the observed IPS activation may reflect the perceptual organization of the auditory scene (Cusack 2005; Hill et al. 2011; Teki et al. 2011, 2016; Kondo et al. 2018). The computations related to perceptual organization in IPS may extend across sensory domains to include visual (Shafritz et al. 2002; Xu and Chun 2009), auditory (Cusack 2005), and possibly tactile (Kitada et al. 2003) stimuli. In fact, the numerous functions that have been ascribed to IPS (i.e. saccade planning, salience mapping, distractor filtering, working memory, etc.) may be, at least in part, carried out in the service of structuring the perceptual scene. Other evidence linking the IPS to amodal transformations of sensory data come from behavioral studies documenting an association between visual rotation and melody reversals (Cupchik et al. 2001), as well as neuropsychological studies that have shown impaired visual rotation abilities in persons with congenital amusia (Douglas and Bilkey 2007).
Our results provide additional evidence for the involvement of human IPS in the processing of auditory stimuli. Not only was right IPS activation observed in relation to auditory object processing as well as ASA processing, but the representational content of this region was significantly correlated with participant behavior. Given these findings, one possibility is that the organizational processes occurring in the IPS play a role in transforming acoustic features processed by auditory cortex into behaviorally relevant outputs. The perception of music relies on the syntactical structure of the constituent elements; thus, the subjective experience of musicality elicited by a temporally evolving auditory sequence will be contingent on the organizational properties of that sequence. Therefore, to say that the manipulation of low-level features results in robust changes to perceived musicality is to say that these manipulations are modulating the perceptual organization of the auditory sequence. At the very least, the attribution of musicality to a sequence of tones is contingent upon more basic sensory decisions, such as whether the sequence of sound should be organized into one or more distinct objects, or how to segregate the figure from the auditory ground. These types of computations have shown to take place in human IPS (Cusack 2005; Teki et al. 2011, 2016) and may explain the observed correlation between this region and participant behavior, as well as its activation in both ASA and musicality processes (as uncovered by the corresponding localizers). It has been previously suggested that the functional organization of the visual dorsal pathway can be understood as a gradient whereby information is transformed from perceptual (caudal-medial dorsal pathway) to motor (rostral-lateral dorsal pathway) representations (Freud et al. 2016). Thus, the activity in IPS (which belongs to both the auditory and visual dorsal pathways) may ultimately occur in the service of sensorimotor organization. An alternative explanation is that attention might play a central role in the binding of tones into an auditory object. Hence, if music is to be viewed as a special kind of auditory object, the strength of which can exist on a spectrum, then differential degrees of attentional allocation could also account for the observed correlation.
Other ROIs of the overlapping network
Two additional areas of the overlapping regions showed significant correlations with the ASA model: the left postcentral sulcus and a small region in the left ACC. The ACC is thought to be involved in a number of computations, including conscious information processing, response formation (Posner and Rothbart 1992; Posner 1994), and attentional modulation (Davis et al. 2000; Paus 2001). Crottaz-Herbette and Menon (2006) showed attentional modulation by ACC via enhanced connectivity between ACC and both Heschl’s and superior temporal gyri during an auditory odd-ball task. This attentional modulation is likely mediated through top-down and bottom-up interactions between the ACC and sensory cortices. Thus, it is possible that the functional similarities between the ACC and PT are in part explained by feed-forward auditory signals originating in auditory cortex and PT. The ACC has also been shown to play a role in conflict resolution (Weissman 2004). Thus, during the perception of the least-musical sequences, greater conflict resolution may be required in order to overcome ambiguity and settle upon a single percept. Conversely, the most-musical sequences likely entail a single auditory object that emerges as the dominant percept.
A region in the left postcentral sulcus also showed a significant correlation with the ASA model. A qualitative examination of the corresponding MDS plot suggests clustering of manipulated sequences separately from the base sequences. However, unlike the PT, this region does not appear to distinguish between nature of the manipulation (timbre vs. amplitude). The correlation of this region with the ASA model is unexpected and warrants further exploration. Finally, two other regions of the overlapping network (a region in the right postcentral gyrus and left anterior STG) were not significantly correlated with any of the RSA models or with participant behavior. While their involvement in both ASA and auditory object perception was revealed by the two localizers, the specific computations of these areas require further investigation.
Conclusions
In order to explore ASA and auditory object perception, we utilized a novel methodological approach in which participants rated the perceived musicality of randomly generated pure-tone sequences. Musicality ratings were taken as a proxy for the integrity of the auditory object, while changes to musicality ratings in response to low-level manipulations allowed to us explore the relationship between ASA and auditory object perception. Using fMRI, we uncovered a network of regions involved in auditory object/music perception, as well as another network involved in ASA processing. We report the following conclusions: (i) low-level auditory features can modulate the perceptual experience of an auditory object, as measured by changes in musicality ratings in response to ASA manipulations; (ii) ASA and auditory object perception are subserved by largely distinct, yet partially overlapping, brain regions and (iii) regions involved in both ASA processing and auditory object perception show distinct functional profiles. Overlapping regions located in bilateral PT show the strongest correlations with ASA models which deal with the extraction of auditory features (timbre; amplitude). An overlapping region located in the right IPS showed a significant correlation to participant behavior and may play an important role in the perceptual organization of auditory stimuli.
Acknowledgments
We would like to thank Monica Ly and Katherine Blakely for their assistance with data collection.
Contributor Information
Gennadiy Gurariy, Department of Biomedical Engineering, Medical College of Wisconsin and Marquette University, 8701 W Watertown Plank Rd, Milwaukee, WI 53233, United States.
Richard Randall, School of Music and Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, United States.
Adam S Greenberg, Department of Biomedical Engineering, Medical College of Wisconsin and Marquette University, 8701 W Watertown Plank Rd, Milwaukee, WI 53233, United States.
Funding
This research was supported by a Rothberg Research Award in Human Brain Imaging (RR), National Institutes of Health (T32-MH19983; ASG), and the University of Wisconsin-Milwaukee Research Growth Initiative (ASG).
Conflict of interest statement
None declared.
Data availability
Data will be made available upon request.
References
- Ahveninen J, Kopčo N, Jääskeläinen IP. Psychophysics and neuronal bases of sound localization in humans. Hear Res. 2014:307:86–97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alain C, Arnott SR, Hevenor S, Graham S, Grady CL. “What” and “where” in the human auditory system. Proc Natl Acad Sci U S A. 2001:98:12301–12306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Allen EJ, Burton PC, Olman CA, Oxenham AJ. Representations of pitch and timbre variation in human auditory cortex. J Neurosci. 2017:37:1284–1293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Arnott SR, Alain C. The auditory dorsal pathway: orienting vision. Neurosci Biobehav Rev. 2011:35:2162–2173. [DOI] [PubMed] [Google Scholar]
- Arnott SR, Binns MA, Grady CL, Alain C. Assessing the auditory dual-pathway model in humans. NeuroImage. 2004:22:401–408. [DOI] [PubMed] [Google Scholar]
- Ashley R, Hannon EJ, Honing H, Hutchins S, Large E, Palmer C. Music and cognition: what cognitive science can learn from music cognition. Proc Annu Conf Cogn Sci Soc. 2006:28. [Google Scholar]
- Belin P, Zatorre RJ, Romanski LM, Tian B, Fritz JB, Mishkin M, Goldman-Rakic PS, Rauschecker JP. “What”, “where” and “how” in auditory cortex [2]. Nat Neurosci. 2000:3:965–966. [DOI] [PubMed] [Google Scholar]
- Bendor D, Wang X. The neuronal representation of pitch in primate auditory cortex. Nature. 2005:436:1161–1165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B. 1965:57:289–300. [Google Scholar]
- Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann Stat. 2001:29:1165–1188. [Google Scholar]
- Binder JR, Liebenthal E, Possing ET, Medler DA, Ward BD. Neural correlates of sensory and decision processes in auditory object identification. Nat Neurosci. 2004:7:295–301. [DOI] [PubMed] [Google Scholar]
- Bizley JK, Cohen YE. The what, where and how of auditory-object perception. Nat Rev Neurosci. 2013:14:693–707. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bizley JK, Walker KMM. Distributed sensitivity to conspecific vocalizations and implications for the auditory dual stream hypothesis. J Neurosci. 2009:29:3011–3013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bizley JK, Walker KMM, Nodal FR, King AJ, Schnupp JWH. Auditory cortex represents both pitch judgments and the corresponding acoustic cues. Curr Biol. 2013:23:620–625. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blood AJ, Zatorre RJ. Intensely pleasurable responses to music correlate with activity in brain regions implicated in reward and emotion. Proc Natl Acad Sci. 2001:98:11818–11823. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brainard DH. The psychophysics toolbox. Spat Vis. 1997:10:433–436. [PubMed] [Google Scholar]
- Brefczynski-Lewis JA, Lewis JW. Auditory object perception: a neurobiological model and prospective review. Neuropsychologia. 2017:105:223–242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bregman AS. Auditory scene analysis: the perceptual organization of sound. The MIT Press. 1990. [Google Scholar]
- Brown S, Martinez MJ. Activation of premotor vocal areas during musical discrimination. Brain Cogn. 2007:63:59–69. [DOI] [PubMed] [Google Scholar]
- Cate AD, Herron TJ, Yund EW, Stecker GC, Rinne T, Kang X, Petkov CI, Disbrow EA, Woods DL. Auditory attention activates peripheral visual cortex. PLoS One. 2009:4:e4645. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chang EF, Rieger JW, Johnson K, Berger MS, Barbaro NM, Knight RT. Categorical speech representation in human superior temporal gyrus. Nat Neurosci. 2010:13:1428–1432. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen JL, Penhune VB, Zatorre RJ. Listening to musical rhythms recruits motor regions of the brain. Cereb Cortex. 2008:18:2844–2854. [DOI] [PubMed] [Google Scholar]
- Clavagnier S, Falchier A, Kennedy H. Long-distance feedback projections to area V1: implications for multisensory integration, spatial awareness, and visual consciousness. Cogn Affect Behav Neurosci. 2004:4:117–126. [DOI] [PubMed] [Google Scholar]
- Cloutman LL. Interaction between dorsal and ventral processing streams: where, when and how? Brain Lang. 2013:127:251–263. [DOI] [PubMed] [Google Scholar]
- Cohen MS. Parametric analysis of fMRI data using linear systems methods. NeuroImage. 1997:6:93–103. [DOI] [PubMed] [Google Scholar]
- Cohen YE, Russ BE, Davis SJ, Baker AE, Ackelson AL, Nitecki R. A functional role for the ventrolateral prefrontal cortex in non-spatial auditory cognition. Proc Natl Acad Sci. 2009:106:20045–20050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Colby CL, Goldberg ME. Space and attention in parietal cortex. Annu Rev Neurosci. 1999:22:319–349. [DOI] [PubMed] [Google Scholar]
- Corbetta M. Frontoparietal cortical networks for directing attention and the eye to visual locations: identical, independent, or overlapping neural systems? Proc Natl Acad Sci. 1998:95:831–838. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cox RW. AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. Comput Biomed Res. 1996:29:162–173. [DOI] [PubMed] [Google Scholar]
- Crottaz-Herbette S, Menon V. Where and when the anterior cingulate cortex modulates attentional response: combined fMRI and ERP evidence. J Cogn Neurosci. 2006:18:766–780. [DOI] [PubMed] [Google Scholar]
- Cui H. From intention to action: hierarchical sensorimotor transformation in the posterior parietal cortex (12). eNeuro. 2014:1:ENEURO.0017–ENEU14.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cupchik GC, Phillips K, Hill DS. Shared processes in spatial rotation and musical permutation. Brain Cogn. 2001:46:373–382. [DOI] [PubMed] [Google Scholar]
- Cusack R. The intraparietal sulcus and perceptual organization. J Cogn Neurosci. 2005:17:641–651. [DOI] [PubMed] [Google Scholar]
- Darwin CJ. Auditory grouping. Trends Cogn Sci. 1997:1:327–333. [DOI] [PubMed] [Google Scholar]
- Davis KD, Hutchison WD, Lozano AM, Tasker RR, Dostrovsky JO. Human anterior cingulate cortex neurons modulated by attention-demanding tasks. J Neurophysiol. 2000:83:3575–3577. [DOI] [PubMed] [Google Scholar]
- Douglas KM, Bilkey DK. Amusia is associated with deficits in spatial processing. Nat Neurosci. 2007:10:915–921. [DOI] [PubMed] [Google Scholar]
- Ellis D, Lee K. Minimal-Impact Audio-Based Personal Archives. In: First ACM workshop on Continuous Archiving and Recording of Personal Experiences CARPE-04. New York. 2004, p. 39–47. [Google Scholar]
- Ellis DP. 2010. Time-domain scrambling of audio signals in Matlab [WWW document]. http://www.ee.columbia.edu/~dpwe/resources/matlab/scramble/
- Erlikhman G, Gurariy G, Mruczek REB, Caplovitz GP. The neural representation of objects formed through the spatiotemporal integration of visual transients. NeuroImage. 2016:142:67–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fadiga L, Craighero L, D’Ausilio A. Broca’s area in language, action, and music. Ann. N. Y. Acad. Sci. 2009:1169:448–458. [DOI] [PubMed] [Google Scholar]
- Fedorenko E, McDermott JH, Norman-Haignere S, Kanwisher N. Sensitivity to musical structure in the human brain. J Neurophysiol. 2012:108:3289–3300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feng W, Stormer VS, Martinez A, McDonald JJ, Hillyard SA. Sounds activate visual cortex and improve visual discrimination. J Neurosci. 2014:34:9817–9824. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fishman MC, Michael CR. Integration of auditory information in the cat’s visual cortex. Vis Res. 1973:13:1415–1419. [DOI] [PubMed] [Google Scholar]
- Freud E, Plaut DC, Behrmann M. ‘What’ is happening in the dorsal visual pathway. Trends Cogn Sci. 2016:20:773–784. [DOI] [PubMed] [Google Scholar]
- Gifford GW, MacLean KA, Hauser MD, Cohen YE. The neurophysiology of functionally meaningful categories: macaque ventrolateral prefrontal cortex plays a critical role in spontaneous categorization of species-specific vocalizations. J Cogn Neurosci. 2005:17:1471–1482. [DOI] [PubMed] [Google Scholar]
- Graziano MSA, Gandhi S. Location of the polysensory zone in the precentral gyrus of anesthetized monkeys. Exp Brain Res. 2000:135:259–266. [DOI] [PubMed] [Google Scholar]
- Graziano MSA, Reiss LAJ, Gross CG. A neuronal representation of the location of nearby sounds. Nature. 1999:397:428–430. [DOI] [PubMed] [Google Scholar]
- Greenberg AS, Esterman M, Wilson D, Serences JT, Yantis S. Control of spatial and feature-based attention in Frontoparietal cortex. J Neurosci. 2010:30:14330–14339. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grefkes C, Fink GR. The functional organization of the intraparietal sulcus in humans and monkeys. J Anat. 2005:207:3–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Green B, Jääskeläinen IP, Sams M, Rauschecker JP. Distinct brain areas process novel and repeating tone sequences. Brain Lang. 2018:187:104–114. [DOI] [PubMed] [Google Scholar]
- Griffiths TD, Warren JD. The planum temporale as a computational hub. Trends Neurosci. 2002:25:348–353. [DOI] [PubMed] [Google Scholar]
- Griffiths TD. Sensory systems: auditory action streams? Curr Biol. 2008:18:R387–R388. [DOI] [PubMed] [Google Scholar]
- Griffiths TD, Warren JD. What is an auditory object? Nat Rev Neurosci. 2004:5:887–892. [DOI] [PubMed] [Google Scholar]
- Griffiths TD, Kumar S, Sedley W, Nourski KV, Kawasaki H, Oya H, Patterson RD, Brugge JF, Howard MA. Direct recordings of pitch responses from human auditory cortex. Curr Biol. 2010:20:1128–1132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gurariy G, Randall R, Greenberg AS. Manipulation of low-level features modulates grouping strength of auditory objects. Psychol Res. 2021:85:2256–2270. [DOI] [PubMed] [Google Scholar]
- Hall DA, Plack CJ. Pitch processing sites in the human auditory brain. Cereb Cortex. 2009:19:576–585. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heide W, Binkofski F, Seitz RJ, Posse S, Nitschke MF, Freund HJ, Kömpf D. Activation of frontoparietal cortices during memorized triple-step sequences of saccadic eye movements: an fMRI study. Eur J Neurosci. 2001:13:1177–1189. [DOI] [PubMed] [Google Scholar]
- Hill KT, Bishop CW, Yadav D, Miller LM. Pattern of BOLD signal in auditory cortex relates acoustic response to perceptual streaming. BMC Neurosci. 2011:12:85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Howe PD, Horowitz TS, Akos Morocz I, Wolfe J, Livingstone MS. Using fMRI to distinguish components of the multiple object tracking task. J Vis. 2009:9:10.1–10.1011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Itti L, Koch C. Computational modelling of visual attention. Nat Rev Neurosci. 2001:2:194–203. [DOI] [PubMed] [Google Scholar]
- Janata P, Tillmann B, Bharucha JJ. Listening to polyphonic music recruits domain-general attention and working memory circuits. Cogn Affect Behav Neurosci. 2002:2:121–140. [DOI] [PubMed] [Google Scholar]
- Kaas JH, Hackett TA. “What” and “where” processing in auditory cortex. Nat Neurosci. 1999:2:1045–1047. [DOI] [PubMed] [Google Scholar]
- Kaas JH, Hackett TA. Subdivisions of auditory cortex and processing streams in primates. Proc Natl Acad Sci. 2000:97:11793–11799. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kendall M. A new measure of rank correlation. Biometrika. 1938:30:81–89. [Google Scholar]
- Kitada R, Kochiyama T, Hashimoto T, Naito E, Matsumura M. Moving tactile stimuli of fingers are integrated in the intraparietal and inferior parietal cortices. Neuroreport. 2003:14:719–724. [DOI] [PubMed] [Google Scholar]
- Koelsch S. Toward a neural basis of music perception—a review and updated model. Front Psychol. 2011:2:195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koelsch S, Siebel WA. Towards a neural basis of music perception. Trends Cogn Sci. 2005:9:578–584. [DOI] [PubMed] [Google Scholar]
- Kolarik AJ, Moore BCJ, Zahorik P, Cirstea S, Pardhan S. Auditory distance perception in humans: a review of cues, development, neuronal bases, and effects of sensory loss. Atten Percept Psychophys. 2016:78:373–395. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kondo HM, Pressnitzer D, Shimada Y, Kochiyama T, Kashino M. Inhibition-excitation balance in the parietal cortex modulates volitional control for auditory and visual multistability. Sci Rep. 2018:8:14548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kopčo N, Huang S, Belliveau JW, Raij T, Tengshe C, Ahveninen J. Neuronal representations of distance in human auditory cortex. Proc Natl Acad Sci. 2012:109:11019–11024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kriegeskorte N. Representational similarity analysis—connecting the branches of systems neuroscience. Front Syst Neurosci. 2008:2:4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kringelbach ML. The human orbitofrontal cortex: linking reward to hedonic experience. Nat Rev Neurosci. 2005:6:691–702. [DOI] [PubMed] [Google Scholar]
- Kumar S, Stephan KE, Warren JD, Friston KJ, Griffiths TD. Hierarchical processing of auditory objects in humans. PLoS Comput Biol. 2007:3:e100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kunert R, Willems RM, Casasanto D, Patel AD, Hagoort P. Music and language syntax interact in Broca’s area: an fMRI study. PLoS One. 2015:10:e0141069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kusunoki M, Gottlieb J, Goldberg ME. The lateral intraparietal area as a salience map: the representation of abrupt onset, stimulus motion, and task relevance. Vision Res. 2000:40:1458–1468. [DOI] [PubMed] [Google Scholar]
- LaCroix AN, Diaz AF, Rogalsky C. The relationship between the neural computations for speech and music perception is context-dependent: an activation likelihood estimate study. Front Psychol. 2015:6:1138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leaver AM, Rauschecker JP. Cortical representation of natural complex sounds: effects of acoustic features and auditory object category. J Neurosci. 2010:30:7604–7612. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee JH. Prefrontal activity predicts monkeys’ decisions during an auditory category task. Front Integr Neurosci. 2009:3:16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lehne M, Rohrmeier M, Koelsch S. Tension-related activity in the orbitofrontal cortex and amygdala: an fMRI study with music. Soc Cogn Affect Neurosci. 2013:9:1515–1523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lerdhal F, Jackendoff R. A generative theory of tonal music. Cambridge: The MIT Press; 1996 [Google Scholar]
- Likert R. A technique for the measurement of attitudes. Arch Psychol. 1932:22:5–55. [Google Scholar]
- Lomber SG, Malhotra S. Double dissociation of “what” and “where” processing in auditory cortex. Nat Neurosci. 2008:11:609–616. [DOI] [PubMed] [Google Scholar]
- Maeder PP, Meuli RA, Adriani M, Bellmann A, Fornari E, Thiran JP, Pittet A, Clarke S. Distinct pathways involved in sound recognition and localization: a human fMRI study. NeuroImage. 2001:14:802–816. [DOI] [PubMed] [Google Scholar]
- Maess B, Koelsch S, Gunter TC, Friederici AD. Musical syntax is processed in Broca’s area: an MEG study. Nat Neurosci. 2001:4:540–545. [DOI] [PubMed] [Google Scholar]
- MathWorks . MATLAB data analysis. Matlab. 2007. [Google Scholar]
- McDermott JH, Oxenham AJ. Music perception, pitch, and the auditory system. Curr Opin Neurobiol. 2008:18:452–463. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Menon V, Levitin DJ, Smith BK, Lembke A, Krasnow BD, Glazer D, Glover GH, McAdams S. Neural correlates of timbre change in harmonic sounds. NeuroImage. 2002:17:1742–1754. [DOI] [PubMed] [Google Scholar]
- Middlebrooks JC. Sound localization. Handbook of Clinical Neurology. 2015:99–116. 10.1016/b978-0-444-62630-1.00006-8. [DOI] [PubMed] [Google Scholar]
- Morrell F. Visual system’s view of acoustic space. Nature. 1972:238:44–46. [DOI] [PubMed] [Google Scholar]
- Muckli L, Petro LS. Network interactions: non-geniculate input to V1. Curr Opin Neurobiol. 2013:23:195–201. [DOI] [PubMed] [Google Scholar]
- O’Doherty J, Kringelbach ML, Rolls ET, Hornak J, Andrews C. Abstract reward and punishment representations in the human orbitofrontal cortex. Nat Neurosci. 2001:4:95–102. [DOI] [PubMed] [Google Scholar]
- Ohl FW, Scheich H, Freeman WJ. Change in pattern of ongoing cortical activity with auditory category learning. Nature. 2001:412:733–736. [DOI] [PubMed] [Google Scholar]
- Paus T. Primate anterior cingulate cortex: where motor control, drive and cognition interface. Nat Rev Neurosci. 2001:2:417–424. [DOI] [PubMed] [Google Scholar]
- Pearce M, Rohrmeier M. Music cognition and the cognitive sciences. Top Cogn Sci. 2012:4:468–484. [DOI] [PubMed] [Google Scholar]
- Pelli DG. The VideoToolbox software for visual psychophysics: transforming numbers into movies. Spat Vis. 1997:10:437–442. [PubMed] [Google Scholar]
- Petro LS, Paton AT, Muckli L. Contextual modulation of primary visual cortex by auditory signals. Philos Trans R Soc Lond. 2017:372:20160104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Posner MI. Attention: the mechanisms of consciousness. Proc Natl Acad Sci. 1994:91:7398–7403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Posner MI, Rothbart MK. Attentional mechanisms and conscious experience. Neuropsychol Conscious. Academic Press. 1992:91–111. [Google Scholar]
- Pressnitzer D, Sayles M, Micheyl C, Winter IM. Perceptual organization of sound begins in the auditory periphery. Curr Biol. 2008:18:1124–1128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Randall R, Greenberg AS. Principal component analysis of musicality in pitch sequences. In: Proceedings of the 14th International Conference on Music Perception and Cognition. San Francisco, CA; 2016. pp. 112–118 [Google Scholar]
- Rauschecker JP, Scott SK. Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing. Nat Neurosci. 2009:12:718–724. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reiterer S, Erb M, Grodd W, Wildgruber D. Cerebral processing of timbre and loudness: FMRI evidence for a contribution of Broca’s area to basic auditory discrimination. Brain Imaging Behav. 2008:2:1–10. [Google Scholar]
- Röhl M, Uppenkamp S. Neural coding of sound intensity and loudness in the human auditory system. J Assoc Res Otolaryngol. 2012:13:369–379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rohrmeier MA. 2007. A generative grammar approach to diatonic harmonic structure. In: Proc SMC’07, 4th Sound Music Comput Conf.
- Rolls ET. The orbitofrontal cortex and reward. Cereb Cortex. 2000:10:284–294. [DOI] [PubMed] [Google Scholar]
- Rolls ET. The functions of the orbitofrontal cortex. Brain Cogn. 2004:55:11–29. [DOI] [PubMed] [Google Scholar]
- Romanski LM, Tian B, Fritz J, Mishkin M, Goldman-Rakic PS, Rauschecker JP. Dual streams of auditory afferents target multiple domains in the primate prefrontal cortex. Nat Neurosci. 1999:2:1131–1136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Romanski LM, Averbeck BB, Diltz M. Neural representation of vocalizations in the primate ventrolateral prefrontal cortex. J Neurophysiol. 2004:93:734–747. [DOI] [PubMed] [Google Scholar]
- Russ BE, Ackelson AL, Baker AE, Cohen YE. Coding of auditory-stimulus identity in the auditory non-spatial processing stream. J Neurophysiol. 2008:99:87–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Saad ZS, Reynolds RC, Argall B, Japee S, Cox RW. SUMA: an interface for surface-based intra- and inter-subject analysis with AFNI; 2004 2nd IEEE International Symposium on Biomedical Imaging: Nano to Macro (IEEE Cat No. 04EX821). 2005:2:1510–1513. [Google Scholar]
- Schön D, Gordon R, Campagne A, Magne C, Astésano C, Anton JL, Besson M. Similar cerebral networks in language, music and song perception. NeuroImage. 2010:51:450–461. [DOI] [PubMed] [Google Scholar]
- Selezneva E, Scheich H, Brosch M. Dual time scales for categorical decision making in auditory cortex. Curr Biol. 2006:16:2428–2433. [DOI] [PubMed] [Google Scholar]
- Sereno MI, Pitzalis S, Martinez A. Mapping of contralateral space in retinotopic coordinates by a parietal cortical area in humans. Science. 2001:294:1350–1354. [DOI] [PubMed] [Google Scholar]
- Seydell-Greenwald A, Greenberg AS, Rauschecker JP. Are you listening? Brain activation associated with sustained nonspatial auditory attention in the presence and absence of stimulation. Hum Brain Mapp. 2014:35:2233–2252. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shafritz KM, Gore JC, Marois R. The role of the parietal cortex in visual feature binding. Proc Natl Acad Sci. 2002:99:10917–10922. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shamma SA, Elhilali M, Micheyl C. Temporal coherence and attention in auditory scene analysis. Trends Neurosci. 2011:34:114–123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shams L, Kamitani Y, Shimojo S. Illusions: what you see is what you hear. Nature. 2000:408:788. [DOI] [PubMed] [Google Scholar]
- Shinn-Cunningham BG. Object-based auditory and visual attention. Trends Cogn Sci. 2008:12:182–186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shinn-Cunningham BG, Best V. Selective attention in normal and impaired hearing. Trends Amplif. 2008:12:283–299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stecker GC, Middlebrooks JC. Distributed coding of sound locations in the auditory cortex. Biol Cybern. 2003:89:341–349. [DOI] [PubMed] [Google Scholar]
- Teki S, Chait M, Kumar S, von Kriegstein K, Griffiths TD. Brain bases for auditory stimulus-driven figure-ground segregation. J Neurosci. 2011:31:164–171. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Teki S, Barascud N, Picard S, Payne C, Griffiths TD, Chait M. Neural correlates of auditory figure-ground segregation based on temporal coherence. Cereb Cortex. 2016:26:3669–3680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tian B, Reser D, Durham A, Kustov A, Rauschecker JP. Functional specialization in rhesus monkey auditory cortex. Science. 2001:292:290–293. [DOI] [PubMed] [Google Scholar]
- Todd JJ, Marois R. Posterior parietal cortex activity predicts individual differences in visual short-term memory capacity. Cogn Affect Behav Neurosci. 2005:5:144–155. [DOI] [PubMed] [Google Scholar]
- Torgerson WS. Multidimensional scaling: I. Theory and method. Psychometrika. 1952:17:401–419. [Google Scholar]
- Tsunada J, Lee JH, Cohen YE. Differential representation of auditory categories between cell classes in primate auditory cortex. J Physiol. 2012:590:3129–3139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ungerleider L, Mishkin M. Two cortical visual systems. In: Ingle D, Goodale M, Mansfield RJ, editors. Analysis of visual behavior. Cambridge: MIT Press; 1982. pp. 549–586 [Google Scholar]
- Uppenkamp S, Röhl M. Human auditory neuroimaging of intensity and loudness. Hear Res. 2014:307:65–73. [DOI] [PubMed] [Google Scholar]
- van der Heijden K, Rauschecker JP, de Gelder B, Formisano E. Cortical mechanisms of spatial hearing. Nat Rev Neurosci. 2019:20:609–623. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van Noorden L. Temporal coherence in the perception of tone sequences. Unpublished doctoral dissertation, Eindhoven University ofTechnology. 1975. [Google Scholar]
- Vetter P, Smith FW, Muckli L. Decoding sound and imagery content in early visual cortex. Curr Biol. 2014:24:1256–1262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Watkins PV, Barbour DL. Level-tuned neurons in primary auditory cortex adapt differently to loud versus soft sounds. Cereb Cortex. 2011:21:178–190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Watkins S, Shams L, Tanaka S, Haynes JD, Rees G. Sound alters activity in human V1 in association with illusory visual perception. NeuroImage. 2006:31:1247–1256. [DOI] [PubMed] [Google Scholar]
- Watkins S, Shams L, Josephs O, Rees G. Activity in human V1 follows multisensory perception. NeuroImage. 2007:37:572–578. [DOI] [PubMed] [Google Scholar]
- Webster MJ, Bachevalier J, Ungerleider LG. Connections of inferior temporal areas TEO and TE with parietal and frontal cortex in macaque monkeys. Cereb Cortex. 1994:4:470–483. [DOI] [PubMed] [Google Scholar]
- Weissman DH. Dorsal anterior cingulate cortex resolves conflict from distracting stimuli by boosting attention toward relevant events. Cereb Cortex. 2004:15:229–237. [DOI] [PubMed] [Google Scholar]
- Winkler I, Denham SL, Nelken I. Modeling the auditory scene: predictive regularity representations and perceptual objects. Trends Cogn Sci. 2009:13:532–540. [DOI] [PubMed] [Google Scholar]
- Winkowski DE, Nagode DA, Donaldson KJ, Yin P, Shamma SA, Fritz JB, Kanold PO. Orbitofrontal cortex neurons respond to sound and activate primary auditory cortex neurons. Cereb Cortex. 2018:28:868–879. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wise SP. Motor cortex. Int Encycl Soc Behav Sci. 2001:10137–10140. [Google Scholar]
- Wojciulik E, Kanwisher N. The generality of parietal involvement in visual attention. Neuron. 1999:23:747–764. [DOI] [PubMed] [Google Scholar]
- Worsley KJ, Friston KJ. Analysis of fMRI time-series revisited—again. NeuroImage. 1995:2:173–181. [DOI] [PubMed] [Google Scholar]
- Xu Y, Chun MM. Selecting and perceiving multiple visual objects. Trends Cogn Sci. 2009:13:167–174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yu M, Xu M, Li X, Chen Z, Song Y, Liu J. The shared neural basis of music and language. Neuroscience. 2017:357:208–219. [DOI] [PubMed] [Google Scholar]
- Zald DH, Kim SW. Anatomy and function of the orbital frontal cortex: I. Anatomy, neurocircuitry, and obsessive-compulsive disorder. J Neuropsychiatry Clin Neurosci. 1996:8:125–138. [DOI] [PubMed] [Google Scholar]
- Zatorre R. Music, the food of neuroscience? Nature. 2005:434:312–315. [DOI] [PubMed] [Google Scholar]
- Zatorre RJ, Belin P. Spectral and temporal processing in human auditory cortex. Cereb Cortex. 2001:11:946–953. [DOI] [PubMed] [Google Scholar]
- Zatorre R, Evans A, Meyer E. Neural mechanisms underlying melodic perception and memory for pitch. J Neurosci. 1994:14:1908–1919. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zimmer U, Lewald J, Erb M, Grodd W, Karnath HO. Is there a role of visual cortex in spatial hearing? Eur J Neurosci. 2004:20:3148–3156. [DOI] [PubMed] [Google Scholar]
- Zimmermann JF, Moscovitch M, Alain C. Attending to auditory memory. Brain Res. 2016:1640:208–221. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data will be made available upon request.