Abstract
Attentional modulation of cortical networks is critical for the cognitive flexibility required to process complex scenes. Current theoretical frameworks for attention are based almost exclusively on studies in visual cortex, where attentional effects are typically modest and excitatory. In contrast, attentional effects in auditory cortex can be large and suppressive. A theoretical framework for explaining attentional effects in auditory cortex is lacking, preventing a broader understanding of cortical mechanisms underlying attention. Here, we present a cortical network model of attention in primary auditory cortex (A1). A key mechanism in our network is attentional inhibitory modulation (AIM) of cortical inhibitory neurons. In this mechanism, top-down inhibitory neurons disinhibit bottom-up cortical circuits, a prominent circuit motif observed in sensory cortex. Our results reveal that the same underlying mechanisms in the AIM network can explain diverse attentional effects on both spatial and frequency tuning in A1. We find that a dominant effect of disinhibition on cortical tuning is suppressive, consistent with experimental observations. Functionally, the AIM network may play a key role in solving the cocktail party problem. We demonstrate how attention can guide the AIM network to monitor an acoustic scene, select a specific target, or switch to a different target, providing flexible outputs for solving the cocktail party problem.
Author summary
Selective attention plays a key role in how we navigate our everyday lives. For example, at a cocktail party, we can attend to friend’s speech amidst other speakers, music, and background noise. In stark contrast, hundreds of millions of people with hearing impairment and other disorders find such environments overwhelming and debilitating. Understanding the mechanisms underlying selective attention may lead to breakthroughs in improving the quality of life for those negatively affected. Here, we propose a mechanistic network model of attention in primary auditory cortex based on attentional inhibitory modulation (AIM). In the AIM model, attention targets specific cortical inhibitory neurons, which then modulate local cortical circuits to emphasize a particular feature of sounds and suppress competing features. We show that the AIM model can account for experimental observations across different species and stimulus domains. We also demonstrate that the same mechanisms can enable listeners to flexibly switch between attending to specific targets sounds and monitoring the environment in complex acoustic scenes, such as a cocktail party. The AIM network provides a theoretical framework which can work in tandem with new experiments to help unravel cortical circuits underlying attention.
Introduction
A hallmark of cortical processing is the capacity for generating flexible behaviors in a context-dependent manner. A striking example of a problem that requires such cognitive flexibility is the cocktail party problem, where a listener can selectively listen to a speaker amongst other speakers [1]. Listening in such settings can be highly flexible, depending on the goal of the listener. For example, a listener can monitor the entire auditory scene, select a particular target, or switch to another target. Recent theoretical and experimental studies have begun to propose model networks and cortical mechanisms for producing flexible behaviors [2–8], and top-down control of cortical circuits via attention is thought to be a critical component.
The influence of attention on cortical processing has been intensively investigated in vision, resulting in a prominent theoretical framework of attention [9,10]. In contrast, relatively little is known about attentional mechanisms in auditory cortex. After the early discovery of “attention units” in the auditory cortex [11], there has recently been renewed interest on attentional effects in auditory cortex [12–14]. In comparison to the effects of attention in primary visual cortex, which are relatively small and excitatory [15], attentional effects in primary auditory cortex (A1) can be much larger and suppressive [13,16,17]. However, a theoretical framework for cortical mechanisms underlying auditory attention is lacking.
The responses of neurons in A1 can change rapidly when an animal is actively engaged in a task [8,13,16,17]. For example, cortical neurons with broad spatial tuning curves can sharpen tuning during attentive behavior [16]; whereas the spectral temporal receptive fields (STRFs) of cortical neurons with narrow frequency tuning can display the emergence of entirely new excitatory regions [17] or suppressive effects [13]. Cortical network mechanisms underlying such diverse attentional changes in tuning remain poorly understood. Changes in cortical tuning can also be driven by competing auditory stimuli in cocktail-party settings, even when an animal is anesthetized [18,19], suggesting the involvement of both bottom-up and top-down mechanisms [1,12,20,21].
There is a growing literature on computational models of auditory attention in the context of auditory scene analysis, as discussed in a comprehensive review [22]. These models can be grouped into bottom-up or top-down models. Bottom-up models have employed time-frequency representations of sound as an “auditory image” to compute salience maps using static or temporally evolving features of the image; and predictive coding theory to account for behavioral results in humans and animals processing auditory scenes [23,24]. Top-down models have formulated neural processing as spectral-temporal “filters” which extract features from the auditory image. In these models, attention adjusts the filter characteristics to optimize the detection and discrimination of targets to explain changes in receptive field properties in behaving animals [25]. Subsequent models have extended the feature analysis framework to propose computational principles, e.g., temporal coherence for linking multiple features across time (“streaming”) [26], and incorporated task structure to demonstrate that changes in receptive field properties during behavior can be specific to task demands [27] as observed experimentally [28]. These previous computational models are all formulated in terms of statistical, signal processing or optimization principles. Thus, a key gap in current state-of-the-art models remains in formulating mechanistic models of how cortical computations are implemented by underlying cortical circuits. Indeed, as the review points out, “The field is particularly challenged by the lack of theories that integrate our knowledge of cortical circuitry in the auditory pathway with adaptive and cognitive processes that shape behavior and perception of complex acoustic scenes” [22]. So far, circuit-level models of cortical processing underlying the cocktail party problem have largely focused on bottom-up mechanisms [29,30]. Specific cortical circuit mechanisms underlying top-down attentional changes in cortical responses, and their functional role in solving the cocktail party problem, remain unclear [22].
Here we propose a network model to explain how experimentally observed cortical response properties in A1 could arise from underlying network mechanisms, via the interplay between bottom-up and top-down processes. Central to our network model is attentional inhibitory modulation (AIM), i.e., attention-driven modulation of distinct populations of cortical inhibitory neurons. Specifically, this mechanism relies on disinhibition of bottom-up cortical circuits, mediated via top-down inhibitory neurons, a prominent motif observed in cortex [5–8,31]. We first use the AIM network to model attentional changes in spatial and spectral tuning in auditory cortex [16,17], and then illustrate its potential functional role in solving the cocktail party problem.
Results
The AIM network
We began by focusing on spatial processing of multiple sound sources in auditory cortex, extending previous models of bottom-up processing (Fig 1A and 1B). The bottom-up network is made up of integrate-and-fire neurons and implements two key operations–integration and competition. Integration is mediated by broad convergence across spatial channels on the cortical neuron (C, Fig 1A), whereas competition is mediated by inhibition across spatial channels via I neurons (Fig 1B). These bottom-up mechanisms explain two key features observed experimentally in anesthetized or passive animals: broad spatial tuning of cortical neurons to single sounds, and sharpening of spatial tuning in the presence of multiple competing sounds [18,19,29,30].
We extended the bottom-up network to model the effects of attention in the AIM network (Fig 1C). Previous studies have shown that bottom-up cortical representations can be modulated in the attentive state by distinct sub-types of inhibitory neurons. To model such attentional inhibitory modulation, we introduced an additional layer of inhibitory neurons (I2, Fig 1C). I2 neurons can control the spatial and spectral tuning of the cortical neuron in different attentive states, by modulating the activity of E and I neurons in the bottom-up network.
Attentional changes in spatial tuning
A previous experimental study in cat A1 demonstrated that the spatial tuning of cortical neurons sharpen during attentive behavior [16]. Specifically, the authors observed that A1 neurons exhibited broad spatial tuning when the animal was idle but sharpened their spatial tuning when the animal performed a spatial localization task, as demonstrated by the changes in the azimuth-dependent peristimulus time histograms (PSTHs) (Fig 2, 3rd column). We modeled this attention-induced sharpening effect using the AIM network.
In this simulation, the AIM network consisted of an array of spatial channels tuned to locations between -90° and 90° azimuth. We then probed the network with broadband noise from various spatial locations and analyzed the C neuron’s response as a function of space. Based on the azimuth-dependent PSTHs, we found that when I2 neurons in all spatial channels were on, the cortical neuron in the AIM network exhibited broad tuning—similar to the idle condition in the experiment (Fig 2A). The release of acetylcholine (ACh) during behavioral task performance can suppress intracortical excitatory connections and strengthen thalamocortical connections [32–34]. When we simulated these effects in the AIM network, the spatial tuning of the cortical neuron sharpened, resembling the tuning in the behaving condition in the Lee and Middlebrooks study (Fig 2B). In that study, however, animals were not required to attend to a specific location during the task. We simulated selective attention to a specific location by inactivating an I2 neuron in a specific channel, e.g., 30° (Fig 2C, left). We found that, in this case, the spatial tuning of the cortical neuron also sharpened (Fig 2C, right). In the AIM network, this effect occurs because of two key mechanisms: the disinhibition of the attended channel by the I2 neuron, which then drives powerful inhibition of competing channels by the I neuron. Thus, selective attention activates focal disinhibition at the attended location and suppression at other locations in the network.
Attentional changes in spectral tuning
Rapid changes in receptive fields during task performance, thought to arise from attentional mechanisms, have also been observed in the frequency domain as in the experiments by Fritz et al., (2003) and Atiani et al., (2009). Here, we show that the attentional mechanisms in the AIM network can also account for these experimental observations.
We first constrained the AIM network to incorporate experimentally observed features of network connectivity in the frequency domain (Fig 3). Unlike the broad connectivity in the spatial domain (Figs 1 and 2), connectivity across frequency channels is localized to nearby frequencies, reflecting the tonotopic organization of the auditory cortex.
We then simulated the spectral tuning of neurons in the network in the passive vs. attentive states. Here, we probed the network with pure-tone stimuli in the frequency ranges shown in Fig 4. In the attentive state, the network attended to a specific target frequency, distinct from the best frequency of the neuron, as in the experiments by Atiani et al. [35]. Atiani et al. showed the neuron’s response as spectrotemporal receptive fields (STRFs). Here, we used frequency-dependent PSTHs as an approximation to STRFs in order to compare our results.
In the attentive state, the I2 neuron in the attended channel (i.e., the target frequency channel, fT, Fig 4) is suppressed during attention, disinhibiting the E and I neurons in that channel (Fig 4A). In this case, we found that when the target frequency was close to the best frequency of the neuron, attention produced an increase as well as a sharpening of the response near the best frequency, as observed experimentally (Fig 4B). In the AIM network, this occurs because when the target frequency is close to the best frequency, excitation from the disinhibited E neuron in the target channel increases the peak response of the cortical neuron, and inhibition via the I neuron in the target channel sharpens the shape of response. In contrast, when the target frequency is far from the best frequency of the neuron, the effect due to inhibition dominates, producing a net suppressive effect on the response, which is also observed experimentally (Fig 4C). Thus, the AIM network qualitatively explains salient features of the experimental observations by Atiani et al.
In addition to sharpening, strengthening, and weakening of STRF hotspots, the experimental results of Fritz et al. showed that a secondary hotspot in the STRF may arise when the animal is in the attentive state. (Fig 5).
We hypothesized that this is a result of the strengthening of an intracortical connection between the target frequency and the best frequency in the attentive state (see Discussion for possible mechanisms). To test this hypothesis, we added an additional E-E connection between the fT and fB, and found that in the passive state, the neuron responded to frequencies near its best frequency, showing a single hotspot. On the other hand, in the attentive state, the same neuron also responded to the target frequency, as seen by the emergence of a new excitatory region at the target frequency in Fig 5B. This effect occurs due to the strengthening of synaptic connection between the target frequency and the best frequency in the attentive state. There is also a suppressive effect on the response to the best frequency, as seen in the slight reduction in amplitude of the tuning curve at best frequency. This effect occurs due to inhibition driven by the I neuron in the target frequency channel. Both of these effects were observed in experimentally measured receptive fields [17]. Thus, the AIM network qualitatively explained salient features of experimentally observed changes by Fritz et al.
Functional implications
We hypothesized that the attentional mechanisms in the AIM network play an important functional role in processing complex auditory scenes. Two highly effective mechanisms for sound segregation are spatial hearing and frequency selectivity. We first considered functional implications in the spatial domain.
When entering a cocktail party, one might want to monitor the entire scene, focus one’s attention on a conversation partner, or switch attention between conversation partners. How does top-down attentional control modulate bottom-up mechanisms to enable this flexible behavior? To illustrate the behavior of the AIM network in these different modes, we simulated several scenarios. In these simulations, we used three spatial channels corresponding 0°, 90°, and -90° azimuths, presenting the network with different tokens of speech stimuli at these locations, either sequentially or simultaneously (see Methods).
To demonstrate passive listening (the “monitor” mode), we set all I2 neurons active, thereby silencing all I neurons (Fig 6A). In this case, when the speech tokens were presented sequentially to the network, the network output resembles each individual speaker (Fig 6A). When the speech tokens were presented simultaneously, the network output resembles their mixture. Thus, in this mode, the network broadly monitors the acoustic scene across different spatial locations.
To simulate selective attention to a particular speech token, we first inactivated the I2 neuron in the 0° spatial channel, thereby activating lateral inhibition via the disinhibition of I neurons in that channel (Fig 6B). In this case, the network output resembled the 0° speaker output, regardless of whether the speakers were presented sequentially or simultaneously. Finally, to simulate switching attention to a different location (90°), we turned off the I2 neuron at 90°, activating lateral inhibition via the disinhibition of I neuron at that location (Fig 6C). In this case, the network output was more similar to the output for the speaker at 90° (Fig 6D).
Finally, we demonstrate the effect of speaker separation on network performance. First, we expanded the network to have seven spatial channels tuned from 0° to 90° azimuth in 15° increments. We then kept the first speaker at 0° while varying the location of the second speaker, and presented both speakers to the network simultaneously. We found that the network output became more representative of the attended targets (either speaker 1 or speaker 2) as the two speakers separate in space, demonstrating the effect of spatial release from masking [36] (Fig 6E). In summary, when an I2 neuron in a specific spatial channel is inactivated, it disinhibits the I neuron at that location, causing the network to selectively attend to that spatial location. Additionally, the ability of the network to attend to a target depends on the spatial separation between the target and its maskers.
We next investigated the functional role of the AIM network in the frequency domain using the same network shown in Fig 4. In this simulation, two competing speech tokens, a male and female speaker, originate from the same spatial location. In this case, differences in spatial cues cannot be exploited for segregation, but differences in spectral features of the two speakers, e.g., the fundamental frequency (F0), are available. We simulated the network in three conditions: passive, attending to male F0, or attending to female F0. Attention was simulated (as in Fig 2) by inactivating the I2 neuron which disinhibits the I neuron at the attended frequency channel. In the attending modes, network activity showed a sharp peak around the F0 of the attended speaker, and a suppression of activity around the F0 of the competing speaker (Fig 7). Similar to the earlier results shown in Fig 3, the attended F0 received an increase in spiking activity in that region while spiking activity in frequencies far from the F0 are suppressed (Fig 7A marginals and Fig 7B).
Discussion
The capacity for generating flexible behaviors in a context-dependent manner is central to many complex cognitive tasks. How cortical circuits achieve such flexible computations is a central area of investigation in both theoretical and experimental neuroscience. Recent theoretical studies have begun to propose model networks capable of producing flexible behaviors, e.g., gating mechanisms for flexible routing of information [2–4], and experimental studies have begun to reveal cortical mechanisms underlying flexible gating of information and attentional control [4,6,37]. However, such models are lacking for auditory cortex. In this study, we propose the AIM network, which describes a mechanism of interaction between top-down and bottom-up processes in auditory cortex that may underlie the attention-driven changes in cognitive behavior.
Flexible cortical processing
The rapid flexibility of the AIM network is generated by top-down inputs, which control the state of the network by dictating the on/off state of specific I2 neurons. The top-down inhibition of I2 neurons disinhibits the I neuron of the attended channel, which then suppresses competing channels via top-down lateral inhibition, resulting in focused attention. On the other hand, when I2 neurons in all channels are active, the network integrates information from all input channels. In the spatial case, this produces broad spatial tuning and allows the network to monitor the entire scene. The ability to switch between these behaviors is important from a functional standpoint. A network that is always selective for a single channel may fail to detect important events in the scene at other locations. The different states of the I2 layer (e.g., all on vs. one off), allows the opportunity for exploration (by detecting events across a broad range of locations) as well as exploitation (by selectively listening to a particular channel) in a dynamic manner.
Switching between exploration and exploitation is especially interesting in the context of findings that the “spotlight of attention” fluctuates in a rhythmic manner [38]. In our model, such fluctuations in the strength of the top-down input would cause the network to alternate between periods favoring broad detection of sounds across the entire acoustic scene and fine discrimination at a single spatial location [39]. Such periods may allow salient sounds in the background to capture the spotlight of attention, as demonstrated in a recent study in humans [40]. Moreover, changes in the location of the top-down input would promote switching attention to a different location [41]. These various behaviors may play important roles in how animals navigate complex environments.
Alternative/additional mechanisms
The AIM network describes the mechanistic interaction between top-down and bottom-up processes, and even though it can explain various experimental observations, alternative/additional mechanisms may be involved in some aspects of the experiments modeled here.
Neuromodulation
The neuromodulatory systems can exert powerful control over the global state of cortical networks, e.g., asleep, quiet arousal, and active attention [42]. Two key modulatory projections to cortex involve norepinephrine (NE) and acetylcholine (ACh), which have been implicated in arousal and attention, respectively. In our network, we assumed that when the network is in the “monitor” mode, the I2 layer is on. Such a global state may correspond to arousal and be NE-dependent. Top-down suppression of I2 neurons in our model could correspond to an attentive state and be ACh-dependent.
Indeed, we showed that global cholinergic effects on cortical circuits, i.e., suppression of intracortical connections and enhancement of thalamocortical connections, could also produce a sharpening in spatial tuning (Fig 2). In the study by Lee and Middlebrooks [16], animals were not required to attend to a fixed location when performing the localization task. Thus, sharpening of attention via the activation of global cholinergic mechanisms may be more consistent with that experimental design.
Top-down inputs from frontal areas
Although cholinergic mechanisms are clearly important in attentional states, such mechanisms are thought to operate on slow timescales and can be long lasting [42], whereas switching between exploration and focused attention requires rapid, reversible changes in cortical outputs, potentially on sub-second timescales. Recent studies with EEG and fMRI in humans have suggested top-down activation in the frontal areas modulates processing in auditory cortex on the time scale of hundreds of milliseconds [43,44]. Evidence from studies in mice also support the idea that the frontoparietal network can modulate processing in the primary sensory cortices during selective attention [5,45–47]. Together, these studies suggest that the top-down signals responsible for modulating the AIM network may originate from the frontal areas.
Synapse-specific gating
Our model predicts that experiments where animals are required to selectively attend to a specific location should also produce a sharpening of spatial tuning in A1. For the experiments by Fritz et al. [17], we found that strengthening of the intra-cortical synaptic connection between the target frequency and the best frequency could explain the emergence of new excitatory regions at the target frequency. A possible mechanism for transient, reversible strengthening of intracortical synapses is synapse-specific gating [3,4], which may then promote long-term strengthening via classic Hebbian plasticity.
Cortical inhibitory neurons and function
Inhibitory neurons play key roles in the AIM network. There are several types of inhibitory neurons in cortex [48]. The majority of inhibitory neurons can be placed into three categories: those that express parvalbumin (PV), somatostatin (SOM), or vasointestinal peptide (VIP). It is worth noting that most currently available information on specific classes of interneurons come from studies in rodents in a variety of cortical areas, whereas key experimental observations modeled in this study were obtained in other species. Thus, it is difficult to directly map the functional groups of neurons, e.g., I2 and I, in the model to identified interneuron types e.g., PV, SOM and VIP neurons. Nevertheless, we suggest some hypotheses on a possible correspondence based on recent experiments in rodents, to motivate future experimental work.
VIP neurons
The top-down input in the model could correspond to inputs from VIP neurons. VIP mediated inhibition is engaged under specific behavioral conditions, including attention [5,7]. It has been proposed that VIP cells “open holes in the blanket of inhibition” [49], generating the “spotlight of attention” [5]. Our results are consistent with this intuition, with the top-down input being critical for selecting a particular target and switching to a different target.
VIP input is often thought to favor excitation, due to the disinhibition of excitatory neurons [50]. In the model, top-down inhibition of an I2 neuron in a specific channel activates powerful inhibition via I neurons that suppress competing channels, leading to the selection of the target. Thus, the model also explains powerful suppressive effects of selective attention, which have been observed in auditory cortex [13]. The model predicts that silencing top-down inputs to a specific channel, via optogenetics or other methods, should block the effects of selective attention.
SOM and PV neurons
VIP neurons are known to inhibit SOM neurons, which in turn inhibit excitatory neurons [31]. This motif suggests that the I2 neurons in our model may correspond to SOM neurons, specifically Martinotti cells, which are strongly targeted by VIP neurons [51].
The I neurons mediate powerful and sustained inhibition of competing channels in the model. A key distinction between this type of inhibition and “classical” lateral inhibition observed at multiple stages of sensory processing starting at the periphery is noteworthy. Classical lateral inhibition is activated by bottom-up stimulus-driven mechanisms, whereas the inhibition in our model is driven by top-down attentional mechanisms. To distinguish these two cases, we refer to the inhibition mediated by I neurons in the model as “top-down lateral inhibition”. This distinction is conceptually and functionally important, because unlike classical lateral inhibition, which is recruited automatically by the stimulus, top-down lateral inhibition can be recruited volitionally. Such top-down lateral inhibition can be activated by direct disinhibition of I neurons, disinhibition of E neurons which drives feedback lateral inhibition via I neurons (S1 Fig), or a combination of the two.
In principle, top-down lateral inhibition could be mediated by any interneuron type. Although PV neurons are a possible candidate, the long-lasting inhibition required to suppress competing channels in our model should be distinguished from the fast and transient dynamics of inhibition typically associated with PV neurons [48]. SOM neurons can also mediate feedback lateral inhibition to generate a “winner-take-all” circuit and suppress competing channels [52], or modulate bottom-up inputs in specific layers [51,53]. Developing behavioral paradigms for investigating attention in rodents combined with optogenetic manipulations, and/or developing methods for selectively manipulating different interneuron types in other species, are promising future directions for identifying specific cell-types involved in mediating top-down lateral inhibition.
Space vs. frequency
We related the effects of attention in the AIM network to key experimental observations on changes in cortical spatial and frequency tuning in animals engaged in a behavioral task vs. passive animals [16,17]. Similar changes have also been reported in the primary auditory cortex of humans [54]. Assuming that attentional mechanisms are a key factor in driving such changes [12,16,55], the effects of attention appear very different in the spatial and frequency domains. In the spatial domain, broad tuning sharpens during task performance, whereas in the frequency domain, narrow tuning can be enhanced or suppressed depending on the target frequency and other parameters such as SNR. Our results suggest these apparent differences in the spatial vs. the frequency domain may share similar underlying attentional mechanisms.
A key difference between the spatial and frequency domains in our model is the convergence from the E neurons to the C neuron, which is broad in the spatial domain but narrow in the frequency domain. Previous studies in the auditory cortex have found a tonotopic organization, but no topographic organization for spatial tuning [56]. Therefore, local synaptic connections in a patch of cortex may result in convergence from neurons with similar tuning in frequency but a broad range of tuning in space, which is consistent with our model. Thus, our results suggest that the same cortical mechanisms underlying attention can produce diverse effects on stimulus tuning, due to differences in the cortical organization of stimulus features, e.g., space or frequency. For simplicity, we considered spatial and spectral networks separately. Future models should unify these two dimensions.
Domain-specific considerations
Spatial domain
Here, we did not explicitly model how spatially tuned inputs to the AIM network arise, an aspect that is likely to be species dependent. In the AIM network, spatial tuning is inherited from tuning for acoustic cues in pre-cortical areas [30], perhaps the simplest scenario consistent with experimental observations [57–59]. Additional mechanisms, e.g., forward suppression, may further sharpen or generate spatial tuning in cortex [60,61]. In rodents, spatially tuned responses covering a range of azimuths have been observed in cortical areas [62], and may emerge from excitatory-inhibitory interactions in the underlying network [63]. From a functional standpoint, it is interesting to note that sharp tuning is not necessary for the monitor mode, but only for the selective mode of the AIM network. In some species, the sharpness of tuning may emerge in the attentive state based on state-dependent mechanisms, and/or inputs from other brain areas, e.g., the superior colliculus, which shows a map of auditory space [64,65] and can modulate responses in A1 via the pulvinar [66]. These outstanding issues will require further experimental work, especially in attentive animals, as well as the development of species-specific models.
Spectral domain
The model has several simplifications and limitations that motivate future directions of work. For example, in the frequency network, we used pure tones to characterize the responses of neurons in the network and relate them to experimentally observed STRFs obtained with ripple noise stimuli. Although this approach captured salient attentional effects observed experimentally, future studies should probe non-linear components using complex stimuli, e.g., ripples and natural sounds.
In this study, we focused on excitatory regions of STRFs, modeling three representative changes in the excitatory regions of STRFs observed by Fritz et al and Atiani et al. [35]. However, it is known that the balance between both excitatory and inhibitory subregions can play an important functional role [67]. In preliminary simulations we found that attention could also decrease an inhibitory subregion, as observed by Fritz et al. (S2 Fig). Modeling the effects of attention on inhibitory regions, and complex STRFs with both excitatory and inhibitory regions, merit further investigation in the future. For example, the diverse effects of training paradigms on inhibitory regions [28], will require modeling the effect of training, reward or punishment on cortical circuits in the model.
We used the AIM network to illustrate how top-down inhibition of I2 neurons can enhance the representation of sounds at an attended location, or an attended feature, e.g., F0 in speech, by suppressing competing sounds at a different location or F0. F0 is likely to be one of many potential features that contribute to speech segregation. However, a similar principle could be applied to more complex features, e.g., enhancing the representation of an attended harmonic, employing harmonic template neurons in the auditory cortex [68]. Such enhancement of an attended feature accompanied by suppression of competing features may contribute to speech segregation in settings where spatial cues are unavailable [69].
Temporal domain
Temporal dynamics likely play a large role in auditory processing. For example, the neurons observed by Lee and Middlebrooks are highly sensitive to stimulus onset (Fig 2). Our work focused on the effect of attention in the spatial and spectral domain, and thus we did not include detailed models of temporal dynamics, e.g., adaptation or sensitivity to stimulus onset or offset, or investigate the transient tuning properties during attention switching. Additionally, we did not investigate temporal phenomena that is likely to play an important role in speech segregation, e.g., temporal aspects of F0 or tracking of slow spectrotemporal modulations [70], and auditory “streaming” for linking sound segments over time [71]. We believe that thoroughly investigating these aspects at the cortical circuit level will require modeling rich temporal aspects of neuronal and network dynamics, e.g., adaptation, synaptic facilitation and depression, oscillations, synchrony, and coherence, and is outside the scope of this current study. Future extensions of the AIM network should incorporate these aspects to link mechanisms of neuronal and network dynamics to attentional dynamics.
A mechanistic model of attention
Previous studies have modeled the effects of attention on auditory cortical receptive fields using mathematical and computational principles such as temporal coherence [22,72]. In contrast, the AIM network is a cortical circuit level model underlying attentional effects. One recent study modeled different STRFs in the attending vs passive state of the ferret A1 with a two-layer spiking network [73]. The focus of that study was to produce detailed fits of STRFs in attending and passive animals. In contrast, the focus of this study was to propose general cortical circuit mechanisms, e.g., top-down disinhibition, underlying the effects of attention on both spatial and spectral tuning. Another previous study modeled global cholinergic mechanisms underlying changes in STRF [74], similar to the effects modeled in Fig 2B. However, that study did not include the selective top-down disinhibitory mechanism, which was unknown at that time and is a key mechanism in the AIM network.
Original models of attention in vision were also developed based on computational principles, e.g., biased competition or normalization [9,10]. At the time, available information on cortical circuits to guide and constrain circuit-based models were limited. Subsequently, cortical circuit-based models of visual attention have been proposed [75]. With the rapidly emerging knowledge of specific cell types and circuitry in auditory cortex, along with the availability of powerful optogenetic tools for cell type-specific perturbations, the AIM network may help guide the design of new experiments to unravel cortical circuits that underlie general attention.
Methods
Simulations and models were implemented in Matlab (Natick, MA, USA).
Stimuli
Three sets of auditory stimuli were used, depending on the specific simulation. White Gaussian noise was used as the stimulus in spatial tuning simulations, pure tones with frequencies approximately equal to the center frequencies of the gammatone filterbank (see Subcortical Processing) were used in spectral tuning simulations, and speech tokens from the Coordinated Response Measure (CRM) corpus were used in the functional demonstration simulations [76]. In spatial simulations where stimuli were placed along the azimuth, directionality is imparted on the stimuli by convolving them with the head-related transfer functions (HRTFs) of the Knowles Electronics Mannikin for Acoustic Research (KEMAR) [77,78].
Subcortical processing
Stimuli for each simulation were first processed and encoded with models of the auditory periphery and midbrain, then presented to the network. The auditory periphery was modeled by a gammatone filterbank, implemented using the Auditory Toolbox [79]. It was used to separate the sentence mixture into 64 narrowband frequencies, with center frequencies ranging from 200 to 8000 Hz, uniformly spaced on the equivalent rectangular bandwidth scale.
We used a previously published model of the midbrain to perform spatial segregation of spatialized stimulus mixtures, as well as to encode the stimuli. If a simulation did not use spatial stimuli as the input, the stimuli were treated as dichotic. For details pertaining to the midbrain model, see Fischer et al. and Chou et al. [30,80]. Briefly, the midbrain model computed binaural features (i.e., interaural timing and level differences) in each time-frequency tile (i.e., narrowband and short time window). Model neurons encoded the stimulus at specific time-frequency tiles if the binaural features of the stimuli matched the “preferred” binaural features of the model neuron, thereby performing spatial segregation. The preferred binaural feature of each model neuron is specific to the frequency and spatial channel each neuron belonged to. There were 64 frequency channels in the midbrain model, corresponding to each channel of the gammatone filter. The number of spatial channels in the midbrain model depended on each specific simulation. The input neuron in a spatial channel is spatially tuned to the azimuth corresponding to that channel, consistent with spatial tuning of acoustic cues observed in sub-cortical areas [57–59]. The spiking responses of these model neurons were used as the input to the AIM network.
Attentional inhibitory modulation (AIM) network
The AIM network was implemented using the DynaSim package [81], and its structure is illustrated in Fig 1. For simplification purposes, only one frequency channel and three spatial channels are shown. A “spatial channel” refers to the sub-network of neurons that are responsible for processing inputs from a specific spatial location (blue shading, Fig 1). The number of spatial and frequency channels in the network, and their connectivities, depended on the specific simulation being explored.
Five neural populations were created within the network: excitatory input (IC), excitatory (E), inhibitory (I), output cortical (C), and a second inhibitory (I2) population. IC neurons represent the bottom-up inputs to the network from the subcortical model. I2 neurons represent attentional top-down control. With the exception of the C neurons, a number of neurons were created within each population, corresponding to each of the spatial or frequency channels needed in a simulation. All five neural populations are implemented as leaky integrate-and-fire neurons whose dynamics are defined by The following differential equation [82]:
where V is the membrane potential, isyn is the synaptic input current, C is membrane capacitance, gleak is the membrane conductance, and Eleak is the equilibrium potential. The spike-and-reset mechanism employed in our model dictates that if V>Vthresh, then V→Vreset. Here, Vthresh is the action potential threshold and Vreset is the reset voltage. Values for these parameters are listed in Table 1.
Table 1. Default parameters of cellular dynamics.
Parameter | Value |
---|---|
C (nF) | 1 |
gleak (μS) | 0.1 |
Eleak (mV) | -70 |
Vthresh (mV) | -55 |
Vspike (mV) | 50 |
Vreset (mV) | -75 |
iapp (μA) | 0 |
Noise | 0 |
The dynamics of the synaptic input current is defined by a double exponential:
where t is time since the previous spike, gsyn is the synaptic conductance, τD and τR are the decay and rise time constants, respectively, and the difference of exponentials represent the excitatory post-synaptic potential (EPSP) waveform. u(t) is the unit step function to ensure that EPSP is zero before the previous spike has occurred. Esyn is the reversal potential, ie is the externally applied current, iapp is the externally applied current, and netcon refers to a binary matrix of network connectivities that define the connections between populations of neurons. Each row in the netcon matrix represents a presynaptic neuron, and each column represents a postsynaptic neuron. Binary entries of netcon represents presence of a synaptic connection between neurons. Inhibitory synapses have the following parameters: τR = 1ms, τD = 10ms, Esyn = −80mv. Excitatory synapses have the following parameters: τR = 0.4ms, τD = 2ms, Esyn = 0mv. The values for gsyn and iapp are simulation- and connection-dependent, and are listed in Table 2. The network connections are illustrated in Figs 1 and 3.
Table 2. Simulation-specific parameters.
Connection or Neuron | Param | Lee | Fritz | Atiani a | Atiani b | Spatial Function | Freq Function |
---|---|---|---|---|---|---|---|
IC→E | gSYN | 2.5 | 4 | 4 | 3 | 2 | 4 |
E→C | gSYN | 2 | 1.25 | 1.25 | 3 | 2 | 1.5 |
E→I | gSYN | 2.5 | 3 | 3 | 3 | 2 | 3 |
E→E (Attend) | gSYN | - | 5 | - | - | - | - |
E→E (Passive | gSYN | - | 3 | - | - | - | - |
I2→I | gSYN | 4 | 3 | 3 | 3 | 2.25 | 3 |
I2→E | gSYN | 2.8 | 4 | 4 | 3 | 1.25 | 3 |
I→E | gSYN | 3 | 4 | 4 | 4 | 3 | 3.5 |
I | i app | 3 | - | - | - | 4 | - |
E | i app | 1 | 3 | 3 | 3 | 0 | 3 |
I2 | i app | 8 | 8 | 8 | 8 | 3.5 | 8 |
g EC | - | - | 1 | 1 | 1 | - | 1 |
g EC | - | - | 1 | 1 | 1 | - | 1 |
σIE (kHz) | - | - | 0.65 | 1.7 | 2 | - | 0.01 |
σEC (kHz) | - | - | 0.32 | 1.15 | 0.35 | - | 0.05 |
The default gsyn were chosen such that if I neurons were off, then the inputs would be relayed and combined at the C neuron with a similar firing rate, and if I neurons were on, then E neurons would be completely silenced.
Simulation-specific model configurations
Lee & Middlebrooks simulations
In this spatial tuning simulation, 80 ms of white gaussian noise was placed between -80° to 80° azimuth, in 10° increments. The spatialized stimuli were then processed and encoded with the subcortical model. The midbrain model in this simulation consisted of 19 spatial channels from -90° to 90° azimuth, in 10° increments, and 64 frequency channels. To reduce the computational demand of simulating the AIM network, a new set of spike trains, generated using a Poisson model based on the overall firing rate across all frequency channels, were computed for each spatial channel. This operation essentially collapses the neural response over the frequency dimension. Therefore, the AIM network for this simulation consisted of 19 spatial channels and one single frequency channel, where each spatial channel processed the set of spike trains that represent the average activity across all frequencies. Network connectivities between spatial channels are as shown in Fig 2. Spatial tuning curves were then calculated based on the response of the C neuron of the AIM network.
The effects of neuromodulators were simulated by applying a gain on the network connections. During the behavior state, off-target E-C connectivities were applied a gain of 0 to simulate the effects of muscarinic receptors, and off-target IC-E connectivities were applied a gain of 2.5 to simulate the effects of the nicotinic receptors. These gains were chosen to replicate the effects observed experimentally.
Atiani et al. and Fritz et al. simulations
Pure tones were presented dichotically to the subcortical model, which consisted of a single spatial channel, corresponding to 0° azimuth, and 64 frequency channels. Spike trains were passed directly to the AIM network, which also consisted of a single spatial channel and 64 frequency channels. Network connectivities between spatial channels are as shown in Fig 3. Approximations to spectral temporal receptive fields were calculated based on the response of the best-frequency cortical neuron to each of the pure tone stimulus.
Calculation of frequency- and azimuth- dependent peristimulus time histograms (PSTHs)
To Approximate STRFs of cortical neurons, we show the responses of the model cortical neuron as functions of time and either frequency or space. In the spatial case, white Gaussian noise were used as stimuli. In the spectral case, pure tones were used as stimuli. Model neuron response for each frequency or azimuth were shown as its firing rate, which was calculated using a 5ms moving window.
Functional example–spatial listening
In this example, we demonstrate how the AIM network can be used to isolate a specific talker of interest within a speech mixture. 20 pairs of speech tokens, one male and one female, were randomly chosen from the CRM corpus. The male token was placed at 0° and the female token was placed at 90° azimuth. For simultaneous presentation, speech tokens were summed prior to being processed by the subcortical model. The subcortical model used 5 spatial channels, tuned from -90° to 90° azimuth in 45° increments, and 64 frequency channels, and spatially segregates the speech tokens. Its output is relayed directly to the AIM network, which also has 5 spatial channels and 64 frequency channels. In this simulation, network connectivities across spatial channels and parameters are as shown in Fig 2, and each frequency channel operated independently of each other.
To demonstrate the effect of spatial separation on model performance, the male speaker was placed at 0° azimuth while the female speaker was placed at locations between 15° and 90° azimuths in 15° increments (Fig 6E). The subcortical model for this simulation used 7 spatial channels, corresponding to locations from 0° to 90° azimuths in 15° increments, and 64 frequency channels. The output is then processed with the AIM network, which has 7 spatial channels corresponding to the same locations in space. The AIM network was set to 1) attend to the male target at 0° or 2) attend to the female target at various locations or 3) to be in the monitor mode. The network performance was measured by “similarity” between the network outputs and the attended speakers. In the monitoring mode, male speaker was used as the reference talker. Similarity was quantified by calculating the two-dimensional correlation coefficient between the network output of the specific simulation and the network output of the reference speakers. More specifically, we first calculated the firing rates for each frequency channel, then calculated the two-dimensional correlation coefficient of the firing rates.
Functional example–monaural listening
In this example, we demonstrate that the AIM network can also operate in the spectral domain to aid in sound segregation during monaural listening. The same 20 pairs of speech tokens as above were summed and presented dichotically to the subcortical model. Here, both the subcortical model and the AIM network has one spatial channel (0° azimuth) and 64 frequency channels. In this simulation, the network connectivities are as shown in Fig 3. The pitch of each speech token was estimated using MATLAB Audio Toolbox’s pitch() function. The I-E connectivity parameters were chosen based on the two speaker’s f0, such that when attention is focused on one speaker’s f0, the other speaker’s f0 would be inhibited.
Supporting information
Acknowledgments
The authors thank Larry Abbott and John Middlebrooks for comments on the manuscript.
Data Availability
All code used for this work are available online at www.github.com/kfchou/AIM_network.
Funding Statement
This work was supported by an NSF Award 1835270 (to KS). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.McDermott JH. The cocktail party problem. Curr Biol. 2009;19: R1024–R1027. doi: 10.1016/j.cub.2009.09.005 [DOI] [PubMed] [Google Scholar]
- 2.Vogels TP, Abbott LF. Gating multiple signals through detailed balance of excitation and inhibition in spiking networks. Nat Neurosci. 2009;12: 483–491. doi: 10.1038/nn.2276 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Yang GR, Murray JD, Wang XJ. A dendritic disinhibitory circuit mechanism for pathway-specific gating. Nat Commun. 2016;7. doi: 10.1038/ncomms12815 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Wang XJ, Yang GR. A disinhibitory circuit motif and flexible information routing in the brain. Curr Opin Neurobiol. 2018;49: 75–83. doi: 10.1016/j.conb.2018.01.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Zhang S, Xu M, Kamigaki T, Hoang Do JP, Chang W-C, Jenvay S, et al. Long-range and local circuits for top-down modulation of visual cortex processing. Science (80-). 2014;345: 660–665. doi: 10.1126/science.1254126 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Letzkus JJ, Wolff SBE, Meyer EMM, Tovote P, Courtin J, Herry C, et al. A disinhibitory microcircuit for associative fear learning in the auditory cortex. Nature. 2011;480: 331–335. doi: 10.1038/nature10674 [DOI] [PubMed] [Google Scholar]
- 7.Pi HJ, Hangya B, Kvitsiani D, Sanders JI, Huang ZJ, Kepecs A. Cortical interneurons that specialize in disinhibitory control. Nature. 2013;503: 521–524. doi: 10.1038/nature12676 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kuchibhotla K V., Gill J V., Lindsay GW, Papadoyannis ES, Field RE, Sten TAH, et al. Parallel processing by cortical inhibition enables context-dependent behavior. Nat Neurosci. 2017;20: 62–71. doi: 10.1038/nn.4436 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Desimone R, Duncan J. Neural mechanisms of selective visual attention. Annual Review of Neuroscience. 1995. doi: 10.1146/annurev.ne.18.030195.001205 [DOI] [PubMed] [Google Scholar]
- 10.Reynolds JH, Heeger DJ. The Normalization Model of Attention. Neuron. 2009. pp. 168–185. doi: 10.1016/j.neuron.2009.01.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Hubel DH, Henson CO, Rupert A, Galambos R. “Attention” Units in the Auditory Cortex. Science (80-). 1959;129: 1279–1280. doi: 10.1126/science.129.3358.1279 [DOI] [PubMed] [Google Scholar]
- 12.Fritz JB, Elhilali M, David S V, Shamma SA. Auditory attention—focusing the searchlight on sound. Curr Opin Neurobiol. 2007;17: 437–455. doi: 10.1016/j.conb.2007.07.011 [DOI] [PubMed] [Google Scholar]
- 13.Otazu GH, Tai LH, Yang Y, Zador AM. Engaging in an auditory task suppresses responses in auditory cortex. Nat Neurosci. 2009. doi: 10.1038/nn.2306 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Shinn-Cunningham BG. Object-based auditory and visual attention. Trends Cogn Sci. 2008. doi: 10.1016/j.tics.2008.02.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Buffalo EA, Fries P, Landman R, Liang H, Desimone R. A backward progression of attentional effects in the ventral stream. Proc Natl Acad Sci U S A. 2010. doi: 10.1073/pnas.0907658106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Lee C-CC, Middlebrooks JC. Auditory cortex spatial sensitivity sharpens during task performance. Nat Neurosci. 2011;14: 108–114. doi: 10.1038/nn.2713 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Fritz J, Shamma S, Elhilali M, Klein D. Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex. Nat Neurosci. 2003;6: 1216–1223. doi: 10.1038/nn1141 [DOI] [PubMed] [Google Scholar]
- 18.Maddox RK, Billimoria CP, Perrone BP, Shinn-Cunningham BG, Sen K. Competing sound sources reveal spatial effects in cortical processing. PLoS Biol. 2012;10. doi: 10.1371/journal.pbio.1001319 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Middlebrooks JC, Bremen P. Spatial Stream Segregation by Auditory Cortical Neurons. J Neurosci. 2013;33: 10986–11001. doi: 10.1523/JNEUROSCI.1065-13.2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Bee MA, Micheyl C. The cocktail party problem: What is it? How can it be solved? And why should animal behaviorists study it? J Comp Psychol. 2008;122: 235–251. doi: 10.1037/0735-7036.122.3.235 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Bronkhorst AW. The cocktail-party problem revisited: early processing and selection of multi-talker speech. Attention, Perception, and Psychophysics. 2015. pp. 1465–1487. doi: 10.3758/s13414-015-0882-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Kaya EM, Elhilali M. Modelling auditory attention. Philosophical Transactions of the Royal Society B: Biological Sciences. Royal Society; 2017. doi: 10.1098/rstb.2016.0101 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Kayser C, Petkov CI, Lippert M, Logothetis NK. Mechanisms for allocating auditory attention: An auditory saliency map. Curr Biol. 2005;15: 1943–1947. doi: 10.1016/j.cub.2005.09.040 [DOI] [PubMed] [Google Scholar]
- 24.Kaya EM, Elhilali M. Investigating bottom-up auditory attention. Front Hum Neurosci. 2014;8: 327. doi: 10.3389/fnhum.2014.00327 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Mesgarani N, Fritz J, Shamma S. A computational model of rapid task-related plasticity of auditory cortical receptive fields. J Comput Neurosci. 2010;28: 19–27. doi: 10.1007/s10827-009-0181-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Elhilali M, Ma L, Micheyl C, Oxenham AJ, Shamma SA. Temporal Coherence in the Perceptual Organization and Cortical Representation of Auditory Scenes. Neuron. 2009;61: 317–329. doi: 10.1016/j.neuron.2008.12.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Carlin MA, Elhilali M. Modeling attention-driven plasticity in auditory cortical receptive fields. Front Comput Neurosci. 2015;9: 106. doi: 10.3389/fncom.2015.00106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.David S V., Fritz JB, Shamma SA. Task reward structure shapes rapid receptive field plasticity in auditory cortex. Proc Natl Acad Sci U S A. 2012;109: 2144–2149. doi: 10.1073/pnas.1117717109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Dong J, Colburn HS, Sen K. Cortical Transformation of Spatial Processing for Solving the Cocktail Party Problem: A Computational Model. eNeuro. 2016;3: 1–11. doi: 10.1523/ENEURO.0086-15.2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Chou KF, Dong J, Colburn HS, Sen K. A Physiologically Inspired Model for Solving the Cocktail Party Problem. J Assoc Res Otolaryngol. 2019;20: 579–593. doi: 10.1007/s10162-019-00732-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Pfeffer CK, Xue M, He M, Huang ZJ, Scanziani M. Inhibition of inhibition in visual cortex: The logic of connections between molecularly distinct interneurons. Nat Neurosci. 2013;16: 1068–1076. doi: 10.1038/nn.3446 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Gil Z, Connors BW, Amitai Y. Differential regulation of neocortical synapses by neuromodulators and activity. Neuron. 1997. doi: 10.1016/s0896-6273(00)80380-3(00)80380–3 [DOI] [PubMed] [Google Scholar]
- 33.Hasselmo ME. Neuromodulation and cortical function: modeling the physiological basis of behavior. Behav Brain Res. 1995. doi: 10.1016/0166-4328(94)00113-t [DOI] [PubMed] [Google Scholar]
- 34.Hsieh CY, Cruikshank SJ, Metherate R. Differential modulation of auditory thalamocortical and intracortical synaptic transmission by cholinergic agonist. Brain Res. 2000. doi: 10.1016/s0006-8993(00)02766-9(00)02766–9 [DOI] [PubMed] [Google Scholar]
- 35.Atiani S, Elhilali M, David S V., Fritz JB, Shamma SA. Task Difficulty and Performance Induce Diverse Adaptive Patterns in Gain and Shape of Primary Auditory Cortical Receptive Fields. Neuron. 2009;61: 467–480. doi: 10.1016/j.neuron.2008.12.027 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Litovsky RY. Spatial Release from Masking. Acoust Today. 2012;8: 18. doi: 10.1121/1.4729575 [DOI] [Google Scholar]
- 37.Zhang S, Xu M, Kamigaki T, Do JPH, Chang WC, Jenvay S, et al. Long-range and local circuits for top-down modulation of visual cortex processing. Science (80-). 2014;345: 660–665. doi: 10.1126/science.1254126 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Helfrich RF, Fiebelkorn IC, Szczepanski SM, Lin JJ, Parvizi J, Knight RT, et al. Neural Mechanisms of Sustained Attention Are Rhythmic. Neuron. 2018. doi: 10.1016/j.neuron.2018.07.032 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Guo W, Clause AR, Barth-Maron A, Polley DB. A Corticothalamic Circuit for Dynamic Switching between Feature Detection and Discrimination. Neuron. 2017. doi: 10.1016/j.neuron.2017.05.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Huang N, Elhilali M. Push-pull competition between bottom-up and top-down auditory attention to natural soundscapes. Elife. 2020. doi: 10.7554/eLife.52984 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Fiebelkorn IC, Kastner S. A Rhythmic Theory of Attention. Trends in Cognitive Sciences. 2019. doi: 10.1016/j.tics.2018.11.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Lee SH, Dan Y. Neuromodulation of Brain States. Neuron. 2012;76: 209–222. doi: 10.1016/j.neuron.2012.09.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Lesenfants D, Francart T. The interplay of top-down focal attention and the cortical tracking of speech. Sci Rep. 2020;10: 1–10. doi: 10.1038/s41598-019-56847-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Wikman P, Sahari E, Salmela V, Leminen A, Leminen M, Laine M, et al. Breaking down the cocktail party: Attentional modulation of cerebral audiovisual speech processing. Neuroimage. 2021;224: 117365. doi: 10.1016/j.neuroimage.2020.117365 [DOI] [PubMed] [Google Scholar]
- 45.Gritton HJ, Nocon JC, James NM, Lowet E, Abdulkerim M, Sen K, et al. Oscillatory activity in alpha/beta frequencies coordinates auditory and prefrontal cortices during extinction learning. bioRxiv. 2020; 2020.10.30.362962. doi: 10.1101/2020.10.30.362962 [DOI] [Google Scholar]
- 46.Corbetta M, Patel G, Shulman GL. The Reorienting System of the Human Brain: From Environment to Theory of Mind. Neuron. NIH Public Access; 2008. pp. 306–324. doi: 10.1016/j.neuron.2008.04.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Petersen SE, Posner MI. The Attention System of the Human Brain: 20 Years After. Annu Rev Neurosci. 2012;35: 73–89. doi: 10.1146/annurev-neuro-062111-150525 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Tremblay R, Lee S, Rudy B. GABAergic Interneurons in the Neocortex: From Cellular Properties to Circuits. Neuron. 2016;91: 260–292. doi: 10.1016/j.neuron.2016.06.033 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Karnani MM, Jackson J, Ayzenshtat I, Sichani XH, Manoocheri K, Kim S, et al. Opening holes in the blanket of inhibition: Localized lateral disinhibition by vip interneurons. J Neurosci. 2016;36: 3471–3480. doi: 10.1523/JNEUROSCI.3646-15.2016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Pfeffer CK. Inhibitory Neurons: Vip Cells Hit the Brake on Inhibition. Curr Biol. 2014;24: R18–R20. doi: 10.1016/j.cub.2013.11.001 [DOI] [PubMed] [Google Scholar]
- 51.Muñoz W, Tremblay R, Levenstein D, Rudy B. Layer-specific modulation of neocortical dendritic inhibition during active wakefulness. Science (80-). 2017. doi: 10.1126/science.aag2599 [DOI] [PubMed] [Google Scholar]
- 52.Silberberg G, Markram H. Disynaptic Inhibition between Neocortical Pyramidal Cells Mediated by Martinotti Cells. Neuron. 2007. doi: 10.1016/j.neuron.2007.02.012 [DOI] [PubMed] [Google Scholar]
- 53.Naka A, Veit J, Shababo B, Chance RK, Risso D, Stafford D, et al. Complementary networks of cortical somatostatin interneurons enforce layer specific control. Elife. 2019. doi: 10.7554/eLife.43696 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.van der Heijden K, Rauschecker JP, Formisano E, Valente G, de Gelder B. Active Sound Localization Sharpens Spatial Tuning in Human Primary Auditory Cortex. J Neurosci. 2018;38: 8574–8587. doi: 10.1523/JNEUROSCI.0587-18.2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Fritz JB, Elhilali M, David S V., Shamma SA. Does attention play a role in dynamic receptive field adaptation to changing acoustic salience in A1? Hear Res. 2007;229: 186–203. doi: 10.1016/j.heares.2007.01.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Panniello M, King AJ, Dahmen JC, Walker KMM. Local and global spatial organization of interaural level difference and frequency preferences in auditory cortex. Cereb Cortex. 2018;28: 350–369. doi: 10.1093/cercor/bhx295 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Knudsen E, Konishi M. A neural map of auditory space in the owl. Science (80-). 1978;200: 795–797. doi: 10.1126/science.644324 [DOI] [PubMed] [Google Scholar]
- 58.Yin TCT, Chan JCK. Interaural time sensitivity in medial superior olive of cat. J Neurophysiol. 1990;64: 465–488. doi: 10.1152/jn.1990.64.2.465 [DOI] [PubMed] [Google Scholar]
- 59.Köppl C, Carr CE. Maps of interaural time difference in the chicken’s brainstem nucleus laminaris. Biol Cybern. 2008;98: 541–559. doi: 10.1007/s00422-008-0220-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Zhou Y, Wang X. Spatially extended forward suppression in primate auditory cortex. Eur J Neurosci. 2014;39: 919–933. doi: 10.1111/ejn.12460 [DOI] [PubMed] [Google Scholar]
- 61.Yao JD, Bremen P, Middlebrooks JC. Emergence of Spatial Stream Segregation in the Ascending Auditory Pathway. J Neurosci. 2015;35: 16199–16212. doi: 10.1523/JNEUROSCI.3116-15.2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Higgins NC, Storace DA, Escabi MA, Read HL. Specialization of Binaural Responses in Ventral Auditory Cortices. J Neurosci. 2010;30: 14522–14532. doi: 10.1523/JNEUROSCI.2561-10.2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Kyweriga M, Stewart W, Cahill C, Wehr M. Synaptic mechanisms underlying interaural level difference selectivity in rat auditory cortex. J Neurophysiol. 2014;112: 2561–2571. doi: 10.1152/jn.00389.2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Middlebrooks J, Knudsen E. A neural code for auditory space in the cat’s superior colliculus. J Neurosci. 1984;4: 2621–2634. doi: 10.1523/JNEUROSCI.04-10-02621.1984 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.King AJ, Hutchings ME. Spatial response properties of acoustically responsive neurons in the superior colliculus of the ferret: A map of auditory space. J Neurophysiol. 1987;57: 596–624. doi: 10.1152/jn.1987.57.2.596 [DOI] [PubMed] [Google Scholar]
- 66.Chou XL, Fang Q, Yan L, Zhong W, Peng B, Li H, et al. Contextual and cross-modality modulation of auditory cortical processing through pulvinar mediated suppression. Elife. 2020;9: 1–21. doi: 10.7554/eLife.54157 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Narayan R, Ergün A, Sen K. Delayed inhibition in cortical receptive fields and the discrimination of complex stimuli. J Neurophysiol. 2005;94: 2970–2975. doi: 10.1152/jn.00144.2005 [DOI] [PubMed] [Google Scholar]
- 68.Feng L, Wang X. Harmonic template neurons in primate auditory cortex underlying complex sound processing. Proc Natl Acad Sci U S A. 2017;114: E840–E848. doi: 10.1073/pnas.1607519114 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Mesgarani N, Chang EF. Selective cortical representation of attended speaker in multi-talker speech perception. Nature. 2012;485: 233–236. doi: 10.1038/nature11020 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Ding N, Simon JZ. Emergence of neural encoding of auditory objects while listening to competing speakers. Proc Natl Acad Sci. 2012;109: 11854–11859. doi: 10.1073/pnas.1205381109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Shamma SA, Elhilali M, Micheyl C. Temporal coherence and attention in auditory scene analysis. Trends Neurosci. 2011;34: 114–123. doi: 10.1016/j.tins.2010.11.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Krishnan L, Elhilali M, Shamma S. Segregating Complex Sound Sources through Temporal Coherence. Lewicki M, editor. PLoS Comput Biol. 2014;10: e1003985. doi: 10.1371/journal.pcbi.1003985 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Chambers JD, Elgueda D, Fritz JB, Shamma SA, Burkitt AN, Grayden DB. Computational neural modeling of auditory cortical receptive fields. Front Comput Neurosci. 2019;13: 1–13. doi: 10.3389/fncom.2019.00001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Soto G, Kopell N, Sen K. Network architecture, receptive fields, and neuromodulation: Computational and functional implications of cholinergic modulation in primary auditory cortex. J Neurophysiol. 2006;96: 2972–2983. doi: 10.1152/jn.00459.2006 [DOI] [PubMed] [Google Scholar]
- 75.Ardid S, Wang XJ, Compte A. An integrated microcircuit model of attentional processing in the neocortex. J Neurosci. 2007. doi: 10.1523/JNEUROSCI.1145-07.2007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Bolia RS, Nelson WT, Ericson MA, Simpson BD. A speech corpus for multitalker communications research. J Acoust Soc Am. 2000;107: 1065–1066. doi: 10.1121/1.428288 [DOI] [PubMed] [Google Scholar]
- 77.Algazi VR, Duda RO, Thompson DM, Avendano C. The CIPIC HRTF database. Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat No01TH8575). IEEE; 2001. pp. 99–102. doi: 10.1109/ASPAA.2001.969552 [DOI] [Google Scholar]
- 78.Burkhard MD, Sachs RM. Anthropometric manikin for acoustic research. J Acoust Soc Am. 1975;58: 214–222. doi: 10.1121/1.380648 [DOI] [PubMed] [Google Scholar]
- 79.Slaney M. Auditory toolbox: A Matlab Toolbox for Auditory Modeling Work. Interval Res Corp Tech Rep. 1998;10: 1998. Available: https://engineering.purdue.edu/~malcolm/interval/1998-010/AuditoryToolboxTechReport.pdf [Google Scholar]
- 80.Fischer BJ, Anderson CH, Peña JL. Multiplicative auditory spatial receptive fields created by a hierarchy of population codes. PLoS One. 2009;4: 24–26. doi: 10.1371/journal.pone.0008015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Sherfey JS, Soplata AE, Ardid S, Roberts EA, Stanley DA, Pittman-Polletta BR, et al. DynaSim: A MATLAB Toolbox for Neural Modeling and Simulation. Front Neuroinform. 2018;12: 10. doi: 10.3389/fninf.2018.00010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Dayan P, Abbott LF. Theoretical neuroscience: computational and mathematical modeling of neural systems. Cambridge, MA: MIT Press; 2001. [Google Scholar]