Skip to main content
Journal of Neurophysiology logoLink to Journal of Neurophysiology
. 2015 Feb 18;113(7):2934–2952. doi: 10.1152/jn.01054.2014

Diverse cortical codes for scene segmentation in primate auditory cortex

Brian J Malone 1,, Brian H Scott 2, Malcolm N Semple 3
PMCID: PMC4416616  PMID: 25695655

Abstract

The temporal coherence of amplitude fluctuations is a critical cue for segmentation of complex auditory scenes. The auditory system must accurately demarcate the onsets and offsets of acoustic signals. We explored how and how well the timing of onsets and offsets of gated tones are encoded by auditory cortical neurons in awake rhesus macaques. Temporal features of this representation were isolated by presenting otherwise identical pure tones of differing durations. Cortical response patterns were diverse, including selective encoding of onset and offset transients, tonic firing, and sustained suppression. Spike train classification methods revealed that many neurons robustly encoded tone duration despite substantial diversity in the encoding process. Excellent discrimination performance was achieved by neurons whose responses were primarily phasic at tone offset and by those that responded robustly while the tone persisted. Although diverse cortical response patterns converged on effective duration discrimination, this diversity significantly constrained the utility of decoding models referenced to a spiking pattern averaged across all responses or averaged within the same response category. Using maximum likelihood-based decoding models, we demonstrated that the spike train recorded in a single trial could support direct estimation of stimulus onset and offset. Comparisons between different decoding models established the substantial contribution of bursts of activity at sound onset and offset to demarcating the temporal boundaries of gated tones. Our results indicate that relatively few neurons suffice to provide temporally precise estimates of such auditory “edges,” particularly for models that assume and exploit the heterogeneity of neural responses in awake cortex.

Keywords: scene analysis, encoding, decoding, cortex, primate


the perceptual output of the auditory system consists of discrete auditory objects, despite the fact that the spectral energy associated with multiple, complex sound sources is thoroughly mixed at the eardrum. Understanding how signals associated with common sound sources are processed and appropriately grouped in complex auditory scenes is a primary goal of auditory neuroscience. Some of the most important auditory grouping cues are joint temporal cues, including common onsets, offsets, and modulation profiles (Bregman 1990; Bregman et al. 1994; Yost 1991). Recent evidence suggests that temporal coherence dominates the segmentation of complex auditory scenes (Elhilali et al. 2009; Fishman and Steinschneider 2010; Shamma et al. 2013; Teki et al. 2011, 2013).

How does the cortical representation of the temporal boundaries of acoustic signals (i.e., their beginning and end) support auditory scene analysis? We attacked this challenging problem (Bee and Micheyl 2008) by asking how physiological responses to simple tones, in a background of silence, are defined against the background of intrinsic neural “noise.” By discriminating responses to gated tones of different durations presented at each neuron's best frequency and best sound level, we can estimate an upper bound for the quality of the cortical representation of joint temporal cues for scene segmentation. Pure tones varying only in duration provide an opportunity to isolate uniquely temporal contributions to the cortical representation of sound envelopes. By decoding cortical responses to otherwise identical tones of varying durations, we can quantify how effectively cortical spiking patterns signal the presence and demarcate the temporal boundaries of acoustic signals.

The information contributed by distinct neural response features, such as bursts of spiking at signal onset or offset, or sustained firing throughout the tone duration, to the informativeness of cortical responses has rarely been rigorously characterized in nonspecialized mammalian systems (Qin et al. 2009). The envelope of spiking activity produced by neurons with sustained responses essentially reproduces the envelope of a gated tone, such that increases and decreases in firing rate signal increases and decreases in tone amplitude. By contrast, the envelope of spiking activity in a neuron with highly phasic responses is more like the temporal derivative of the stimulus envelope, such that the increases in firing rate are coincident with changes in stimulus amplitude. How are the responses of cortical neurons distributed along this continuum, and how effective are spiking patterns based on these divergent encoding strategies at signaling the temporal extent of tonal stimuli? The answers determine the efficiency of the cortical representation because sustained responses could reliably signal tone duration at the cost of many spikes relative to temporally sparse phasic responses limited to onset and offset bursts.

The diversity of cortical spiking patterns must be reconciled with the need to combine information across neurons, particularly when such diversity is expressed in the temporal distribution of action potentials, not merely differences in average firing rates. We have shown that the response profiles (i.e., the temporal distributions of spikes within the interval encompassing the stimulus presentation) provides more information about tone level (Malone et al. 2010) and tone frequency (Malone et al. 2014; Moshitch et al. 2006) than does average firing rate. This raises two important questions when thinking about scene segmentation in the context of spike train decoding. First, how much must be known about a given neuron's response profile to develop an effective decoding model for it? Second, how much do differences in cortical response profiles matter for population-based decoding models based on pooled responses?

Our results demonstrate that cortical responses to simple gated tones exhibit complex, diverse, and highly nonlinear temporal dynamics. Surprisingly, different encoding styles result in relatively similar performance when classifying stimulus duration. In fact, cortical responses are sufficiently reliable and diverse that it is often possible to reverse the classification process and identify which neuron produced a particular spike train for a given stimulus. Given the robustness of the cortical response to rapid envelope changes, relatively few neurons suffice to provide precise estimates of temporal acoustic “edges” that likely inform the segmentation of complex auditory scenes.

METHODS

Subjects, surgical preparation, and physiological recording.

Two adult male monkeys (Macaca mulatta, designated X and Z) participated in these experiments. The methods of animal training, stimulus delivery, and physiological recording have been described previously (Malone et al. 2002, 2007, 2010, 2014; Scott et al. 2007, 2009, 2011). All procedures were in accordance with the Society for Neuroscience guiding principles on the care and use of animals and approved by the Institutional Animal Care and Use Committee of New York University. Both animals were trained on a sound lateralization task (Malone et al. 2002; Scott et al. 2007), but all data in this report were obtained while the animals were sitting passively with their heads fixed in a custom chair (Crist, Hagerstown, MD) within a double-walled anechoic chamber (Industrial Acoustics, Bronx, NY) while being monitored by video.

After behavioral training, a recording chamber (CalTech) was implanted above the auditory cortex in the left hemisphere of each animal by aseptic surgical techniques. Both chambers were subsequently moved to the right hemisphere after mapping of the left hemisphere was complete. Physiological criteria, referenced to the histological and MRI data available in animal X, were used to delineate boundaries of the auditory areas and define the locations of neurons encountered in animal Z (Scott et al. 2011).

Single unit extracellular recordings were obtained by advancing tungsten microelectrodes (10–12 MΩ; FHC, Bowdoin, ME) with a stepping microdrive. Use of electrodes with such high impedances ensured highly reliable single unit isolation. Electrical signals were amplified, filtered (typically from 0.25 to 10 kHz), and registered by an event timer (MALab, Kaiser Instruments). Search stimuli including tones, band-pass noise, sinusoidally amplitude modulated tones, and sinusoidally frequency modulated tones were used to identify single units, which were discriminated with multiple adjustable voltage/time windows. Action potentials and stimulus synchronization events were logged with a resolution of 1 μs by custom hardware (MALab, Kaiser Instruments).

Stimulus generation and protocols.

Stimuli were generated digitally (MALab, Kaiser Instruments) and presented in the closed field via electrostatic speakers (STAX Lambda), coupled to ear inserts (Custom Sound Systems) positioned within the ear canal. Phase and level at each ear were calibrated across frequency at the start of each session using a ½-inch probe microphone (Brüel and Kjær 4133). All stimuli were gated on and off by a cosine-squared ramp (10 ms). Responsive neurons were initially characterized with a battery of pure tone stimuli (typically 100 ms duration) to determine the frequency tuning function at the best sound level [dB sound pressure level (SPL)], and the rate-level function at the neuron's best frequency (BF). The initial selection of stimulus frequency and level was performed manually, and online feedback was used to ensure adequate sampling of the relevant parameter axis. BF and best level were the frequency and level, respectively, that elicited the highest firing rates within an analysis window from 0 to 100 ms after tone onset. Tones of varying duration were then presented at best frequency and level. In cases where the best level was unavailable or indeterminate due to saturating responses, 60 dB SPL was used. Stimuli were delivered binaurally unless testing indicated a strong preference for a single ear.

We generally attempted to present the stimuli at the BF and best level of each neuron, but in some cases the online estimates of BF and best level differed from those computed offline. In a few cases, however, the most salient feature of the response to pure tones was suppression, and the tested frequency and level were used to elicit responses of this type (see results). We were able to determine the best level for 232 neurons, and among these neurons the median ratio of firing rates elicited by the best SPL and tested SPL was 1.03. Across all neurons in the study, stimulus amplitude (in dB SPL) varied from −5 to 90 dB, with a mode and median at 60 dB and a mean of 49.3 dB. Tone frequency ranged from 0.1 to 32 kHz, with a median of 3.5 kHz (mean = 7.48 kHz).

The data in this report represent all instances in which neural responses were obtained for 10 repeated trials of tone durations of 50, 100, 200, 300, and 400 ms, resulting in a database of 279 cells. The choice of durations was intended to ensure that at least some stimuli would outlast bursts of activity at tone onset that might otherwise surpass the duration of very short tones (e.g., 25 ms). Long tone durations (>300 ms) also provide sufficient time for neurons to adapt their response rates during the course of their responses, which may affect how and how effectively tone offsets are represented. In many cases, tone durations intermediate to the standard set (e.g., 25 ms, 150 ms, etc.) were presented. These additional durations are retained for illustrative purposes (e.g., Fig. 1) but excluded from population analyses to facilitate comparisons across neurons.

Fig. 1.

Fig. 1.

Examples of 4 different response types obtained in 4 different cortical neurons (1 per column) in response to tones ranging in duration from 50 to 400 ms. A: the set of peristimulus time histograms (PSTHs) indicating how responses vary as a function of tone duration. Tone onset is 0 ms, and tone offset is indicated by the gray vertical lines on each panel. B: the set of 3 confusion matrices indicates how often individual spike trains (from 0 to 500 ms, referenced to stimulus onset) were correctly associated with the stimulus that elicited each spike train. Actual durations (unlabeled) are grouped by columns, and estimated durations are grouped by rows, such that entries along the diagonal represent correct assignment of tone duration. Grayscale indicates the fraction of trials in each cell of the matrix, where black = 0, and white = 10, the number of trials presented for each duration. All columns sum to 10, since each trial for a given tone duration is assigned to an estimated duration. The 3 confusion matrices represent the results obtained for the full spike train (top left), phase-only (top right), and rate-only (bottom) classifiers (see methods). The percent of correctly assigned trials is shown below each confusion matrix. The duration tuning function (DTF, black) in the lower panel shows the firing rate averaged over the initial 500 ms of each trial, for each tone duration. Vertical lines indicate ±2 SE. The gray line indicates the average firing rate measured in the last 500 ms of each trial and is used as an estimate of the spontaneous rate. The remaining panels (CH) are displayed similarly.

Pure tones were presented within trials lasting 1,000 ms. For notational convenience, tone onset occurs at 0 ms throughout the article. Tones of different durations were presented in blocks, such that all 10 trials at a given duration were presented consecutively before beginning the next set of 10 trials at a different duration. Blocked presentation of this sort avoided the randomization of the intertone intervals that occurs in the context of random and pseudorandom stimulus presentation. However, blocked presentation implies that the intertone intervals covaried with tone duration, from 950 ms for 50 ms tones to as little as 600 ms for 400 ms tones. Since each trial was 1 s long, the duty cycle of the stimuli within each block covaried with duration too (e.g., 50, 100, 200, 300, and 400 ms, corresponding to 5%, 10%, etc.). The effect of duty cycle on cortical responses is analyzed in results.

Duration tuning functions (DTFs) were computed from the average firing rates during the initial 500 ms of each trial for all tone durations.

Spike train classification.

Data analysis was performed using MATLAB (MathWorks, Natick, MA). We quantified how effectively cortical responses to stimuli of different durations could be discriminated with peristimulus time histogram (PSTH)-based pattern classifiers (Foffani and Moxon 2004), as in prior reports (Malone et al. 2007, 2010, 2013, 2014). Because all tones are “aligned” to a common onset time (i.e., 0 ms), classifying tone duration is equivalent to classifying the time of tone offset. We use the terms “duration discrimination” and “offset discrimination” interchangeably in the context of spike train classification. Estimation of tone offset and tone onset is achieved by the likelihood models described in a subsequent section.

Responses to gated tones were binned from 0 to 500 ms and then averaged to form a “template,” a bin-dimensional vector representing the response to each tone duration. Individual “test” spike trains were binned (0–500 ms) at equivalent temporal resolution and compared with the templates by computing the Euclidean distance between the test and the template vectors. The match that minimized that distance was estimated to be the stimulus that produced the response. Whenever the test and template were drawn from the same stimulus (e.g., the 100 ms tone), the test was excluded from the average that produced the template (complete cross validation). We chose epochs of 500 ms because this was the minimum value that encompassed the longest tested duration (400 ms), while still allowing time to capture offset responses within 100 ms of tone offset.

The temporal resolution for the bins was set at 5, 10, 25, 50, and 500 ms. When a single 500 ms-wide bin is used, classification relies entirely on the average spike rate information, which we refer to as the “rate-only” classifier. This average includes all portions of the trial when the stimulus is on, as well as an interval of the trial when the stimulus is off that depends on the duration.

Alternatively, we eliminate the average firing rate information and retain the relative distribution of spikes by normalizing the test and template PSTHs by their respective vector norms. We refer to this as the “phase-only” classifier. The “full spike train” classifier operates on the PSTHs without normalization. Strictly speaking, the full spike train classifier utilizes average firing rate information at different temporal resolutions (e.g., 10 ms). The phase-only classifier is similar but has access only to relative spike rates across different time bins for a given tone duration, not to the differences in absolute spike rates across different tone durations. The rate-only classifier has access only to absolute spike rates across different tone durations. Duration “tuning” refers to changes in the average response rate to tones of varying duration and is indexed by the performance of the rate classifier. Duration “coding,” by contrast, refers to all aspects of cortical spiking patterns that could potentially be used to decode information about the tone duration.

Except where noted, classifier discrimination performance was reported at the bin-width that maximized performance, which we refer to as the “optimal” bin-width. For analyses involving multiple cells, we used the modal optimal bin-width for the population.

Classifier performance was quantified as the percentage of trials correctly associated with the stimulus presented on that trial and computed by summing along the diagonal of a confusion matrix (CM) whose columns indicate the actual duration and whose rows indicate the duration estimated by the classifier. Since correct estimates fall along the diagonal, percent correct can be computed by dividing the sum of the diagonal entries by the total number of estimates and converting the resulting fraction into a percentage. Percent correct is monotonically related to mutual information via the CM transform described in Schnupp et al. (2006).

Significance was assessed by simulating CMs by Monte Carlo methods, such that each row of the CM was populated by incrementing values (representing the estimates) at random until the sum of each equals the number of trials to be classified (10 trials for each duration, in this case). P values represent the number of instances where the simulated CMs of equivalent size yielded a higher percentage correct than the actual CM, divided by the number of simulated CMs (n = 100,000).

Across-cell spike train classification.

To demonstrate that cortical spiking patterns produced by different neurons are “meaningfully” diverse with quantitative rigor, we need to demonstrate that a “one-size-fits-all” decoding strategy fails to capture a significant fraction of the information about tone duration. When performing spike train classification for individual neurons, we use only spike trains from the neuron being considered for the analysis (see above). In the context of complete cross validation, this means that the decoding model used for each neuron “knows” how the cell responded on every trial other than the one being classified. We quantify the impact of diversity in cortical responses by replacing these within-cell templates normally used for classification with across-cell templates based on the responses of a different cell or a group of different cells. This allows us to determine the extent to which the classifiers can successfully “interpret” responses like those in Fig. 1C by comparing them against the responses in Fig. 1E, for example.

We simulated a range of decoding models by generating response templates (i.e., trial-averaged PSTHs) for each cell and then averaging the response templates for either all cells or for all cells within a particular response category (see Response type categorization below). The neuron whose responses were being classified was always excluded from the construction of the response templates.

Spike train identification.

Evaluating how effective the decoding model from neuron A is in decoding the responses of neuron B provides useful perspective on how cortical response diversity constrains decoding strategies that map neural responses to stimulus features. We can also evaluate cortical response diversity more directly by modifying the classification procedure so that given a particular spike train, we attempt to identify which neuron fired it, instead of which stimulus elicited it. For clarity, we refer to this process as spike train identification, rather than classification, though the underlying computation is essentially the same. For identification to succeed, trial-by-trial variability in spiking patterns must be smaller than the differences among the spiking patterns produced by different neurons.

We identified spike trains two ways. In the first, we concatenated response epochs (0–500 ms) across the five tested tone durations to produce a single (2,500 ms) pseudotrial for each of the 10 trials (i.e., the first pseudotrial is composed of the responses to the first presentation of each standard duration, etc.). We then classified each of the 10 pseudotrials against “pseudotemplates” based on concatenation of the trial-averaged PSTHs. The pseudotemplate for the cell whose spike trains are being identified is based on the other nine pseudotrials (complete cross validation). In the second method, the process is analogous, but the data were limited to a single standard duration, and we repeated the identification process independently for each standard duration.

Likelihood estimation of tone onset and offset.

Spike train classification methods permit the assignment of individual spike trains to a given stimulus but do not permit the direct estimation of tone duration from neural data.

Ideally, we would like a model that estimates both when the tone began and when the tone ended based solely on observing single spike trains. By comparing models that do and do not know that cortical spiking patterns emphasize envelope transients, we can also evaluate how much such physiological transients contribute to the cortical representation of acoustic transients. To do so, we developed a novel method that allowed us to compute the likelihood of a given spike train for an arbitrary estimate of the duration.

Because the gated tones used in this study are static, with the exception of the onset and offset changes in sound level, it is possible to subdivide each trial into a small number of intervals independently of their duration. We treat all intervals where the stimulus is nominally identical (e.g., 4 kHz at 30 dB SPL) as identical across different tone durations. In the simplest case, we define two intervals, 1) stimulus On, extending from tone onset to offset, and 2) stimulus Off, for all times after tone offset. We remove the spike train used to estimate the tone offset, bin all remaining trials at 5 ms, and calculate the distribution of binned spike counts for each interval (On and Off) separately. We convert these into (discrete) probability distributions by normalizing by the total number of bins contributing to each distribution. The test spike train is also binned at 5 ms to produce a set of binned spike counts. Each individual bin can then be assigned a likelihood using the distributions of binned spike counts. The likelihood of a particular spike train is thus the product of the likelihoods of the binned spike counts comprising the test train.

Critically, the likelihood of the spike count in each bin depends on whether the bin is assigned to the On or Off distribution. By varying how the bins of the test train are assigned to the response distributions, it is possible to estimate the likelihood of that spike train for an arbitrary estimate of tone duration. For example, a cell with low spontaneous activity and high stimulus-driven activity will more commonly produce two spikes within a single (5 ms) bin during the On interval than during the Off interval. Thus, estimates of duration that assign the bin containing two spikes to the On interval are more likely than those that assign that bin to the Off interval. By systematically varying the estimates of the duration, we can create a likelihood function for each spike train. In practice, we tested all durations from 0 to 450 ms in 5 ms steps. If it happened that a certain spike count only occurred in a single trial (e.g., 3 spikes in 5 ms), we replaced the likelihood (which would have been 0 for the distribution based on all other trials) with the quotient of 1 divided by the number of bins for that interval to avoid likelihood values of zero.

Because both response transients are salient features of cortical responses to rapidly gated tones, we also computed likelihood functions for estimators based on four stimulus-defined intervals: 1) Onset, within 50 ms of tone onset; 2) On, as above but excluding the initial 50 ms; 3) Offset, within 50 ms of tone offset; and 4) Off, as above but excluding the initial 50 ms. The procedure is otherwise identical. We refer to the estimator based on the On and Off intervals as the Tonic model, and the estimator based on the Onset, On, Offset, and Off intervals as the Phasic+Tonic model. Bins were always assigned in the appropriate order: On then Off for the Tonic model, or Onset, then On, then Offset, then Off for the Phasic+Tonic model.

We estimated the tone offset as the maximum of the log-likelihood function averaged across trials. To quantify the quality of the estimates of tone offset, we exponentiated the trial-averaged log-likelihood function and then normalized this result by its sum to convert it into a probability distribution. (For convenience we still refer to result of this process as a “likelihood function” in the text). For each offset value (0 to 450), we calculated the product of the height of the distribution and the distance of that value from the actual offset value. We refer to the sum of these values as the estimation error. The estimation error reflects how concentrated the estimates are around the actual tone offset for that set of trials, and is lower when the estimator is more accurate.

We also constructed analogous estimators for tone onset. This allowed us to compare model performance for estimating tone onsets against model performance for estimating tone offsets. Within each trial, tone onset occurred at 0 ms. Because the trials are presented consecutively with no intervening delay, it would have been possible to concatenate the last half (500 ms) of the previous trial (during which no tone is presented) with the first half of the current trial. For computational convenience, however, we instead circularized each trial, treating the final 500 ms of each trial as the initial 500 ms of the trial, which we indicate by labeling the time axis from −500 to 500 ms, rather than 0 to 1,000 ms. For all but the first stimulus in each 10 trial block (which follows the last stimulus in the previous block), the stimulus history is equivalent across all trials.

We implemented the procedure described above for tone offsets, computing the likelihood of candidate onsets from −495 to 500 ms in 5 ms steps for each spike train. It is possible to estimate stimulus onset and offset simultaneously for circularized trials, resulting in a two-dimensional likelihood surface rather than two separate likelihood functions. This process is highly inefficient, since it requires an exhaustive search of a much larger stimulus space. Because we found that the results of this analysis were largely separable in the subset of cells we examined (data not shown), we present the results for the separate estimation of onset and offset.

To evaluate the relative temporal precision of tone onset and tone offset encoding, we calculated the estimation error, as defined above. For tone onsets, each probability distribution generated by the models extends from −495 to 500 ms, rather than 0 to 450 ms for offset estimates, so the theoretical upper bound of the statistic is larger for onsets than for offsets. This makes statistical tests based on the estimation error more conservative when attempting to demonstrate greater temporal precision for onset encoding (see results).

Response type categorization.

An important feature of the cortical representation of gated tones is the diversity apparent in the responses of different neurons to such tones. We attempted to categorize cortical responses into a limited set of categories that reflected the most salient differences among cortical responses, referenced to same four intervals defined above for the Phasic+Tonic model. For each of these four intervals, we compute the distributions of binned spike counts (at a resolution of 5 ms) across all five durations.

We summarize the differences in these distributions by computing ratios of the average binned spike count for the Onset, On, and Offset intervals to the Off interval (i.e., the spontaneous rates). For each of the three response epochs (i.e., Onset, On, Offset), we drew, at random, as many bins from the Off distribution as there were bins in the given interval (e.g., Onset) 1,000 times and computed the average binned spike count for the Off interval for each iteration. We expressed the average binned spike count for each response epoch as a z-score relative to the resampled distribution of binned spike counts for the “spontaneous” interval. These three values, for each cell, represent coordinates in a three-dimensional response space. We normalized each trio of values by the maximum absolute value. Geometrically, this procedure forces each point to a face of a cube (Fig. 2A). Thus, the location of each cell in this representation depends only on the relative magnitude of the responses in each response epoch, independent of the absolute firing rate. We then divided the set of coordinates into four groups via k-means clustering (“kmeans,” Matlab). The choice of four groups provided the best match to our impressions of the distinct response types and was similar to results obtained by applying multidimensional scaling methods to response profiles concatenated across the tested durations.

Fig. 2.

Fig. 2.

Categorization of cortical response profiles to pure tones of varying duration. A: the response cube illustrates the relative magnitude of responses for 3 intervals: Onset (within 50 ms of tone onset), Offset (within 50 ms of tone offset), and On (throughout the tone duration, excluding the Onset interval). Response magnitudes were expressed relative to the spontaneous rate and then normalized to map the response of each cell to a face of the response cube. The color of each filled circle indicates the response category, and the size of each filled circle indicates the percent correct when discriminating tone duration using the full spike train classifier (see methods). B: composite PSTHs for the full population (binned at 5 ms resolution) are shown for all standard durations (50, 100, 20, 300, and 400 ms), and vertical black lines indicate the time of tone offset. Similarly, composite PSTHs were constructed for all cells categorized as Mixed (C), Phasic (D), Sustained (E), and Suppressed (F).

RESULTS

Summary of the data sample.

Duration tuning functions were obtained from 279 single units in the auditory cortex of the left and right hemisphere in two animals [83 from monkey Z (67 left, 16 right), and 196 from monkey X (93 left, 103 right)]. Most neurons (88%) were in the core auditory cortex, 171 in the primary auditory field AI (61%), and 75 in the rostral field R (27%). The remaining 33 neurons were recorded in the belt fields immediately adjacent to the core (11 medial, 14 lateral, and 8 caudo-medial). Delineation of field boundaries for this data-set has been described previously (Scott et al., 2011), and as in other studies derived from this data set (e.g., Malone et al. 2007, 2010, 2014), our sample is biased toward neurons with strong responses to pure-tone stimuli; neurons later determined to lie in the belt exhibited tone responses typical of those in the core.

Cortical neurons encode tone duration in multiple ways.

Fundamentally, we are interested in the form and efficacy of the cortical representation of the temporal boundaries of acoustic signals. The most striking feature of cortical responses to tones of varying duration is the heterogeneity in the response patterns we observed. Figure 1 illustrates this diversity for a set of four different cortical neurons. The first of these was relatively uncommon, insofar as tones of all durations elicited robust responses limited to tone onset, as indicated by the PSTHs in Fig. 1A. Consequently, the DTF in Fig. 1B is fairly flat, and the performance of the classifiers, indicated by the confusion matrices in the inset panels (Fig. 1B), is not terribly dissimilar from chance (∼16.7% for a set of six stimuli).

The remaining three response types illustrated in Fig. 1 encode tone duration effectively, but in distinct ways. This can be accomplished by signaling the presence of the tone by sustained changes in firing rate while the tone is presented or by signaling the envelope transitions at the tone offset with changes in firing rate. The responses of the neuron depicted in Fig. 1C capture an important and relatively common cortical response feature: strong bursts of activity at both the onset and offset of the tone. These response features were sufficiently salient that each trial was successfully associated with the correct stimulus duration when the temporal dynamics of the responses are considered (Fig. 1D). Despite the poorly tuned DTF, even the rate-only classifier discriminated the tone duration at better than chance levels (44 vs. 16.7%).

The PSTHs in Fig. 1E are consistent with what is arguably the simplest encoding strategy for tone duration: as long as the tone endures, this neuron fired action potentials at relatively high rates and ceased when the tone ended. Spike train classifiers with access to the temporal dynamics of the response correctly estimated the duration on each trial (Fig. 1F). The rate-only classifier also performs effectively, since the increasing duration increases the total spike count, resulting in a roughly linear relationship between spike count and tone duration. Note that neurons that integrate signal energy in this way are not considered genuinely selective for stimulus duration.

We also observed a relatively small number of neurons that could effectively encode tone duration due to the suppression of their responses throughout the duration of the tones. In some cases (e.g., Fig. 1G) these responses were also characterized by a burst of activity at tone offset in the absence of robust spontaneous activity. In other cells, however, the presence of the tone is largely signaled by the absence of spiking throughout its duration and a resumption of spiking at a comparatively robust spontaneous rate. Like its complement, the pure onset response in Fig. 1B, the DTF of this cell (Fig. 1H) exhibits very little tuning, and the rate-only classifier fails to provide much useful information about tone duration. However, the other classifiers correctly discriminate the duration on the basis of the timing of cortical spiking patterns for nearly all trials. Collectively, these data demonstrate that many (Fig. 1, C–H) but not all (Fig. 1, A and B) of the different spiking patterns we observe converge on a robust representation of tone duration.

Categorization of cortical response types.

To characterize the form of the cortical representation of the temporal boundaries of gated tones, we tried to systematize the diverse cortical responses we observed (Fig. 1) into a manageable set of response categories. Categorization by response types also allowed us to quantify the efficacy and efficiency of different encoding strategies. We emphasize that these are response types, not cell types, since the shapes of PSTHs vary with stimulus parameters such as frequency (Malone et al. 2014) and amplitude (Malone et al. 2010). Thus, the distribution of response types we report here is specific to the tone parameters comprising the current dataset: tones presented at each neuron's best frequency and level.

We quantified the strength of the onset, sustained, and offset components relative to the spontaneous rate for each neuron in our sample (see methods). The resulting three-dimensional representation was subjected to k-means clustering to generate four response categories. The results of this procedure are plotted and color-coded by category in Fig. 2A. Cells shown in red were driven by tone presentation in a sustained fashion, whereas cells shown in blue were suppressed in a sustained fashion (though for a few the dominant response feature was a large offset response, which mapped them to the left-front face of the cube). The categories shown in green and purple share large onset responses but are differentiated largely on the basis of the strength of the sustained response during the On interval (i.e., their height on the right-front face of the cube). For convenience, we refer to these categories as Suppressed (blue), Phasic (green), Sustained (red), and Mixed (purple), though as Fig. 2A indicates, there is considerable variability within each category.

The similarities and differences among the response categories are more clearly evident in the sets of composite PSTHs shown in Fig. 2, B–F. The response profiles (i.e., PSTHs) summed across all categorized cells (n = 279) are shown in black (Fig. 2B). The global PSTH composites show a strong onset response, decaying to a steady-state firing rate throughout the tone duration, and a weak but perceptible offset response at the longest durations. Mixed responses (purple, n = 78) are most similar to the global composites but lack a pronounced offset response peak (Fig. 2C). We describe these responses as “Mixed” because responses in this category combine the onset responses characteristic of the Phasic category with the steady-state firing characteristic of the Sustained category.

Differences from the global cortical response profiles are more clearly evident for the remaining categories. The most striking differences occur for the relatively uncommon Suppressed responses (Fig. 2F, n = 24), which feature both sustained suppression and offset responses. Phasic responses (Fig. 2D, n = 82) also feature offset responses, but these are, on average, much smaller than the initial onset responses. Finally, the Sustained responses (Fig. 2E, n = 95), comprising the largest individual category, exhibit robust firing throughout the tone durations, but the firing rates near tone onset rise more slowly, resulting in plateaus rather than peaks for the longer durations.

Relationships between response type and stimulus parameters.

It is important to note that some differences among the response types could reflect differences in the stimuli presented to neurons associated with different response categories (see methods). Neurons in the suppressed class were challenged with stimuli at higher SPLs relative to their best SPL, such that the median for this group was 20 dB above best level compared with 0 dB above level for the remaining groups. This difference was significant (P = 0.0012, Kruskal-Wallis). Sustained responses were obtained closer to their response thresholds (defined as in Scott et al. 2011), with median values of +10 dB compared with +30 dB for the Phasic class, and +20 dB for the Phasic and Mixed responses. This difference was significant (P = 0.0027, Kruskal-Wallis; Suppressed responses were excluded from this analysis since only 7 neurons had defined thresholds). The absolute value of the difference between the BF and the tested tone frequency was also greater in for the Suppressed responses than for the Phasic, Sustained, and Mixed responses (P < 10−5 for all comparisons). The median values of these differences, in octaves, were 0.46 for the Suppressed group and 0 for each of the remaining groups. Since cortical PSTH shapes obtained with tones vary with sound pressure level (Malone et al. 2010) and frequency (Malone et al. 2014), we emphasize that the incidence of response types reflects the stimuli used in our experiments, particularly for Suppressed responses, which were obtained further away from BF and best SPL than the other response types.

Duration discrimination performance was similar across response categories.

As the examples in Fig. 1 suggest, effective duration discrimination was compatible with multiple response profiles. We investigated the efficacy of different encoding strategies at the population level by computing duration discrimination performance using spike train classifiers for all neurons in the sample (see methods). The results of this analysis are shown in Fig. 3, A and B, which compares performance of the different classifiers. Results for each cell are color-coded by response category using the same conventions as Fig. 2.

Fig. 3.

Fig. 3.

All response categories include neurons with both poor and excellent duration discrimination performance. A: circles representing the percent correct for each cell are color coded using the conventions in Fig. 2. Duration discrimination is compared for the full spike train (x-axis) and phase-only classifiers (y-axis). The diagonal line indicates parity in performance across the two classifiers. The smaller gray box indicates performance expected by chance for a set of 5 durations (i.e., 20%), and the larger indicates values corresponding to the listed statistical criterion. B: duration discrimination is compared for the full spike train (x-axis) and rate-only classifiers (y-axis).

The distributions of percent correct for the full spike train and phase-only classifiers are similar, resulting in the clustering of all points near the diagonal (Fig. 3A). There were no significant differences between the full spike train and phase classifier performance for any response category (Wilcoxon sign rank, P > 0.08 for all comparisons). By contrast, both the full spike train and phase classifiers significantly outperformed the rate-only classifier for all response categories (Fig. 3B, Wilcoxon sign rank, P < 0.0001 for all comparisons).

Duration discrimination performance among the different response categories was thoroughly mixed when information about the temporal dynamics of cortical responses is retained (Fig. 3A). We confirmed that there were no significant differences in duration discrimination performance between all possible response category pairs (n = 6, Wilcoxon rank sum, P > 0.05) for both the full spike train and phase-only classifiers. This indicates that no particularly encoding strategy was better overall than any other. Moreover, neurons drawn from each response category could be found at both the upper and lower reaches of the performance distribution, suggesting that performance reflected intrinsic limits on the quality of the encoding (e.g., high trial-to-trial variability) rather than the form of the encoding.

Careful examination of Fig. 3B suggests that duration discrimination performance was generally better for Sustained responses (red circles) for the rate classifier, as might be expected from the example cell in Fig. 1C, where the relationship between duration and total spike count was strong. Average firing rate-based classification was significantly better for Sustained responses than for Suppressed responses (Wilcoxon rank sum, P = 0.0149), Phasic responses (P < 0.0002), and Mixed responses (P = 0.0387), though the difference was marginally significant for all but the Phasic comparison. Again, the increased spike counts associated with longer stimuli for Sustained responses are not genuinely “duration-tuned.” Thus, we found little evidence for genuine duration-tuning in our sample for the range of tone durations we tested.

We were surprised to find that multiple encoding strategies were similarly effective. The exception, obviously, would be pure onset responses of the sort depicted in Fig. 1, A and B. With this in mind, it is instructive to reconsider the response cube in Fig. 2A. The size of the circular icons increase in linear proportion to duration discrimination performance for the full spike train classifier. Among Phasic responses (green), for example, duration discrimination is best when the offset responses are robust (e.g., Fig. 1, A and B), which is mapped to the front edge of the response cube. For obvious reasons, robust offset responses guarantee robust duration discrimination, as indicated by the large circles on the left-front face of the response cube, which include both Phasic and Suppressed responses.

Duration coding efficiency varied significantly with response type.

Although we found that the efficacy of different response types was equivalent, we hypothesized that the efficiency of those representations would vary significantly. For example, two brief response transients can unambiguously demarcate the signal boundaries with few spikes, while Sustained responses must maintain elevated firing throughout the signal duration. As expected, the efficiency of the cortical representation of tone duration varied across response types for the neurons in our data sample. Efficiency was defined as the ratio of performance for the full spike train classifier (in percent correct) to the average firing rate. Suppressed responses were significantly more efficient that Phasic (Wilcoxon rank sum, P < 10−5), Sustained (P < 10−6), and Mixed (P < 10−8) responses. Phasic responses were significantly more efficient than Mixed responses (P = 0.0012).

This pattern of results mirrored the pattern for significant differences in average firing rates: Median firing rates differed by response category, being lowest for the Suppressed (7.3 spikes/s) and Phasic (15.0 spikes/s) classes, and highest for the Sustained (22.6 spikes/s) and Mixed (22.0 spikes/s) classes. Since performance was similar between response categories, this translated into greater duration encoding efficiency for the classes with lower firing rates. Nevertheless, higher firing rates overall were significantly correlated with better duration discrimination within each response category (Suppressed: r = 0.52, P = 0.0098; Phasic: r = 0.48, P < 10−6; Sustained: r = 0.45, P < 10−6; Mixed: r = 0.60, P < 10−9).

Response diversity constrains decoding models for the temporal boundaries of gated tones.

Even if effective duration discrimination is compatible with multiple encoding strategies in individual neurons, the diversity of those strategies may constrain the effectiveness of population codes that combine the outputs of multiple neurons.

Our spike train classification analysis above was based on the similarity (i.e., Euclidean distance) of single spike trains to templates based on all other responses of the same neuron. That is, each neuron's responses are used to build a customized decoding model for each of its spike trains, which we refer to as “within-cell” decoding. This represents an upper bound for decoding the responses of that cell. If the responses of a given set of neurons are highly stereotyped, then a generic decoding would suffice.

To obtain a more quantitatively rigorous estimate of the diversity of cortical spiking patterns, we introduced a novel variant of the spike train classification procedure we refer to as spike train identification (see methods). This procedure attempts to match individual spike trains to the neuron that produced them, rather than to the stimulus that elicited them. Identification performance was quite good: 54% of spike trains were associated with the correct neuron for responses concatenated across duration (chance performance is roughly 0.36% for 279 neurons). Importantly, performance of the phase-only “identifier” was 42.7%, compared with 12.6% for the rate-only identifier. This comparison indicates that the temporal pattern of spikes was more discriminative than differences in average firing rates across different cortical neurons. Thus, regardless of response type, cortical spike trains could be identified with their “cell of origin” better than half the time, demonstrating that the temporal dynamics of cortical firing patterns are idiosyncratic: they are both highly conserved across repeated trials and meaningfully diverse across the cortical population.

Identification errors typically occurred within the same response category. In fact, more than twice as many identification errors occurred within the same response category than across response categories (879 vs. 407, ratio 2.16). We estimated the expected ratio of within/across category errors by generating simulated matrices based on random reassignments of the identification errors by rows and never observed a simulated ratio larger than the actual ratio (mean = 0.40, P < 0.0001). The relevance of the response categories is evident in Fig. 4B, which depicts the correlations computed between all pairs of PSTHs (concatenated across duration and binned at the 25 ms resolution that was found to be optimal for spike train identification). The patchwork structure indicates that PSTH correlations were strongest within response categories but that strong correlations between Phasic and Mixed responses were also common, as might be expected by their adjacency on the response cube (green and purple points on Fig. 2A). The structure of the correlations in the matrix supports the appropriateness of our categorization procedure.

Fig. 4.

Fig. 4.

Response categories capture important features of the substantial diversity of cortical responses. A: the matrix illustrates the distribution of correlation coefficients for the response PSTHs concatenated across the 5 standard durations at a temporal resolution of 25 ms. The color bar indicates the value of the correlations. B: this matrix indicates the median duration discrimination performance for the cells in each response category (grouped by rows) when the spike trains to be classified are compared against different templates, grouped by columns. The color bar indicates the mapping of discrimination performance (in percent correct) to color; for convenience, the values indicates by the colors appear within each cell of the matrix.

How might the idiosyncrasy of cortical response profiles affect a population code for signal duration? First, we turn to a related question: How important is the decoding model for interpreting cortical spike trains appropriately? We addressed this question by classifying the spike trains from each cell against templates derived from all other cells in the population, as well as templates derived from all other cells within the given cell's response category. In effect, the group templates are the composite PSTHs depicted in Fig. 2, except that the cell whose responses are being classified is removed from the composite.

We summarize the results of this analysis as a table superimposed on a heat-map of duration discrimination performance in Fig. 4B. Each element of the matrix represents the median value (in percent correct) for neurons in a given response group (rows) when compared against responses defined by each column. The leftmost column shows the values for the standard, within-cell classification process, while the second column shows the values when the spike trains for each cell are compared against global composites based on all other cells. The remaining four columns indicate median performance for comparisons against response profiles averaged within each of the four response categories. Comparison of the first two columns clearly indicates that the diversity of cortical responses poses a genuine challenge for a one-size-fits-all decoding model (Wilcoxon rank sum; Suppressed P < 10−5, Phasic P < 10−11, Sustained P < 10−4, Mixed P < 10−9). The performance decreases for the global cortical response template relative to within-cell classification were significantly correlated with performance for the within-cell classification (r = 0.62, P < 10−31). That is, the better a cell performed when decoded with a customized decoding model, the more discrimination performance was compromised by the one-size-fits-all decoding model. The effect of replacing within-cell decoding models with the global composite decoding model varied between groups. The largest performance reduction occurred for the least populous Suppressed class (60 vs. 22%); the smallest reduction occurred for the most populous Sustained class (54 vs. 40%). Cells in the Sustained class outperformed all other response classes (Wilcoxon rank sum, P < 0.0004 for all comparisons) when using the decoding model based on averaged cortical response profiles.

If the one-size-fits-all model fails, would a set of “one size per response category” decoding models be adequate? Comparison of the first column against the diagonal of the third through sixth columns indicates that the challenge posed by the diversity of cortical spiking patterns is only partly ameliorated by using within-category decoding models. Using decoding templates based on cells restricted to the same response category generally improved duration discrimination, particularly for cells in the Suppressed (56 vs. 22%, Wilcoxon rank sum, P < 10−4) and Phasic (34 vs. 24%, P < 0.0003) response classes. Overall, however, the improvements were relatively modest, reflecting the diversity of responses within each response category (Fig. 2A).

The remaining elements of the matrix (Fig. 4B) indicate that cells in particular responses classes are differentially compatible with templates drawn from other classes. For example, Mixed templates are poor for Suppressed spike trains (20%), but Suppressed templates can be appropriate for Phasic spike trains (37%), since the latter often share the offset response feature in their temporal profiles. Both Suppressed and Phasic response templates reduce duration discrimination to or near chance for Sustained and Mixed spike trains. In fact, only Sustained responses, which were the most common, can be classified as effectively using the global response template instead of the within-class templates (the performance distributions with medians of 40 and 34% did not differ significantly: P > 0.2). Collectively, these results demonstrate that although the spike trains of many cortical neurons are highly informative about the temporal boundaries of gated tones, decoding models that “understand” each cell's spiking behavior may be necessary to extract that information effectively.

Cortical response diversity can be overcome by sufficient response pooling.

How effectively is tone duration encoded by populations of cortical neurons, and how does decoding performance scale with population size and composition? In this section, we examine duration discrimination for spike trains either averaged or aggregated across multiple cells. The templates used for comparison are derived from the same population of cells that supply the “test” spike trains. This is the multicell analog of the standard, complete cross-validated classification procedure. We combined data across cells in two ways: 1) The “convergence” model simulates the convergence of multiple cells on a single output neuron by simply adding the PSTHs of multiple cells; 2) The “labeled line” model preserves information across cells by serially concatenating the PSTHs. The labeled line model does not simulate a biological process but serves as a benchmark for the convergence model (Schneider and Woolley 2010). The combined data were then used as input to the classifiers, and the percent of trials correctly associated with the duration that elicited the spike trains were computed for 1,000 randomly selected groups of 2, 4, 8, 16, 32, and 64 cells.

Results of this analysis are shown in Fig. 5 for the two models, three classifiers, and four response classes, using the color coding scheme introduced in Fig. 2. We connected the curves based on the simulations to the within-cell classifier results (i.e., for n = 1) shown in Fig. 3 to provide a reference for the increases in performance achieved by pooling additional cells. When data from all cells (black lines) were combined, the labeled line models outperformed the convergence models for each of the classifiers and for all six simulated cell counts (Wilcoxon signed rank, P < 10−25 for every comparison). Note that the signed rank test can result in significant differences even when the median values are equal (e.g., 100%) because the test determines whether the median value of differences in the paired values is different from zero. The advantage for the labeled line model is particularly apparent for the rate-only classifier, which shows only modest improvement (∼10%) for increasing cell counts when the responses are summed across cells compared with the improvement when the average rate information is concatenated (>50%). For the full spike train and phase-only classifiers, the labeled line models typically achieve a criterion level of performance with roughly half as many cells (e.g., 90% with 8 rather than 16 cells). Clearly, combining the responses of cells with diverse response profiles results in the relative loss of information about stimulus duration. Nevertheless, both models achieve essentially perfect duration discrimination with as few as 64 cells, suggesting that such loss can be obviated with even a modest responding population.

Fig. 5.

Fig. 5.

The sets of curves in this figure indicate the average duration discrimination performance for each of the classifier types as functions of the numbers of cells included in the classification process. When the number of cells is 2 or greater, the curves reflect the mean performance averaged over 1,000 random draws of n cells in the database. Vertical lines indicate ±2 SE. The color coding scheme is the same introduced in Fig. 2. Heavier line weights are used to indicate results for the full spike train classifier. Results from the phase-only classifier are often difficult to discern because they overlap almost entirely with results from the full spike train classifier. Results for Suppressed responses terminate at n = 16 since there were only 24 cases in total. A: all curves reflect performance of the Convergence model. B: all curves reflect performance of the Labeled Line model. C: these curves indicate the difference in performance between the 2 population coding models (Convergence − Labeled Line). Filled circles on the colored curves indicate significant differences (P < 0.0001) from the black curve representing pooling of data without regard to response group.

We also calculated the performance of the rate classifier using the entire population of neurons in our sample (n = 279), which resulted in duration discrimination performance of only 50%. Because Sustained responses showed the best performance with the rate classifier (thin red curve on Fig. 5A), we computed performance for all neurons in this category (n = 95). Surprisingly, performance worsened from roughly ∼60% with 64 neurons to 52% with all of them, indicating that additional pooling is not always advantageous. We sorted all neurons in the Sustained class by performance with the rate classifier and then pooled them in order from best to worst. Optimal performance (92–94%) was achieved with the best 4–12 Sustained neurons, with further pooling resulting in steady declines to 74% with 79 neurons. Pooling of the worst 16 neurons caused performance to drop precipitously to 52%. These findings suggest that perfect rate decoding of tone duration could not necessarily be achieved with an arbitrarily large population of neurons if the pooling process is indiscriminate. They also demonstrate that decoding performance can be substantially optimized by careful selection of the neurons admitted into the response pool.

Figure 5 also indicates that discrimination performance for groups of cells categorized in different response classes varied significantly. The colored curves indicate the results of combining data from cells within particular response classes. Suppressed responses (blue) outperformed all other response types for cell counts from 2 to 16 cells (Wilcoxon rank sum, P < 10−6 for all comparisons) for both population coding models. According to the convergence model, Sustained responses (red) uniformly outperformed Mixed responses (purple, P < 10−12 for all comparisons) and outperformed Phasic responses for all combinations up to 32 cells (P < 10−4 for all comparisons). Both the Suppressed and Sustained responses also significantly outperformed the full population curve (black) for all simulated group sizes (P < 10−7 for all comparisons). This likely reflects the fact that both of these response classes are based on sustained responses to tones, unlike the Phasic and predominantly phasic Mixed responses, which more closely match the population curve.

With respect to the interaction between cortical response diversity and population coding, however, the relevant question is how much combining responses within response classes improves duration discrimination performance. To estimate this quantity, we expressed each curve in Fig. 5A relative to its labeled line counterpart in Fig. 5B and graphed the differences in Fig. 5C to facilitate comparison between in-group results (colored curves) against combinations of neurons drawn from the full population (black curve). These data are based on the full spike train classifiers. The V-shaped curve for all cells (black) indicates that duration discrimination improves more rapidly for the labeled line classifier when more cells are pooled for groups of eight or fewer cells, but thereafter the discrimination deficit declines since both population coding models asymptote to nearly perfect (>99%) discrimination with 64 cells. The colored curves for the Suppressed (blue), Sustained (red), and Phasic (green) responses show consistently smaller deficits. Thus, pooling over cells with similar responses reduces the information loss relative to the labeled line benchmarks. As we observed for neural responses considered individually, within-category decoding strategies significantly but only partly overcome the neuron to neuron diversity in cortical spiking patterns.

Direct estimation of tone onsets and offsets with single spike trains.

Spike train classification allows us to assign the spike train from each trial to one member of the set of tested durations (see above). This is not the same as directly estimating when a tone begins and ends. For tones aligned to a common onset time (i.e., 0 ms), classification of tone duration is equivalent to classification of the tone offset time. Ideally, we would like to estimate both the onset and offset of gated tones. We would also like to be able to quantify the contribution of physiological transients to demarcating tone boundaries and to compare the physiological salience of onsets against that of offsets. To do so, we developed a novel likelihood estimator that allowed to us assign the likelihood of a given spike train for any specified stimulus onset or offset (see methods). The estimator “knows” the distribution of spike counts expected for narrow bins (5 ms) for intervals defined with respect to the stimulus for all other trials. For the Tonic model, the estimator knows only the distribution of binned spike counts when the stimulus is On, or when the stimulus is Off. For the Phasic+Tonic model, the estimator also knows the distributions of binned spike counts within 50 ms of stimulus Onset and Offset. In effect, this model exploits the knowledge that rapid envelope transitions at stimulus onset and offset are robustly signaled by physiological transients in many cortical responses. These likelihood models are not intended to simulate a biologically plausible decoding process but, rather, to convert sets of spiking responses obtained for single trials into compact functions that capture how informative those responses are about the temporal boundaries of gated tones.

The inclusion of Phasic responses in the Phasic+Tonic model produces significant improvements in estimating tone boundaries when phasic responses are salient features of cortical responses. The set of eight panels at the top of Fig. 6 illustrate the performance of the estimators for tone offset for the four cells depicted in Fig. 1 for the Tonic (Fig. 1, A–D) and Phasic+Tonic (Fig. 1, E–H) models. Each likelihood function is color-coded to reflect the actual stimulus duration. Spike trains from the neuron that responded to tones at varying durations with similar bursts of activity limited to stimulus onset (Fig. 1, A and B) cannot be used to estimate tone offset (Fig. 6, A and E), as indicated by the high degree of overlap among the different likelihood functions. Spike trains from the neuron with well-defined onset and offset responses (Fig. 1, C and D) cannot be used to estimate tone offset in the context of the Tonic model (Fig. 6B), since the onset and offset bursts are similar, which results in similar distributions of binned spike counts in both the stimulus On and stimulus Off intervals. However, the Phasic+Tonic model differentiates onset and offset responses, resulting in successful estimation of tone offset (Fig. 6F). The neuron depicted in Fig. 1, E and F, was chosen as an exemplar of a sustained response, and as expected, the Tonic model provides very good estimates of tone offset (Fig. 6C). For the Phasic+Tonic model, the estimates are less concentrated but are better centered on the actual offset values. We should note that we did not compensate for the response latency in either model. Finally, spike trains from the neuron that exhibited suppression and offset transients across duration (Fig. 1, G and H) provided useful estimates of tone offset for the Tonic model (Fig. 6D), since all curves show very shallow peaks near the actual offset values. The slight slope of the functions reflects the suppression of the responses during the tones, while the sharp transition for offsets exceeding the peak reflects the fact that bursts of activity (i.e., the actual offset responses) are highly unlikely when the tones are On. The Phasic+Tonic model provides much better estimates (Fig. 6H).

Fig. 6.

Fig. 6.

Estimation of stimulus offset for the Tonic and Phasic+Tonic maximum likelihood models. AD: the results of the Tonic model when estimating tone duration based on the 4 sets of PSTHs illustrated in Fig. 1. Each curve represents the trial-averaged, sum-normalized likelihood of tone durations from 0 to 450 ms, given the recorded cortical responses (see methods). Gray vertical lines indicate the actual tested durations, and colored vertical lines indicate the maxima of each curve. The colors correspond to the actual tone durations of 50 (red), 100 (yellow), 200 (green), 300 (blue), and 400 (purple). EH show the corresponding curves for the Phasic+Tonic model. I and J show population histograms of the estimated tone durations from each neuron, for each of the tested tone durations. The estimated tone duration was defined as the peak of the likelihood function (identified by vertical lines in AH). Colored vertical lines indicate the actual tested durations.

Results for the population are illustrated by the histograms in Fig. 6, I and J, which indicate the maxima of the likelihood functions for all neurons for each of the standard durations. The Phasic+Tonic model produced more accurate estimates, as evidenced by the larger proportions of cells with estimated offsets near the actual offset times. We compared the performance of the models by computing the estimation error, defined as the sum of the products of the distances between the actual offset and the height of the likelihood function at each distance (see methods). Estimation error was significantly lower for the Phasic+Tonic model than for the Tonic model (median ratio 0.7, Wilcoxon signed rank, P < 10−222).

We performed a similar analysis for the estimation of stimulus onsets (see methods). Examples from the neurons depicted in Fig. 1 are shown in A–H on Fig. 7. In cases where the likelihood functions are sharply peaked, we have included insets with expanded time axes. Spike trains from the neuron with pure onset responses (Fig. 1, A and B) provided poor estimates of the onset for the Tonic model (Fig. 7A); the likelihood functions show long plateaus since all candidate onset values that include the burst of neural activity at the actual stimulus onset within the On interval are equivalently likely (the exact positioning of the lines indicating the maxima reflect small fluctuations in likelihood that are not likely to be significant). When the Onset and Offset intervals are included, however, this neuron provides effective estimates of the stimulus onset (Fig. 7E), as do the remaining example neurons (Fig. 7, F–H). As was the case for offsets, estimation errors for onsets were significantly smaller for the Phasic+Tonic model (Wilcoxon signed rank, P < 10−139). Collectively, these maximum likelihood models demonstrate that the bursts of neural activity corresponding to rapidly gated envelope changes are sufficient to detect these “temporal edges” of gated tones with excellent precision.

Fig. 7.

Fig. 7.

Estimation of stimulus onset for the Tonic and Phasic+Tonic maximum likelihood models. AD show the results of the Tonic model when estimating tone onset based on the 4 sets of PSTHs illustrated in Fig. 1. Each curve represents the trial-averaged, sum-normalized likelihood of tone onset (i.e., 0 ms), given the recorded cortical responses (see methods). Inset in c shows the same data on an expanded time axis. The color conventions used in Fig. 6 were retained, even though the correct onset estimate is 0 ms for all tone durations. Estimated onset times are both positive and negative due to the circularization of the time axis (see methods). EH show the corresponding curves for the Phasic+Tonic model, including inset panels with an expanded time axis. I and J show population histograms of the estimated onset time from each neuron for each of the 5 most commonly tested tone durations. Estimated onset time was defined as the peak of the likelihood function (identified by vertical lines in AH).

Is the cortical representation of tone onsets more salient than that of tone offsets? Comparison of Figs. 6 and 7 also suggests that they are. We examined this issue by comparing the distributions of estimation error (see methods) for tone onsets and tone offsets for both the Tonic and Phasic+Tonic models. Each cell supplies five separate values for the estimation error (i.e., one for each tested duration). Median estimation errors were significantly lower for onsets than offsets for both the Tonic (Wilcoxon rank sum, P < 10−312, medians: 27.2 vs. 60.0) and Phasic+Tonic models (P < 10−212, medians: 8.8 vs. 42.7). Thus, the median estimation error for onsets was roughly one-fifth that of offsets when the Phasic+Tonic model was used.

To place these results in context, we computed the span centered on the correct estimate needed to encompass >50% of the area under the likelihood functions (these spans were averaged across the 5 tone durations). The span is small when the likelihood functions are sharply peaked at the correct estimate and large when the peak is mislocated or when the likelihood function is flat because the cell is poorly responsive. For onsets, the bulk (>50%) of the likelihood function area occurs within 25 ms of the correct estimate in one-third of the neurons in our sample. For offsets, this was true of only 1.5% of cells. Thus, cortical responses appear to signal the onset of tones from silence significantly far more effectively than they signal their cessation.

Contextual effects related to the stimulus duty cycle.

Since the trial duration was held constant at 1,000 ms, the changes in stimulus duration produced concomitant changes in the stimulus duty cycle, which varied from 5% for 50 ms tones to 40% for the 400 ms tones. Because cortical neurons are sensitive to the temporal context in which the tones are presented (e.g., Malone et al. 2002), these differences in stimulus duty cycle related to the sequential presentation of the tones could affect the responses we observed.

First, we investigated the effects of stimulus duty cycle by restricting the spike train classification analysis to the final 500 ms of each trial, an interval that begins 100 ms after the offset of the longest tones. Performance of the classifiers when estimating tone duration of the tones for data limited to putatively “spontaneous” spikes is shown in Fig. 8A. A sizeable minority of cortical neurons was able to perform the discrimination at rates significantly above chance (29, 19, and 13% of all neurons for the full spike train, phase-only, and rate-only classifiers, respectively; P < 0.001). However, only half of the trials were assigned to the correct duration for the best cell, whose responses are depicted in Fig. 8B. In this case, tones suppressed firing throughout their duration, resulting in persistent offset responses that increased in proportion to the duration of the tones. The cell whose responses are shown in Fig. 8C showed the opposite behavior: increasing the duration of the tones increased the duration of the neural responses, as well as a persistent reduction in the spontaneous rate subsequent to the cessation of the driven responses.

Fig. 8.

Fig. 8.

Contextual modulation of cortical responses permits the discrimination of tone duration from the “spontaneous” rates and onset responses. A: the scatterplot compares performance of the different spike train classifiers when the data are limited to spiking patterns recorded from 500-1,000 ms in each trial, at least 100 ms after cessation of the longest duration tones. To reduce icon overlap, results for the full spike train vs. rate-only classifier comparison (black circles) were shifted 1% to the right in A and D. Results for the full spike train vs. phase-only classifier comparison are indicated by gray circles. The smaller of the 2 gray boxes indicates chance performance, while the larger box indicates performance equivalent to the indicated P value (0.001). B and C show the responses of 2 different cells that exhibited the best discrimination of tone duration on the basis of changes in the spontaneous rates. Responses contributing to the discrimination are indicated by black histogram bars; excluded data are indicated by gray histogram bars. Thin vertical lines on the PSTHs indicate tone offset. DTFs to the right of each panel. Vertical lines on the DTF indicate ±2 SE. Performance of the full spike train classifier is indicated by the inset confusion matrix on each panel, and the percent correct is included at the top right of the matrix. D: the scatterplot is analogous to that in A, but the analysis interval is limited to the first 100 ms of the response, and only durations of 100 ms or greater (n = 4) are discriminated, which shifts both chance and the statistical criterion to larger values, as indicated by the gray boxes. E and F obey the same conventions as B and C.

We also examined the effects of stimulus duty cycle on the magnitude of the responses within 100 ms of tone onset. In this case, we limited the analysis to tone durations of 100, 200, 300, and 400 ms (chance = 25%). This means that within the analysis interval, the stimuli are equivalent and can only be differentiated on the basis of stimulus history, since the time elapsed from the offset of the previous tone in the sequence is smaller for longer duration tones. Performance of the classifiers when estimating tone duration prior to the offsets of the tones is shown in Fig. 8D. Once again, a sizeable minority of cortical neurons exceeded chance performance on the discrimination (37, 21, and 15% of all neurons for the full spike train, phase-only, and rate-only classifiers, respectively; P < 0.001). Performance of the best cell was 67.5% (Fig. 8E), reflecting reduced responsiveness during the initial 100 ms of each trial, which we attributed to a failure to fully recover from the response to the preceding stimulus (for trials 2–10). This notion is also supported by the fact that during the last half of the trials (500-1,000 ms) spontaneous firing is also clearly reduced, much like the example in Fig. 8F. Thus, the effects of stimulus context continue to shape the responses of some cortical neurons for hundreds of milliseconds after direct acoustic stimulation has ceased.

These results imply that the blocked presentation of the tones we employed likely improved the performances of the classifiers somewhat when considering the initial 500 ms of each trial. Nevertheless, comparison of the distributions of classifier performance in Fig. 3 against those in Fig. 8 suggests that these effects are likely to be subtle when weighed against duration discrimination performance overall.

DISCUSSION

Cortical spiking activity can provide a robust demarcation of the envelopes of gated tones via mixtures of tonic and phasic responses distributed across different neurons. The relationship between the stimulus envelope and the “physiological envelope” can be complex and exhibits substantial diversity among cortical neurons. We demonstrated that cortical spiking patterns are meaningfully diverse in the context of population-based decoding methods, such that a decoding model based on a given cortical cell typically performs poorly when applied to different cells. Sound onsets are more robustly encoded than sound offsets in the cortex, which may represent a novel neural basis for their greater perceptual salience (see Phillips et al. 2002 for review). Our models incorporating both phasic and tonic response features discriminated onsets and offsets more effectively than those using only tonic response features; this indicates that the nonlinear enhancement of the cortical representation of envelope transients contributes substantially to defining the temporal edges (Phillips et al. 2002) of acoustic stimuli.

Duration tuning versus duration coding.

To place our results in the appropriate context, it is essential to distinguish between duration tuning, typically defined as duration-selective responses as measured by firing rate, and duration coding, defined here as the ability to discriminate signal duration on the basis of neural spiking patterns. Duration tuned neurons (DTNs) respond more strongly to stimuli of particular durations and have typically been categorized as short pass, band pass, and long pass depending on the location of the preferred duration within their DTF (Faure et al. 2003; Sayegh et al. 2011). Long-pass neurons, in this context, are neurons that do not respond to stimuli below a minimum duration, and not simply neurons that fire more spikes for longer durations. Neurons that fire more spikes in response to longer stimuli (e.g., Fig. 1E) are not considered to be tuned for duration since their responses can be explained “by simple integration of stimulus energy” (Sayegh et al. 2011).

Importantly, it is possible for a system to evidence perfect duration coding in the complete absence of duration tuning. For example, a neuron that produced equivalently robust phasic bursts of activity time-locked to stimulus offset could encode stimulus duration without being “tuned” to it. Our results based on the spike train classifiers indicate that rate-based duration tuning is generally poor, except for sustained responses, which reflect the “simple integration of stimulus energy.” Thus, our results suggest that neurons in the auditory core of awake macaques robustly encode tone duration without being duration tuned over the range of durations we tested.

Most studies of duration coding have focused on rate-based duration selectivity and have relied on stimuli whose durations match self-generated signals in the animal models under study, such as frog vocalizations (Narins and Capranica 1980) or bat echolocation signals (Ehrlich et al. 1997; Faure et al. 2003; Pinheiro et al. 1991). For example, Aubie et al. (2014) decoded spike counts recorded from bat midbrain neurons responding to tones with durations ranging from 1 to 25 ms. The much longer tone durations we presented (50–400 ms) are the inverses of the range of modulation frequencies essential for speech signals (2.5–20 Hz) and known to be robustly represented by cortical neurons in this species (Malone et al. 2007). Moreover, the use of relatively long tones enabled us to assess the extent to which adaptation attenuated crucial response features such as tonic firing, signaling the continued presentation of the tone, or phasic responses signaling tone offset.

Neurons exhibiting duration-selective responses have been characterized in multiple vertebrate species (Aubie et al. 2012; Sayegh et al. 2011), chiefly frogs (Gooler and Feng 1992; Hall and Feng 1986; Narins and Capranica 1980; Potter 1965; Rose 2014) and bats (Aubie et al. 2014; Casseday et al. 1994; Ehrlich et al. 1997; Faure et al. 2003; Fuzessery 1994; Fuzessery and Hall 1999; Galazyuk and Feng 1997; Macías et al. 2011; Mora and Kössl 2004; Pinheiro et al. 1991; Pollak and Schuller 1981; Sayegh et al. 2011, 2014). However, only a handful of studies focused on duration coding involved mammalian species that are not bioacoustically specialized for echolocation. These include multiple rodent species (chinchillas, Chen 1998; mice, Brand et al. 2000; guinea pigs, He 2002; rats, Pèrez-González et al. 2006), and cats (He et al. 1997; Qin et al. 2009). Genuine duration tuning is rare in the dorsal zone of cat auditory cortex, where most instances consisted of long-pass neurons (He 1997). As noted above, we did not encounter neurons that were truly selective for short signal durations, replicating results in awake cats (Qin et al. 2009). Such tuning preferences were also relatively uncommon in mice (∼17%, Brand et al. 2000) and rats (<10%, Pèrez-González et al. 2006). When cortical neurons in our sample exhibited a preference for shorter durations, that preference seemed most parsimoniously explained by contextual sensitivity to the stimulus duty cycle (or equivalently, the time elapsed since the last tone was presented; Fig. 8, D–F). Overall, duration selectivity appears to be relatively rare in nonspecialized mammalian auditory systems, including those of monkeys.

Psychophysical duration discrimination.

The tone durations we presented are trivially easy for humans to distinguish: Weber fractions (Δt/t) range from 0.15 to 0.25 for tones from 50–400 ms (Abel 1972; Rammsayer 2014). Although Weber fractions in macaques for similar base durations are roughly twice those obtained for humans (Sinnott et al. 1987), the durations in this study remain well within the discriminable range for this species. This may explain why perfect duration discrimination was observed in individual cortical neurons, and why perfect discrimination was achieved with relatively few (e.g., <64) neurons in population pooling models with access to spike timing information. We tested a number of cortical neurons with durations intermediate to the five analyzed in this report (e.g., 25, 75, 125, 150, etc.) and found that smaller differences in duration are reliably discriminated by a subset of cells with particularly robust or precise responses. The modal optimal bin-width for duration discrimination estimated from the spike train classification procedure (25 ms) was longer than that for amplitude (Malone et al. 2007) and frequency (Malone et al. 2014) modulation frequency discrimination (∼10 ms) in an overlapping neuronal sample. This indicates that cortical neurons benefit from signal averaging up to the temporal resolution limits enforced by the stimulus dynamics (Kayser et al. 2010).

Diverse neural codes for encoding temporal stimulus boundaries.

Duration coding efficacy depends on both phasic (e.g., offset bursts) and tonic (i.e., sustained) response features, and we found that both phasic and tonic encoding styles could be effective in demarcating the temporal boundaries of gated tones. Qin et al. (2009) reported that Sustained responses were more effective than Offset responses when discriminating tone durations from 20 to 320 ms against 10 ms reference tones via PSTH-based ROC analyses. Similarly, we observed that the Tonic response group had the highest percentage of neurons exhibiting significant duration discrimination performance for the rate classifier and the best performance for the rate-classifiers when pooled (Fig. 5A). As Fig. 2 indicates, Mixed neurons with more sustained responses encoded duration more effectively, as did Phasic neurons with more robust offset responses. Our likelihood estimation results comparing the Phasic+Tonic and Tonic models clearly indicate that phasic responses are salient and informative features of the cortical representation of gated tones. Phasic neural responses may have specialized roles in parsing complex auditory scenes (Heil, 2003; Hu and Wang 2007; Phillips et al. 2002), perhaps by providing a “reset signal” (Knüsel et al. 2004) or a temporal reference frame for decoding (Brasselet et al. 2012; Panzeri et al. 2014).

The origin and distribution of different cortical responses features, onset, sustained, and offset firing, is of particular interest given their role in defining the neural representation of temporal acoustic edges (Fishbach et al. 2001). Our response categories are broadly consistent with prior reports involving awake macaque cortex, which described both phasic and tonic responses, and both inhibitory and excitatory offset responses (Pfingst and O'Connor 1981). Recanzone (2000) reported a “continuum of responses,” the most prevalent being ‘sustained-onset’ neurons, which correspond to Mixed neurons in our sample. The greater prevalence of sustained responses in our sample may reflect our use of best frequency and level in most neurons (Wang et al. 2005).

Historically, the prevalence of cortical offset responses may have been underestimated (Phillips et al. 2002) due to the use of barbituate anesthesia (see Moshitch et al. 2006; Qin et al. 2007). Offset responses in central structures may be inherited from peripheral structures, such as the superior paraolivary nucleus, which has been suggested to be specialized for encoding temporal information given its sharp offset responses to tones, precise entrainment to envelope modulations, and responses to short gaps (Behrend et al. 2002; Felix et al. 2011, 2013; Kadner and Berrebi 2008; Kadner et al. 2006; Kopp-Scheinpflug et al. 2011; Kulesza et al. 2003). This idea is compatible with intracellular recordings that suggest that cortical offset responses reflect the activity of distinct synaptic inputs rather than postinhibitory rebounds (Scholl et al. 2010).

Offset responses provide a means to signal stimulus offsets at a substantially reduced metabolic cost relative to sustained responses. Because sustained responses must endure throughout the stimulus duration, the increased efficiency of phasic responses increases with increasing stimulus duration. Our comparison of duration discrimination efficiency among response types showed that a substantial fraction of cortical neurons employ this encoding strategy. However, higher firing rates were associated with better duration discrimination in all response categories, including Phasic and Suppressed responses. If we consider the spontaneous background activity in a given auditory structure as “noise,” these response types could be considered as a more efficient alternative to tonic firing for enhancing the “signal to noise” ratio in the representation of acoustic transients.

Onset transients are such salient features of auditory responses (Heil et al. 2003) that they are often excluded to avoid contaminating the analysis of sustained response features, such as synchrony to modulation (e.g., Malone et al. 2013). As is evident in the average cortical response profiles in Fig. 2B, as well as the group averages for Phasic and Mixed responses (Fig. 2, C and D), onset responses are arguably the dominant cortical response feature for gated tones, even in awake animals (see also Qin et al. 2009). These results are also consistent with the observation that the rising phase of the amplitude envelope of modulated signals elicits the highest firing rates in many cortical neurons (Malone et al. 2007; Wang et al. 2014; Zhou and Wang 2010). Comparison of Figs. 6 and 7 helps place the relative salience of cortical offset and onset responses in perspective and strongly suggests that cortical response profiles are adequate to support perceptual grouping based on temporal cues such as common onsets, and consistent with well-known perceptual asymmetries favoring onsets over offsets (Phillips et al. 2002).

An important related question regarding cortical response types is whether they reflect underlying neuronal types or shift with stimulus parameters such as frequency and level. For example, Tian et al. (2013) recently used the patterns of onset and offset responses elicited by band-pass noises of differing center frequencies to create a taxonomy of cortical neurons analogous to simple and complex cells in the visual system. Lin and Liu (2010) identified a physiologically distinct class of short latency, highly precise, putatively inhibitory “thin spike” neurons that preferentially encoded stimulus onsets. He (2002) reported anatomically segregated “On” and “Off” pathways in the auditory thalamus of the guinea pig. In their sample, however, On-Off neurons sometimes changed their response type with stimulus changes, while On and Off neurons remained consistent to type. However, the demonstration that phase-only classifiers outperform rate-only classifiers not just for modulated stimuli (Malone et al. 2007, 2010, 2014), but also for “static” tones varying in level (Malone et al. 2010) and frequency (Malone et al. 2014) provides evidence of stimulus-driven reorganization of cortical spiking patterns, and not simply rescaling of otherwise consistent response profiles.

Population coding with temporally heterogeneous cortical spiking patterns.

Template-based decoding methods (Foffani and Moxon 2004), such as the classifiers used here, incorporate knowledge about each cell's characteristic response profile. As we demonstrated, these response profiles are often distinct across neurons and reproducible across trials, permitting us to identify particular spike trains with the neurons that produced them better than half the time. In the context of population coding, however, the heterogeneity of cortical response profiles poses a challenge: How can cortical response features be related to stimulus features if such mappings vary significantly from neuron to neuron? As we show, the effectiveness of template-based spike train decoding depends significantly on the appropriateness of the decoding model: decoding models using templates derived from alien response types are often poor (Fig. 4B). Nevertheless, our results also suggest that with a sufficient neuronal population, the suboptimal aspects of indiscriminate pooling can be overcome when spike timing information is preserved (Fig. 5). When relying solely on firing rates, however, indiscriminate pooling of many neurons may not always be sufficient to account for psychophysical performance (e.g., Micheyl et al. 2013), particularly if the discrimination task is explicitly based on timing, as was the case here.

The implications of response heterogeneity on neural population codes are the subject of intense experimental and theoretical interest in multiple sensory systems (Abbot and Dayan 1999; Averbeck et al. 2006; Buonomano and Maass 2009; Butts and Goldman 2006; Chelaru and Dragoi 2008; Ganguli and Simoncelli 2014; Mejias and Longtin 2012; Osborne et al. 2008; Padmanabhan and Urban 2010; Ringach 2010; Shamir and Sompolinksy 2006; Tripathy et al. 2013; Zheng and Escabi 2008). However, much of this work focuses on differences among rate-defined tuning functions, rather than on the temporal differences among spiking patterns we have chosen to highlight here. Auditory receptive fields are known to be temporally dynamic: the frequency/intensity response areas of single neurons have been shown to vary for onset, sustained, and offset activity (Bartho et al. 2011; Fishman and Steinschneider 2009; Loftus and Sutter 2001; Qin et al. 2003, 2007; Tian et al. 2013). As a consequence, correlations in response timing and not merely response tuning may influence cortical organization at the microcircuit level to optimize the representation of important stimulus features (Atencio and Schreiner 2013; Bathellier et al. 2012; Hansen et al. 2012; Ince et al. 2013; Padmanabhan and Urban 2014; Sadovsky and MacLean 2014).

Response type-specific decoding models outperformed a generic decoding model based on the average cortical response profile for gated tones. This suggests that diversity in the response profiles of cortical neurons is an important constraint on network connectivity. Such diversity may be relevant for sophisticated decoding strategies that presuppose different encoding styles for distinct neural ensembles. For example, tonic firing among a group of neurons to an ongoing stimulus may be necessary to disambiguate the offset of that stimulus, signaled by a phasic burst of activity, from the onset of a novel stimulus, which would also be signaled by a phasic burst of activity in many neurons. The asymmetry in the robustness of the representation of onsets and offsets could potentially contribute to this process. The fact that Suppressed responses were associated with stimuli that were further from best frequency and level may also be relevant here, since such suppression could help foreground the responses of those neurons that are best tuned for the current sound. Given the significant diversity of cortical responses elicited by simple gated tones, our results suggest that this diversity is leveraged by multiple, circuit-specific processing strategies to represent and parse complex auditory scenes.

GRANTS

B. J. Malone was supported by National Institute of Mental Health Grant MH-12993-02 and National Institutes of Health/Deafness and Communication Disorders Grant DC-011843. B. H. Scott was supported by National Institute on Deafness and Other Communication Disorders Grant DC-05287-01 and a James Arthur Fellowship from New York University. M. N. Semple was supported by the W. M. Keck Foundation.

DISCLOSURES

No conflicts of interest, financial or otherwise, are declared by the author(s).

AUTHOR CONTRIBUTIONS

Author contributions: B.J.M. and M.N.S. conception and design of research; B.J.M., B.H.S., and M.N.S. performed experiments; B.J.M. analyzed data; B.J.M. and B.H.S. interpreted results of experiments; B.J.M. prepared figures; B.J.M. drafted manuscript; B.J.M. and B.H.S. edited and revised manuscript; B.J.M., B.H.S., and M.N.S. approved final version of manuscript.

ACKNOWLEDGMENTS

We thank Bruno Averbeck, Jonathan Pillow, and Joseph Gerard Makin for helpful discussions during the preparation of the manuscript.

REFERENCES

  1. Abbott L, Dayan P. The effect of correlated variability on the accuracy of a population code. Neural Comput 11: 91–101, 1999. [DOI] [PubMed] [Google Scholar]
  2. Abel SM. Duration discrimination of noise and tone bursts. J Acoust Soc Am 51: 1219–1223, 1972. [DOI] [PubMed] [Google Scholar]
  3. Atencio CA, Schreiner CE. Auditory cortical local subnetworks are characterized by sharply synchronous activity. J Neurosci 33: 18503–18514, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Aubie B, Sayegh R, Faure PA. Duration tuning across vertebrates. J Neurosci 32: 6373–6390, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Aubie B, Sayegh R, Fremouw T, Covey E, Faure PA. Decoding stimulus duration from neural responses in the auditory midbrain. J Neurophysiol 112: 2432–2445, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Averbeck BB, Latham PE, Pouget A. Neural correlations, population coding and computation. Nat Rev Neurosci 7: 358–366, 2006. [DOI] [PubMed] [Google Scholar]
  7. Bartho P, Curto C, Luczak A, Marguet SL, Harris KD. Population coding of tone stimuli in auditory cortex: dynamic rate vector analysis. Eur J Neurosci 30: 1767–1778, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bathellier B, Ushakova L, Rumpel S. Discrete neocortical dynamics predict behavioral categorization of sounds. Neuron 76: 435–449, 2012. [DOI] [PubMed] [Google Scholar]
  9. Bee M, Micheyl C. The “cocktail-party problem”: What is it? How can it be solved? And why should animal behaviorists study it? J Comp Psychol 122: 235–251, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Behrend O, Brand A, Kapfer C, Grothe B. Auditory response properties in the superior paraolivary nucleus of the gerbil. J Neurophysiol 87: 2915–2928, 2002. [DOI] [PubMed] [Google Scholar]
  11. Brand A, Urban R, Grothe B. Duration tuning in the mouse auditory midbrain. J Neurophysiol 84: 1790–1799, 2000. [DOI] [PubMed] [Google Scholar]
  12. Brasselet R, Panzeri S, Logothetis NK, Kayser C. Neurons with stereotyped and rapid responses provide a reference frame for relative temporal coding in primate auditory cortex. J Neurosci 32: 2998–3008, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Bregman AS. Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge, MA: MIT Press, 1990. [Google Scholar]
  14. Bregman AS, Ahad PA, Kim J. Resetting the pitch-analysis system. 2. Role of sudden onsets and offsets in the perception of individual components in a cluster of overlapping tones. J Acoust Soc Am 96: 2694–2703, 1994. [DOI] [PubMed] [Google Scholar]
  15. Buonomano DV, Maass W. State-dependent computations: spatiotemporal processing in cortical networks. Nat Rev Neurosci 10: 113–125, 2009. [DOI] [PubMed] [Google Scholar]
  16. Butts DA, Goldman MS. Tuning curves, neuronal variability, and sensory coding. PLoS Biol 4: e92, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Casseday JH, Ehrlich D, Covey E. Neural tuning for sound duration: role of inhibitory mechanisms in the inferior colliculus. Science 264: 847–850, 1994. [DOI] [PubMed] [Google Scholar]
  18. Chelaru MI, Dragoi V. Efficient coding in heterogeneous neuronal populations. Proc Natl Acad Sci USA 105: 16344–16349, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Chen GD. Effects of stimulus duration on responses of neurons in the chinchilla inferior colliculus. Hear Res 122: 142–150, 1998. [DOI] [PubMed] [Google Scholar]
  20. Ehrlich D, Casseday JH, Covey E. Neural tuning to sound duration in the inferior colliculus of the big brown bat, Eptesicus fuscus. J Neurophysiol 77: 2360–2372, 1997. [DOI] [PubMed] [Google Scholar]
  21. Elhilali M, Ma L, Micheyl C, Oxenham AJ, Shamma SA. Temporal coherence in the perceptual organization and cortical representation of auditory scenes. Neuron 61: 317–329, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Faure PA, Fremouw T, Casseday JH, Covey E. Temporal masking reveals properties of sound-evoked inhibition in duration-tuned neurons of the inferior colliculus. J Neurosci 23: 3052–3065, 2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Felix RA 2nd, Fridberger A, Leijon S, Berrebi AS, Magnusson AK. Sound rhythms are encoded by postinhibitory rebound spiking in the superior paraolivary nucleus. J Neurosci 31: 12566–12578, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Felix RA 2nd, Vonderschen K, Berrebi AS, Magnusson AK. Development of on-off spiking in superior paraolivary nucleus neurons of the mouse. J Neurophysiol 109: 2691–2704, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Feng AS, Hall JC, Goller DM. Neural basis of sound pattern recognition in anurans. Prog Neurobiol 34: 313–329, 1990. [DOI] [PubMed] [Google Scholar]
  26. Fishbach A, Nelken I, Yeshurun Y. Auditory edge detection: a neural model for physiological and psychoacoustical responses to amplitude transients. J Neurophysiol 85: 2303–2323, 2001. [DOI] [PubMed] [Google Scholar]
  27. Fishman YI, Steinschneider M. Temporally dynamic frequency tuning of population responses in monkey primary auditory cortex. Hear Res 254: 64–76, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Fishman YI, Steinschneider M. Neural correlates of auditory scene analysis based on inharmonicity in monkey primary auditory cortex. J Neurosci 30: 12480–12494, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Foffani G, Moxon KA. PSTH-based classification of sensory stimuli using ensembles of single neurons. J Neurosci Meth 135: 107–120, 2004. [DOI] [PubMed] [Google Scholar]
  30. Fremouw T, Faure PA, Casseday JH, Covey E. Duration selectivity of neurons in the inferior colliculus of the big brown bat: tolerance to changes in sound level. J Neurophysiol 94: 1869–1878, 2005. [DOI] [PubMed] [Google Scholar]
  31. Fuzessery ZM. Response selectivity for multiple dimensions of frequency sweeps in the pallid bat inferior colliculus. J Neurophysiol 72: 1061–1079, 1994. [DOI] [PubMed] [Google Scholar]
  32. Fuzessery ZM, Hall JC. Sound duration selectivity in the pallid bat inferior colliculus. Hear Res 137: 137–154, 1999. [DOI] [PubMed] [Google Scholar]
  33. Galazyuk AV, Feng AS. Encoding of sound duration by neurons in the auditory cortex of the little brown bat, Myotis lucifugus. J Comp Physiol A 180: 301–311, 1997. [DOI] [PubMed] [Google Scholar]
  34. Ganguli D, Simoncelli EP. Efficient sensory encoding, and Bayesian inference with heterogeneous neural populations. Neural Comput 26: 2103–2134, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Gooler DM, Feng AS. Temporal coding in the frog auditory midbrain: the influence of duration and rise-fall time on the processing of complex amplitude-modulated stimuli. J Neurophysiol 67: 1–22, 1992. [DOI] [PubMed] [Google Scholar]
  36. Gutschalk A, Dykstra AR. Functional imaging of auditory scene analysis. Hear Res 307: 98–110, 2014. [DOI] [PubMed] [Google Scholar]
  37. Gutschalk A, Micheyl C, Melcher JR, Rupp A, Scherg M, Oxenham AJ. Neuromagnetic correlates of streaming in human auditory cortex. J Neurosci 25: 5382–5388, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Hall J, Feng AS. Neural analysis of temporally patterned sounds in the frog's thalamus: processing of pulse duration and pulse repetition rate. Neurosci Lett 63: 215–220, 1986. [DOI] [PubMed] [Google Scholar]
  39. Hansen BJ, Chelaru MI, Dragoi V. Correlated variability in laminar cortical circuits. Neuron 76: 590–602, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. He J, Hashikawa T, Ojima H, Kinouchi Y. Temporal integration and duration tuning in the dorsal zone of cat auditory cortex. J Neurosci 17: 2615–2625, 1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. He J. OFF responses in the auditory thalamus of the guinea pig. J Neurophysiol 88: 2377–2386, 2002. [DOI] [PubMed] [Google Scholar]
  42. Heil P. Coding of temporal onset envelope in the auditory system. Speech Commun 41: 123–134, 2003. [Google Scholar]
  43. Hu G, Wang D. Auditory segmentation based on onset and offset analysis. IEEE Trans Audio Speech Lang Process 15: 396–405, 2007. [Google Scholar]
  44. Ince RA, Panzeri S, Kayser C. Neural codes formed by small and temporally precise populations in auditory cortex. J Neurosci 33: 18277–18287, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Kadner A, Berrebi AS. Encoding of temporal features of auditory stimuli in the medial nucleus of the trapezoid body and superior paraolivary nucleus of the rat. Neuroscience 151: 868–887, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Kadner A, Kulesza RJ Jr, Berrebi AS. Neurons in the medial nucleus of the trapezoid body and superior paraolivary nucleus of the rat may play a role in sound duration coding. J Neurophysiol 95: 1499–1508, 2006. [DOI] [PubMed] [Google Scholar]
  47. Kayser C, Logothetis NK, Panzeri S. Millisecond encoding precision of auditory cortex neurons. Proc Natl Acad Sci USA 107: 16976–16981, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Knüsel P, Wyss R, König P, Verschure PF. Decoding a temporal population code. Neural Comput 16: 2079–2100, 2004. [DOI] [PubMed] [Google Scholar]
  49. Kopp-Scheinpflug C, Tozer AJ, Robinson SW, Tempel BL, Hennig MH, Forsythe ID. The sound of silence: ionic mechanisms encoding sound termination. Neuron 71: 911–925, 2011. [DOI] [PubMed] [Google Scholar]
  50. Kulesza RJ Jr, Spirou GA, Berrebi AS. Physiological response properties of neurons in the superior paraolivary nucleus of the rat. J Neurophysiol 89: 2299–2312, 2003. [DOI] [PubMed] [Google Scholar]
  51. Lin FG, Liu RC. Subset of thin spike cortical neurons preserve the peripheral encoding of stimulus onsets. J Neurophysiol 104: 3588–3599, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Loftus WC, Sutter ML. Spectrotemporal organization of excitatory and inhibitory receptive fields of cat posterior auditory field neurons. J Neurophysiol 86: 475–491, 2001. [DOI] [PubMed] [Google Scholar]
  53. Ma WJ, Beck JM, Latham PE, Pouget A. Bayesian inference with probabilistic population codes. Nat Neurosci 9: 1432–1438, 2006. [DOI] [PubMed] [Google Scholar]
  54. Macías S, Mora EC, Hechavarría JC, Kössl M. Duration tuning in the inferior colliculus of the mustached bat. J Neurophysiol 106: 3119–3128, 2011. [DOI] [PubMed] [Google Scholar]
  55. Malone BJ, Beitel RE, Vollmer M, Heiser MA, Schreiner CE. Spectral context affects temporal processing in awake auditory cortex. J Neurosci 33: 9431–9450, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Malone BJ, Scott BH, Semple MN. Context-dependent adaptive coding of interaural phase disparity in the auditory cortex of awake macaques. J Neurosci 22: 4625–4638, 2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Malone BJ, Scott BH, Semple MN. Dynamic amplitude coding in the auditory cortex of awake rhesus macaques. J Neurophysiol 98: 1451–1474, 2007. [DOI] [PubMed] [Google Scholar]
  58. Malone BJ, Scott BH, Semple MN. Temporal codes for amplitude contrast in auditory cortex. J Neurosci 30: 767–784, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Malone BJ, Scott BH, Semple MN. Encoding frequency contrast in primate auditory cortex. J Neurophysiol 111: 2244–2263, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Marr D. Vision: a Computational Investigation into the Human Representation and Processing of Visual Information. New York: Freeman, 1982. [Google Scholar]
  61. Mejias JF, Longtin A. Optimal heterogeneity for coding in spiking neural networks. Phys Rev Lett 108: 228102, 2012. [DOI] [PubMed] [Google Scholar]
  62. Micheyl C, Carlyon RP, Gutschalk A, Melcher JR, Oxenham AJ, Rauschecker JP, Tian B, Courtenay Wilson E. The role of auditory cortex in the formation of auditory streams. Hear Res 229: 116–131, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Micheyl C, Kreft H, Shamma S, Oxenham AJ. Temporal coherence versus harmonicity in auditory stream formation. J Acoust Soc Am 133: 188–194, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Micheyl C, Schrater PR, Oxenham AJ. Auditory frequency and intensity discrimination explained using a cortical population rate code. PLoS Comput Biol 9: e1003336, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Mora EC, Kössl M. Ambiguities in sound-duration selectivity by neurons in the inferior colliculus of the bat Molossus molossus from Cuba. J Neurophysiol 91: 2215–2226, 2004. [DOI] [PubMed] [Google Scholar]
  66. Moshitch D, Las L, Ulanovsky N, Bar-Yosef O, Nelken I. Responses of neurons in primary auditory cortex (A1) to pure tones in the halothane-anesthetized cat. J Neurophysiol 95: 3756–3769, 2006. [DOI] [PubMed] [Google Scholar]
  67. Narins PM, Capranica RR. Neural adaptations for processing the two-note call of the Puerto Rican treefrog, Eleutherodactylus coqui. Brain Behav Evol 17: 48–66, 1980. [DOI] [PubMed] [Google Scholar]
  68. Osborne LC, Palmer SE, Lisberger SG, Bialek W. The neural basis for combinatorial coding in a cortical population response. J Neurosci 28: 13522–13531, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Padmanabhan K, Urban NN. Intrinsic biophysical diversity decorrelates neuronal firing while increasing information content. Nat Neurosci 10: 1276–1282, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Padmanabhan K, Urban NN. Disrupting information coding via block of 4-AP sensitive potassium channels. J Neurophysiol 112: 1054–1066, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Panzeri S, Ince RA, Diamond ME, Kayser C. Reading spike timing without a clock: intrinsic decoding of spike trains. Philos Trans R Soc Lond B Biol Sci 369: 20120467, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Pelleg-Toiba R, Wollberg Z. Tuning properties of auditory cortex cells in the awake squirrel monkey. Exp Brain Res 74: 353–364, 1989. [DOI] [PubMed] [Google Scholar]
  73. Pérez-González D, Malmierca MS, Moore JM, Hernández O, Covey E. Duration selective neurons in the inferior colliculus of the rat: topographic distribution and relation of duration sensitivity to other response properties. J Neurophysiol 95: 823–836, 2006. [DOI] [PubMed] [Google Scholar]
  74. Pfingst BE, O'Connor TA. Characteristics of neurons in auditory cortex of monkeys performing a simple auditory task. J Neurophysiol 45: 16–34, 1981. [DOI] [PubMed] [Google Scholar]
  75. Phillips DP, Hall SE, Boehnke SE. Central auditory onset responses and temporal asymmetries in auditory perception. Hear Res 167: 192–205, 2002. [DOI] [PubMed] [Google Scholar]
  76. Pinheiro AD, Wu M, Jen PHS. Encoding repetition rate and duration in the inferior colliculus of the big brown bat, Eptesicus fuscus. J Comp Physiol A 169: 69–85, 1991. [DOI] [PubMed] [Google Scholar]
  77. Plack CJ, White LJ. Perceived continuity and pitch perception. J Acoust Soc Am 108: 1162–1169, 2000. [DOI] [PubMed] [Google Scholar]
  78. Pollak GD, Schuller G. Tonotopic organization and encoding features of single units in inferior colliculus of horseshoe bats: functional implications for prey identification. J Neurophysiol 45: 208–226, 1981. [DOI] [PubMed] [Google Scholar]
  79. Potter HD. Patterns of acoustically evoked discharges of neurons in the mesencephalon of the bullfrog. J Neurophysiol 28: 1155–1184, 1965. [DOI] [PubMed] [Google Scholar]
  80. Qin L, Chimoto S, Sakai M, Wang J, Sato Y. Comparison between offset and onset responses of primary auditory cortex ON-OFF neurons in awake cats. J Neurophysiol 97: 3421–3431, 2007. [DOI] [PubMed] [Google Scholar]
  81. Qin L, Kitama T, Chimoto S, Sakayori S, Sato Y. Time course of tonal frequency-response-area of primary auditory cortex neurons in alert cats. Neurosci Res 46: 145–152, 2003. [DOI] [PubMed] [Google Scholar]
  82. Qin L, Liu Y, Wang J, Li S, Sato Y. Neural and behavioral discrimination of sound duration by cats. J Neurosci 29: 15650–15659, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Rammsayer TH. The effects of type of interval, sensory modality, base duration, and psychophysical task on the discrimination of brief time intervals. Atten Percept Psychophys 76: 1185–1196, 2014. [DOI] [PubMed] [Google Scholar]
  84. Recanzone GH. Response profiles of auditory cortical neurons to tones and noise in behaving macaque monkeys. Hear Res 150: 104–118, 2000. [DOI] [PubMed] [Google Scholar]
  85. Recanzone GH, Engle JR, Juarez-Salinas DL. Spatial and temporal processing of single auditory cortical neurons and populations of neurons in the macaque monkey. Hear Res 271: 115–122, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Ringach DL. Population coding under normalization. Vision Res 50: 2223–2232, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Rose GJ. Time computations in anuran auditory systems. Front Physiol 5: 206, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Sadovsky AJ, MacLean JN. Mouse visual neocortex supports multiple stereotyped patterns of microcircuit activity. J Neurosci 34: 7769–7777, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Sayegh R, Aubie B, Faure PA. Duration tuning in the auditory midbrain of echolocating and non-echolocating vertebrates. J Comp Physiol A 197: 571–583, 2011. [DOI] [PubMed] [Google Scholar]
  90. Sayegh R, Casseday JH, Covey E, Faure PA. Monaural and binaural inhibition underlying duration-tuned neurons in the inferior colliculus. J Neurosci 34: 481–492, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Schneider DM, Woolley SM. Discrimination of communication vocalizations by single neurons and groups of neurons in the auditory midbrain. J Neurophysiol 103: 3248–3265, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. Schnupp JW, Hall TM, Kokelaar RF, Ahmed B. Plasticity of temporal pattern codes for vocalization stimuli in primary auditory cortex. J Neurosci 26: 4785–4795, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Scholl B, Gao X, Wehr M. Nonoverlapping sets of synapses drive On responses and Off responses in auditory cortex. Neuron 65: 412–421, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  94. Scott BH, Malone BJ, Semple MN. Effect of behavioral context on representation of a spatial cue in core auditory cortex of awake macaques. J Neurosci 27: 6489–6499, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  95. Scott BH, Malone BJ, Semple MN. Representation of dynamic interaural phase difference in auditory cortex of awake rhesus macaques. J Neurophysiol 101: 1781–1799, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Scott BH, Malone BJ, Semple MN. Transformation of temporal processing across auditory cortex of awake macaques. J Neurophysiol 105: 712–730, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. Seriès P, Latham PE, Pouget A. Tuning curve sharpening for orientation selectivity: Coding efficiency and the impact of correlations. Nat Neurosci 7: 1129–1135, 2004. [DOI] [PubMed] [Google Scholar]
  98. Shamir M, Sompolinsky H. Implications of neuronal diversity on population coding. Neural Comput 18: 1951–1986, 2006. [DOI] [PubMed] [Google Scholar]
  99. Shamma S, Elhilali M, Ma L, Micheyl C, Oxenham AJ, Pressnitzer D, Yin P, Xu Y. Temporal coherence and the streaming of complex sounds. Adv Exp Med Biol 787: 535–543, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  100. Sinnott JM, Owren MJ, Petersen MR. Auditory duration discrimination in Old World monkeys (Macaca, Cercopithecus) and humans. J Acoust Soc Am 82: 465–70, 1987. [DOI] [PubMed] [Google Scholar]
  101. Teki S, Chait M, Kumar S, Shamma S, Griffiths TD. Segregation of complex acoustic scenes based on temporal coherence. Elife 2: e00699, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  102. Teki S, Chait M, Kumar S, von Kriegstein K, Griffiths TD. Brain bases for auditory stimulus-driven figure-ground segregation. J Neurosci 31: 164–171, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  103. Tian B, Kusmierek P, Rauschecker JP. Analogues of simple and complex cells in rhesus monkey auditory cortex. Proc Natl Acad Sci USA 110: 7892–7897, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  104. Tripathy SJ, Padmanabhan K, Gerkin RC, Urban NN. Intermediate intrinsic diversity enhances neural population coding. Proc Natl Acad Sci USA 110: 8248–8253, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  105. Wang J, Qin L, Chimoto S, Tazunoki S, Sato Y. Response characteristics of primary auditory cortex neurons underlying perceptual asymmetry of ramped and damped sounds. Neuroscience 256: 309–321, 2014. [DOI] [PubMed] [Google Scholar]
  106. Wang X, Lu T, Snider RK, Liang L. Sustained firing in auditory cortex evoked by preferred stimuli. Nature 435: 341–346, 2005. [DOI] [PubMed] [Google Scholar]
  107. Wilson EC, Melcher JR, Micheyl C, Gutschalk A, Oxenham AJ. Cortical FMRI activation to sequences of tones alternating in frequency: relationship to perceived rate and streaming. J Neurophysiol 97: 2230–2238, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  108. Yost WA. Auditory image perception and analysis: the basis for hearing. Hear Res 56: 8–18, 1991. [DOI] [PubMed] [Google Scholar]
  109. Zheng Y, Escabí MA. Distinct roles for onset and sustained activity in the neuronal code for temporal periodicity and acoustic envelope shape. J Neurosci 52: 14230–14244, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  110. Zhou Y, Wang X. Cortical processing of dynamic sound envelope transitions. J Neurosci 30: 16741–16754, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Journal of Neurophysiology are provided here courtesy of American Physiological Society

RESOURCES