Skip to main content
Journal of Neurophysiology logoLink to Journal of Neurophysiology
. 2017 May 10;118(2):1034–1054. doi: 10.1152/jn.00152.2017

Background noise exerts diverse effects on the cortical encoding of foreground sounds

B J Malone 1,, Marc A Heiser 2, Ralph E Beitel 1, Christoph E Schreiner 1,3,4
PMCID: PMC5547268  PMID: 28490644

The ability to detect and discriminate sounds in background noise is critical for our ability to communicate. The neural basis of robust perceptual performance in noise is not well understood. We identified neuronal populations in core auditory cortex of squirrel monkeys that differ in how they process foreground signals in background noise and that may contribute to robust signal representation and discrimination in acoustic environments with prominent background noise.

Keywords: auditory, cortex, hearing, neural coding, noise

Abstract

In natural listening conditions, many sounds must be detected and identified in the context of competing sound sources, which function as background noise. Traditionally, noise is thought to degrade the cortical representation of sounds by suppressing responses and increasing response variability. However, recent studies of neural network models and brain slices have shown that background synaptic noise can improve the detection of signals. Because acoustic noise affects the synaptic background activity of cortical networks, it may improve the cortical responses to signals. We used spike train decoding techniques to determine the functional effects of a continuous white noise background on the responses of clusters of neurons in auditory cortex to foreground signals, specifically frequency-modulated sweeps (FMs) of different velocities, directions, and amplitudes. Whereas the addition of noise progressively suppressed the FM responses of some cortical sites in the core fields with decreasing signal-to-noise ratios (SNRs), the stimulus representation remained robust or was even significantly enhanced at specific SNRs in many others. Even though the background noise level was typically not explicitly encoded in cortical responses, significant information about noise context could be decoded from cortical responses on the basis of how the neural representation of the foreground sweeps was affected. These findings demonstrate significant diversity in signal in noise processing even within the core auditory fields that could support noise-robust hearing across a wide range of listening conditions.

NEW & NOTEWORTHY The ability to detect and discriminate sounds in background noise is critical for our ability to communicate. The neural basis of robust perceptual performance in noise is not well understood. We identified neuronal populations in core auditory cortex of squirrel monkeys that differ in how they process foreground signals in background noise and that may contribute to robust signal representation and discrimination in acoustic environments with prominent background noise.


the central auditory system must typically process behaviorally relevant sounds in the presence of competing sound sources, which act as background noise for foreground signals of interest. The presence of confounding noise sources can reduce the ability of humans and other animals to discriminate between sounds (e.g., Miller et al. 1951; Miller 1974). Temporal cortex lesions render patients particularly susceptible to the effects of noise on auditory perceptual performance (Olsen et al. 1975), suggesting that the cortex plays a central role in the processing of sounds in noise. The underlying neural mechanisms remain unclear, despite an increasing research focus on this fundamental problem (Bar-Yosef and Nelken 2007; Hulse et al. 1997; Mesgarani et al. 2014; Moore et al. 2013; Nagarajan et al. 2002; Narayan et al. 2007; Rabinowitz et al. 2013; Schneider and Woolley 2013; Teschner et al. 2016).

Background noise produces a marked reduction in synchronization of AI neurons to phrases of primate vocalizations (Nagarajan et al. 2002). Background sounds also degrade neuronal responses to birdsong in the avian homolog of primary auditory cortex (AI) in a manner consistent with decrements in behavioral discrimination performance (Narayan et al. 2007). Nevertheless, central representations of acoustic signals can be surprisingly robust to the presence of noise. Relative to neurons in more peripheral structures, central neurons have been shown to preferentially encode particular classes of complex sounds at the expense of background noise, resulting in neural representations that have been described as “noise invariant” (Moore et al. 2013; Rabinowitz et al. 2013).

Studies of AI responses to tones have emphasized how such neurons adapt to the presence of background noise (Ehret and Schreiner 2000; Phillips and Cynader 1985; Phillips et al. 1985; Phillips 1990; Sadagopan and Wang 2008), where it has been reported that response thresholds and latencies increased linearly with the noise masker level, and rate-level functions shifted toward higher sound levels. These data suggest that cortical neurons shift their operating characteristics to match the background context in which signals occur. The essential question for our purposes is how background noise-mediated shifts in responsivity affect the transmission and representation of spectrotemporally distinct foreground signals. We investigated this issue for a class of relatively simple dynamic sounds, frequency-modulated (FM) sweeps, that resemble the FM components of twitter calls, an important class of vocalization with behavioral relevance to squirrel monkeys. FM sweeps were presented against backgrounds of continuous white noise at varying signal-to-noise ratios (SNRs) while cortical responses were recorded from awake squirrel monkeys. We demonstrate that cortical neurons exhibit diverse response behaviors for different SNRs. One group showed a monotonic decrease of the foreground response with decreasing SNR, whereas for another group, the presence of noise at moderate SNRs often improved the representation of the FM sweeps as assessed by the performance of a stimulus classifier based on linear decoding of cortical responses. This suggests that some cortical neurons are tuned for signal processing in specific noisy conditions and could contribute to a robust population representation that maintains discrimination abilities over a wide range of SNRs. Because speech signals commonly occur in background noise, the data presented have significant implications for understanding the cortical mechanisms of speech processing in challenging acoustic environments.

METHODS

Surgical Preparation

All procedures related to the maintenance and use of animals in this study were approved by the Institutional Animal Care and Use Committee of the University of California, San Francisco (UCSF) and followed guidelines of the National Institutes of Health for the care and use of laboratory animals. The methodological details for these experiments have been described in a previous report (Malone et al. 2013) but are briefly recapitulated here. Two adult female squirrel monkeys (Saimiri sciureus) participated in these experiments. The animals were group housed in custom-built cages located within a temperature-controlled facility. They were provided a nutritionally complete, species-appropriate diet, including ad libitum access to water, supplemented by additional enrichment by the Laboratory Animal Resource Center (LARC) and preferred treats from the experimenters. The animals’ welfare was assessed daily by LARC staff and during and after each recording session by the experimenters. When necessary, interventions were discussed in consultation with UCSF veterinarians and administered by LARC and the experimenters.

Monkeys were trained to sit quietly in a restraint chair. Animals were then implanted with head posts to allow for head fixation during physiological recording. During all surgical procedures, anesthesia was induced with ketamine (25mg/kg im) and midazolam (0.1 mg/kg), and the animals were maintained in a steady plane of anesthesia using isoflurane gas (0.5–5%). Implants were secured to the skull using bone screws and dental acrylic. After animals were trained to sit in the primate chair with their head fixed to a frame, they underwent a second surgery to implant a recording chamber over auditory cortex. The temporal muscle was resected, the cranium overlying auditory cortex was exposed, and a 10-mm-diameter ring was secured using bone screws and dental acrylic. Perioperative pain management included local application of bupivacaine, as well as buprenorphine (0.01–0.03 mg/kg) and meloxicam (0.3 mg/kg) as needed, and in consultation with veterinary staff in the LARC.

Sterile procedures were used to expose and record from auditory cortex. A 2- to 3-mm burr hole was drilled and a small incision made in the dura after application of a drop of 1% lidocaine. After several recording sessions in a burr hole, another burr hole was drilled and the recording process was repeated. Burr holes were sometimes enlarged or connected by removing bone following application of lidocaine as needed to expose additional areas of auditory cortex. After each recording session, the chamber was filled with antibiotic ointment and sealed with a metal cap.

Electrophysiology

All recordings were made in a soundproof chamber (Industrial Acoustics, Bronx, NY). During each recording session, the animal was seated comfortably in a custom-built primate chair with its head fixed to a frame while stimuli were presented. Animals were continuously monitored on a closed circuit camera throughout the recording session to ensure that they did not exhibit signs of distress or changes in alertness. Recording sessions typically lasted ~2–3 h.

Data were obtained using 16-channel linear electrodes (177-µm2 contact size, 100- or 150-µm spacing) from NeuroNexus Technologies (Ann Arbor, MI). An electrode was advanced into cortex using a microdrive (David Kopf Instruments, Tujunga, CA) to the depth at which most channels were active (tip depth of ~1–2 mm from the depth of first spontaneous activity identified audiovisually). We attempted to orient all penetrations perpendicularly to the surface of the exposed cortex, but it was not always possible to achieve electrode orientations orthogonal to the cortical surface in some recording locations. The fact that the penetrations are not orthogonal to the surface of the cortex means that the depth of the recordings cannot be related in a straightforward way to the laminar organization. Changes in best frequency across the length of the electrodes suggest that different penetrations cover significantly different “horizontal” distances in the cortex, as judged from the changes in the best frequency. This frustrates analyses relating neural response properties (e.g., sensitivity to background noise level) to laminar location.

Electrical signals from the brain were amplified using a 16-channel pre-amplifier (RA16 Medusa; Tucker-Davis Technologies, Alachua, FL), bandpass filtered (600–7,000 Hz), and recorded using an RX-5 amplifier and Brainware software (Tucker-Davis Technologies) on a personal computer. Brainware was used for online estimation of neural responsiveness and tuning, and raw waveforms were sampled (25 kHz).

In the squirrel monkey, the core auditory fields (primary auditory cortex, AI; field R, R) are located on the surfaces of the temporal gyrus and in the supratemporal plane of the lateral sulcus. The location of our recordings within auditory cortex was determined physiologically by the characteristics of core auditory neurons including vigorous pure tone responses, short response latencies, and a tonotopic gradient in the rostrocaudal dimension (Cheung et al. 2001; Malone et al. 2013, 2015b; Scott et al. 2011).

Sets of trials for a given SNR were presented in blocks of “runs.” As a result, the entire duration of the recording session could be quite long (>1 h). For this reason, we report multiunit responses, because tracking individual spike waveforms over such long intervals proved impractical (Malone et al. 2015b). We defined multiunit activity at a given recording site as the time points where the filtered voltage waveform exhibited positive voltage deflections greater than 3.5 SD of its amplitude distribution. By collecting the data in this way, we maximized our ability to compare responses across the six different background noise conditions typically used to vary the SNR at each site (−20, −15, −10, −5, 0, and +∞ dB; i.e., without overt noise).

Frequency-Modulated Sweeps

Logarithmic FM sweeps (Fig. 1) were created in MATLAB (The MathWorks, Natick, MA) and delivered via an RP2.1 device (Tucker-Davis Technologies). FM sweep frequency ranged from 50 to 21,000 Hz in upward and downward directions with rates of frequency change of 5, 10, 20, 35, 60, 85, and 110 octaves/s. Corresponding sweep lengths were the following: 1743, 871, 436, 249, 145, 103, and 79 ms. Because the rate 5 octaves/s was not included in all experiments, we confine the bulk of our analyses to the six sweep speeds ≥10 octaves/s in both the ascending and descending directions. This allows decoding results across different electrode penetrations to be compared directly.

Fig. 1.

Fig. 1.

Frequency-modulated (FM) sweep stimuli. A: spectrogram of a squirrel monkey twitter call containing FM harmonic sweeps. White curves added to the spectrogram show logarithmic FM sweeps of different velocities (52, 30, and 17 octaves/s), illustrating the overlap of the chosen FM parameter range with the range of frequency modulation rates present in squirrel monkey vocalizations. B: graph showing the upward and downward FM sweep set, respectively, as functions of time and frequency. At the beginning and end of each sweep is a constant-frequency period of 100 ms.

Each FM sweep contained a 100-ms constant tone at the start and end frequencies. These constant frequency portions had a 5-ms cosine ramp up and ramp down at the beginning and end of the stimulus. FM sweep stimuli were most often played at 60 dB sound-pressure level (SPL). Slight energy-matching compensations in sweep level were made for the different sweep velocities such that sweeps slower than 35 octaves/s were attenuated by an additional 1–3 dB and those faster than 35 octaves/s were attenuated less by 1 dB. Although the noise was continuous, data were collected in trials lasting 2,500 ms. Because the noise was presented continuously within each block, however, repetitions of identical sweeps occurred in different noise contexts.

Each stimulus was repeated 20 times in pseudorandom order with a minimum intertrial interval of 500 ms. In some experiments, FM sweep levels (dB SPL) were also varied for the ascending sweep at 35 octaves/s as follows: 20, 30, 40, 50, 60, 71, and 85 dB SPL (as measured at the position of the external ear). Sweep level was varied in this way for 181 of 321 recording sites, and changes in both sweep level (dB) and trajectory (ID) were pseudorandomly interleaved.

The FM sweeps were presented in silence and in the presence of continuous white noise at different SNRs. We refer to the presentation of the sweeps in silence as the SNR+∞ condition. Different SNR conditions were presented in blocks, and within each block the presentation order of the different sweeps was pseudorandom, as noted above. Order of the SNR conditions was varied across penetrations to reduce order effects related to changes in recording conditions over time. The tested SNRs were 0, −5, −10, −15, and −20 dB, which corresponded to noise levels of 60, 65, 70, 75, and 80 dB SPL for sweep levels of 60 dB. More rarely, SNRs of −25 and −30 dB were used, but these values were excluded in most analyses to facilitate direct comparisons across sites. Because the amplitudes of the sweeps were fixed in the context of sweep trajectory discrimination, SNR condition and background noise level are jointly determined, and the terms can be used interchangeably.

One FM sweep condition (the ascending 35 octaves/s sweep) was also tested at several stimulus levels (20, 30, 40, 50, 60, 71, and 85 dB). The studied absolute background noise levels for this condition were the same as for the full FM sweep (60, 65, 70, 75, and 80 dB SPL). These values correspond to the tested SNRs for the sweep trajectory stimulus set for the 60-dB sweep amplitude. Because the SNR varies for different sweeps in the context of sweep amplitude discrimination, we refer exclusively to background noise level when describing the results.

Stimulus Delivery and Measurement of Response Areas

All sounds in this study were presented using a free-field speaker (Sony SS-MB150H) placed directly in front of the animal. Distance from the front of the speaker to the interaural line was 40 cm. The sound delivery system was calibrated using a sound-level meter and SigCal software (Tucker-Davis Technologies). Levels were measured using a Brüel & Kjær (Norcross, GA) model 2209 meter with an A-weighted decibel filter and a model 4192 microphone.

As in prior studies (Malone et al. 2013, 2015b), the frequency-response areas (FRAs) of AI neurons were determined by presenting tone bursts of varying frequency and intensity generated using SigGen software (Tucker-Davis Technologies) and played in random order. Tone frequencies ranged from 1 to 4 octaves (semitone or tone spacing) centered on the estimated center frequency of the site under study. Tones had 5 ms on and off cosine-squared ramps.

Spike counts were calculated for the duration of the tone pips, and the spontaneous rates were calculated over a similar duration toward the end of each trial. The duration of the tone pips varied across experiments (50, 100, and 500 ms), and the interval used to calculate the spontaneous rates began after all responses to the tones had ceased. Spike rates less than 5 SD above the spontaneous rate were set to zero. We collapsed the response area matrix across stimulus level to generate the frequency tuning function (FTF) and identified the peak as the best frequency (BF). To verify that the FTF represented significant tuning, we compared the variance of the actual FTF to variances computed for simulated FTF obtained by random columnwise (i.e., frequency) reassignment of the spike rates in the response area matrix. BF estimates were used only when the likelihood of the actual variance was <0.001 (i.e., fewer than one simulated FTF in a thousand resulted in a larger variance than was observed in the actual data). In a few cases where objective frequency tuning data were unavailable, the BF was estimated on the basis of online plotting of the response area (Brainware) during the experiment. We verified that these online BF estimates and those obtained by the procedure described above produced essentially identical results when the data were available and the responses to pure tones were robust.

Spike Train Classification and Stimulus Decoding

We used a nearest-neighbor linear decoder (Foffani and Moxon 2004) to estimate which FM condition elicited the response on each trial. Details for this procedure have been provided in previous reports using the same methodology (Malone et al. 2007; 2010, 2013, 2014, 2015a, 2015b; Teschner et al. 2016). Briefly, responses to each stimulus were averaged across trials to form a response template for that stimulus and then binned to form a bin-dimensional vector representing the response across time. Individual trials were similarly binned and compared with the templates by computing the Euclidean distance between the trial and the template vectors, and the stimulus associated with the nearest template in the response space was identified. We used a complete cross validation procedure that ensured that no trial was compared with a template that included the response from that trial. This analysis generates a confusion matrix whose columns represent the actual stimulus and whose rows represent the stimulus selected by the decoding algorithm. Each cell in the matrix indicates the number of times (out of 20, for 20 trials) that a spike train elicited by the actual stimulus (column) was assigned to each estimated stimulus (row). Correctly identified trials result in matrix entries along the diagonal, and the sum of the diagonal entries divided by the total number of trials. We refer to the percentage of trials whose responses were correctly assigned to the stimulus that elicited them as the decoding accuracy.

We report the percentage of correctly classified trials for the binning resolution (2, 5, 10, 20, 50, 100, and 2,000 ms) that maximized decoder performance, except when the distribution of optimal bin sizes is the focus of the analysis (e.g., see Fig. 11). It is important to compute decoding performance at a range of bin sizes because cortical neurons vary in their temporal precision and firing rates, and usage of binning resolutions that are too narrow or too wide can result in underestimates of the information provided about the stimuli by cortical spiking patterns.

Fig. 11.

Fig. 11.

Distributions of the optimal temporal resolution for FM sweep trajectory and amplitude decoding across SNR/noise level. A: each entry in the matrix indicates how often a SNR condition (rows) coincided with a particular values for the optimal bin width (columns). Open bars above the matrix indicate the marginal distribution of optimal binwidths. B: each entry in the matrix is a z score indicating the likelihood that the corresponding matrix entry in A would have occurred given the marginal distribution of optimal binwidths but a random association of optimal binwidths and SNR conditions. C and D: these matrices shows results analogous to those in A and B for the decoding of sweep amplitude across different noise level conditions. Sweep ID data from −20 to ∞ dB (i.e., matrix rows) were obtained at 307, 273, 307, 295, 316, and 321 sites, respectively (A and C). Sweep dB data from 80 to −20 dB were obtained at 167, 181, 167, 181, 181, and 181 sites, respectively (B and D).

Because the duration of the analysis interval was 2,000 ms, binning the responses at 2,000 ms (i.e., in a single bin) results in a decoder that relies entirely on average firing rate information, which we term the rate classifier. The analysis interval was chosen to be sufficiently long to include the entire duration of all FM sweeps, as well as after-discharges elicited by the offset of the sweeps. It is also possible to remove differences in average firing rates across stimuli while retaining information about differences in how spikes are distributed in time by normalizing each test and each template by its respective vector norm. As a result, responses to different stimuli are mapped to an equivalent distance from the origin in the response space. We refer to this as the phase classifier.

We assigned significance values for decoding accuracy by comparing the actual confusion matrices obtained for each SNR condition and site to a pregenerated set of simulated confusion matrices. Simulated confusion matrices of the appropriate size were populated by random draws from a uniform distribution spanning the number of elements in the stimulus set (e.g., 12 for sweep trajectory, 7 for sweep amplitude). The number of random draws for each stimulus was equal to the number of experimental trials, so each column of the simulated confusion matrix summed to the number of trials (i.e., n = 20). The decoding accuracy for each genuine confusion matrix was compared against the distribution of simulated decoding accuracies, and P values were assigned by counting the fraction of simulated accuracies that exceeded the actual accuracy, divided by the total number of simulated values (n = 100,000). When results that involved confusion matrices of different sizes were compared, classifier performance was standardized as z scores relative to the distributions obtained by simulating confusion matrices of the appropriate size.

When computing significance for results from the phase and full spike train classifiers at the optimal bin size, we corrected for a potential bias introduced by taking the maximum value across multiple bin sizes by taking the maximum value from an equivalent number of draws from the distribution of decoding performance for the simulated confusion matrices. Thus the significance criterion (P < 0.0001) is 15.4% for the rate classifier for the set of 12 sweeps (chance = 8.3%) and 16.3 for the full spike train and phase classifiers. For the seven sweep-tested sweep levels (chance = 14.3%), the analogous values are 25.7%, and 27.2%, respectively.

To assess whether changes in the SNR resulted in discriminable response changes, we generated composite confusion matrices (CCMs) that combined the data across multiple SNRs. This allows for estimating errors based on misidentification of the SNR, as well as errors based on misidentification of the sweep trajectory, or sweep level.

Definition of SNR response classes.

We defined three different FM response classes for decreasing SNRs. To determine whether a given channel showed a “monotonic,” “nonmonotonic,” or “equivalent” change in FM response, we compared the decoding performance in each SNR condition with noise to the decoding performance for the SNR+∞ condition (see methods). We performed Monte Carlo tests for significance by segregating the 20 trials recorded during the experiments into 2 sets of 10 trials and computed decoding performance for the split (10 trial) data separately. We repeated this process 20 times and then computed d′ for the difference in decoding performance we observed across shuffles. We repeated this process 10 times for every SNR condition and channel in the data set. The resulting distribution of d′ values is thus reflective of the magnitude of sampling error to be expected given the trial-to-trial variability in the data overall.

We generated d′ estimates with identical procedures (i.e., using 10 trials and 20 shuffles) for each SNR condition referenced to the SNR+∞ condition. Because the likelihood of a d′ value with an absolute value greater than 2.1415 was less than 0.00006, decoding functions with a d′ value greater than 2.1415 were considered significantly nonmonotonic, decoding functions with d′ values less than −2.1415 for all SNR conditions ≤0 dB were considered monotonic, and decoding functions that contained at least one d′ value greater than −2.1415 but no values greater than 2.1415 were defined as equivalent. All analyses were conducted at the optimal temporal resolution for the given SNR condition.

Computing expected ratios of SNR condition “confusion errors.”

The essential question is whether the proportion of such errors is less than that expected by chance (see methods); if so, this indicates that the SNR condition shapes rather than gates the cortical representation of the sweeps. We compared the ratios of correctly assigned trials with trials assigned to the proper sweep trajectory but not to the proper SNR condition. The expected ratios vary as a function of the size of the composite confusion matrices (e.g., for 2 conditions, the expected ratio for random assignment is 1:1; for 3 conditions, the expected ratio is 1:2, etc.). The CCM sizes varied because the number of tested SNR conditions varied across recording sites. We obtained distributions of the expected ratios by generating confusion matrices via random assignment. Note that although increases in decoding performance will increase the distribution of matrix entries on both the main and minor diagonals relative to nondiagonal entries, the ratios do not change. We then compared the actual ratio of main diagonal entries to minor diagonal entries with the distribution of ratios obtained via Monte Carlo simulations and assigned a P value according to the number of simulated ratios that exceeded the actual ratio divided the shuffle count (n = 100,000).

Assigning significance for shifts in the optimal temporal resolution for decoding.

As mentioned above, analysis was performed at seven different temporal binwidths. We ordered the distributions of optimal binwidths for decoding stimulus identity for each of the tested SNRs into a single matrix (see Fig. 10). For the analysis of temporal resolution, we eliminated the final column corresponding to a rate code, leaving a 6 × 6 matrix. If the optimal temporal resolution for decoding shifted to larger bins for less favorable SNRs, we would expect that matrix entries would cluster in the lower left and upper right quadrants, rather than the upper left and lower right quadrants. We summed the matrix entries in each pair of quadrants and computed the difference between sums as the test statistic for a permutation test. We generated 100,000 similar matrices by scrambling the SNR condition labels assigned to each optimal bin width such that each simulated matrix preserved the marginal distributions of optimal bin width and SNR condition, but not the relationship between them, and computed the test statistic. We assigned a significance value by counting the number of times the actual value for the test statistic exceeded the values obtained for simulated matrices and dividing by the number of iterations (i.e., 100,000).

Fig. 10.

Fig. 10.

Noise-evoked firing rates and SNR decoding performance. A: waterfall plot of the noise-evoked firing rate for all tested SNR conditions. Firing rate was measured in the interval 2–2.5 s after trial onset, i.e., during a portion of the continuous noise stimulus but excluding responses to the FM sweeps. B: SNR decoding performance (z scored) excluding (trial interval 2–2.5 s) vs. including (trial interval 0–2 s) responses to the FM sweeps (n = 321 sites).

All data analysis was performed using MATLAB (The MathWorks). When analyzing population distributions of continuous variables, we compared those distributions via nonparametric Wilcoxon rank-sum tests unless otherwise stated. Correlations were quantified in terms of the Pearson product-moment coefficient.

RESULTS

Summary of Data Sample

The data in this report are derived from clusters of neurons recorded on 321 distinct channels during 28 penetrations using linear 16-channel probes. Penetrations were made into core auditory cortex located in three hemispheres of two alert adult squirrel monkeys. The data described in this report were obtained as part of a series of neurophysiological recordings described in prior publications (Malone et al. 2013, 2015b). We included all data when we were able to obtain responses for 20 repeated trials to each stimulus in the set of FM sweeps (6 velocities, 2 sweep directions) across a 20-dB span of noise levels and SNRs in addition to FM sweep responses in the absence of noise (SNR+∞). This simplified the analytical procedures and ensured that the decoding algorithms could be universally applied without the need for adjustment due to missing trials or stimulus conditions.

Examples of Cortical Responses to FM Sweeps

The spectrogram in Fig. 1A represents a segment of a squirrel monkey twitter call and indicates the prominent FM components of such calls. Examples of logarithmic FM sweeps (white curves) have been added to provide context for the velocity of the FM components of the call. These curves demonstrate that the range of sweep speeds we presented are reasonably well matched to the modulation contours of twitter calls. Figure 1 illustrates the frequency trajectories of the ascending (Fig. 1B, top) and descending (Fig. 1B, bottom) logarithmic FM sweeps.

For ease of nomenclature, we refer to the decoding of different sweep speeds in the ascending and descending directions as decoding of sweep “trajectory.” This is to be distinguished from the decoding of sweep “amplitude,” where one of the sweeps (the ascending sweep at 35 octaves/s) was presented at different attenuation levels (see methods). In this case, the temporal features of the sweeps are identical, but the intensity varies. For notational convenience, we sometimes refer to sweep trajectory as “sweep ID” and to sweep amplitude as “sweep dB.”

Figure 2 shows sets of peristimulus time histograms (PSTHs) for two clusters of neurons whose spiking patterns provided the best (Fig. 2A) and median (Fig. 2B) decoding of sweep trajectory. Each set of PSTHs is shown at the temporal resolution that optimized decoder performance (10 and 20 ms, respectively). The 12 × 12 confusion matrices for each example (Fig. 2, A and B, insets) represent the decoding performance for the six up and six down sweeps (each row and column reflects first the 6 up sweeps from slow to fast and then the 6 down sweeps from slow to fast; for color scale, see Fig. 2 legend). A confusion matrix indicates how often individual spike trains were associated with each of the presented stimuli. For the best channel, 97.5% of the individual spike trains were correctly associated with the sweep that elicited them, as indicated by the high values along the diagonal of the matrix. Only the two fastest descending sweeps (at the bottom right corner of the confusion matrix) were sometimes confused for each other; this is sensible in the context of the similarity of their PSTHs (the bottom 2 PSTHs in the right column of Fig. 2A). For the median channel, by contrast, the diagonal only shows higher values (i.e., correct decoding) for the four slowest ascending sweeps and the slowest descending sweep.

Fig. 2.

Fig. 2.

Examples of multiunit cortical responses to velocity and direction of FM sweeps. A: PSTHs of all FMs at the site with best sweep identity (ID) decoding performance. B: PSTHs at a site with median sweep ID decoding performance. Colored curves represent the sweep trajectories (see Fig. 1) aligned to the responses. The confusion matrices (see methods) inset at right of each set of PSTHs indicate how successfully the spike train decoder assigned trials to the stimuli that elicited them. The confusion matrices are ordered similarly to the PSTHs: upward sweeps in order of increasing velocity and then downward sweeps in order of increasing velocity appear from top to bottom for the rows and from left to right across the columns. Correctly assigned trials appear along the diagonal, and the number of correctly decoded trials is indicated by color as indicated by the color bar to the right of each matrix. Each column sums to 20, the number of stimulus repetitions.

If we compare the structure of the confusion matrix to the appearance of the PSTHs, it is evident that the FM sweeps that could not be successfully decoded correspond with the absence of clear peaks in the PSTHs. This highlights the most salient feature of this data set: the cortical representation of FM sweeps consisted of prominent PSTH peaks above a relatively low background firing rate.

Figure 3 shows sets of PSTHs obtained when the sweep trajectory was held constant (ascending sweeps at 35 octaves/s) but the sweep amplitude was varied over a large range (65 dB). As in Fig. 2, we show examples of the best decoding performance in the sample (Fig. 3A) and an example of median decoding performance (Fig. 3B). The best performance was obtained in the no-noise condition, whereas the selected example for median performance was obtained at a noise level of 60 dB SPL.

Fig. 3.

Fig. 3.

Examples of multiunit cortical responses to FM sweep level. A: PSTHs of FM sweeps at 7 different levels (dB SPL) at the site with best level decoding performance. B: PSTHs at a site with median level decoding performance. A single sweep trajectory (upward at 35 octaves/s) was used, and its trajectory is superimposed on the PSTHs. Confusion matrices indicating decoding performance are inset above the PSTHs in A and B. Data in A and B were obtained at background noise levels of −20 and 60 dB, respectively.

Although the sweep trajectory is constant, changes in stimulus amplitude produced obvious changes in the shapes of the PSTHs. Responses to the same sweep at different amplitudes are not simply scaled versions of one another; for example, the loudest sweep elicited a unique dual-peaked PSTH for the median channel.

Spike Timing Information is Critical for Decoding FM Sweep Trajectory and Amplitude

As in prior work (Malone et al. 2007, 2010, 2013, 2014, 2015a, 2015b), we used linear spike train decoders (see methods) to compare information about the stimulus contained in single-trial spike trains (the full spike train classifier), trial-averaged firing rate (the rate classifier), and rate-normalized temporal spiking patterns (the phase classifier). Figure 4 compares the performance of the full spike train, phase-only, and rate-only spike train classifiers for the decoding of sweep trajectory (Fig. 4A) and amplitude (Fig. 4B). The data are shown for all tested SNR conditions and all cell clusters (sweep ID: n = 1,835 from 321 sites; sweep dB: n = 1,058 from 181 sites). Figure 4, A–C, compares results obtained using the bin size that maximized decoding accuracy (panels at left) and a single fixed bin size of 10 ms (panels at right) to demonstrate that the effects of selecting the optimal bin size for analysis were negligible. In what follows, we focus our statistical analyses on results obtained at the optimal bin size, but the results at a fixed bin width (10 ms) were essentially the same.

Fig. 4.

Fig. 4.

FM decoding performance for the full spike train (FST), phase-only, and rate-only decoding algorithms. Results are shown for all tested SNRs, so a single site may contribute as many as 6 distinct values to each panel. A: performance of the phase-only (blue) and rate-only (black) decoders for FM sweep identity as a function of the FST decoder performance for all sites and SNR conditions (n = 1,835 for 321 sites). B: performance of the phase-only (red) and rate-only (black) decoders for FM level as a function of performance of the FST performance for all available sites and SNR conditions (n = 1,058 for 181 sites). C: comparison of FM level vs. FM identity decoding performance (n = 1,058 for 181 sites) for the FST decoder (purple), phase-only decoder (green), and rate-only decoder (black). The decoding performances were z scored to account for the differing number of conditions (7 for FM level decoding, 12 for FM identity decoding). In A–C, panels at left show the results obtained at the optimal bin size, and panels at right show the results obtained for 10-ms bins. In A and B, the dark gray box near the origin demarcates chance performance, and the light gray box demarcates the P < 0.0001 statistical criterion. The larger light gray boxes in panels at left are adjusted for multiple comparisons for the FST and phase-only decoders, whereas the smaller light gray boxes in panels at right apply to the rate-only decoders and the FST and phase-only decoders at a single bin size (e.g., 10 ms).

First, we plotted decoding accuracy obtained with the phase decoder (blue circles) and the rate decoder (black circles) against the performance of the full spike train decoder (Fig. 4A). As is obvious, successful sweep ID decoding relied on spike timing; in the absence of spike timing information, the rate classifier performed essentially at chance levels, with a median value of 8.75%. Chance performance (indicated by the smaller, darker gray box) for a 12-stimulus set is ~8.33%. For the phase decoder, by contrast, the majority of points lie above the unity line, indicating that the removal of average firing rate information (via the normalization process that defines the phase classifier) modestly but significantly improved decoding performance (Wilcoxon signed rank; P < 10−290). Median decoding performance for the full spike train and phase decoders was 28.0% and 31.25%, respectively. The improvements wrought by rate normalization were modest (mean and median of 2.1% and 1.7% respectively) but very consistent across clusters. These results indicate that trial-to-trial fluctuations in firing rate perturbed rather than aided sweep ID discrimination. Thus differences in the timing of responses to different stimuli supported more effective decoding than differences in the overall strength of the responses.

The long durations of the analysis window relative to the relatively short durations of the faster FM sweeps likely contribute to this result. However, it should be emphasized that without providing information about the sweep trajectory and the cluster BF, information that is effectively provided by measuring firing rates in a narrow window centered by the experimenter on the largest response during the sweep, discrimination of FM sweeps of variable speeds and durations based solely on firing rate information averaged over a long window is expected to be difficult.

Results for the decoding of sweep amplitude (dB) are shown in Fig. 4B. Despite the difference in the decoding task, the pattern of results was highly similar. Median decoding performance for the full spike train and phase classifiers was 29.3% and 31.4%, respectively, and the phase classifier significantly outperformed the full spike train classifier modestly but consistently (by median and mean values of 1.4% and 1.6%, respectively; Wilcoxon signed rank: P < 10−54). Performance of the rate decoder was near chance levels (~14.3%), with a median value of 15.7%.

The pattern of results was consistent across the two decoding tasks. For sweeps of different velocities and directions, the intersection of the sweep and the cluster BF will occur at different times. Thus we should expect that the timing of the neural responses to different sweeps should suffice to discriminate them. For sweep amplitude, however, this is not the case: sweep timing is constant, although the change in sweep amplitude may change how the sweeps interact with the site’s response area. For example, the response of a neuron whose bandwidth changes at higher stimulus levels might extend for a longer duration for louder sweeps. Detailed differences in the temporal response patterns discriminated sweep amplitude more effectively than differences in response magnitude (e.g., Fig. 3). Even so, the median ratio of decoder performance for the rate and full spike train classifiers was 51.5% for sweep level but only 28.3% for sweep trajectory, indicating a significantly (P < 10−81) more prominent role for average spike rate information in sweep level decoding.

Comparison of Fig. 4, A and B, suggests that decoding performance was significantly better for sweep trajectory than for sweep amplitude. Given the difference in the sizes of the stimulus sets (12 for sweep ID; 7 for sweep dB), however, we recomputed decoding performance as z scores relative to the distributions of expected decoding performance obtained with Monte Carlo techniques (see methods). Results of this analysis are shown in Fig. 4C for the full spike train (purple circles), phase (green circles), and rate decoders (gray circles). Median decoding performance (full decoder) for sweep ID (11.0) was significantly higher than for sweep dB (5.1) whether all the data were used (Wilcoxon rank sum; P < 10−54) or were restricted to clusters tested for both tasks (Wilcoxon signed rank; P < 10−81). There was also a highly significant correlation between decoding performance for ID and dB stimulus sets for the full spike train (r = 0.73; P < 10−173) and phase classifiers (r = 0.72; P < 10−169). This correlation was much weaker for the rate decoder (r = 0.13; P < 0.0001).

In some cases, such as the fastest descending sweeps for the best channel (see Fig. 2A), the PSTH peaks were present, but their timing was highly similar, and thus not reliably discriminable. This pattern of results is expected, because the point in time when the sweeps of different velocities intersect a fixed frequency value varies less for the faster sweeps than for the slower sweeps (Fig. 1, B and C). Importantly, when considering how to discriminate sweeps that differ in direction (ascending or descending), the difference (in ms) at which two sweeps intersect a given frequency depends on that frequency. In fact, for each pair of sweeps at a given speed, there is a point where the curves representing the “instantaneous” frequency of the sweeps intersect, suggesting that the expected pattern of confusion errors for a given neuron will depend on its tuning for frequency when spike timing is used for decoding.

In conclusion, we have demonstrated that spike train classification methods can robustly reveal encoding of FM sweep trajectory and amplitude, especially when considering the temporal information in spike trains. Next, we consider the effects of background noise on cortical FM encoding in more detail.

Noise-Tolerant and Noise-Tuned Cortical Responses Were Surprisingly Common

Examination of Fig. 4 indicates that decoding performance varied widely across recording sites. We now focus on how decoding accuracy varies with the background noise level. The response PSTHs shown in Fig. 5, A and B, show essentially monotonic improvements in decoding performance with increases in SNR. To illustrate the PSTHs more compactly, responses to each set of six sweep velocities are superimposed using the color scheme for sweep velocity introduced in Fig. 1. In Fig. 5, panels at the top of each pair of composite PSTHs represents responses to the up sweeps; panels at the bottom represent responses to the Down sweeps. A line plot illustrating decoding accuracy as a function of SNR is included below each column of PSTHs. The cluster represented in Fig. 5A showed dramatic response and decoding improvement in the complete absence of noise and relatively poor performance even at the most favorable tested SNR (e.g., 0 dB). The cluster represented in Fig. 5B, however, showed substantial noise tolerance by encoding FM sweeps effectively at SNRs as low as −15 dB and exhibited saturating improvements at more favorable SNRs such that an SNR of 0 dB was nominally optimal.

Fig. 5.

Fig. 5.

Examples of FM sweep responses at different signal-to-noise ratios (SNRs). Columns in A–E show the PSTHs of FM sweep responses at 6 different SNRs (indicated at left) binned at a temporal resolution of 5 ms. For each SNR, 2 PSTHs are shown, illustrating superimposed responses to 6 FM speeds indicated by the same color code introduced in Fig. 1. For each pair, the top PSTH depicts responses to ascending (up) sweeps, and the bottom PSTH, to descending (down) sweeps. The numerical values sandwiched between the PSTHs indicate decoding accuracy obtained at the resolution (5 ms) used for plotting the PSTHs. A–E, bottom, show decoding accuracy as a function of SNR for the data illustrated in PSTHs. A: site with a “monotonic” increase in decoding performance for decreasing background noise levels. B: site with an “equivalent” level of decoding performance in the absence of noise and the presence of noise over a large range of SNRs. C–E: sites with “nonmonotonic” responses across SNR such that the no-noise condition (SNR+∞) and −20-dB SNR conditions have less robust and less informative responses than were observed at intermediate SNRs.

The most surprising facet of the data was the observation that quite a number of recorded clusters responded more robustly and reliably in the presence of noise rather than in its absence. Three examples of such responses are shown in Fig. 5, C–E. In all three examples, sweep ID decoding is maximal at −5 dB SNR, but appreciable information about stimulus identity is available at less favorable SNRs (−15 dB in Fig. 5C; −10 dB in Fig. 5, D and E). The composite PSTHs indicate that at the optimal SNRs, most FM sweeps were represented by a distinct response peak. However, in the absence of noise, responses to the FM sweeps were either diffuse (Fig. 5C) or absent altogether (Fig. 5, D and E). These examples demonstrate that the presence of noise at moderate SNRs can sometimes improve the cortical representation of dynamic signals such as FM sweeps.

Examination of Fig. 5 suggests that decoding performance was closely tied to the salience of peaks in the composite PSTHs. We evaluated this possibility by correlating the lifetime sparseness (Vinje and Gallant 2000) of the PSTHs with decoding accuracy. Sparseness was defined as a rescaling of the activity fraction (Rolls and Tovee 1995), A, such that A = (Σri/n)2/Σ(ri2/n), and (lifetime) sparseness was computed as (1 − A)/(1 − 1/n), where n is the length of the spike count vector in each time bin and r is the spike count in the ith bin. We computed the sparseness for SNR condition and site on the basis of a composite vector comprised by concatenating the PSTHs associated with each of the different sweep trajectories. Sparseness was significantly and strongly predictive of decoding performance (r = 0.58; P < 10−167).

We characterized the distribution of SNR-dependent FM responses by constructing functions describing sweep decoding accuracy at the optimal temporal resolution determined for each SNR condition. Figure 6 shows waterfall plots describing how decoding performance for the full spike train classifier varies across the population as functions of the SNR. For convenient visualization, the waterfall plots are sorted in order of their maximal decoding performance from lowest (site number 0) to highest.

Fig. 6.

Fig. 6.

Grouping of sweep trajectory decoding across SNRs for all sites (n = 321). Three SNR response classes were defined. A: monotonic: sites whose decoding performance was maximal in the absence of noise and greater than that obtained in the presence of noise at any of the tested SNR conditions (n = 104). B: nonmonotonic: sites whose decoding performance is maximal in the presence of noise at one of the tested SNR conditions and significantly greater than that obtained in the absence of noise (n = 101). C: equivalent: Sites whose decoding performance in the absence of noise is not statistically different from performance in the presence of noise for at least one tested SNR condition (n = 116). Waterfall plots show performance as function of SNR for all sites in each response class. Untested SNR conditions of −25 dB were set to zero for graphical purposes. Other untested SNR conditions were linearly interpolated on the basis of existing values. Coloring varies from the lowest (blue) to highest (red) values on each panel. D: distribution of the different response groups for the 5 different SNR conditions. Blue slices represent performance at the indicated SNR less than that for the no-noise (SNR+∞) condition. Red slices represent performance better than that for SNR+∞. Green slices represent performance indistinguishable from SNR+∞. Gray slices represent sites with no significant decoding performance at any tested SNR. Because not all SNR conditions were tested at every site, the data from −20 to 0 dB were obtained at 307, 273, 307, 295, and 316 sites respectively.

We defined three separate response classes for this analysis: 1) clusters whose decoding performance is maximal in the absence of noise (Fig. 6A) and significantly (P < 0.00006) greater than that obtained in the presence of noise at one of the tested SNR conditions are classified as monotonic; 2) clusters whose decoding performance is maximal in the presence of noise at one of the tested SNR conditions and significantly greater than that obtained in the absence of noise are classified as nonmonotonic; and 3) clusters whose decoding performance in the absence of noise is not statistically different from their performance in the presence of noise for at least one tested SNR condition are classified as equivalent. This analysis allowed us to operationally define noise-averse, noise-preferring, and noise-tolerant responses, respectively. Note that the stringency of the statistical criterion determines the prevalence of the different response classes. We chose a very stringent criterion (P < 0.0001) because we wanted to be conservative when assigning the nonmonotonic (and monotonic) response class label.

For the purposes of this analysis, responses for each SNR condition were decoded separately at the optimal temporal resolution for that condition. This was done to make our analysis more conservative with respect to reporting nonmonotonicity for SNR tuning functions because the optimal temporal resolution could depend on the SNR condition (see Fig. 10). Nonetheless, we repeated the analysis at a single bin size (10 ms), the most common optimal bin size, and obtained essentially identical distributions of response classes (P > 0.99; χ2 test of association).

As is evident, the distribution of clusters across these three defined categories was relatively even. In the majority of cases (217/321), decoding performance was not significantly improved by the absence of noise relative to that observed for at least one SNR condition. The pie charts in Fig. 6D indicate the distribution of the different noise-effect categories for five different SNR conditions. Gray wedges indicate cases where decoding was equivalent to the SNR+∞ condition and neither were significantly better than expected by chance. Blue wedges indicate the percentages of cases where the decoding accuracy at all SNR conditions (indicated at left) was significantly less than the accuracy for the no-noise condition. With decreasing SNR, this population grows from 41% at 0 dB SNR to 89% at −20 dB SNR.

Sites that exceeded the performance observed in the absence of noise are indicated by red wedges. At 0 dB, 22% of the recorded clusters decoded FM sweep trajectory more effectively than in the absence of noise. This percentage decreased with decreasing SNR such that no sites outperformed the no-noise condition at an SNR of −20 dB.

Green wedges represent the percentage of sites that performed as well for at least one SNR condition as they did for the no-noise condition. This was the case for 33% of recorded clusters at 0 dB SNR and, remarkably, still true for 3% at −20 dB. Overall, noise-tolerant or noise-preferring sites (combined green and red wedges) decreased from 55% at 0 dB SNR to 19% at −20 dB SNR. That means that even at very unfavorable noise conditions (e.g., −20 dB SNR), a small proportion of sites still allowed a significant degree of stimulus decoding.

In summary, we identified three types of FM responses that differed in their behavior in the presence of background noise. One group (monotonic) showed a progressive decrease in FM encoding with decreasing SNR. Another group (equivalent) showed similar FM encoding both in the presence and absence of noise for at least one noise condition. A third group (nonmonotonic) showed the best performance for at least one tested SNR, exceeding the performance for the no-noise condition. Even at a perceptually challenging SNR condition such as −15 dB, nearly 20% of sites exhibited decoding performance as good or better than they had in the absence of noise, although accuracy for these sites was typically poor.

Tone Response Features Associated with Noise Tolerance and Noise Preference

We identified a number of basic response metrics that predicted the SNR response class of a given recording site. We estimated the spontaneous rates for each site by averaging the firing rates during the interval from 2,000 to 2,500 ms after trial onset for the SNR+∞ condition. There was a trend suggesting that the median spontaneous rates of equivalent sites (19.8 spikes/s) were lower (P < 0.018) than those of monotonic sites (23.7 spikes/s), but not those of nonmonotonic sites (21.3 spikes/s). We estimated the noise-driven rates by averaging the firing rates for each site across all SNR conditions containing noise during the interval from 2,000 to 2,500 ms after trial onset. Median noise-driven rates for monotonic sites (25.6 spikes/s) were significantly higher than those of equivalent (21.1; P < 0.004) and nonmonotonic sites (19.1; P < 0.002). When firing rates were estimated during the intervals including the FM sweeps and averaged across sweeps and SNR conditions, monotonic sites again exhibited significantly higher median rates (34.1 spikes/s) than either equivalent (25.5; P < 0.001) or nonmonotonic sites (24.2; P < 10−5). Thus a distinguishing feature of monotonic sites was higher firing rates in general.

We next compared the incidence of nonmonotonic sound level tuning for pure tones, measured at each site’s best frequency (BF) without any noise, with that for FM sweeps at different SNRs. We computed the monotonicity index (MI; Bendor and Wang 2008) as the ratio of the firing rate for the loudest pure tone and for the pure tone that elicited the maximal response. When the loudest tone elicits the maximal response, the MI equals 1. Sites defined as monotonic with respect to the decoding of sweep identity across SNR conditions also exhibited the highest median tonal MI (0.75), which significantly exceeded that of the nonmonotonic sites (0.62; P < 0.005), but not that of the equivalent sites (0.69; P < 0.067), which were intermediate. To determine whether this finding was specific to tones at BF, we repeated this analysis but included the responses for the range of tone frequencies where the responses summed across the different tested levels exceeded the mean plus 5 SD for spontaneous responses. Measured this way, the median tone MI of nonmonotonic sites (0.83) remained significantly lower than that of equivalent sites (1; P < 0.003) or monotonic sites (1; P < 0.001).

We compared the frequency tuning bandwidths measured with pure tones against the SNR response classes for sound levels 30 dB above threshold and at the maximum level tested (70 dB). Median FRA bandwidths measured 30 dB above threshold were narrowest for sites with nonmonotonic SNR functions (492 Hz) and significantly less than the median for the equivalent sites (879 Hz; P < 0.006) and, to a lesser degree, for the monotonic sites (837 Hz; P < 0.027). The differences were more stark for the loudest tones used to define the FRA. The maximum amplitude bandwidths of the nonmonotonic sites decreased to an median value of 226 Hz, whereas those for the equivalent and monotonic sites widened to 1,178 and 1,780 Hz, respectively, resulting in significant differences (P < 0.002 and P < 10−4, respectively). We computed a SNR-based MI (MISNR) by taking the ratio of the decoding performance for the SNR+∞ condition and the optimal SNR. There were modest but significant correlations between the MISNR and the frequency tuning bandwidths estimated 30 dB above threshold (r = 0.24; P < 0.002) and at the maximum amplitude (r = 0.29; P < 10−5).

Finally, we computed and compared the minimum response latencies for the different SNR response classes. We defined the minimum latency as the earliest bin (at 1-ms resolution) that exceeded the mean plus 5 SD of the spontaneous rate at each site. The median latency for the nonmonotonic sites was 18 ms, which was significantly longer than was observed for the equivalent (15 ms; P < 0.001) and monotonic sites (15 ms; P < 10−4).

Collectively, the foregoing results suggest that those sites that integrate sound energy more broadly, quickly, and linearly were most likely to exhibit monotonic increases in decoding performance at progressively more favorable SNRs. Sites with nonmonotonic decoding performance (i.e., those “tuned” to a defined SNR range) tended to be more narrowly tuned for tones, showed a nonmonotonic growth of firing rates for BF tones, had lower driven firing rates for tones and noise, and had slightly longer latencies. These properties are suggestive of stronger inhibitory influences on the processing at nonmonotonic sites.

Sweep Amplitude Decoding

The previous analysis of SNR effects was based on FM sweeps that varied in trajectory but not in amplitude. Response timing, depending strongly on sweep velocity and direction, dominated decoder performance (see Fig. 5). To assess the influence of stimulus level, we considered the decoding performance of a FM sweep of fixed velocity and direction as a function of its overall level across a range of noise levels (see Fig. 3). Results for decoding sweep amplitude are shown in Fig. 7. Although the presentation of the results is analogous to that for sweep identity, there are important differences to consider. First, as we discussed with reference to Fig. 4C (scatterplots), we tested fewer FM sweep levels than trajectories, so results described in percent correct cannot be compared directly. Second, data across different sweep trajectories were collected at a narrow range of moderate SPLs. The nominal SNR values for the different SNR conditions are referenced to these amplitudes, although it would be equivalent to describe them in terms of noise levels varying from 60 (0 dB) to 80 dB (−20 dB) in 5-dB steps.

Fig. 7.

Fig. 7.

Grouping of FM level (sweep dB) decoding accuracy across different noise conditions. Conventions are identical to those in Fig. 6. Because not all SNR conditions were tested at every site, the data from −20 to 0 dB were obtained at 167, 181, 167, 181, and 181 sites, respectively.

SNR values differ for each of the different FM sweeps in the “sweep dB” stimulus set (see methods). For example, the loudest tested sweep (i.e., 85 dB) exceeded the loudest noise (80 dB). Thus the organization of the confusion matrices is expected to be somewhat different when sweep amplitude is being discriminated, because it is likely that the loudest FM sweeps will elicit activity that supports their discrimination by the classifiers, even at the least favorable SNRs. Conversely, the quietest FM sweeps will always be well below the level of the noise, so it is possible that they never elicit activity that supports their detection or discrimination. As a result, we might expect that decoding performance will be above chance at the least favorable noise condition (i.e., 80 dB) but poorer than sweep ID decoding at the most favorable noise condition (i.e., 60 dB). A comparative examination of Fig. 7 supports this notion: even for noise levels equal to or greater than 75 dB, more than one-third of the recorded clusters decode sweep level as effectively as they do in silence. As we noted with respect to Fig. 4C when comparing decoding performance for sweep ID and sweep dB, temporal cues reflecting changes in sweep level are expected to be more subtle and less effective in general. This is reflected by the fact that roughly one-sixth of all clusters fail to discriminate changes in sweep amplitude, even in silence (gray wedges). The distributions of SNR response classes (e.g., monotonic, nonmonotonic, and equivalent) differed significantly for sweep ID and sweep dB discrimination (P < 10−5; χ2). Overall, the changes in performance with changes in noise level are less dramatic for sweep amplitude than they are for sweep trajectory, in keeping with differences in the nature of the discriminations themselves.

Decoding Accuracy Across SNR by Response Class

Optimal decoding was achieved in the presence of noise for the bare majority of cases (164/321), even if not all of these instances exceeded performance in the SNR+∞ condition for the P < 0.0001 significance criterion. Nevertheless, monotonic sites tended to achieve the highest decoding performance in absolute terms. For each site, we identified the maximum decoding performance across all tested SNRs and then compared the medians of these values for the different SNR response classes. For monotonic sites (where the maximum occurred for the SNR+∞ condition by definition), the median was 67.5%, which was significantly higher than the median best decoding for either equivalent (55.4%; P < 10−4) or nonmonotonic sites (52.1%; P < 10−5). Thus it is true that many cortical sites exhibited superior decoding in the presence of noise, but in an absolute sense, such sites were likely to underperform those sites that exhibited optimal decoding in the absence of noise. Importantly, however, significant decoding emerged at less favorable SNRs for the nonmonotonic and equivalent sites relative to the monotonic sites. The median minimum SNR where significant decoding was observed was −10 dB for monotonic sites, which was significantly higher than that for equivalent (−15 dB; P < 0.005) and nonmonotonic sites (−15 dB; P < 10−4). The corresponding means were −11.2, −12.7, and −13.6 dB, respectively.

We visualized the comparisons across SNR response classes by computing population averages of decoding accuracy for all sites and for each of the different SNR response classes (Fig. 8). For sweep trajectory decoding (Fig. 8A), there is a monotonic increase in decoding accuracy for more favorable SNRs, both for all sites (black) and for the equivalent class (green).

Fig. 8.

Fig. 8.

A–D: average decoding accuracy as functions of the signal-to-noise ratio (SNR) for each of the defined SNR response classes (blue, red, and green) and for the full population (black). B and D show the results when accuracy for each site is normalized to that obtained without noise before being averaged. Results for decoding sweep trajectory (ID) are shown in A and B (n = 321 sites), and those for decoding sweep amplitude (dB) are shown in C and D (n = 181 sites). Shading indicates ±SE.

Average decoding accuracy for the monotonic and nonmonotonic classes diverged most sharply for the no-noise condition. By definition, monotonic sites achieve maximal decoding accuracy in the absence of noise, whereas nonmonotonic sites achieve it in the presence of noise. Consideration of the population means of the SNR decoding functions normalized to performance in the no-noise condition (Fig. 8B) indicate that the decoding advantage in noise for the nonmonotonic sites is typically greater than 40% for SNRs of −5 and 0 dB. Conversely, the decoding disadvantage in noise for the monotonic sites was ~40% for a SNR of −5 dB. For all sites, however, mean decoding accuracy for SNRs as low as −10 dB was at least 80% of that obtained in the no-noise condition. Thus defining SNR response classes clarifies the fact that cortical sites are quite diverse with respect to encoding signals embedded in noise, which could be obscured by global averages, which suggests robustness to noise given the relative flatness of the curve for SNRs greater than −10 dB.

The pattern of results was quite similar for the decoding of sweep level (Fig. 8, C and D), although the discrepancy between the means for the different SNR classes was more consistent across SNRs. Decoding of sweep amplitude is different from the decoding of sweep trajectory because even at the least favorable SNRs, the levels of the loudest sweeps exceed that of the background noise. As a result, mean accuracy for the different SNR classes does not converge to values near chance as it does for sweep trajectory decoding, because successful decoding of the loudest sweeps remains possible at the lowest SNRs.

SNR Condition Decoding

The analyses we have described thus far describe how effectively linear decoders can distinguish cortical spiking patterns elicited by FM sweeps in isolation or when embedded in noise at varying SNRs, allowing us to characterize decoding performance as a function of the SNR (or, equivalently, background noise level). We were also interested in how the SNR condition affects the cortical representation of the FM sweeps. By decoding sweep ID or dB in parallel with SNR condition, we can evaluate the extent to which the demonstrated robustness of the representation is reflective of response invariance when challenged by background noise. If cortical responses are genuinely invariant across SNR condition, then even when the sweep trajectory is correctly identified, identification of the SNR condition will be at chance levels determined by the number of tested SNRs. This is distinct from noise-robust encoding, which implies that decoding accuracy remains high across a broad range of SNRs.

We generated composite confusion matrices (CCMs; Malone et al. 2013) such that responses to a given sweep must be discriminated not only from other sweeps in the set (e.g., 12 for sweep ID) but also from sweeps presented in different SNR conditions. An example CCM for a single cluster is shown in Fig. 9. The analysis is ordered in blocks by SNR condition, and each sub-block contains a complete set of the sweep stimuli that had been analyzed separately in prior analyses (Fig. 2). Matrix entries along the main diagonal indicate trials where both the sweep trajectory and SNR condition were correctly identified. In this example, decoding accuracy was better at 0 dB than in the SNR+∞ condition. For many trials when the SNR was 0 dB, however, the correct sweep trajectory was assigned to the incorrect SNR condition, resulting in matrix entries along parallel “minor” diagonals directly above the major diagonal. Note that this is not true for the minor diagonal below the major diagonal, indicating that responses to sweeps in the 0-dB SNR condition were very rarely misidentified as responses from the SNR+∞ condition. Conversely, there is little diagonal structure in columns corresponding to the SNR+∞ block on the right edge of the CCM, indicating that sweep responses in silence were typically distinct from sweep responses in noise for this site. Thus the responses in this example were noise preferring, because decoding accuracy was highest in the presence of noise, but were not noise invariant, because the presence or absence of noise resulted in discriminable changes in cortical response patterns. Finally, it is clear that when SNR condition errors occur, they are more likely to occur for similar SNR conditions (e.g., −5 and 0 dB vs. −20 and 0 dB), indicating that more similar SNRs tended to elicit more similar responses.

Fig. 9.

Fig. 9.

Composite confusion matrices (CCMs) across all SNR/noise conditions. Each CCM shows the discrimination performance within and across all SNR/noise conditions. A: example CCM for FM sweep trajectory. The CCM is ordered in blocks by SNR condition, and each sub-block contains a complete set of the sweep stimuli ordered as in Figs. 2 and 3. B: population CCM for FM sweep trajectory. C: population CCM for sweep amplitude discrimination across all tested noise levels. To facilitate averaging, only data for sites tested with the most commonly tested stimulus configuration are shown in B (n = 270 sites) and C (n = 167 sites).

These general patterns are more clearly evident using the population-averaged CCM shown in Fig. 9B. This matrix shows the average number of correctly decoded trials for each sweep trajectory and SNR condition, and was generated by averaging individual CCMs (Fig. 9A) across all cortical sites. Increases in average decoding performance with increasing SNR (Fig. 6) are evident by comparing entries along the main diagonal on the top left of the matrix in Fig. 9B with those on its bottom right corner. Each sub-block of the matrix is ordered similarly to those in Fig. 2, and careful consideration of the results within the sub-blocks corresponding to each SNR reveals that slower sweeps were more accurately decoded than faster sweeps and that ascending sweeps were more accurately decoded than descending sweeps.

More importantly, however, the concentration of entries on the main diagonal relative to the minor diagonals indicates that information about SNR condition could be reliably recovered from cortical spiking patterns. We compared the ratio of entries on the main and minor diagonals with the expected ratios determined via Monte Carlo simulations (see methods) and found that for nearly all clusters (299/321; 93.15%), the actual ratios were significantly higher than expected if SNR condition did not affect cortical sweep responses. Because the CCMs of nearly all clusters (319/321; 99.38%) significantly exceeded chance, however, there were clearly a few rare cases where sweep encoding was effectively invariant to SNR condition, but these were exceptions to the rule. Thus cortical responses were rarely noise invariant, per se, despite the fact that many were noise tolerant or noise preferring (Fig. 6).

Results for decoding sweep amplitude were broadly similar (Fig. 9C). Within each sub-block (Fig. 3), sweep amplitude is arranged in descending order. The high values at the top left of each sub-block reflect correct assignment of responses elicited by the loudest sweep (85 dB SPL). As the background noise level decreases, an increasing number of sweeps can be accurately decoded, indicated by the lengthening of the diagonal structure within each sub-block from the top left to the bottom right of the composite matrix.

We performed an additional analysis to examine the issue of noise invariance by comparing decoding accuracy for sweep trajectory using spike train templates based on responses in the 0-dB SNR and SNR+∞ conditions to decode spike trains derived from the remaining SNR contexts (i.e., −20 to −5 dB). If cortical responses were genuinely noise invariant, then using response templates obtained in a different SNR context should have minimal effects on decoding accuracy. When response templates from the 0-dB SNR condition were used, the median reductions in decoding accuracy for the −20-, −15-, −10-, and −5-dB conditions were relatively small (1.88, 1.25, 2.50, 2.08, and 1.25%, respectively). By comparison, the respective reductions in median decoding accuracy with the use of response templates derived from the SNR+∞ condition were 0.83, 1.67, 5.42, 11.25, and 12.92%. The accuracy reductions were significantly larger (P < 10−11) for the −15-, −10-, and −5-dB SNR contexts. Reduction in decoding accuracy for the least favorable SNRs (−25, −20 dB) were similar due to a restriction of range, because decoding was so poor in general (Fig. 8). Thus responses obtained in noise were better decoded with response templates that were also derived in noise.

Background Noise Level Modulates Signal Transmission But Is Not Explicitly Encoded

Having demonstrated that the SNR condition reliably reshapes cortical firing patterns, we asked whether such patterns reflect response changes due solely to the level of the continuous noise used as a background sound. That is, is the lack of response invariance across different noise levels a reflection of changes in the responses to the background noise itself? The waterfall plot in Fig. 10A shows that firing rates measured well after sweep offset (2,000–2,500 ms after trial onset) varied relatively little as a function of the noise level over the tested 20-dB range of noise levels. We decoded the noise level on the basis of responses during an interval when only the noise was present (2–2.5 s) and compared the results with those for responses during an interval that included the sweeps (0–2 s). If noise levels modulated cortical responses to the sweeps, it should be possible to recover information about them, even if there was little information directly conveyed about noise levels after the sweeps had ceased. Because the number of SNR conditions varied across sites, we converted the percentage of correctly classified trials into z scores for this analysis. When the analysis interval excludes the sweeps, all trials obtained at the same noise level were considered to be equal because several hundred milliseconds had elapsed since the termination of the slowest sweep.

We decoded identical sweeps presented in different SNR conditions (i.e., noise levels) and averaged performance for SNR condition decoding across the set of 12 distinct sweep trajectories. This is different from the analysis that generates the CCMs in Fig. 9, which involves discriminating among all sweep trajectories in all contexts collectively. For this analysis, decoding errors related to sweep trajectory are not possible, because each sweep trajectory is discriminated across SNR context separately.

Results of these analyses are directly compared in the scatterplot featured in Fig. 10B. Decoding SNR context (or, equivalently, noise level context) was generally only possible when the responses to the sweeps were included; cortical responses to the noise by itself, after the sweeps had ceased, were almost universally uninformative. These results demonstrate that linearly additive responses to the signals and noises separately cannot explain the pattern of results we observed. Responses to the noises at different levels were effectively constant, but nevertheless resulted in the differential modulation of sweep responses as a function of noise level.

Performance of the phase classifier (Fig. 10B, green circles) was comparable to that of the full spike train classifier (purple circles) when the sweeps were included but useless for decoding the SNR condition on the basis of responses to the noise alone. Although many cortical neurons exhibit phase-locked responses to repeated instances of “frozen” noise (Scott et al. 2011), for the continuous noise presented here, the detailed spectrotemporal structure of the noise varied on each trial, and as a result would not be expected to produce spike timing information that was consistent across different trials.

Finally, decoding accuracy for the rate classifier (Fig. 10B, black circles), although significantly worse than that of the phase and standard classifiers, more closely approximated that of the timing-based classifiers for the discrimination of SNR condition than it had for the discrimination of sweep trajectory (compare with Fig. 4). The median ratio of SNR condition decoding accuracy for the rate and full spike train classifiers was 0.67, compared with 0.28 for sweep trajectory decoding.

Optimal Temporal Resolution for Decoding Varied with Noise Condition

We evaluated how decoding performance varied with the temporal resolution at which cortical spiking patterns were binned by decoding single-trial spike trains binned at a range of temporal resolutions (see methods). Note that average decoding performance decreased as the SNR decreased (ANOVA; df = 6; F = 173.8; P < 10−174), despite the prevalence of equivalent and nonmonotonic sites, and the distributions of optimal binwidths for the joint decoding of sweep trajectory and SNR condition (i.e., CCMs) did not vary across SNR response classes (P > 0.1 for all comparisons). Figure 11 shows the distributions of optimal decoding binwidths for each of the tested SNR or noise conditions as matrices. The marginal distributions, representing the distribution of optimal binwidths across SNR conditions, are indicated by the histogram bars (Fig. 11, top insets). Each row in the matrices in Fig. 11, A and B, shows the percentage of clusters where decoding was maximal for the bin width shown below the matrix for the SNR/noise condition indicated to the left of each matrix (each row sums to 100%). The optimal temporal resolution for decoding both sweep trajectory (Fig. 11A) and sweep level (Fig. 11B) was high (i.e., ≤20 ms) at the most favorable SNRs and low (≥50 ms) at the least favorable noise conditions. For example, when the SNR was equal to or greater than −5 dB, the optimal decoding bin width was rarely 50 ms or greater, particularly for sweep ID.

To quantify this trend, we compared the actual SNR by bin width matrices with simulated matrices obtained by random assignments of SNR condition labels to the optimal bin width distribution. We compared the actual prevalence of SNR and optimal bin width combinations with the distributions obtained for the simulated matrices to generate z scores (Fig. 11, C and D). The preponderance of high values in the bottom left and top right quadrants are evident for both sweep ID and sweep dB decoding, indicating that more temporal averaging is increasingly necessary to compensate for the increased noise at unfavorable SNRs.

To provide statistical support for these observations, we performed a permutation test designed to quantify the degree to which favorable SNRs were associated with narrow optimal binwidths and unfavorable SNRs were associated with wide ones. Because the −25-dB SNR condition was uncommonly tested, and because the 2,000-ms bin width represented the rate classifier, we restricted the analysis to a 6 × 6 matrix (−20 to +∞ SNR and 2 to 100 ms). For each simulated matrix (n = 100,000), we computed the difference between the sum of the bottom left and top right quadrants and the sum of the top left and bottom right quadrants, and compared this value with the actual difference observed in the experiment. Results for both sweep trajectory and sweep level decoding were highly significant (P < 0.00001), indicating that increased temporal averaging was necessary to offset the deleterious effects of decreasing the SNR on decoding performance.

DISCUSSION

Our main finding was that a substantial minority of cortical sites exhibited significantly better decoding of FM sweep trajectory in the presence rather than absence of continuous white noise. If we consider these clusters of neurons as noise preferring and add to them those noise-tolerant neurons that achieved decoding that was statistically indistinct from the no-noise condition for at least one tested SNR, then a clear majority of cortical sites exhibit signal-in-noise transmission capabilities that rival or exceed their ability to transmit information about FM sweeps in silence. Although these experiments cannot demonstrate that such sites support noise-robust auditory perceptual competence, they provide direct evidence for substantial diversity in how noise affects signal representations in the core auditory fields and complement recent demonstrations of the complex implications of the SNR (Teschner et al. 2016).

Our findings expand on previous observations of a progressive reduction of noise influences on central auditory responses. Mesgarani et al. (2014) proposed a model employing dynamic synaptic depression combined with a feedback gain normalization to account for a noise-robust cortical representation of speech sounds. Rabinowitz et al. (2013) reported a reduction of noise influences in cortex compared with more peripheral stations and suggested that neuronal populations that show the strongest adaptation to stimulus statistics are also the most noise tolerant. Sensitivity to statistical differences between the spectrotemporal structure of vocalizations and background noises has also been proposed as an explanation for noise-tolerant responses in the higher auditory areas of birds (Moore et al. 2013).

The prevalence of noise tolerance and noise preference likely depends on statistical dissimilarities in the spectrotemporal structure of foreground signals and background noises. FM sweeps are spectrally dense and temporally structured relative to white noise. In the context of our stimulus protocol, which involved continuous white noise, FM sweeps were also relatively infrequent events. Thus differential adaptation to the background noise and foreground signals could facilitate selective signal transmission. Background noise levels spanning a range of 20 dB could not, by themselves, be discriminated, despite readily discriminable influences on contemporaneous sweeps (Fig. 10B). Although it is impossible to rule out covert attentional effects (Niwa et al. 2013, 2015), our results in a passive listening paradigm indicate that the segregation of foreground and background signals can occur in the absence of overt task engagement, likely driven by bottom-up mechanisms sensitive to the large spectrotemporal differences in the signals and noises we presented. In fact, such differences may be a necessary substrate for attentional selection by top-down mechanisms.

We want to emphasize that central representations of signals embedded in noise diverge significantly from expectations based on responses in the auditory periphery (Rabinowitz et al. 2013), where the features of both foreground and background sounds are reliably encoded. Cortical responses to foreground signals were not noise invariant (c.f., Schneider and Woolley 2013) in the strictest sense (Fig. 9), but cortical responses to background noise were essentially invariant to a crucial parameter of the noise, namely, its level, over a 20-dB range, given sufficient time to adapt (Fig. 10A). Changes in the SNR reliably modulated the propagation of the FM sweeps in the ascending auditory pathway such that significant information about the SNR context could be decoded from cortical responses to the FM sweeps. If we are to judge from the PSTHs (e.g., Fig. 5), responses to signals in silence are not obscured by rising responses to louder background noises. Instead, PSTH peaks representing responses to sweeps at relatively favorable SNRs simply vanish for sufficiently negative SNRs. We caution that it is too simplistic to assign a value to the SNR that prevents signal transmission, because not only is there substantial diversity at the same hierarchical level of the system, but also the effects of a given SNR vary with the absolute levels of both the signals and noise (Teschner et al. 2016).

Our analytical framework mirrors that used for a study of the cortical representation of sinusoidal FM in awake monkeys (macaques; Malone et al. 2014) and differs from prior studies in anesthetized (Godey et al. 2005) and awake (Atencio et al. 2007) New World monkeys, which focused on rate-defined tuning for sweep speed and direction. Calculating rates over a comparatively long (2 s) fixed interval penalizes rate-based estimates of tuning for the FM sweep parameters relative to methods that estimate firing rates within intervals matched to duration of each sweep. However, the latter method requires knowledge of the sweep parameters that the decoders are intended to utilize for discrimination, effectively solving much of the problem that the auditory system must solve. Because the timing of PSTH peaks indicates the intersection of the FM sweeps with the frequency-defined response area of each cluster, varying FM sweep speed and direction varies the effective onset and offset times of the different sweeps (Malone et al. 2015a). Analogously, interactions between the sweeps and the shapes of the neurons’ response areas will effectively vary the attack and decay times of the sweeps, which could also influence response timing, and support timing-based decoding. For example, the phase classifier substantially outperformed the rate decoder (Fig. 4B) even when discriminating FM sweeps differing only in amplitude. This could reflect amplitude-related changes in response latency, or spectral bandwidth, for example, both of which manifest as differences in spike timing. Thus steady-state level differences and the concomitant effects on envelope profiles are sufficient to organize cortical spike timing (e.g., Fig. 3), as has been reported for changes in tone levels (Malone et al. 2010).

Across the population, average decoding performance and the prevalence of significant decoding declined at less favorable SNRs (Fig. 8), consistent with psychophysical data demonstrating decrements in discrimination between sounds as the level of noise maskers increase (Miller et al. 1951; Miller 1974; Narayan et al. 2007). Nevertheless, the effects on single sites were quite variable (Figs. 58); roughly one-third of cortical sites become more informative in the presence of noise than in silence, consistent with prior observations in anesthetized preparations. Phillips (1990) reported increased salience of cat AI neural responses to noise-masked tones relative to unmasked tones. Nagarajan et al. (2002) showed a similar effect on responses to vocalizations in anesthetized marmoset monkeys. To our knowledge, however, noise-enhanced cortical representations of complex signals have not been previously reported to the degree observed in our study. It is possible that noise-enhanced responses have been undersampled in prior studies that presented noise noncontinuously or focused on units identified with search stimuli presented against silent backgrounds.

Continuous background noise modulated the neural activity patterns representing the signals, even over a range where varying the level of the noise itself did not itself produce discriminable changes in spiking patterns. This is consistent with the idea that moment-to-moment noise fluctuations were insufficiently large to drive responses in the adapted network (Ringach and Malone 2007). Moreover, external noise may actually enhance stimulus representations in cortex by altering basal network activity. Increased levels of background synaptic activity have been shown to shift neurons into a high-conductance state that increases sensitivity to low-level inputs and increases the temporal resolution (Destexhe et al. 2003; Destexhe and Contreras 2006; Rudolph and Destexhe 2003). Although those experiments addressed the issue of internal states of synaptic activity, it is possible that stochastic driven activity produces similar effects and is responsible for the enhanced responsiveness of auditory cortical neurons in the present study. Theoretical and brain slice experiments in single neurons and networks in which noise was added to simulate different levels of synaptic background activity have demonstrated that the basal level of synaptic activity modulates the responsiveness of neurons (Chance et al. 2002; Destexhe and Contreras 2006; Hô and Destexhe 2000; Wolfart et al. 2005) and promotes increased responsiveness to stimuli during noisy states compared with quiescent states. Noise-increased responsiveness may be related to suprathreshold stochastic resonance, a phenomenon in which nonlinear systems carry increased signal information in the presence of noise (Moss et al. 2004; Wiesenfeld and Moss 1995). Because background acoustic noise likely alters background synaptic activity, it may contribute to improved cortical signal responses, in addition to local circuit effects. The potential role of circuit influences on creating or enhancing noise tolerant encoding has been supported by recordings from homologs of auditory cortex in zebra finch (Schneider and Woolley 2013). Those authors suggested that feedforward inhibition can contribute to stimulus encoding with enhanced background tolerance. Our observation that sites with pure tone responses that appear to reflect stronger inhibition, including narrower frequency tuning, lower driven and spontaneous firing rates, and nonmonotonic rate-level functions, are correlated with sites of higher noise tolerance (the nonmonotonic SNR sites) supports that hypothesis. These findings support the notion that cellular and network effects could be crucial for generating noise-robust and even noise-enhanced cortical representations of foreground signals in background noise.

The dependence of neuronal responses on SNR is a prime example of the sensitivity of central auditory neurons to acoustic context (Dean et al. 2005; Malone and Semple 2001; Malone et al. 2002; Mesgarani et al. 2009; Ulanovsky et al. 2003; Westö and May 2016). The temporal filters of IC neurons adapt to the presence of background noise in a manner that maintains stimulus information (Lesica and Grothe 2008), effects that appear to strengthen along the auditory neuraxis (Rabinowitz et al. 2013; Schneider and Woolley 2013). Our results demonstrate that the effect of background noise varies significantly even within the auditory core fields, and suggest that adaptive mechanisms sensitive to the differential statistical structure of ongoing acoustic signals triage particular signals (the foreground) to be robustly transmitted and represented, while rejecting or de-emphasizing others (the background) (Ringach and Malone 2007). The prevalence of noise-robust and noise-enhanced cortical responses in a passive listening paradigm suggests an important role for bottom-up mechanisms in auditory scene analysis. How top-down, attentional mechanisms contribute to this process and how changes in neuronal responsiveness resulting from changes in context subserve psychophysical performance (e.g., Teschner et al. 2016) remain important questions for future investigations.

GRANTS

Funding in support of this work was provided by National Institute of Deafness and Communication Disorders Grants DC011843 (to B. J. Malone) and DC002260 (to C. E. Schreiner), Silvio O. Conte Grant MH077970 (to C. E. Schreiner), Hearing Research Inc. (San Francisco, CA; to B. J. Malone and C. E. Schreiner), and the Coleman Memorial Fund.

DISCLOSURES

No conflicts of interest, financial or otherwise, are declared by the authors.

AUTHOR CONTRIBUTIONS

M.A.H., R.E.B., and C.E.S. conceived and designed research; M.A.H. and R.E.B. performed experiments; B.J.M. analyzed data; B.J.M. and C.E.S. interpreted results of experiments; B.J.M. prepared figures; B.J.M. drafted manuscript; B.J.M. and C.E.S. edited and revised manuscript; B.J.M., M.A.H., R.E.B., and C.E.S. approved final version of manuscript.

REFERENCES

  1. Atencio CA, Blake DT, Strata F, Cheung SW, Merzenich MM, Schreiner CE. Frequency-modulation encoding in the primary auditory cortex of the awake owl monkey. J Neurophysiol 98: 2182–2195, 2007. doi: 10.1152/jn.00394.2007. [DOI] [PubMed] [Google Scholar]
  2. Bar-Yosef O, Nelken I. The effects of background noise on the neural responses to natural sounds in cat primary auditory cortex. Front Comput Neurosci 1: 3, 2007. doi: 10.3389/neuro.10.003.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bendor D, Wang X. Neural response properties of primary, rostral, and rostrotemporal core fields in the auditory cortex of marmoset monkeys. J Neurophysiol 100: 888–906, 2008. doi: 10.1152/jn.00884.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Chance FS, Abbott LF, Reyes AD. Gain modulation from background synaptic input. Neuron 35: 773–782, 2002. doi: 10.1016/S0896-6273(02)00820-6. [DOI] [PubMed] [Google Scholar]
  5. Cheung SW, Bedenbaugh PH, Nagarajan SS, Schreiner CE. Functional organization of squirrel monkey primary auditory cortex: responses to pure tones. J Neurophysiol 85: 1732–1749, 2001. [DOI] [PubMed] [Google Scholar]
  6. Dean I, Harper NS, McAlpine D. Neural population coding of sound level adapts to stimulus statistics. Nat Neurosci 8: 1684–1689, 2005. doi: 10.1038/nn1541. [DOI] [PubMed] [Google Scholar]
  7. Destexhe A, Contreras D. Neuronal computations with stochastic network states. Science 314: 85–90, 2006. doi: 10.1126/science.1127241. [DOI] [PubMed] [Google Scholar]
  8. Destexhe A, Rudolph M, Paré D. The high-conductance state of neocortical neurons in vivo. Nat Rev Neurosci 4: 739–751, 2003. doi: 10.1038/nrn1198. [DOI] [PubMed] [Google Scholar]
  9. Ehret G, Schreiner CE. Regional variations of noise-induced changes in operating range in cat AI. Hear Res 141: 107–116, 2000. doi: 10.1016/S0378-5955(99)00213-0. [DOI] [PubMed] [Google Scholar]
  10. Foffani G, Moxon KA. PSTH-based classification of sensory stimuli using ensembles of single neurons. J Neurosci Methods 135: 107–120, 2004. doi: 10.1016/j.jneumeth.2003.12.011. [DOI] [PubMed] [Google Scholar]
  11. Godey B, Atencio CA, Bonham BH, Schreiner CE, Cheung SW. Functional organization of squirrel monkey primary auditory cortex: responses to frequency-modulation sweeps. J Neurophysiol 94: 1299–1311, 2005. doi: 10.1152/jn.00950.2004. [DOI] [PubMed] [Google Scholar]
  12. Hô N, Destexhe A. Synaptic background activity enhances the responsiveness of neocortical pyramidal neurons. J Neurophysiol 84: 1488–1496, 2000. [DOI] [PubMed] [Google Scholar]
  13. Hulse SH, MacDougall-Shackleton SA, Wisniewski AB. Auditory scene analysis by songbirds: stream segregation of birdsong by European starlings (Sturnus vulgaris). J Comp Psychol 111: 3–13, 1997. doi: 10.1037/0735-7036.111.1.3. [DOI] [PubMed] [Google Scholar]
  14. Lesica NA, Grothe B. Efficient temporal processing of naturalistic sounds. PLoS One 3: e1655, 2008. doi: 10.1371/journal.pone.0001655. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Malone BJ, Beitel RE, Vollmer M, Heiser MA, Schreiner CE. Spectral context affects temporal processing in awake auditory cortex. J Neurosci 33: 9431–9450, 2013. doi: 10.1523/JNEUROSCI.3073-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Malone BJ, Beitel RE, Vollmer M, Heiser MA, Schreiner CE. Modulation-frequency-specific adaptation in awake auditory cortex. J Neurosci 35: 5904–5916, 2015b. doi: 10.1523/JNEUROSCI.4833-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Malone BJ, Scott BH, Semple MN. Context-dependent adaptive coding of interaural phase disparity in the auditory cortex of awake macaques. J Neurosci 22: 4625–4638, 2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Malone BJ, Scott BH, Semple MN. Dynamic amplitude coding in the auditory cortex of awake rhesus macaques. J Neurophysiol 98: 1451–1474, 2007. doi: 10.1152/jn.01203.2006. [DOI] [PubMed] [Google Scholar]
  19. Malone BJ, Scott BH, Semple MN. Temporal codes for amplitude contrast in auditory cortex. J Neurosci 30: 767–784, 2010. doi: 10.1523/JNEUROSCI.4170-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Malone BJ, Scott BH, Semple MN. Encoding frequency contrast in primate auditory cortex. J Neurophysiol 111: 2244–2263, 2014. doi: 10.1152/jn.00878.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Malone BJ, Scott BH, Semple MN. Diverse cortical codes for scene segmentation in primate auditory cortex. J Neurophysiol 113: 2934–2952, 2015a. doi: 10.1152/jn.01054.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Malone BJ, Semple MN. Effects of auditory stimulus context on the representation of frequency in the gerbil inferior colliculus. J Neurophysiol 86: 1113–1130, 2001. [DOI] [PubMed] [Google Scholar]
  23. Mesgarani N, David SV, Fritz JB, Shamma SA. Mechanisms of noise robust representation of speech in primary auditory cortex. Proc Natl Acad Sci USA 111: 6792–6797, 2014. doi: 10.1073/pnas.1318017111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Miller GA, Heise GA, Lichten W. The intelligibility of speech as a function of the context of the test materials. J Exp Psychol 41: 329–335, 1951. doi: 10.1037/h0062491. [DOI] [PubMed] [Google Scholar]
  25. Miller JD. Effects of noise on people. J Acoust Soc Am 56: 729–764, 1974. doi: 10.1121/1.1903322. [DOI] [PubMed] [Google Scholar]
  26. Moore RC, Lee T, Theunissen FE. Noise-invariant neurons in the avian auditory cortex: hearing the song in noise. PLOS Comput Biol 9: e1002942, 2013. doi: 10.1371/journal.pcbi.1002942. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Moss F, Ward LM, Sannita WG. Stochastic resonance and sensory information processing: a tutorial and review of application. Clin Neurophysiol 115: 267–281, 2004. doi: 10.1016/j.clinph.2003.09.014. [DOI] [PubMed] [Google Scholar]
  28. Nagarajan SS, Cheung SW, Bedenbaugh P, Beitel RE, Schreiner CE, Merzenich MM. Representation of spectral and temporal envelope of twitter vocalizations in common marmoset primary auditory cortex. J Neurophysiol 87: 1723–1737, 2002. doi: 10.1152/jn.00632.2001. [DOI] [PubMed] [Google Scholar]
  29. Narayan R, Best V, Ozmeral E, McClaine E, Dent M, Shinn-Cunningham B, Sen K. Cortical interference effects in the cocktail party problem. Nat Neurosci 10: 1601–1607, 2007. doi: 10.1038/nn2009. [DOI] [PubMed] [Google Scholar]
  30. Niwa M, Johnson JS, O’Connor KN, Sutter ML. Differences between primary auditory cortex and auditory belt related to encoding and choice for AM sounds. J Neurosci 33: 8378–8395, 2013. doi: 10.1523/JNEUROSCI.2672-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Niwa M, O’Connor KN, Engall E, Johnson JS, Sutter ML. Hierarchical effects of task engagement on amplitude modulation encoding in auditory cortex. J Neurophysiol 113: 307–327, 2015. doi: 10.1152/jn.00458.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Olsen WO, Noffsinger D, Kurdziel S. Speech discrimination in quiet and in white noise by patients with peripheral and central lesions. Acta Otolaryngol 80: 375–382, 1975. doi: 10.3109/00016487509121339. [DOI] [PubMed] [Google Scholar]
  33. Phillips DP. Neural representation of sound amplitude in the auditory cortex: effects of noise masking. Behav Brain Res 37: 197–214, 1990. doi: 10.1016/0166-4328(90)90132-X. [DOI] [PubMed] [Google Scholar]
  34. Phillips DP, Cynader MS. Some neural mechanisms in the cat’s auditory cortex underlying sensitivity to combined tone and wide-spectrum noise stimuli. Hear Res 18: 87–102, 1985. doi: 10.1016/0378-5955(85)90112-1. [DOI] [PubMed] [Google Scholar]
  35. Phillips DP, Orman SS, Musicant AD, Wilson GF. Neurons in the cat’s primary auditory cortex distinguished by their responses to tones and wide-spectrum noise. Hear Res 18: 73–86, 1985. doi: 10.1016/0378-5955(85)90111-X. [DOI] [PubMed] [Google Scholar]
  36. Rabinowitz NC, Willmore BD, King AJ, Schnupp JW. Constructing noise-invariant representations of sound in the auditory pathway. PLoS Biol 11: e1001710, 2013. doi: 10.1371/journal.pbio.1001710. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Ringach DL, Malone BJ. The operating point of the cortex: neurons as large deviation detectors. J Neurosci 27: 7673–7683, 2007. doi: 10.1523/JNEUROSCI.1048-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Rolls ET, Tovee MJ. Sparseness of the neuronal representation of stimuli in the primate temporal visual cortex. J Neurophysiol 73: 713–726, 1995. [DOI] [PubMed] [Google Scholar]
  39. Rudolph M, Destexhe A. A fast-conducting, stochastic integrative mode for neocortical neurons in vivo. J Neurosci 23: 2466–2476, 2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Sadagopan S, Wang X. Level invariant representation of sounds by populations of neurons in primary auditory cortex. J Neurosci 28: 3415–3426, 2008. doi: 10.1523/JNEUROSCI.2743-07.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Schneider DM, Woolley SM. Sparse and background-invariant coding of vocalizations in auditory scenes. Neuron 79: 141–152, 2013. doi: 10.1016/j.neuron.2013.04.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Scott BH, Malone BJ, Semple MN. Transformation of temporal processing across auditory cortex of awake macaques. J Neurophysiol 105: 712–730, 2011. doi: 10.1152/jn.01120.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Teschner MJ, Seybold BA, Malone BJ, Hüning J, Schreiner CE. Effects of signal-to-noise ratio on auditory cortical frequency processing. J Neurosci 36: 2743–2756, 2016. doi: 10.1523/JNEUROSCI.2079-15.2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Vinje WE, Gallant JL. Sparse coding and decorrelation in primary visual cortex during natural vision. Science 287: 1273–1276, 2000. [DOI] [PubMed] [Google Scholar]
  45. Westö J, May PJ. Capturing contextual effects in spectro-temporal receptive fields. Hear Res 339: 195–210, 2016. doi: 10.1016/j.heares.2016.07.012. [DOI] [PubMed] [Google Scholar]
  46. Wiesenfeld K, Moss F. Stochastic resonance and the benefits of noise: from ice ages to crayfish and SQUIDs. Nature 373: 33–36, 1995. doi: 10.1038/373033a0. [DOI] [PubMed] [Google Scholar]
  47. Wolfart J, Debay D, Le Masson G, Destexhe A, Bal T. Synaptic background activity controls spike transfer from thalamus to cortex. Nat Neurosci 8: 1760–1767, 2005. doi: 10.1038/nn1591. [DOI] [PubMed] [Google Scholar]

Articles from Journal of Neurophysiology are provided here courtesy of American Physiological Society

RESOURCES