Abstract
Both mice and primates are used to model the human auditory system. The primate order possesses unique cortical specializations that govern auditory processing. Given the power of molecular and genetic tools available in the mouse model, it is essential to understand the similarities and differences in auditory cortical processing between mice and primates. To address this issue, we directly compared temporal encoding properties of neurons in the auditory cortex of awake mice and awake squirrel monkeys (SQMs). Stimuli were drawn from a sinusoidal amplitude modulation (SAM) paradigm, which has been used previously both to characterize temporal precision and to model the envelopes of natural sounds. Neural responses were analyzed with linear template-based decoders. In both species, spike timing information supported better modulation frequency discrimination than rate information, and multiunit responses generally supported more accurate discrimination than single-unit responses from the same site. However, cortical responses in SQMs supported better discrimination overall, reflecting superior temporal precision and greater rate modulation relative to the spontaneous baseline and suggesting that spiking activity in mouse cortex was less strictly regimented by incoming acoustic information. The quantitative differences we observed between SQM and mouse cortex support the idea that SQMs offer advantages for modeling precise responses to fast envelope dynamics relevant to human auditory processing. Nevertheless, our results indicate that cortical temporal processing is qualitatively similar in mice and SQMs and thus recommend the mouse model for mechanistic questions, such as development and circuit function, where its substantial methodological advantages can be exploited.
NEW & NOTEWORTHY To understand the advantages of different model organisms, it is necessary to directly compare sensory responses across species. Contrasting temporal processing in auditory cortex of awake squirrel monkeys and mice, with parametrically matched amplitude-modulated tone stimuli, reveals a similar role of timing information in stimulus encoding. However, disparities in response precision and strength suggest that anatomical and biophysical differences between squirrel monkeys and mice produce quantitative but not qualitative differences in processing strategy.
Keywords: amplitude modulation, auditory cortex, species comparison, temporal processing
INTRODUCTION
Humans have unique capabilities to understand and appreciate language and music that necessitate complex auditory processing at high temporal resolutions. Direct comparisons between species are necessary to provide a rigorous basis for understanding the specific advantages and disadvantages of different models of the human sensory systems that give rise to these abilities. Nonhuman primates are well placed as models of sophisticated auditory cortical processing because of their comparatively recent divergence from the human lineage (Meredith et al. 2011; Steiper and Young 2006) and similarities in anatomy (Hackett 2015), auditory physiology (Baumann et al. 2015), behavior (Song et al. 2016), and the sophistication of their vocal repertoire (Winter et al. 1966). However, mice offer many experimental advantages over primates, including access to larger numbers of animals, which facilitates experiments requiring multiple experimental groups, and the suite of powerful genetic tools for targeted parsing and manipulation of cortical circuits (Fenno et al. 2011; Huang and Zeng 2013). Primate neocortex differs from rodent neocortex in numerous ways, including the presence of additional neuron types, specializations in conserved neuron types, altered patterns of local and long-range connections, the presence of additional cytoarchitectonic areas (Rakic 2009), and increased neural density (Herculano-Houzel 2012). Identifying the facets of primate auditory cortical processing that are well modeled in mice (Izpisua Belmonte et al. 2015) will facilitate the interpretation of studies that apply the rich suite of tools for circuit dissection available in mice to questions of auditory cortical processing.
Amplitude modulation is a feature of natural sounds, including speech (Elliott and Theunissen 2009; Rosen 1992; Shannon et al. 1995), nonhuman primate vocalizations (Cohen et al. 2007; Winter et al. 1966), and mouse vocalizations (Holy and Guo 2005; Liu et al. 2003). Because they are simple in the modulation domain, sinusoidal amplitude-modulated (SAM) stimuli have been used to model the more complex amplitude modulation characteristic of natural sounds, including the vocalizations of many species, and to characterize the fidelity of temporal encoding at various stages of the auditory system (Joris et al. 2004; Malone and Schreiner 2010). Cortical responses to SAM stimuli have been recorded in multiple nonhuman primate species, such as rhesus macaques (Malone et al. 2007, 2010; Yin et al. 2011), squirrel monkeys (SQMs) (Bieser and Müller-Preuss 1996; Malone et al. 2013, 2015a), and marmosets (Gao et al. 2016; Liang et al. 2002). Cortical responses to SAM have also been studied in some rodent species, including anesthetized rats (Gaese and Ostwald 1995; Kilgard and Merzenich 1999), anesthetized gerbils (Schulze and Langner 1997), and, more recently, awake gerbils (Rosen et al. 2010; Sarro et al. 2011; Ter-Mikaelian et al. 2007). Results across these species suggest that the modulation frequencies of SAM stimuli are encoded by changes in both firing rate and the temporal organization of spiking patterns in auditory cortex (Gourévitch and Eggermont 2010; Joris et al. 2004). However, spike timing-based encoding of modulation frequency appears to extend to higher frequencies in primates than in rodents (see, e.g., Ter-Mikaelian et al. 2007 vs. Malone et al. 2013). Yet surprisingly little is known about the temporal processing capabilities of mouse auditory cortex, which limits the interpretation of studies that involve the encoding of natural or other complex sounds. We presented matched SAM stimuli to awake mice and awake SQMs and compared the cortical responses to these sounds. We estimated the information available in each species’ responses, using an algorithm to decode the modulation frequency of SAM stimuli from neural responses. This analysis revealed quantitative differences in representation quality and qualitative similarities in the way that information is encoded. Direct comparisons of cortical responses between two awake organisms with matched stimulus and analysis paradigms are rare; our study offers an uncommon insight into the similarities and differences between two powerful and ubiquitous auditory models.
METHODS
Surgical Preparation
All animal care and use was approved by the Institutional Animal Care and Use Committee of the University of California, San Francisco (UCSF) and followed National Institutes of Health guidelines.
Squirrel monkeys.
Two adult female SQMs were anesthetized with ketamine (25 mg/kg im) and midazolam (0.1 mg/kg) and maintained in a steady plane of anesthesia with isoflurane gas (0.5–5%) for placement of a custom head holder and recording chamber. Implants were secured to the skull with bone screws and dental acrylic. After implantation of the head holders and training to sit in the primate chair with their head fixed to a frame, the animals underwent a second surgery to implant a recording chamber over auditory cortex. The temporal muscle was resected, the cranium overlying auditory cortex was exposed, and a 10-mm-diameter ring was secured with bone screws and dental acrylic. Perioperative pain management included local application of bupivacaine, as well as buprenorphine (0.01–0.03 mg/kg) and meloxicam (0.3 mg/kg) as needed and in consultation with veterinary staff in the UCSF Laboratory Animal Resource Center. Sterile procedures were used to expose and record from auditory cortex. A 2- to 3-mm burr hole was drilled either with a dental drill mounted on a micromanipulator under magnification with a surgical microscope or with a hand drill. A small incision was made in the dura with microsurgical instruments after application of a drop of 1% lidocaine. After several recording sessions in a burr hole, another burr hole was drilled and the recording process was repeated. Burr holes were also sometimes enlarged or were connected by removing bone with fine surgical instruments after application of lidocaine as needed to expose additional areas of auditory cortex. After each recording session, the chamber was filled with antibiotic ointment and sealed with a metal cap.
Mice.
Mice were generated by crossing the Cre-dependent channelrhodopsin reporter line Ai32 (JAX strain 012569) to either Sst-Cre or Pvalb-Cre(2a) reporter lines (JAX strains 013044 and 012358) as in Phillips and Hasenstaub (2016) and Seybold et al (2015). Adult male or female mice (7–12 wk old) were anesthetized with isoflurane, and a custom metal headbar was implanted. Two to five days later, the mice were briefly reanesthetized and a craniotomy of ~1-mm diameter was drilled over the right auditory cortex, centered ~2.5 mm posterior to bregma and centered on the parietal-squamosal suture; after 1–3 h of recovery, mice were head-fixed on a floating spherical treadmill in a darkened sound-attenuated chamber, and electrophysiological recordings were then performed. Intermittent optogenetic stimulation (<20% duty cycle) usually preceded the collection of the mouse data described in this report, but the stimulus block containing the SAM tones analyzed in this report was always separated from the last optogenetic stimulation by several minutes of recovery time.
Electrophysiology
We recorded responses with translaminar multichannel probes (NeuroNexus) inserted into the auditory cortex of awake, passively listening, head-fixed animals. In mice, recordings were performed with 16-channel probes and 50-µm intersite spacing, roughly spanning the thickness of mouse auditory cortex. Responses were amplified and digitized continuously with a 16-channel recording system (RA16 Medusa preamplifier and RX-5 amplifier; Tucker-Davis Technologies) and recorded with a custom RPvdsEx (Tucker-Davis Technologies) routine. In the SQMs, recordings were performed with 16-channel probes and 100- or 150-µm site spacing. Responses were amplified and digitized continuously with a 16-channel recording system (RA16 Medusa preamplifier and RX-5 amplifier; Tucker-Davis Technologies) and recorded with Brainware software (Tucker-Davis Technologies).
Stimulus Presentation System
Squirrel monkeys.
Sounds were presented through a field speaker (Sony SS-MB150H) placed directly in front of the animal. The distance from the front of the speaker to the interaural line was 40 cm.
Mice.
Sounds were presented through a free-field speaker (ES1; Tucker-Davis Technologies) placed 30.5 cm from the animal’s left ear (contralateral to the recording site). Sound presentation was controlled by custom MATLAB software.
Stimulus Presentation: Unmodulated Tones
To identify the frequency preferences and monotonicity characteristics of each recording site, and to determine the appropriate SAM carrier frequency for each site, we presented a battery of pure tones at varying frequencies and sound pressure levels.
Squirrel monkeys.
For SQMs, the frequency range varied across experiments, ranging from 1 to 4 octaves (0.083- or 0.167-octave spacing) centered on the multiunit (MU) center frequency estimated from online feedback during each experiment. Tone intensity varied from 0 to 70 dB in 10-dB steps. Tone duration was typically 50 ms (5 repetitions at each tone/level combination), but longer tones (500 ms; 2 repetitions) were used in some penetrations. In others, we estimated the frequency response area (FRA) from responses to the maskers in a masker-probe stimulus paradigm in which the maskers were otherwise identical to the 50-ms tones presented in isolation. Spike or event counts were calculated for the duration of the tone pips (offset by 10 ms to account for response latency). Because firing rates preceding tone onset were not recorded, spontaneous rates were estimated from intervals distal to tone onset (140–190 ms or 440–490 ms after tone onset for recording sweep durations of 200 and 500 ms, respectively). For 500-ms tones, spontaneous rates were calculated from the interval 700 to 1,000 ms relative to tone onset. We visualized the peristimulus time histograms (PSTHs) to confirm that these intervals effectively excluded afterdischarges related to the tones. The interstimulus intervals (ISIs) were variable because of stimulus scheduling delays in software but were typically ~750 ms for the 50-ms tones and ~1,560 ms for the 500-ms tones.
Mice.
In mice, we presented a battery of 50-ms pure tones at 4–64 kHz with 0.2-octave spacing, with intensities from 30 to 60 dB in 5-dB steps. ISIs ranged from 700 to 1,200 ms. Spike counts were calculated for the duration of the tone pips, and spontaneous rates were calculated over an equivalent duration immediately preceding sound onset.
Stimulus Presentation: SAM Tones
Stimuli consisted of SAM tonal carriers frequency matched to the online estimates of the best spectral frequency (BF) of cortical sites as described below. The sinusoidal carrier tone at frequency fc was modulated at a lower frequency (fm) such that s(t) = [1 + M·cos(2πfmt + Φ)]·A·sin(2πfct), where s(t) is the signal, t refers to time, M is the depth of modulation (M = 1 for all stimuli, corresponding to 100% modulation depth), and A refers to amplitude of the signal. We presented 15 trials of SAM signals (100%-depth SAM), at moderate sound pressure levels (SPLs) of ~55–65 dB, modulated at 4, 8, 16, 32, 64, and 128 Hz. Intertrial intervals (that is, the period between the onset of one sound and the onset of the next) were variable and ranged from 3.2 to 3.8 s for the SQMs and from 2.8 to 3.3 s for the mice. In the SQM, because of the 1-s preceding masker and the 1-s SAM probe tone, the total sound duration on each trial was 2 s. In the mouse, each SAM tone was 2 s, so in the mouse the total sound duration on each trial was also 2 s.
Frequency Response Area Analyses
We analyzed responses to unmodulated pure tones in order to assess sound responsiveness and frequency and level preferences for each recording site. For analyses based on estimates derived from the FRA, we included only data where a permutation test (Malone et al. 2013) indicated that the FRA was significantly organized by frequency preference. The frequency tuning function (FTF) was computed as the columnwise sum of the FRA, reflecting responses to each tested frequency across all SPLs. We compared the across-frequency variance of the actual FTF to variances computed for simulated FTFs generated by random columnwise reassignment of the spike rates in the FRA. The simulated FTF was then calculated as the columnwise sum of the FRA, as for the data. Tuning was deemed significant for FRAs with a Bonferroni-Holm-corrected P < 0.05. The BF was defined as the location of the FTF peak. The minimum threshold at BF was computed as the minimum sound level (dB SPL) that exceeded the mean spontaneous rate by 3 SDs of the spontaneous rate distribution. We computed a tonal monotonicity index (MI) as the ratio of the response to the loudest BF tone to the largest response to any BF tone at any tested level. A value of 1 indicates that the loudest BF tone elicited the strongest response. Responses corresponding to a MI < 0.85 were considered to be nonmonotonic. We defined the bandwidth as the spectral extent (in octaves) of the FRA where responses exceeded the mean spontaneous rate by 3 SDs of the spontaneous rate distribution.
We calculated the onset response latency for each mouse recording site in order to report the proportions of mouse recording sites with primary- or secondary-like auditory responses. For each recording site, we estimated latency across all unmodulated tones by comparing the binned (1 ms) average firing rate during the stimuli to the baseline firing rate calculated across 50 ms before tone onset. We first smoothed the responses by summing the counts in each bin with the counts from the two preceding and two subsequent bins; we then estimated the latency of the smoothed response by identifying the first 1-ms bin with a firing rate that exceeded 5 SDs of the mean baseline firing rate.
The SAM data were collected as part of a series of experiments in both species, so in many cases more than an hour elapsed between collection of tonal and SAM responses. Because of this time difference in recording, we relied on MU estimates of frequency and level preferences rather than attempting to identify the same unit from both recordings over this span. We validated that MU estimates of BF provide a good approximation of single-unit (SU) BF, using data collected from independent cohorts of both mice and SQMs tested with the methods for surgical preparation, neural recordings, and stimulus presentation described here.
Inclusion Criteria
In the mice, MU responsiveness to tone pips or SAM tones was a precondition for inclusion in the database. MU responses to tone pips were required to exhibit 1) significant frequency tuning as described above, 2) a significant increase in the firing rate relative to the spontaneous rates distributed across the FRA (Wilcoxon rank sum; P < 0.05), or 3) significant vector strength (VS) (see below) to at least one tested modulation frequency or a significant decoding accuracy at least one tested bin size. More than 90% of recorded channels were included on the basis of significant frequency tuning for tones. In the SQM, primary auditory cortex is located on the surfaces of the temporal gyrus and in the supratemporal plane of the lateral sulcus and characterized by vigorous pure tone responses, short response latencies, and a tonotopic gradient in the rostrocaudal dimension (Cheung et al. 2001). No specific inclusion criteria were applied to the SQM data; all channels from all penetrations into auditory cortex were included in the database when appropriate SAM stimuli had been presented.
Spike Sorting
SU responses were isolated from MU events off-line with custom software in MATLAB.
Squirrel monkeys.
The SQM SU data in this report were previously published in Malone et al. (2013); SUs were isolated with a custom sorter (written in MATLAB) that projected the waveform snippets into principal component analysis (PCA) space, where they were manually clustered using the first two principal component axes. We evaluated the quality of the isolation using the incidence of low-ISI events.
Mice.
Data were clustered with custom MATLAB software (KFMM Autosorter, written by Matthew Fellows). Waveforms were projected into PCA space, where the numbers of clusters were manually determined using the first two principal component axes; waveforms were assigned to clusters using k-means. We evaluated the quality of the isolation using the incidence of low-ISI events. All mouse SUs were isolated from sound-responsive MUs (as described above); no inclusion criterion was applied to the SQM SUs. For analyses in which SU and MU activity from the same channel were compared, we included only MUs for which there was a corresponding SU.
Waveform Classification
To classify action potentials from putative regular- or fast-spiking cells, we analyzed the size and duration of the mean waveform for each SU. We computed the spike duration as the interval between the peak and trough of the mean spike waveform and the peak-to-trough ratio as the absolute value of the ratio between the size of the peak of the action potential and the size of the afterhyperpolarization. For both species, we used k-means clustering to identify two clusters of cells in the spike duration vs. peak-to-trough ratio plane. We designated the cluster with shorter duration spikes as putative fast-spiking cells and the other cluster as putative regular-spiking cells.
Onset Exclusion
SAM stimuli presented to SQMs were preceded by a 1-s duration masker (a SAM tone modulated at 4 Hz). Previous work (Malone et al. 2015a) on this data set demonstrated that this masker had minimal effects on the subsequent responses, which were analyzed for this study. Specifically, the 4-Hz modulated masker did not significantly change firing rates or VS values (a measure of how well a neuron entrains to the modulation) for modulated tones following the masker relative to an unmodulated masker (Malone et al. 2015a). The dominant effect of the preceding 4-Hz masker is to dampen the responses at the onset of the probe tone that are typically evident at the onset of an unmasked tone. To limit effects from prior presentation of the masking stimulus, we analyzed only the interval 250–1,000 ms after SAM onset in both mouse and SQM. This choice of the analysis window did not significantly impact the comparisons across species (see results).
Spike Train Decoding
To quantify how effectively cortical responses to different modulation frequencies could be discriminated, we used a set of PSTH-based pattern classifiers (Foffani and Moxon 2004) described in detail in prior reports (Malone et al. 2007, 2010, 2013, 2014, 2015a, 2015b). Spike trains obtained in each single trial (the “test” spike train) are compared against templates reflecting the average response to each distinct stimulus in the set (i.e., SAM at 4, 8, 16, 32, 64, or 128 Hz), excluding the test train itself. The test spike train and each response template are represented as vectors of binned spike counts, and the similarity between the test spike train and each template is defined as the distance (the Euclidean norm) between those vectors (i.e., the square root of the sum of the squared differences between each element of the vector). The response template that minimizes the distance to the test spike train is identified, and the stimulus associated with that template is hypothesized to be the stimulus that also elicited the test spike train. If that stimulus actually elicited the test spike train, then the trial is considered to have been decoded accurately. Decoding accuracy is computed as the fraction of spike trains that were correctly assigned to the stimuli that elicited them. Note that the test spike train is never included in the average used to define the response template for the stimulus that elicited it (e.g., if the test spike train occurred in response to 4-Hz SAM, the response template for 4 Hz represents the average of every other response to 4 Hz). Otherwise, all recorded spike trains are provided to the decoder for the purposes of classification (complete cross-validation).
Rate-Only, Timing-Only, and Combined Rate and Timing Decoders
Because cortical neurons vary in their temporal precision and firing rates, we computed performance of spike train decoders across a range of bin sizes (2.5, 5, 7.5, 10, 15, 25, 50, 75, 150, and 750 ms). Using binning resolutions that are too narrow or too wide underestimates the information provided about the stimuli by cortical spiking patterns. Unless otherwise noted, we report decoding accuracy at the bin size that maximized decoding accuracy (i.e., “optimal” bin size). In the limit where the width of a single bin is equal to the analysis interval (750 ms for these experiments), spike train classification relies entirely on the average firing rate information. We refer to this as the rate-only decoder. Alternatively, it is possible to remove differences in average firing rates across stimuli while retaining information about differences in how spikes are distributed in time by normalizing each test and each template by its respective vector norm. This normalization process maps all responses to an equivalent distance from the origin in the response space (a Euclidean n-space where n is the number of bins used to generate the response vectors). We refer to this as the timing-only decoder. The combined rate and timing decoder operates on the spike trains without normalization.
We assigned significance to decoding accuracy by comparing the performance of the combined rate and timing decoder at the optimal bin width (which could be the 750-ms bin that defines the rate decoder) to the distribution of accuracy values obtained via Monte Carlo simulation. We simulated accuracy values by randomly assigning 15 repetitions of each of the six tested modulation frequencies to one of those six modulation frequencies and counting the number of correctly assigned trials for each simulation. Because selecting the optimal bin size requires taking the maximum of 10 values, we iteratively chose the maximum result from 10 random draws of simulated decoding accuracy values to construct the expected distribution at chance. P values are assigned by taking the ratio of simulated accuracies that exceed the actual decoding accuracy and the total number of simulated values (n = 100,000). By this metric, an accuracy of 0.284 corresponds to decoding significantly better than chance for α < 0.01. For a single bin size (e.g., the rate decoder), the corresponding value is 0.25. Because of the binning inherent to the decoders, two spikes that occur at very similar times relative to stimulus onset are treated as occurring at different times when they fall within different bins in the test and template vectors. We stabilized the estimates with respect to these bin-edge effects by shifting the analysis interval by up to 9 ms in 1-ms steps (i.e., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9) and taking the average of the 10 resulting estimates of decoding accuracy.
Quantification of Response Synchrony
VS (Goldberg and Brown 1969) was defined as VS = (1/n) · Σ[cos(2π · fm · ti)2 + sin(2π · fm · ti)2]0.5, where ti is the time of occurrence of the ith spike, n is the total number of spikes, and fm is the modulation frequency. Significant synchrony was defined as a Rayleigh statistic (2 · VS2 · n) exceeding 13.816 (P < 0.001; Mardia and Jupp 1999). The synchronization limit was computed by linearly interpolating the Rayleigh statistic between the highest significant VS value and the next highest tested modulation frequency. The synchronization limit was defined as the modulation frequency where the interpolated line intercepted the Rayleigh criterion value (i.e., 13.816).
Quantification of Response Reliability
Decoding accuracy depends on the reliability of neural responses across repeated trials and the diversity of spiking patterns across different stimuli. We calculated trial similarity (TS) (Malone et al. 2007) as an index of the intertrial reliability of responses to each tested modulation frequency. For each unit, we randomly divided the 15 trials into two groups (of 7 and 8 trials), binned the responses at 5-ms resolution for each group of trials, and calculated the Pearson’s correlation coefficient between the two PSTHs. We repeated this process 100 times; the TS value is the mean of the correlation coefficients generated by each repetition. We determined whether each unit had significant TS by using Monte Carlo methods to estimate the probability that a TS value from a null distribution would be larger than the TS value observed for each cell. We created the null distribution by calculating TS values for 10,000 simulations of uniformly random firing at a range of firing rates from 2.2 Hz to 100 Hz in 2.2-Hz steps. Each neuron was compared to the null distribution generated with the simulated firing rate closest to the observed firing rate. P values obtained from the Monte Carlo method were corrected for multiple comparisons with the Bonferroni-Holm method; the criterion for significant TS was a corrected P value < 0.05.
Calculation of SAM Firing Rate Responses
We estimated spontaneous firing rates by quantifying spikes occurring in the 100 ms before the onset of sound on each trial (i.e., before the 4-Hz masker in the SQM and before the tested SAM tone in the mouse). We calculated evoked firing rates by averaging the spike counts obtained during each stimulus presentation across all trials. Maximum evoked rates are the trial-averaged firing rates at the modulation frequency that elicited the highest rate; mean evoked rates are the mean of the trial-averaged firing rates across all modulation frequencies. To quantify the extent to which evoked firing rates change relative to spontaneous rates, we calculated a rate modulation index (RMI): .
Prior Publication of Data Sets
The SQM data set has been reported in Malone et al. (2015a) and Malone et al. (2013). The mouse data set presented here has not been previously published.
All analysis was performed in MATLAB. Unless otherwise noted, we used nonparametric (Wilcoxon) rank sum tests to compare unpaired continuous distributions, nonparametric (Wilcoxon) signed-rank tests for paired distributions, and Pearson correlation coefficients to measure linear correlations. Population figures used all units reported in results unless otherwise noted in figure legends.
RESULTS
Summary of Data Sample
Data in this report were collected from auditory cortex in awake mice and awake SQMs. In mice, 57 SUs were recorded from 16 penetrations into the auditory cortex (right hemisphere) in 10 animals. Seven of the ten mice were female; three were male. Both SQMs used in these experiments were adult females. In the SQMs, 479 SUs were recorded from 20 total penetrations, 7 in the right hemisphere of one monkey (n = 167 SUs), 3 in the right hemisphere of the other monkey (n = 58 SUs), and 10 in the left hemisphere of that monkey (n = 254 SUs). In mice, 5 SUs were recorded from the top five channels of the linear probes (~0–200 μm from pia), 27 SUs were recorded from the middle six channels (~200–500 μm from pia), and 25 SUs were recorded from the bottom five channels (~500–750 μm from pia). SQM recording depth could not always be estimated effectively because the angle of the probe relative to the cortical surface varied across penetrations because of physical constraints on access within the chronically implanted recording chambers. We classified each SU as a putative regular- or fast-spiking cell based on mean waveform (see methods); 16 of 479 SQM SUs were putative fast-spiking cells, and 13 of 57 mouse SUs were putative fast-spiking cells.
We used MU recordings to estimate frequency response areas to unmodulated tones in both species (see methods). MU response properties recorded on the same channels provide reasonable approximations of the response properties of SUs: MU BF is similarly predictive of SU BF in both species (mouse: r = 0.76, P < 10−62, n = 323; monkey: r = 0.79, P < 10−99, n = 447). The median SQM MU tone threshold was 20 dB, and the median mouse threshold was 30 dB. The rate transfer functions were not uniformly monotonic: 60.6% of SQM sites and 35.1% of mouse sites had nonmonotonic rate transfer functions. The median SQM MU BF was 3.6 kHz, and the median mouse BF was 21.1 kHz. In the SQM, the median mismatch between the carrier frequency and the BF, calculated as |log2(BF/carrier frequency)|, was 0.3 octaves; in the mouse, the median mismatch was 0.2 octaves (P < 10−4). In the mouse, nearly all (96.5%) MUs responded to pure tones within 15 ms, indicating that the vast majority of mouse responses were recorded from primary areas (AI or AAF; Linden et al. 2003). In the SQM, online estimates of response latency and evidence of tonotopy were used to identify recording sites in AI (see methods).
Linear Decoders Estimate Spike Rate and Spike Timing Information Available in Neural Responses
We used linear decoders (see methods) to quantify differences in the cortical representation of SAM signals in SQMs and mice. Figure 1 illustrates the decoding process with three decoder types: a rate-only decoder (Fig. 1C), a timing-only decoder (Fig. 1, D and E), and a combined rate and timing decoder (Fig. 1, F and G). In each case, the spike train from each test trial is compared to trial-averaged, binned response templates (Fig. 1, C–G). The decoder identifies the template most similar (i.e., nearest by Euclidean distance) to the binned spike train (Fig. 1, C–G). The trial is included in the count of accurately decoded trials when the template matches the actual modulation frequency that generated the response.
Fig. 1.
Simple linear decoders were used to determine how effectively the modulation frequency of sinusoidal amplitude modulation (SAM) can be discriminated on the basis of cortical spiking patterns represented at different temporal resolutions. A: gray curves illustrate the amplitude envelopes for SAM at 4, 8, 16, 32, 64, and 128 Hz. B: event raster of a single unit’s responses to SAM tones recorded from the auditory cortex of a mouse. A single trial to be decoded is highlighted in magenta (see methods). The best frequency of this site was 24.25 kHz; the carrier frequency of the SAM tones was 10 kHz. C: rate-only decoder: the spike count from a single trial (magenta) is compared with the trial-averaged counts across all trials of each stimulus (cyan). The trial to be decoded is excluded from the templates. The template that best matches the trial is indicated with an asterisk. D and E: timing-only decoder: the spikes from a single trial (magenta bars and dots at top) are binned and compared with rate-normalized trial-averaged binned responses at 2 time resolutions: 10 ms (D) and 50 ms (E). The trial to be decoded is excluded from the templates. The template that best matches the trial is indicated with an asterisk. Single-trial PSTHs are scaled for graphical convenience. F and G: combined decoder: the spikes from a single trial (magenta bars and dots at top) are binned and compared with unnormalized trial-averaged binned responses for 10-ms bins (F) and 50-ms bins (G). The trial to be decoded is excluded from the templates. The template that best matches the trial is indicated with an asterisk. Single-trial PSTHs are scaled for graphical convenience. H: decoding accuracy is defined as the number of trials matched correctly divided by the total number of trials; accuracies are plotted by time resolution below each decoder. Optimal bin size is defined as the bin size at which decoding accuracy is highest (magenta circles).
Decoding accuracy is defined as the fraction of correctly assigned trials and is computed at different temporal resolutions to identify the bin size that generates the highest decoding accuracy (i.e., the “optimal” bin size; Fig. 1H). The decoding accuracy provides an estimate of stimulus information that the neural response could supply to the animal; whether and how this information is used to generate percepts or guide behavior will depend on how the information is extracted, combined, or transformed at subsequent stages of processing. By comparing decoding accuracy in mouse vs. SQM units, we can broadly assess how much information auditory cortical neurons may provide about these stimuli in the two species. By comparing decoding accuracy based exclusively on spike rate information, spike timing information, and both types of information, we can assess whether mouse and SQM responses include similar information. Finally, by comparing the optimal temporal resolutions for decoding, we can evaluate the temporal precision of responses in the two species.
Patterns in Relative Decoding Accuracy Across Decoder Types Were Similar Across Species
Decoding accuracy was significantly higher, and a greater proportion of units significantly exceeded chance performance, in SQMs compared with mice when only the average firing rates were available to the decoding algorithm for both the analysis window 250–1,000 ms after sound onset (Fig. 2A, a–c) and the analysis window 0–750 ms after sound onset (not shown). Overall, however, the rate decoder performed relatively poorly for both species: mean decoding accuracy was below 50% for nearly all units (Fig. 2Aa) for both analysis windows. By contrast, timing-only decoders supported better decoding; indeed, in a sizable minority of SQM neurons, the majority of trials were accurately decoded (accuracy > 50%) for bins ≤ 25 ms (Fig. 2Ad). Such high accuracies were rarely observed for mouse SUs. Plotting the data as population means (Fig. 2Ae) reveals that SQM units performed significantly better than mouse units for bin sizes from 2.5 to 25 ms for both analysis windows (0–750 ms analysis not shown), resulting in a greater proportion of SUs that significantly exceeded chance performance (Fig. 2Af) for both analysis windows.
Fig. 2.
Modulation frequency was more accurately decoded from squirrel monkey (SQM) than mouse cortical responses at fine temporal resolutions, although timing information contributes more to decoding than rate; multiunit (MU) responses tend to decode more accurately than single-unit (SU) responses in both organisms. A and D, a: decoding accuracy is plotted by bin size for SQM and mouse SUs (A) and MUs (D). Points that do not exceed the threshold for significance (see methods) are shown in gray. b: Mean (solid line) decoding accuracies ± 2 SE (shaded area) are plotted by bin size for SQM and mouse SUs and MUs. c: % of SUs and MUs with accuracies exceeding the threshold for significance are plotted by bin size for SQM and mouse. B and E, a and b: timing-only decoder accuracy is plotted against rate-only accuracy for SQM and mouse SUs (B) and MUs (E) for bin sizes of 5 ms (a) and 25 ms (b). Gray lines indicate significance thresholds. c: Box and whisker plots show differences between timing-only and rate-only accuracy for SUs and MUs for both SQM and mouse. C and F, a: histograms show the distributions of optimal bin sizes for SQM and mouse SUs (SQM: n = 271; mouse: n = 19) and MUs (SQM: n = 191; mouse: n = 155) that produced accuracies exceeding the threshold for significance (for at least 1 bin size). b: Accuracy at optimal bin size is plotted by bin size for SQM and mouse SUs and MUs. Only significant values are shown. c: Histograms show the distributions of accuracies for significantly decoding SQM and mouse units. Desaturated bars at bottom show % of all units that are insignificant (n.s.) at all bin sizes. G, a: rate-only decoding accuracy for each MU is plotted against rate-only decoding accuracy for each SU isolated on that recording channel for SQMs and mice. b: Timing-only decoding accuracy for each MU is plotted analogously (as in a) against SU timing-only decoding accuracy. c: Combined decoding accuracy for each MU is plotted against combined decoding accuracy for each SU. d: Boxplots illustrate differences (Δ) between MU and SU decoding accuracies for the decoders for SQMs and mice. *P < 0.05, **P < 0.005, ***P < 0.0005.
This pattern was similar for decoders based on combined rate and timing information for both the late analysis window (Fig. 2A, g–i) and the early analysis window (not shown). Because the patterns of results were consistent across both analysis windows, we used the 250–1,000 ms window in subsequent analyses of SU data. Combined decoders outperformed rate-only decoders for both SQMs and mice, although—unlike timing decoders—performance was worse for the narrowest bins compared with the broadest bins. (The reduction in performance for narrow bins occurs because fluctuations in spike counts in narrow bins effectively add noise to the response representation; this noise is reduced by the normalization that defines the timing-only decoder.) Combined decoder performance was not significantly different between putative regular- and fast-spiking cells for either the SQM or the mouse (P > 0.24 in both cases).
Comparison of Fig. 2, Ab and Ae, suggests that timing-only decoders outperform rate-only decoders in both species. Figure 2B illustrates the direct comparison for bin sizes of 5 and 25 ms, chosen to represent both relatively fine and coarse resolutions. At both resolutions, and in both species, timing-only decoders are more accurate than rate-only decoders, indicating that spike timing captures more information about sound envelopes than average firing rate in both mice and SQMs. Collectively, this pattern of results demonstrates that the relative effectiveness of spike timing and spike rate is similar in both mice and SQMs, despite the significant advantage in decoding accuracy observed for SQM neurons for all decoder types.
Temporal Precision of SAM Encoding Is Higher in SQM than in Mouse
Cortical responses in mice and SQMs differ in how the optimal temporal resolution (bin size) for decoding is distributed (Fig. 2, C and F). The distribution of optimal bin sizes in mice is shifted to the right of the SQM distribution; bin sizes smaller than 10 ms were never optimal for mouse SUs, whereas 7.5 ms was the most common optimal bin size for SQM SUs (Fig. 2Ca). The spike train decoders used for these analyses only benefit from precise binning of spike times when the spiking patterns themselves are at least as precise as the binning resolution. Otherwise, the response to a given stimulus feature (e.g., the rising phase of the modulation envelope for SAM tones) will not consistently occur within the same bin across trials, which registers as a dissimilar response to the decoder. Thus the difference in optimal bin sizes between the mouse and the SQM suggests that SQM auditory cortical neurons encode envelope modulations more precisely than those in the mouse.
We repeated the analyses illustrated in Fig. 2, A–C, for MU data (Fig. 2, D–F). Mouse MUs achieved higher decoding accuracies than were typically observed for mouse SUs (compare Fig. 2, A and D). However, the improvements in decoding accuracy for the SQM MUs relative to SUs were also substantial. As a result, decoding accuracy for SQM MUs significantly exceeded that of mouse MUs for all decoders and for analysis windows from 250 to 1,000 ms and from 0 to 750 ms after sound onset (not shown). Because the choice of analysis window did not affect the findings, we used the 250–1,000 ms analysis window for all subsequent analyses of MU data. MUs generally decoded more accurately than SUs obtained on the same recording channel in both species (Fig. 2G).
When we parsed the contributions of rate-only and timing-only information for MUs (Fig. 2E), we observed patterns qualitatively similar to those observed for single neurons: timing-only decoding yielded higher accuracies than rate-only decoding. For both organisms, the distributions of optimal bin sizes are shifted toward smaller bin sizes for MUs compared with SUs. This finding indicates that the benefit from averaging over neurons offsets the cost of averaging over narrower intervals of time for both species, despite species differences in the optimal bin size.
The observation that very high decoding accuracy requires temporally precise binning of spike trains in SQMs (Fig. 2, C and F) suggests that entrainment to high modulation rates may be more precise in the SQM and that, across modulation rates, SQM responses might demonstrate less intertrial variation. We measured entrainment by calculating the VS (see methods), a metric based on the concentration of spike occurrences near a single phase of the modulation cycle. We measured intertrial reliability by calculating the TS (see methods), a metric based on the correlation between trials.
Figure 3 illustrates the relationships between VS and TS and modulation frequency. Representative SQM SUs and MUs (Fig. 3, A and C) exhibit statistically significant VSs (see methods) from 4 to 128 Hz, compared with 4 to 32 Hz for the example mouse SUs and MUs (Fig. 3, B and D). Analogously, TS values are significant for a more restricted range of modulation frequencies in the mouse examples. The population data demonstrate that larger fractions of SQM SUs and MUs exhibited significant VS and TS for modulation frequencies at and above 8 Hz (Fig. 3, E–H) The differences between VSs and TSs in SQM and mouse neurons were particularly striking for higher modulation frequencies.
Fig. 3.
SAM responses from squirrel monkeys (SQMs) are more precisely entrained and more reliable at higher modulation frequencies than responses from mice. A–D, a: event raster from a representative SQM single unit (SU; A), SQM multiunit (MU) from the same channel (C), mouse SU (B), and mouse MU from the same channel (D). Trials in color are included in the binned counts shown in color in c; trials in black are included in the binned counts shown in black. In A and B, insets show the mean ± SE waveform of the unit. b: Phase histograms are generated by folding response PSTHs on the modulation period of the sound. c: PSTHs generated with 8 randomly selected trials (color) are correlated with PSTHs generated using the other 7 trials (black). d, Top: vector strength (VS) is plotted as a function of modulation frequency. Filled circles indicate significant VS (Rayleigh statistic > 13.816, corresponds to P < 0.001). Bottom: trial similarity (TS), the mean correlation between PSTHs, is plotted as a function of modulation frequency. Filled circles indicated significant TS (P < 0.05, Monte Carlo simulated null distribution). E and F, a: % of SUs (E) and MUs (F) recorded from SQMs and mice that have significant VS at each modulation frequency. b: Significant VS values for each SU (E) and MU (F) for SQMs and mice at each modulation frequency. G and H, a: % of SUs (G) and MUs (H) recorded from SQMs and mice that have significant TS at each modulation frequency. b: Significant TS values for each SU (G) and MU (H) for SQMs and mice at each modulation frequency.
Overall, significant entrainment to at least one modulation frequency above 16 Hz was robust in SQM cortex but rare in mouse cortex (SUs: 49% vs. 7%; MUs: 82% vs. 29%, P < 10−10 for both comparisons; Fisher’s exact test), suggesting that, across the population, precise entrainment could contribute to higher decoding accuracies observed in SQM units. On a unit-by-unit basis, there were concomitant differences in the modulation transfer functions based on VS (vsMTFs). The median synchrony cutoff frequency for vsMTFs differed significantly across species in both SUs (SQM: 124.0 Hz; mouse: 5.9 Hz; P < 10−27) and MUs (monkey: 124.9 Hz; mouse: 26.2 Hz; P < 10−37). In addition, SQM vsMTFs were significantly less likely to be low pass for both SUs (40.5% vs. 87.7%; P < 10−11; Fisher’s exact test) and MUs (34.55% vs. 71.5%; P < 10−16; Fisher’s exact test). Thus entrainment to SAM sounds is restricted to the set of lower modulation frequencies in the mouse, whereas the SQM demonstrates greater and more selective entrainment.
Significant TS for modulation frequencies above 16 Hz was also more prevalent in SQM cortex than mouse cortex (SUs: 14% vs. 2%, P < 0.01; MUs: 68% vs. 7%, P < 10−18; Fisher’s exact test). Mouse cortical responses to high modulation frequencies are more temporally diffuse, as indicated by the reduction in significant entrainment; furthermore, the reduced TS at higher modulation frequencies suggests that the ability to produce a consistent response to those stimuli is reduced in the mouse. Thus both reduced precision and reliability contribute to the lower decoding accuracies observed in the mouse.
Presentation of SAM Tones Elicited Greater Rate Modulation in Monkey Cortex
Part of the decoding accuracy advantage observed for SQM neurons may be explained by differences in the overall magnitude of stimulus-driven activity (Fig. 4). In a mouse SU and MU showing comparatively large rate modulations, evoked and spontaneous rates were nonetheless similar to each other (Fig. 4, A and B). To quantify this phenomenon across the population, we calculated the RMI, defined as the difference between the mean spontaneous firing rates and the mean stimulus-evoked rates, divided by their sum (see methods). Relative to the mouse, both the SQM SU and MU populations contained greater proportions of units with RMI > 0.3, corresponding to a twofold increase in firing rate (SUs: 31% vs. 2%, P < 0.0001; MUs: 35% vs. 1%, P < 0.0001, χ2-test of proportions; Fig. 4, E–H). As illustrated in the scatterplots in Fig. 4, E and F, SUs and MUs with high decoding accuracies largely overlapped with SUs and MUs with high RMIs, and such neurons were more common in SQM than mouse cortex. Furthermore, the differences between the maximum and minimum evoked firing rate (rate span) were predictive of decoding accuracy for mouse and SQM MUs (mouse: R = 0.48, P < 10−15; SQM: R = 0.65, P < 10−30, Pearson’s correlation) and SUs (mouse: R = 0.57, P < 10−5; SQM: R = 0.71, P < 10−75, Pearson’s correlation; Fig. 4, G and H). These observations indicate that these differences in rate modulation complement differences in timing precision to support overall higher decoding accuracy in SQM neurons.
Fig. 4.
Evoked firing rate is more modulated above baseline in squirrel monkey (SQM) than mouse single units (SUs) and multiunits (MUs), and cells with large rate modulations are coextensive with high decoding accuracy; the range of evoked firing rates is predictive of decoding accuracy in both organisms. A and B, a: event raster from a representative mouse SU (A) and mouse MU from the same channel (B). b: Mean ± SE evoked firing rate (FR) as a function of modulation frequency (dark solid/shaded lines), spontaneous firing rate (dashed lines) calculated from the 100 ms before sound onset, and mean evoked firing rate across frequency (bright solid lines). RMI, rate modulation index. C and D, a: marginal histograms show the distributions of spontaneous firing rates for SQM and mouse SUs (E) and MUs (F). b: Average evoked firing rate vs. spontaneous firing rate for SQM and mouse SUs and MUs. c: Vertical marginal histograms show the distributions of average evoked firing rates. E and F, a: marginal histograms show the distributions of RMI for SQM and mouse SUs and MUs. An RMI of 0 indicates no difference between evoked and spontaneous activity; an RMI of 1 indicates evoked activity but no spontaneous activity; an RMI of −1 indicates a cell that completely suppresses spontaneous activity in response to sound. b: Combined decoding accuracy at optimal bin size is plotted against RMI for SQM and mouse SUs and MUs. Vertical dashed lines at 0.3 and −0.3 indicate RMI values corresponding to a 2-fold increase in mean evoked firing rate relative to baseline. c: Vertical marginal histograms show the distributions of decoding accuracies for SQM and mouse MUs and SUs. G and H, a: marginal histograms show the distributions of rate spans (the difference between the max and min evoked firing rates) for SQM and mouse SUs and MUs. b: Rate span is plotted against combined decoding accuracy at optimal bin size for SQM and mouse SUs and MUs. c: Vertical marginal histograms show the distributions of decoding accuracies for SQM and mouse MUs and SUs.
DISCUSSION
We found that the cortical representation of SAM differed between mice and SQMs in two main ways. First, the temporal precision and overall reliability of SQM cortical responses exceeded those of the mouse. Second, differences between evoked and spontaneous activity were significantly larger in the SQM. The combination of these factors contributes to the superior decoding accuracy observed in SQMs. Nevertheless, the greater dependence of decoding accuracy on timing rather than rate information was consistent in both mice and SQMs, suggesting that they encode information about stimulus envelopes in fundamentally similar ways.
Greater temporal precision in SQM cortex appears to explain why including spike timing information improved the decoding of modulation frequency information more in SQMs than in mice and to account for the higher synchronization cutoffs of SQM neurons (Fig. 3). Known differences between primate and rodent physiology, including relatively faster passive membrane properties and active channel kinetics (Testa-Silva et al. 2014), faster-firing excitatory cells (Vigneswaran et al. 2011), and synaptic specializations supporting rapid sustained synaptic activity (Molnár et al. 2016) in primates, may explain the differences we report here (Hasenstaub et al. 2016; Otte et al. 2010).
It has been hypothesized that auditory cortex participates in transforming a temporal code for amplitude modulation present in subcortical structures to a rate code in higher processing centers (Lu and Wang 2004; Wang 2007). Consistent with this hypothesis, SAM responses in mouse auditory cortex show diminished temporal acuity compared with inferior colliculus, where a majority of neurons entrain to modulation frequencies of at least 100 Hz (Walton et al. 2002). However, at the level of the auditory cortex, timing information still supports more accurate decoding than rate information in both species, suggesting that auditory cortex is an intermediate point in the temporal code-to-rate code transition (Malone et al. 2007, 2013; Yin et al. 2011).
It is unlikely that differences in the frequency and level preferences of mouse and SQM auditory cortical neurons explain the disparity in decoder accuracy we observed. SQM recording sites had lower best levels and more commonly exhibited nonmonotonic rate-level functions. However, best level was not significantly predictive of decoding accuracy in either species (see Table 1). The lack of a significant relationship comports with the fact that fully modulated SAMs pass through every intensity level between silence and the maximum amplitude. As a result, the SAM stimuli will modulate through the best levels of all neurons whose best levels are less than the maximum amplitude. In addition, mouse sites were tuned to much higher frequencies than SQM sites. However, because BF was not strongly predictive of decoding accuracy in either species (Table 1), the difference in frequency tuning across species is unlikely to explain the difference in decoding accuracy across species. Moreover, the mismatch between SAM carrier frequency and BF was not significantly predictive of decoding accuracy in either species (Table 1). The fact that spectral bandwidths were typically narrower for SQM sites implies that a given BF-carrier frequency mismatch would have a greater adverse effect on SAM encoding in SQM cortex. Thus it is more likely that we have underestimated rather than overestimated differences in SAM decoding accuracy between the mouse and the SQM.
Table 1.
Relationships between decoding accuracy and metrics used to estimate frequency and level preferences
Variable | Species | Rate | Timing | Combined |
---|---|---|---|---|
BF | Monkey | −0.28 (10−3) | −0.33 (10−5) | −0.34 (10−5) |
Mouse | −0.12 (0.09) | 0.04 (0.57) | −0.07 (0.34) | |
|BF – Carrier| | Monkey | −0.22 (0.01) | −0.30 (10−4) | −0.29 (10−3) |
Mouse | −0.26 (10−3) | −0.23 (10−3) | −0.31 (10−5) | |
Bandwidth at 60 dB SPL | Monkey | −0.01 (0.95) | −0.08 (0.33) | −0.09 (0.27) |
Mouse | −0.14 (0.05) | 0.16 (0.03) | 0.07 (0.30) | |
Threshold | Monkey | 0.27 (10−3) | 0.42 (10−7) | 0.43 (10−7) |
Mouse | 0.14 (0.05) | 0.06 (0.39) | 0.09 (0.17) | |
Monotonicity index | Monkey | 0.04 (0.60) | −0.10 (0.22) | −0.08 (0.30) |
Mouse | −0.08 (0.24) | −0.06 (0.37) | 0.00 (0.99) | |
Best level | Monkey | −0.02 (0.76) | 0.12 (0.14) | 0.10 (0.19) |
Mouse | 0.15 (0.04) | 0.19 (0.02) | 0.16 (0.03) |
Relationships between decoding accuracy and various measurements of the tonal responses used to estimate frequency and level preferences. Each entry corresponds to the R value and P value (in parentheses) of the Pearson correlation between the metric indicated at left and the decoder indicated at top for mouse and SQM data. All results were calculated with MU responses. |BF – Carrier| denotes the mismatch between the best and carrier frequencies, calculated by |log2(BF/carrier)|.
The extent to which mice make use of the neural responses we observed depends on how those responses are incorporated into population codes. We do not know how well mice are able to detect and discriminate among modulation frequencies because of the absence of any reports on behavioral tasks involving SAM stimuli. It is possible that different population coding schemes could compensate for the differences in single-neuron encoding quality. For example, especially efficient integration of information across large numbers of neurons could offset the inferior decoding accuracy of individual neurons. However, some population coding schemes could also exacerbate the disparity between the two organisms; the greater prevalence of primate cortical neurons with very high decoding accuracies may have an outsized importance in population representations of sound envelopes if the majority of sensory information can be encoded by small populations of precisely entrained neurons (Ince et al. 2013). However, the observation that, in both species, MUs produce higher decoding accuracies than SUs at the same recording site suggests that population coding based on small numbers of precise neurons may be no more effective than population coding based on profligate local averaging.
It is not necessarily the case that the lower temporal acuity observed in mouse neurons translates to less perceptual acuity, even without compensatory population coding schemes. While vocal niche and auditory processing capabilities must influence each other in both mice and SQMs, it is unlikely that the demands of vocal processing wholly determine the maximum temporal acuity of a primate auditory cortical neuron. The mouse’s 10-Hz syllable production rate (Liu et al. 2003) is within the range of frequencies to which we have demonstrated that mouse auditory cortical neurons can entrain, suggesting that the speed of temporal processing may be sufficient for the demands of the mouse’s vocal niche. Although peak temporal modulations up to 35 Hz have been reported in macaque (Cohen et al. 2007) and marmoset (DiMattina and Wang 2006) vocal repertoires, more prominent temporal modulations are at lower frequencies in SQMs (12.4 Hz; Bieser 1998), marmosets (7.7 Hz; Nagarajan et al. 2002), and macaques (<20 Hz; Cohen et al. 2007). Thus, in common nonhuman primate models, the vocal repertoire does not necessitate processing of periodic amplitude modulations much faster than 30 Hz. Furthermore, macaques engaged in an amplitude modulation depth discrimination task do not perform as well as their best auditory cortical neurons do (Johnson et al. 2012). This finding is consistent with the idea that the ability of some primate cells to precisely entrain to modulation frequencies outside the modulation spectrum of their vocal repertoires might not be an adaptation specific to the processing of communication sounds. Rather, this ability might be the by-product of cortical biophysical specializations unrelated to auditory processing that mice do not share.
Humans do appear to make perceptual use of fast amplitude modulations to discriminate speech and musical timbre (Handel 1995) and the sound quality of “roughness” (Zwicker and Fastl 1990); our results suggest that the SQM can be used to model the physiology that supports such abilities. Given that we observed reliable entrainment of mouse cortical neurons to the modulation frequencies most relevant to rhythm and syllabic production rates, we contend that mice can be used to study the processing of more widely conserved components of communication sounds. Although transgenic marmosets are in development (MacDougall et al. 2016), the mouse model is likely to remain pervasive because of the wide variety of genetic manipulations that are already available and its suitability for high-throughput testing. The experimental access the mouse affords makes it essential for developing hypotheses that can be tested in a more restricted set of marmoset experiments. Finally, we emphasize that the purpose of the comparisons we report is not to argue for the superiority of a particular experimental model. Rather, we propose that comparative studies of this nature demonstrate the importance of using multiple experimental models in parallel. Such studies clarify how neuroanatomical, cellular, and genetic differences manifest as differences in neurophysiological response properties across species.
GRANTS
This work was supported by the Kavli Institute for Fundamental Neuroscience to N. E. G. Hoglen, National Institutes of Health (NIH) Grant R25 NS-070680 to P. Larimer, NIH Grant R01 DC-011843 and Hearing Research Inc. to B. J. Malone, and NIH Grant R01 DC-014101, the Klingenstein Foundation, Hearing Research Inc., and the Coleman Memorial Fund to A. R. Hasenstaub.
DISCLOSURES
No conflicts of interest, financial or otherwise, are declared by the authors.
AUTHOR CONTRIBUTIONS
B.J.M. and A.R.H. conceived and designed research; N.E.G.H. and E.A.P. performed experiments; N.E.G.H., P.L., B.J.M., and A.R.H. analyzed data; N.E.G.H., P.L., B.J.M., and A.R.H. interpreted results of experiments; N.E.G.H., P.L., B.J.M., and A.R.H. prepared figures; N.E.G.H., P.L., B.J.M., and A.R.H. drafted manuscript; N.E.G.H., P.L., E.A.P., B.J.M., and A.R.H. edited and revised manuscript; N.E.G.H., P.L., E.A.P., B.J.M., and A.R.H. approved final version of manuscript.
REFERENCES
- Baumann S, Joly O, Rees A, Petkov CI, Sun L, Thiele A, Griffiths TD. The topography of frequency and time representation in primate auditory cortices. eLife 4: e03256, 2015. doi: 10.7554/eLife.03256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bieser A. Processing of twitter-call fundamental frequencies in insula and auditory cortex of squirrel monkeys. Exp Brain Res 122: 139–148, 1998. doi: 10.1007/s002210050501. [DOI] [PubMed] [Google Scholar]
- Bieser A, Müller-Preuss P. Auditory responsive cortex in the squirrel monkey: neural responses to amplitude-modulated sounds. Exp Brain Res 108: 273–284, 1996. doi: 10.1007/BF00228100. [DOI] [PubMed] [Google Scholar]
- Cheung SW, Bedenbaugh PH, Nagarajan SS, Schreiner CE. Functional organization of squirrel monkey primary auditory cortex: responses to pure tones. J Neurophysiol 85: 1732–1749, 2001. doi: 10.1152/jn.2001.85.4.1732. [DOI] [PubMed] [Google Scholar]
- Cohen YE, Theunissen F, Russ BE, Gill P. Acoustic features of rhesus vocalizations and their representation in the ventrolateral prefrontal cortex. J Neurophysiol 97: 1470–1484, 2007. doi: 10.1152/jn.00769.2006. [DOI] [PubMed] [Google Scholar]
- DiMattina C, Wang X. Virtual vocalization stimuli for investigating neural representations of species-specific vocalizations. J Neurophysiol 95: 1244–1262, 2006. doi: 10.1152/jn.00818.2005. [DOI] [PubMed] [Google Scholar]
- Elliott TM, Theunissen FE. The modulation transfer function for speech intelligibility. PLoS Comput Biol 5: e1000302, 2009. doi: 10.1371/journal.pcbi.1000302. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fenno L, Yizhar O, Deisseroth K. The development and application of optogenetics. Annu Rev Neurosci 34: 389–412, 2011. doi: 10.1146/annurev-neuro-061010-113817. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Foffani G, Moxon KA. PSTH-based classification of sensory stimuli using ensembles of single neurons. J Neurosci Methods 135: 107–120, 2004. doi: 10.1016/j.jneumeth.2003.12.011. [DOI] [PubMed] [Google Scholar]
- Gaese BH, Ostwald J. Temporal coding of amplitude and frequency modulation in the rat auditory cortex. Eur J Neurosci 7: 438–450, 1995. doi: 10.1111/j.1460-9568.1995.tb00340.x. [DOI] [PubMed] [Google Scholar]
- Gao L, Kostlan K, Wang Y, Wang X. Distinct subthreshold mechanisms underlying rate-coding principles in primate auditory cortex. Neuron 91: 905–919, 2016. doi: 10.1016/j.neuron.2016.07.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goldberg JM, Brown PB. Response of binaural neurons of dog superior olivary complex to dichotic tonal stimuli: some physiological mechanisms of sound localization. J Neurophysiol 32: 613–636, 1969. doi: 10.1152/jn.1969.32.4.613. [DOI] [PubMed] [Google Scholar]
- Gourévitch B, Eggermont JJ. Maximum decoding abilities of temporal patterns and synchronized firings: application to auditory neurons responding to click trains and amplitude modulated white noise. J Comput Neurosci 29: 253–277, 2010. doi: 10.1007/s10827-009-0149-3. [DOI] [PubMed] [Google Scholar]
- Hackett TA. Anatomic organization of the auditory cortex. Handb Clin Neurol 129: 27–53, 2015. doi: 10.1016/B978-0-444-62630-1.00002-0. [DOI] [PubMed] [Google Scholar]
- Handel S. Timbre perception and auditory object identification. In: Hearing, edited by Moore BC. San Diego, CA: Academic, 1995, chapt. 12, p. 425–461. [Google Scholar]
- Hasenstaub A, Otte S, Callaway E. Cell type-specific control of spike timing by gamma-band oscillatory inhibition. Cereb Cortex 26: 797–806, 2016. doi: 10.1093/cercor/bhv044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Herculano-Houzel S. Neuronal scaling rules for primate brains: the primate advantage. Prog Brain Res 195: 325–340, 2012. doi: 10.1016/B978-0-444-53860-4.00015-5. [DOI] [PubMed] [Google Scholar]
- Holy TE, Guo Z. Ultrasonic songs of male mice. PLoS Biol 3: e386, 2005. doi: 10.1371/journal.pbio.0030386. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang ZJ, Zeng H. Genetic approaches to neural circuits in the mouse. Annu Rev Neurosci 36: 183–215, 2013. doi: 10.1146/annurev-neuro-062012-170307. [DOI] [PubMed] [Google Scholar]
- Ince RA, Panzeri S, Kayser C. Neural codes formed by small and temporally precise populations in auditory cortex. J Neurosci 33: 18277–18287, 2013. doi: 10.1523/JNEUROSCI.2631-13.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Izpisua Belmonte JC, Callaway EM, Caddick SJ, Churchland P, Feng G, Homanics GE, Lee KF, Leopold DA, Miller CT, Mitchell JF, Mitalipov S, Moutri AR, Movshon JA, Okano H, Reynolds JH, Ringach D, Sejnowski TJ, Silva AC, Strick PL, Wu J, Zhang F. Brains, genes, and primates. Neuron 86: 617–631, 2015. (Erratum in Neuron 87: 671, 2017). doi: 10.1016/j.neuron.2015.03.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johnson JS, Yin P, O’Connor KN, Sutter ML. Ability of primary auditory cortical neurons to detect amplitude modulation with rate and temporal codes: neurometric analysis. J Neurophysiol 107: 3325–3341, 2012. doi: 10.1152/jn.00812.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Joris PX, Schreiner CE, Rees A. Neural processing of amplitude-modulated sounds. Physiol Rev 84: 541–577, 2004. doi: 10.1152/physrev.00029.2003. [DOI] [PubMed] [Google Scholar]
- Kilgard MP, Merzenich MM. Distributed representation of spectral and temporal information in rat primary auditory cortex. Hear Res 134: 16–28, 1999. doi: 10.1016/S0378-5955(99)00061-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liang L, Lu T, Wang X. Neural representations of sinusoidal amplitude and frequency modulations in the primary auditory cortex of awake primates. J Neurophysiol 87: 2237–2261, 2002. doi: 10.1152/jn.2002.87.5.2237. [DOI] [PubMed] [Google Scholar]
- Linden JF, Liu RC, Sahani M, Schreiner CE, Merzenich MM. Spectrotemporal structure of receptive fields in areas AI and AAF of mouse auditory cortex. J Neurophysiol 90: 2660–2675, 2003. doi: 10.1152/jn.00751.2002. [DOI] [PubMed] [Google Scholar]
- Liu RC, Miller KD, Merzenich MM, Schreiner CE. Acoustic variability and distinguishability among mouse ultrasound vocalizations. J Acoust Soc Am 114: 3412–3422, 2003. doi: 10.1121/1.1623787. [DOI] [PubMed] [Google Scholar]
- Lu T, Wang X. Information content of auditory cortical responses to time-varying acoustic stimuli. J Neurophysiol 91: 301–313, 2004. doi: 10.1152/jn.00022.2003. [DOI] [PubMed] [Google Scholar]
- MacDougall M, Nummela SU, Coop S, Disney A, Mitchell JF, Miller CT. Optogenetic manipulation of neural circuits in awake marmosets. J Neurophysiol 116: 1286–1294, 2016. doi: 10.1152/jn.00197.2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Malone B, Schreiner CE. Time-Varying Sounds: Amplitude Envelope Modulations. Oxford, UK: Oxford Univ. Press, 2010. [Google Scholar]
- Malone BJ, Beitel RE, Vollmer M, Heiser MA, Schreiner CE. Spectral context affects temporal processing in awake auditory cortex. J Neurosci 33: 9431–9450, 2013. doi: 10.1523/JNEUROSCI.3073-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Malone BJ, Beitel RE, Vollmer M, Heiser MA, Schreiner CE. Modulation-frequency-specific adaptation in awake auditory cortex. J Neurosci 35: 5904–5916, 2015a. doi: 10.1523/JNEUROSCI.4833-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Malone BJ, Scott BH, Semple MN. Dynamic amplitude coding in the auditory cortex of awake rhesus macaques. J Neurophysiol 98: 1451–1474, 2007. doi: 10.1152/jn.01203.2006. [DOI] [PubMed] [Google Scholar]
- Malone BJ, Scott BH, Semple MN. Temporal codes for amplitude contrast in auditory cortex. J Neurosci 30: 767–784, 2010. doi: 10.1523/JNEUROSCI.4170-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Malone BJ, Scott BH, Semple MN. Encoding frequency contrast in primate auditory cortex. J Neurophysiol 111: 2244–2263, 2014. doi: 10.1152/jn.00878.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Malone BJ, Scott BH, Semple MN. Diverse cortical codes for scene segmentation in primate auditory cortex. J Neurophysiol 113: 2934–2952, 2015b. doi: 10.1152/jn.01054.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mardia KV, Jupp PE. Tests of uniformity and tests of goodness-of-fit. In: Directional Statistics. New York: Wiley, 1999, p. 93–118. doi: 10.1002/9780470316979.ch6. [DOI] [Google Scholar]
- Meredith RW, Janečka JE, Gatesy J, Ryder OA, Fisher CA, Teeling EC, Goodbla A, Eizirik E, Simão TL, Stadler T, Rabosky DL, Honeycutt RL, Flynn JJ, Ingram CM, Steiner C, Williams TL, Robinson TJ, Burk-Herrick A, Westerman M, Ayoub NA, Springer MS, Murphy WJ. Impacts of the Cretaceous Terrestrial Revolution and KPg extinction on mammal diversification. Science 334: 521–524, 2011. doi: 10.1126/science.1211028. [DOI] [PubMed] [Google Scholar]
- Molnár G, Rózsa M, Baka J, Holderith N, Barzó P, Nusser Z, Tamás G. Human pyramidal to interneuron synapses are mediated by multi-vesicular release and multiple docked vesicles. eLife 5: e18167, 2016. doi: 10.7554/eLife.18167. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nagarajan SS, Cheung SW, Bedenbaugh P, Beitel RE, Schreiner CE, Merzenich MM. Representation of spectral and temporal envelope of twitter vocalizations in common marmoset primary auditory cortex. J Neurophysiol 87: 1723–1737, 2002. doi: 10.1152/jn.00632.2001. [DOI] [PubMed] [Google Scholar]
- Otte S, Hasenstaub A, Callaway EM. Cell type-specific control of neuronal responsiveness by gamma-band oscillatory inhibition. J Neurosci 30: 2150–2159, 2010. doi: 10.1523/JNEUROSCI.4818-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Phillips EA, Hasenstaub AR. Asymmetric effects of activating and inactivating cortical interneurons. eLife 5: e183183, 2016. doi: 10.7554/eLife.18383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rakic P. Evolution of the neocortex: a perspective from developmental biology. Nat Rev Neurosci 10: 724–735, 2009. doi: 10.1038/nrn2719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rosen MJ, Semple MN, Sanes DH. Exploiting development to evaluate auditory encoding of amplitude modulation. J Neurosci 30: 15509–15520, 2010. doi: 10.1523/JNEUROSCI.3340-10.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rosen S. Temporal information in speech: acoustic, auditory and linguistic aspects. Philos Trans R Soc Lond B Biol Sci 336: 367–373, 1992. doi: 10.1098/rstb.1992.0070. [DOI] [PubMed] [Google Scholar]
- Sarro EC, Rosen MJ, Sanes DH. Taking advantage of behavioral changes during development and training to assess sensory coding mechanisms. Ann NY Acad Sci 1225: 142–154, 2011. doi: 10.1111/j.1749-6632.2011.06023.x. [DOI] [PubMed] [Google Scholar]
- Schulze H, Langner G. Periodicity coding in the primary auditory cortex of the Mongolian gerbil (Meriones unguiculatus): two different coding strategies for pitch and rhythm? J Comp Physiol A Neuroethol Sens Neural Behav Physiol 181: 651–663, 1997. doi: 10.1007/s003590050147. [DOI] [PubMed] [Google Scholar]
- Seybold BA, Phillips EA, Schreiner CE, Hasenstaub AR. Inhibitory actions unified by network integration. Neuron 87: 1181–1192, 2015. doi: 10.1016/j.neuron.2015.09.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shannon RV, Zeng FG, Kamath V, Wygonski J, Ekelid M. Speech recognition with primarily temporal cues. Science 270: 303–304, 1995. doi: 10.1126/science.270.5234.303. [DOI] [PubMed] [Google Scholar]
- Song X, Osmanski MS, Guo Y, Wang X. Complex pitch perception mechanisms are shared by humans and a New World monkey. Proc Natl Acad Sci USA 113: 781–786, 2016. doi: 10.1073/pnas.1516120113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Steiper ME, Young NM. Primate molecular divergence dates. Mol Phylogenet Evol 41: 384–394, 2006. doi: 10.1016/j.ympev.2006.05.021. [DOI] [PubMed] [Google Scholar]
- Ter-Mikaelian M, Sanes DH, Semple MN. Transformation of temporal properties between auditory midbrain and cortex in the awake Mongolian gerbil. J Neurosci 27: 6091–6102, 2007. doi: 10.1523/JNEUROSCI.4848-06.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Testa-Silva G, Verhoog MB, Linaro D, de Kock CP, Baayen JC, Meredith RM, De Zeeuw CI, Giugliano M, Mansvelder HD. High bandwidth synaptic communication and frequency tracking in human neocortex. PLoS Biol 12: e1002007, 2014. doi: 10.1371/journal.pbio.1002007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vigneswaran G, Kraskov A, Lemon RN. Large identified pyramidal cells in macaque motor and premotor cortex exhibit “thin spikes”: implications for cell type classification. J Neurosci 31: 14235–14242, 2011. doi: 10.1523/JNEUROSCI.3142-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Walton JP, Simon H, Frisina RD. Age-related alterations in the neural coding of envelope periodicities. J Neurophysiol 88: 565–578, 2002. doi: 10.1152/jn.2002.88.2.565. [DOI] [PubMed] [Google Scholar]
- Wang X. Neural coding strategies in auditory cortex. Hear Res 229: 81–93, 2007. doi: 10.1016/j.heares.2007.01.019. [DOI] [PubMed] [Google Scholar]
- Winter P, Ploog D, Latta J. Vocal repertoire of the squirrel monkey (Saimiri sciureus), its analysis and significance. Exp Brain Res 1: 359–384, 1966. doi: 10.1007/BF00237707. [DOI] [PubMed] [Google Scholar]
- Yin P, Johnson JS, O’Connor KN, Sutter ML. Coding of amplitude modulation in primary auditory cortex. J Neurophysiol 105: 582–600, 2011. doi: 10.1152/jn.00621.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zwicker E, Fastl H. Roughness. In: Psychoacoustics—Facts and Models, edited by Fastl H. Berlin: Springer, 1990, chapt. 11, p. 231–236. [Google Scholar]