Skip to main content
PLOS Biology logoLink to PLOS Biology
. 2021 Jun 16;19(6):e3001299. doi: 10.1371/journal.pbio.3001299

Neuronal selectivity to complex vocalization features emerges in the superficial layers of primary auditory cortex

Pilar Montes-Lourido 1,¤,#, Manaswini Kar 1,2,#, Stephen V David 3, Srivatsun Sadagopan 1,2,4,5,*
Editor: Manuel S Malmierca6
PMCID: PMC8238193  PMID: 34133413

Abstract

Early in auditory processing, neural responses faithfully reflect acoustic input. At higher stages of auditory processing, however, neurons become selective for particular call types, eventually leading to specialized regions of cortex that preferentially process calls at the highest auditory processing stages. We previously proposed that an intermediate step in how nonselective responses are transformed into call-selective responses is the detection of informative call features. But how neural selectivity for informative call features emerges from nonselective inputs, whether feature selectivity gradually emerges over the processing hierarchy, and how stimulus information is represented in nonselective and feature-selective populations remain open question. In this study, using unanesthetized guinea pigs (GPs), a highly vocal and social rodent, as an animal model, we characterized the neural representation of calls in 3 auditory processing stages—the thalamus (ventral medial geniculate body (vMGB)), and thalamorecipient (L4) and superficial layers (L2/3) of primary auditory cortex (A1). We found that neurons in vMGB and A1 L4 did not exhibit call-selective responses and responded throughout the call durations. However, A1 L2/3 neurons showed high call selectivity with about a third of neurons responding to only 1 or 2 call types. These A1 L2/3 neurons only responded to restricted portions of calls suggesting that they were highly selective for call features. Receptive fields of these A1 L2/3 neurons showed complex spectrotemporal structures that could underlie their high call feature selectivity. Information theoretic analysis revealed that in A1 L4, stimulus information was distributed over the population and was spread out over the call durations. In contrast, in A1 L2/3, individual neurons showed brief bursts of high stimulus-specific information and conveyed high levels of information per spike. These data demonstrate that a transformation in the neural representation of calls occurs between A1 L4 and A1 L2/3, leading to the emergence of a feature-based representation of calls in A1 L2/3. Our data thus suggest that observed cortical specializations for call processing emerge in A1 and set the stage for further mechanistic studies.


A study of the neuronal representations elicited in guinea pigs by conspecific calls at different auditory processing stages reveals insights into where call-selective neuronal responses emerge; the transformation from nonselective to call-selective responses occurs in the superficial layers of the primary auditory cortex.

Introduction

How behaviorally critical sounds, such as conspecific vocalizations (calls), are represented in the activity of neural populations at various stages of the auditory processing hierarchy is a central question in auditory neuroscience. Early representations of sounds, such as in the auditory nerve, have been proposed to be optimized for the efficient and faithful representation of sounds in general [1,2]. Consequently, at lower auditory processing stations, vocalizations are not represented any differently than other sounds ([3,4]; but see [5]). At the other extreme, behaviorally relevant stimuli such as vocalizations are overrepresented at the highest cortical processing stages [69]. In macaques and marmosets, neurons in the highest stages of the auditory processing hierarchy show strong selectivity for call category and even caller identity [1012]. How the neural representation of calls is transformed from a nonspecific format in early processing stages to a call-selective format at higher processing stages remains unclear. Because auditory receptive fields increase in complexity as one ascends the auditory processing hierarchy [13,14], the conventional hypothesis is that call selectivity is gradually refined across auditory processing stages. However, there is little systematic evidence supporting a gradual refinement in call selectivity. While many studies have investigated call representations in subcortical and cortical stages [6,7,1527], these have not systematically explored the mechanisms of how call representations could be transformed from one stage to the next or how this impacts information representation at different processing stages. A clear understanding of where critical transformations occur is an essential first step in designing experiments to probe neural mechanisms underlying these transformations and to target these experiments to the appropriate processing stage in the auditory hierarchy. In this study, we recorded neural responses to an extensive set of call stimuli across multiple auditory processing stages to test whether the emergence of call selectivity is gradual and to characterize the nature and informativeness of call representations at these processing stages.

The first question to address is what it means for a neuron to be call selective. In many mammalian species, calls are not produced stereotypically from trial to trial; rather, calls are instantiations of an underlying noisy production process. Thus, there is considerable variability in the production of calls belonging to a given call category both across trials and across individuals [28,29]. Furthermore, different call categories may have highly overlapping spectral content. To be call category selective, a neuron has to be selective for more than purely spectral cues and has to generalize across production variability. In previous theoretical work, we showed that in order to construct high level call category-selective neural responses, it is first necessary to have an intermediate representation where neurons detect informative call features [29]. Informative call features are spectrotemporal fragments of calls that are most likely to be found across exemplars of a given category (despite production variability) and typically span about an octave in frequency and about a hundred milliseconds in time. Thus, if one of the objectives of cortical processing is call categorization, our model would predict the existence of diverse neurons, each tuned for model-predicted informative features. Consistent with this prediction, limited experimental data suggested that call feature-selective neurons could be found in primary auditory cortex (A1) of marmosets and guinea pigs (GPs) [29]. But the question remains whether feature selectivity is gradually constructed over the ascending auditory pathway, or if it emerges de novo at some processing stage.

At lower processing stations of the auditory pathway in GPs and nonhuman primates, there is little evidence for the existence of call feature-selective neurons [15,16,22]. Rather, neurons appear to respond to call types in a manner largely explained by frequency tuning [15,16,22]. In GPs, single neurons in the inferior colliculus (IC) are not selective for particular call types or call features [16]. In primates and GPs, even at the level of A1, many previous studies have not reported strong selectivity for particular call types or features, or preference for natural over reversed calls ([17,20,21,30]; but see below). It is only at the level of secondary cortex that clear call-selective responses have been reported, both in primates (in anterolateral (AL) belt; [8,9]), and in GPs (Area S and the ventral–rostral belt (VRB) [6]). However, gaps in understanding remain because of some technical limitations of these studies, including the use of anesthesia, limited stimulus sets, multiunit recordings, or not comparing across processing stages, specifically across cortical laminae. Thus, these studies do not give rise to a clear picture of where and how a call feature-specific representation first emerges.

A few studies have provided hints that A1 could be a locus of important transformations to the neural representation of calls. In A1 of awake squirrel monkeys, one study reported that about a third of neurons responded to call stimuli that showed similarities in their frequency–time characteristics [23]. In marmoset A1, about a third of A1 neurons at shallower recording depths showed highly nonlinear receptive fields that could in turn underlie call feature selectivity [31]. It has been proposed that because A1 neurons cannot phase lock to fast envelope fluctuations, sparse spiking in A1 could provide temporal markers that reflect subcortical spectrotemporal integration [32]. But these studies did not specify whether recordings were from the input or output layers of A1. In humans, a recent study using ultrahigh field fMRI with laminar resolution reported that whereas blood–oxygen level–dependent (BOLD) activity in granular and infragranular layers could be explained using simple frequency content-based models, activity in supragranular layers could be explained better using more complex models incorporating spectral and temporal modulations [33]. This supragranular activity resembled activity in secondary auditory cortical areas, suggesting that a transformation between thalamorecipient (A1 L4) and superficial (A1 L2/3) layers of A1 might give rise to more specialized processing. Thus, a careful investigation of the thalamus and across identified cortical laminae of A1 is necessary to understand how the cortex might transform sound representations, particularly with respect to behaviorally critical sounds such as calls.

In this study, we begin to address how early nonspecific and spectral content-based representations are transformed into higher feature-based representations. We recorded neural activity from unanesthetized GPs passively listening to an extensive range of conspecific calls [6,34,35] and acquired single-unit responses from the thalamus (ventral medial geniculate body (vMGB)), thalamorecipient (A1 L4), and superficial (A1 L2/3) layers of A1. We found that neurons in vMGB and A1 L4 responded to most call categories and throughout the call durations. In contrast, a third of A1 L2/3 neurons responded sparsely and selectively to 1 or 2 call categories, and only in specific time bins within a call. These A1 L2/3 neurons showed highly complex receptive fields that could underlie this call feature selectivity. Information-theoretic analyses revealed that while average mutual information (MI) was high in A1 L4, MI was about evenly distributed over the population of neurons and across multiple stimuli and sustained over the stimulus duration. In contrast, individual A1 L2/3 neurons were highly informative about few stimuli and conveyed high levels of information per spike in only a handful of time bins. These results argue against a gradual emergence of call feature selectivity and suggest that a significant transformation in the neural representation of calls occurs between A1 L4 and A1 L2/3, leading to the emergence of a feature-based representation of calls in A1 L2/3.

Results

We recorded the activity of single neurons located in the vMGB, A1 L4, and A1 L2/3 of unanesthetized, head-fixed, passively listening GPs (Fig 1A, top). We first implanted a head post and recording chambers onto the skull of the animals using aseptic surgical technique. We then performed small craniotomies (approximately 1.0 mm diameter) to access the underlying tissue (Fig 1A, bottom). Single-unit activity was recorded using high-impedance tungsten electrodes and first sorted online using a template match algorithm, and later refined offline. Over a few weeks, we sequentially recorded from a number of such craniotomies and constructed tonotopic maps (Fig 1C). The location of A1 was confirmed using the direction of the tonotopic gradient and tonotopic reversals. Note that in GPs, the A1 gradient is similar to primates and runs from low frequencies rostrally to high frequencies caudally [6,36,37]. On each track, we also acquired local field potential (LFP) responses to tones at evenly spaced depths, from which we calculated the current source density (CSD) profile of the track (Fig 1B). The thalamorecipient layers (referred to here as A1 L4) were identified based on the presence of a short-latency current sink and LFP polarity reversal [38]. We distinguished between regular-spiking (RS) and fast-spiking (FS) neurons in our recordings using spike width and peak-to-trough amplitude ratio (Fig 1D). About 20% of our recordings were from FS neurons, but call responses were tested in only half these neurons. Only RS neurons are reported in this study. Spontaneous rates of A1 L2/3 neurons (Fig 1E; median: 1.51 spk/s) were not significantly different from A1 L4 neurons (median: 2.31 spk/s) but were significantly lower compared to vMGB neurons (median: 3.67 spk/s; Kruskal–Wallis test p = 0.008; post hoc Dunn–Sidak tests vMGB versus A1 L4: p = 0.1112, A1 L2/3 versus vMGB: p = 0.005, A1 L2/3 versus A1 L4: p = 0.085). We sampled over a broad range of neural best frequencies that overlapped with the call frequency range (Fig 1F). Pure tone tuning bandwidths of tone-responsive neurons at all processing stages showed a dependence on best frequency (Fig 1G; ANCOVA with best frequency as covariate, p = 0.0071), and after controlling for this frequency dependence, the bandwidths of vMGB neurons were significantly higher than A1 L2/3 neurons (ANCOVA constrained to same slopes; intercept effect p = 0.0017; post hoc Tukey honestly significant difference (HSD) vMGB versus A1 L4: p = 0.053, A1 L2/3 versus vMGB: p = 0.0012). A1 L4 and A1 L2/3 bandwidths were not significantly different (p = 0.554). Following basic characterization, we presented a range of GP calls (8 categories, 2 or more exemplars of each category; Fig 2). Note that our vocalization set did not have acoustic power in the 4 to 6 kHz range (Fig 2A), which may explain the relative paucity of call-responsive neurons we encountered in that range, particularly in cortical recordings. All call categories were about evenly represented in neural responses across the processing stages (Fig 1H). The only statistically significant deviations we observed was a small overrepresentation of “Other” calls and a small underrepresentation of “Purr” calls in A1 L2/3 (p = 0.014 for both, two-sided permutation test with false discovery rate (FDR) correction for 24 comparisons). All further analyses are based only on call-responsive neurons from the vMGB (n = 33), A1 L4 (n = 67), and A1 L2/3 (n = 45).

Fig 1. Single-unit recordings from unanesthetized, head-fixed GPs.

Fig 1

(A) Recording setup (top) and details of cranial implant (bottom). (B) Average LFP traces (black lines) and CSD (colormap; warm colors correspond to sinks) of an example electrode track in A1. Yellow box outlines estimated A1 L4 location. (C) Example Voronoi map showing tonotopy of auditory cortex in one GP. Colormap corresponds to best frequency. (D) Histogram of spike widths of sorted single units. Dashed orange line is the threshold used to separate FS (blue) from RS (red) units. (E) Distribution of spontaneous rates in vMGB (blue), A1 L4 (yellow), and A1 L2/3 (red). ***p < 0.005, Kruskal–Wallis test (Dunn–Sidak post hoc test). (F) Best frequencies (discs) and bandwidths (lines) of tone-responsive neurons recorded from vMGB, A1 L4, and A1 L2/3 (colors as earlier). Insets show distribution of units across subjects, colors correspond to individual subjects. (G) Tone tuning bandwidth plotted as a function of best frequency across all 3 auditory stages tested. Dots correspond to individual neurons, and lines correspond to linear fits constrained to have the same slope. (H) Fraction of call-responsive neurons in vMGB, A1 L4, and A1 L2/3 that respond to each call category (*p < 0.05, two-sided permutation test with FDR correction). Data underlying this figure can be found in Supporting information file S1 Data. CSD, current source density; FDR, false discovery rate; FS, fast-spiking; GP, guinea pig; LFP, local field potential; RS, regular-spiking.

Fig 2. Spectra and spectrograms of GP calls.

Fig 2

(A) Normalized power spectra of the GP calls used in this study. Colors correspond to different call categories. (B) Spectrograms of the GP calls used in this study (8 categories, 2 calls per category). Data underlying this figure can be found in Supporting information file S2 Data. GP, guinea pig.

Call selectivity emerges in superficial cortical layers

Call selectivity could emerge through a gradual sharpening of tuning along successive stages of the ascending auditory pathway or could sharply emerge at some processing stage. To distinguish between these models, we quantified the call selectivity of neural populations in vMGB, A1 L4, and A1 L2/3. Fig 3 shows representative examples of neural responses to calls in vMGB (Fig 3A), A1 L4 (Fig 3B), and A1 L2/3 (Fig 3C). Neurons in vMGB and A1 L4 typically responded to many call categories, with responses sustained throughout the call, or occurring at multiple times over the duration of a call. In contrast, neurons in A1 L2/3 responded to very few calls and only for short durations within each call.

Fig 3. Detection of response windows.

Fig 3

Spike rasters of 3 call-responsive neurons from (A) vMGB, (B) A1 L4, and (C) A1 L2/3 are plotted. Gray shading indicates stimulus duration, and black dots correspond to spike times. Orange boxes correspond to response windows detected using our algorithm. Data underlying this figure can be found in Supporting information file S3 Data.

Conventionally, response rates and response significance are calculated over a fixed response window, typically encompassing the entire stimulus duration. For a first-pass analysis, we defined selectivity as the number of call categories that, compared to spontaneous rate, evoked a significant response over the entire call duration (1 –highly selective, 8 –no selectivity). The median selectivity of the A1 L2/3 population was 3 call categories, whereas the medians for the A1 L4 and vMGB populations were 6 call categories (p = 3.5 × 10−6; Kruskal–Wallis test). While this approach accurately estimated response properties when response rates were high and sustained, it sometimes failed to capture feature-selective responses that were restricted to only some time bins of the stimulus, such as those we observed in A1 L2/3. To overcome this limitation, we used an automated procedure to estimate significant response windows for each stimulus (orange boxes in Fig 3; see Materials and methods). If at least one response window was detected for any exemplar belonging to a call category, we conservatively counted the neuron as being responsive to that category.

Over the population of recorded neurons, while vMGB and A1 L4 neurons showed significant responses to most of the categories tested (Fig 4A, left and center; median of 7 categories for both vMGB and A1 L4), nearly a third of A1 L2/3 neurons responded to only 1 or 2 call categories (Fig 4A, right; median = 5). Distributions of call selectivity were not significantly different between the vMGB and A1 L4 populations (medians = 7). In contrast, A1 L2/3 neurons responded to significantly fewer categories of calls (p = 2.8 × 10−5, Kruskal–Wallis test; post hoc Dunn–Sidak corrected p-values are as follows: vMGB versus A1 L4: p = 0.90; A1 L2/3 versus vMGB: p = 2.5 × 10−4; A1 L2/3 versus A1 L4: p = 1.9 × 10−4). The temporal characteristics of the response and response duration are shown in Fig 4B, where we plot the joint distribution of the number of response windows found per call and the fractional length of call stimuli spanned by response windows in vMGB, A1 L4, and A1 L2/3. While most vMGB and A1 L4 neurons typically exhibited 2 or more response windows per call that spanned a larger fraction of call length, many A1 L2/3 neurons usually exhibited only one response window per call with response windows spanning a smaller fraction of call length. The temporal response characteristics of vMGB and A1 L4 were therefore not significantly different (p = 0.48, 2D Kolmogorov–Smirnov (K–S) [39] test with Bonferroni correction), whereas A1 L2/3 responses were significantly different (A1 L2/3 versus vMGB: p = 0.0008, A1 L2/3 versus A1 L4: p = 0.0023; 2D K–S test with Bonferroni correction). Thus, at the culmination of subcortical processing, vMGB responses are not call selective and in fact mirror earlier studies showing a lack of call selectivity in GP IC [16]. Even at the first cortical processing stage (A1 L4), no transformation to the representation of calls seems to have occurred. However, our data demonstrate that a significant transformation to call representation occurs in many superficial cortical neurons (A1 L2/3). These data strongly support the de novo emergence of call feature-selective responses in the superficial layers of primary auditory cortex.

Fig 4. Neural selectivity for call features emerges in A1 L2/3.

Fig 4

(A) Distributions of call selectivity in vMGB (blue), A1 L4 (yellow), and A1 L2/3 (red). Black dashed lines are medians. Comparison of cumulative distributions is shown on the right. (B) Joint distributions of the number of response windows and the fractional length of the call stimuli spanned by all windows exhibited by neurons at the different processing stages. vMGB and A1 L4 neurons tended to exhibit either multiple short windows or a single long window that spanned a large portion of the stimuli. In contrast, A1 L2/3 neurons exhibited 1 or 2 short response windows. (C) Distributions of trial-wise response rates in an example vMGB (blue; same neuron as in Fig 3A, left), A1 L4 (yellow; same neuron as in Fig 3B, left), and A1 L2/3 (red; same neuron as in Fig 3C, left) neuron. Kurtosis values calculated over the entire call length are shown. Gray dashed line corresponds to spontaneous rate. (D) Distributions of sparseness (kurtosis) across auditory processing stages. A1 L2/3 responses were significantly sparser than A1 L4 and vMGB responses. (E) Same as (D) but with activity fraction used as a metric of response sparseness. For all panels except B, Kruskal–Wallis tests with post hoc Dunn–Sidak tests were used for statistical comparisons. For B, a two-dimensional K–S test with Bonferroni correction was used. Asterisks correspond to: *p < 0.05, **p < 0.01, ***p < 0.005, ****p < 0.001 (exact p-values in main text). Data underlying this figure can be found in Supporting information file S4 Data. K–S, Kolmogorov–Smirnov.

To evaluate whether neurons specifically responded to only parts of some calls or if neural responses were more evenly distributed across calls using metrics independent of stimulus identity and response window detection parameters, we characterized response sparsity. We defined sparseness as (1) the reduced kurtosis of the trial-wise firing rate distribution and (2) the activity fraction ([40,41]; see Eq 1) of the trial-wise responses. For neurons that responded to most trials about evenly, such as the A1 L4 neuron in Fig 3B (left), the firing rate distribution was approximately normal, resulting in low kurtosis values (Fig 4C, center). In contrast, for neurons that responded strongly only on some trials, and were unresponsive for most trials, such as the A1 L2/3 neuron in Fig 3C (left), the firing rate distribution showed high kurtosis (Fig 4C, right). Over the population, for both sparsity metrics (kurtosis, Fig 4D; and activity fraction, Fig 4E), we found that vMGB and A1 L4 responses were not sparse and not significantly different from each other. Consistent with earlier analyses, compared to both vMGB and A1 L4, A1 L2/3 responses were highly sparse and sparsity distributions were significantly different (Kurtosis: p = 3.2 × 10−5, Kruskal–Wallis test; Dunn–Sidak post hoc test p-values are as follows: vMGB versus A1 L4: p = 0.99, A1 L2/3 versus vMGB: p = 5.5 × 10−4, A1 L2/3 versus A1 L4: p = 1.2 × 10−4. Activity fraction: p = 5.2 × 10−4, Kruskal–Wallis test; Dunn–Sidak post hoc test p-values are as follows: vMGB versus A1 L4: p = 0.79, A1 L2/3 versus vMGB: p = 0.001, A1 L2/3 versus A1 L4: p = 0.004).

These observed differences in A1 L2/3 selectivity and sparsity could not simply be attributed to differences in frequency tuning. As mentioned above, pure tone tuning bandwidths of tone-responsive neurons in A1 L2/3 were not significantly different from A1 L4 neurons (Fig 1G). High call selectivity in A1 L2/3 could also arise if only a few call types are overrepresented in this processing stage. This was not the case in our data—as described earlier, neural preference for call type was about evenly distributed across all tested call types across the processing stages. These controls thus suggest that the emergence of call or feature selectivity in A1 L2/3 is the consequence of cortical computations that result in a meaningful transformation of information representation between processing stages.

Because responses were evoked for more call categories and for larger fractional lengths of the calls in vMGB and A1 L4, and given the overlapping spectral content of call categories that is largely maintained over the call durations (Fig 2), we hypothesized that vMGB and A1 L4 neurons were likely driven by the spectral content of calls, responding when call spectral energy overlapped with the neurons’ tone receptive fields. In contrast, despite this overlap of spectral energy across call types, many A1 L2/3 neurons responded to few call types and only in narrow windows, suggesting that they were likely driven by specific spectrotemporal features that occur during calls, consistent with our earlier theoretical model [29]. We tested these hypotheses by estimating the spectrotemporal receptive fields (STRFs) that best explained neural responses across the processing stages.

Complex spectrotemporal features drive call-selective responses

To determine the call features driving neural responses, we used the Neural Encoding Model System (NEMS [42,43]; https://github.com/LBHB/NEMS) to fit linear–nonlinear (LN) models to neural responses to calls. The input to these models was the concatenated cochleagram of all call stimuli (6 oct. frequency range with 5 steps/oct., 20 ms time bins, approximately 35 seconds total; Fig 5B), constructed using a fast approximation algorithm based on a weighted log-spaced spectrogram and 3 rate-level transformations corresponding to 3 categories of auditory nerve fibers ([44]; https://github.com/monzilur/cochlear_models). A recent study demonstrated that such an input representation adequately captures the auditory input to cortex for the purposes of receptive field estimation [44]. The objective of the encoding model was to estimate a set of linear weights (the STRF of the neuron), which, when convolved with the input cochleagram and then transformed through a point nonlinearity, would yield a predicted peristimulus time histogram (PSTH; Fig 5A; see Materials and methods for details). The correlation coefficient between predicted PSTHs of validation segments of neural responses (labeled r in figures; see Materials and methods) and actual response PSTHs was used as the performance metric. For display and measuring STRF sparsity, we used significance-masked average STRFs (see Materials and methods).

Fig 5. STRF estimates of example A1 L4 neurons.

Fig 5

(A) Schematic of the LN model architecture used to estimate STRFs. (B) Stimulus cochleagram of 16 call stimuli (8 categories) used as the input to the model. (C, E, G) Mean STRF estimates of 3 A1 L4 neurons with a range of selectivity values. (D, F, H) Comparison of predicted PSTHs (magenta) and observed responses (cyan) of these 3 neurons. Horizontal blue lines denote the extent of the frequency tuning of the STRFs. (I) Additional examples of A1 L4 STRF estimates (sel. = call selectivity, r = correlation between predicted and actual responses derived from the validation data set). Data underlying this figure can be found in Supporting information file S5 Data. LN, linear–nonlinear; PSTH, peristimulus time histogram; STRF, spectrotemporal receptive field.

Examples of STRF estimates and comparisons of predicted responses to observed responses are shown for neurons with a range of call selectivities from different subjects in A1 L4 and A1 L2/3 in Figs 5 and 6. For many A1 L4 neurons (Fig 5), STRF estimates that best captured the response showed a clear tuning for specific frequencies, and significant weights were restricted to a narrow range of frequencies and few time bins. While a few call-selective A1 L4 responses could not be directly explained by call energy overlapping with an excitatory receptive field subunit (for example, Fig 5C and 5D), responses of most A1 L4 neurons to calls occurred when call energy was present within the excitatory subunits of the receptive fields (horizontal blue lines in Fig 5E–5H). In contrast, STRFs of A1 L2/3 neurons estimated using the same procedure were often more complex (Fig 6). We observed STRFs with preferences for repetitive features (Fig 6A), harmonically related features (Fig 6G), and frequency-modulated features (Fig 6K). Compared to A1 L4 estimates, significant A1 L2/3 weights spanned a greater range of frequencies and time bins. When we overlaid different stimulus segments on the A1 L2/3 STRFs, we observed that responses did not occur when only stimulus spectral energy matched STRF excitatory subunits (red boxes labeled “3,” “4,” and “5” in Fig 6). Rather, responses were elicited when complex stimulus features matched multiple STRF subunits (green boxes labeled “1” and “2” in Fig 6).

Fig 6. STRF estimates of example A1 L2/3 neurons.

Fig 6

(A, G) STRF estimates of 2 A1 L2/3 neurons showing complex feature selectivity. (B, H) Stimulus cochleagram (background) and comparison of predicted PSTHs (magenta) and observed responses (cyan) of these 2 neurons. (D) Expanded cochleagram segment from orange box in B. In B, D, and H, green boxes labeled “1” and “2” correspond to 200 ms long stimulus segments that elicited neural responses. Red boxes labeled “3,” “4,” and “5” correspond to 200 ms long stimulus segments that did not elicit responses. Numbers correspond to examples shown in panels C, E, F, I, and J. (C, E, F) Overlay of stimulus energy in 200 ms long segments corresponding to numbers in B and D (transparency denotes stimulus energy, peak energy is bounded by black contour) on the STRF (colormap) of this unit. (I, J) Similar to C, E, and F but for the other A1 L2/3 example. (K) Additional examples of complex STRFs of A1 L2/3 neurons. Data underlying this figure can be found in Supporting information file S6 Data. PSTH, peristimulus time histogram; STRF, spectrotemporal receptive field.

For example, the unit in Fig 6A–6F showed selective responses to teeth chatter calls, a nonvoiced call that contains repetitive pulses of low-frequency energy around 1 kHz accompanied by high-frequency energy around 8 kHz (see spectrogram in Fig 2B). The STRF estimate of this neuron showed excitatory receptive field subunits at approximately 1 kHz and approximately 8 kHz, with an additional excitatory subunit at 8 kHz occurring approximately 100 ms later. Some parts of teeth chatter calls thus closely overlapped the excitatory subunits of STRF, driving strong responses (Fig 6C). But other parts of teeth chatter calls did not drive responses (Fig 6F), possibly because of the faster repetition rate of individual syllables or activity-dependent adaptation of spiking activity. A second call exemplar that had repetitive energy at 8 kHz (a chirp call) also drove responses in this neuron to a lesser extent (Fig 6E), but other vocalizations with 8 kHz energy that did not have a repetitive structure did not drive responses (e.g., wheek calls; Fig 6F). A second example unit that required the presence of harmonic structure is shown in Fig 6G–6J. This unit appeared to require at least 2 of the excitatory STRF subunits to be activated to produce a response. The selectivity for multiple frequency components in this unit was reminiscent of “harmonic template neurons” that have been reported in marmoset auditory cortex [45]. This unit responded even when different frequency combinations were excited by different calls (Fig 6I), underscoring the intuition that these units could not be described as a simple spectral filter. Fig 6K shows further examples of STRF estimates of units that showed selective responses to call features.

Over the population of neurons, we did not find significant differences in the performance of the LN models to fit training data segments from vMGB, A1 L4, or A1 L/3 neurons (Fig 7A, left; p = 0.684, Kruskal–Wallis test), suggesting that the model converged to a solution similarly across the 3 processing stages. However, while the LN models generalized to the validation data segments with similar performance in vMGB and A1 L4 neurons, generalization was significantly worse for A1 L2/3 neurons (Fig 7A, right; Kruskal–Wallis test, p = 0.0003; Dunn–Sidak post hoc test p-values are as follows: vMGB versus A1 L4: p = 0.999, A1 L2/3 versus vMGB: p = 0.003, A1 L2/3 versus A1 L4: p = 0.0006). Critically, model generalization performance was correlated with call selectivity across all processing stages (Fig 7B; ANCOVA with selectivity as covariate; p = 2.83 × 10−7). We note that several neurons with a call selectivity of 1 showed very low and nonsignificant r values. These observations suggest that more complex and nonlinear models may be required to capture these highly selective responses.

Fig 7. Performance and complexity of STRF estimates across processing stages.

Fig 7

(A) Performance of LN models on test and validation data from MGB (blue), A1 L4 (yellow), and A1 L2/3 (red). Discs denote medians, thick lines denote interquartile range, and thin lines correspond to the extent of the distribution. Outliers are shown as dots. (B) Model validation performance plotted as a function of call selectivity across processing stages. Dots are individual neurons, and lines correspond to linear fit. (C) Distributions of STRF sparsity across processing stages. Colors and symbols as earlier. (D) Distributions of STRF kurtosis across processing stages. Colors and symbols as earlier. For all panels except B, Kruskal–Wallis tests with post hoc Dunn–Sidak tests were used for statistical comparisons. For B, an ANCOVA with selectivity as a covariate was used. Asterisks correspond to: *p < 0.05, **p < 0.01, ***p < 0.005, ****p < 0.001 (exact p-values in main text). Data underlying this figure can be found in Supporting information file S7 Data. LN, linear–nonlinear; STRF, spectrotemporal receptive field.

We used 2 metrics to compare the complexity of STRF structure across processing stages. First, we used STRF sparsity, defined as the maximum absolute value of the significance-masked STRF divided by the standard deviation of the significance-masked STRF [46,47]. For “simple” STRFs, the maximum value would be high, whereas standard deviation would be low, resulting in high STRF sparsity values. For complex STRFs where many weight values are large, the maximum value and standard deviation would be comparable, resulting in lower STRF sparsity values. We found a significant effect of processing stage on STRF sparsity (Fig 7C; Kruskal–Wallis test, p = 0.008), with post hoc tests revealing a significant difference between A1 L2/3 and A1 L4 neurons (Dunn–Sidak post hoc test, p = 0.006). As a second metric, we quantified the kurtosis of STRF weight values (after significance masking). STRFs with simple structure would show weight distributions with high kurtosis, with most of the weights concentrated in 1 or 2 subunits, and the rest of the weights equaling zero. Complex STRFs would be expected to have a more normal distribution of weight values. We found a significant effect of processing stage on kurtosis (Fig 7D; Kruskal–Wallis test, p = 3.3 × 10−5), with Dunn–Sidak post hoc tests revealing significant differences between A1 L2/3 and vMGB (p = 0.0007) as well as between A1 L2/3 and A1 L4 (p = 0.0001). These statistical results were qualitatively unchanged even when neurons with nonsignificant r values were excluded. These observations supported our hypothesis that whereas vMGB and A1 L4 neurons responded to call stimuli in a manner that was largely consistent with their spectral tuning properties, A1 L2/3 neurons were driven by more complex spectrotemporal features present in calls that could not be well fit by linear models.

Emergence of call feature selectivity in A1 L2/3 confers high stimulus-specific information on to individual A1 L2/3 neural responses

While our data show that A1 L2/3 neurons become call selective by restricting their responses to specific call features, the consequence of this emergence of call selectivity on decoding call identity from A1 L2/3 neural activity is unclear. An obvious expectation would be that increasing the feature selectivity of single neurons would result in unique activity patterns in response to some calls, thereby leading to higher information carried by these neurons about call identity. Conventionally, MI [48] has been used to estimate the amount of information about stimulus identity carried by neural responses [4952]. Intuitively, for our call stimulus set consisting of 16 calls, a neuron that exhibits 16 unique response patterns, each corresponding to a call, would provide the maximal MI about the stimulus set (in this case, 4 bits of information). We computed the MI between the responses and stimuli, limiting our analyses to the first 1,457 ms of response and postresponse period (see Materials and methods). When we computed the average MI in 100 ms time bins (50 ms slide; see Materials and methods) of the population of A1 L4 neurons as has been done in most earlier studies [4952], we found low information levels throughout the response duration (Fig 8A, yellow) that were not significantly different (two-sided t test with FDR correction at each time point) from population MI present in the vMGB population (Fig 8A, blue). However, consistent with a recent result showing decreasing information content in the ascending auditory pathway of anesthetized GPs [53], we found significantly lower MI levels in the A1 L2/3 population (Fig 8A, red). We confirmed that this result held over a wide range of window sizes used for analysis (S1 Fig and data in Supporting information file S9 Data).

Fig 8. Reformatting of stimulus information in A1 L2/3.

Fig 8

(A) Population average of MI as a function of time in vMGB (blue), A1 L4 (yellow), and A1 L2/3 (red) neurons. Lines correspond to means and shading to 1 SEM. Colored dots represent results of statistical testing (p < 0.05; two-sided t test with FDR correction for multiple comparisons). (B–D) MI for 2 example neurons each from vMGB (B), A1 L4 (C), and A1 L2/3 (D). The example neurons are the same as the left 2 examples from Fig 3A–3C. Red crosses correspond to high MI time bins. (E–G) ISSI for the vMGB neurons in (B), the A1 L4 neurons in (C), and the A1 L2/3 neurons in (D). Darker colors correspond to higher ISSI values. (H) Distributions of SIMI for vMGB, A1 L4, and A1 L2/3 neurons. Horizontal line corresponds to median, and colored area corresponds to interquartile range. (I) Distributions of ISSI−PSTH correlation coefficients for vMGB, A1 L4, and A1 L2/3 neurons. (J) Distributions of ISSI−PSTH slopes for vMGB, A1 L4, and A1 L2/3 neurons. Asterisks correspond to: *p < 0.05, **p < 0.01, ***p < 0.005, ****p < 0.001 (Kruskal–Wallis test with post hoc Dunn–Sidak tests, exact p-values in main text). Data underlying this figure can be found in Supporting information file S8 Data. FDR, false discovery rate; MI, mutual information; PSTH, peristimulus time histogram.

To understand how lower population MI levels might arise and to test whether this negatively impacted stimulus decodability in A1 L2/3, we decomposed how information was distributed across 2 factors, (1) individual neurons and (2) individual stimuli, in the vMGB, A1 L4, and A1 L2/3 neural populations. First, we examined how MI was distributed over the individual neurons that make up the population average in Fig 8A. Fig 8C shows MI as a function of time for 2 example A1 L4 neurons (the same neurons as in Fig 3B, left and center). Although the magnitudes of MI are different, the MI over time is sustained in both cases, which means that when averaged, the mean MI will also be sustained over time (as in Fig 8A, yellow). In contrast, Fig 8D shows MI for 2 example A1 L2/3 neurons (the same neurons as in Fig 3C, left and center). Here, the MI is close to zero for many time bins and shows peaks in time bins that are nonoverlapping between neurons, which means when averaged, the mean MI will be at a low value (as in Fig 8A, red). Second, we decomposed the MI into stimulus-specific information (ISSI; [5456]), which measures how much information about each stimulus is provided by the response. Note that the conventionally computed MI is the weighted average of ISSI across all stimuli. Fig 8E–8G show the decomposition of the MI of the example neurons in Fig 8B–8D, respectively, into the ISSI for each call stimulus. In A1 L4 (Fig 8F), ISSI was evenly distributed across all stimuli and time bins, resulting in the average (the MI; Fig 8C) being at a sustained level over time. In A1 L2/3, however (Fig 8G), ISSI was very high (approaching 3 bits) for specific stimuli only at specific time bins. Thus, average ISSI across stimuli, as is done to compute MI (Fig 8D), approached zero for most time bins and severely underestimated the informativeness of the response.

To quantify whether a high-MI time bin (see Materials and methods; red crosses in Fig 8B–8D) arises from an approximately normal distribution of ISSI across all stimuli for that time bin (as in Fig 8F), or from a highly skewed ISSI distribution across stimuli for that time bin (as in Fig 8G), we computed an MI sparsity index. For this analysis, we first identified high-MI time bins (bins that have more than mean + 1 standard deviation of population MI) and quantified the distribution of ISSI values only for these bins (SIMI; see Materials and methods). SIMI increased significantly between all 3 processing stages tested (Fig 8H; p = 1.1 × 10−6, Kruskal–Wallis test; Dunn–Sidak post hoc test p-values are as follows: vMGB versus A1 L4: p = 0.007, A1 L2/3 versus vMGB: p = 5.0 × 10−7, A1 L2/3 versus A1 L4: p = 0.012), with A1 L2/3 neurons being informative about only a few calls in their most informative time bins.

MI analysis only takes into account spike patterns but does not distinguish between the presence or absence of spikes. In other words, if a neuron responds to 15 of the 16 call stimuli and is inhibited by 1 call, the information provided by this neuron about the stimulus set is equivalent to that provided by a neuron that responds to only one call. To determine whether ISSI is provided by the presence or absence of spikes, we computed the cross-correlation between the PSTH and ISSI across all time bins for neurons in vMGB, A1 L4, and A1 L2/3. Compared to vMGB and A1 L4, A1 L2/3 neurons showed higher ISSI−PSTH correlations, suggesting that A1 L2/3 responses were informative because of presence of spikes (Fig 8I; p = 9.3 × 10−6, Kruskal–Wallis test; Dunn–Sidak post hoc test p-values are as follows: vMGB versus A1 L4: p = 0.240, A1 L2/3 versus vMGB: p = 1.0 × 10−5, A1 L2/3 versus A1 L4: p = 0.001). Compared to A1 L4, the ISSI−PSTH relationship in A1 L2/3 also showed a significantly higher slope, indicating that each spike from an A1 L2/3 neuron carried greater stimulus-specific information (Fig 8J; p = 2.7 × 10−6, Kruskal–Wallis test; Dunn–Sidak post hoc test p-values are as follows: vMGB versus A1 L4: p = 0.021, A1 L2/3 versus vMGB: p = 1.3 × 10−6, A1 L2/3 versus A1 L4: p = 0.007). We confirmed that these results were consistent over a wide range of window sizes used for analysis (S1 Fig).

Table 1 is a summary of all statistical comparisons of basic tuning properties, selectivity metrics, STRF metrics, and information theoretic metrics of vMGB, A1 L4, and A1 L2/3 neurons. Where possible, we estimated effect size using the nonparametric Cliff delta (d) ([57]; range: [−1, 1]; implemented using code from https://github.com/GRousselet/matlab_stats). As a guideline for interpreting these values, the effect size may be considered “small” if 0.11 < |d| ≤ 0.28, “medium” if 0.28 < |d| ≤ 0.43, and “large” if |d| > 0.43 [58]. Asterisks denote statistical significance. If call selectivity gradually developed over the 3 processing stages, one would expect to see differences in selectivity parameters pairwise between all 3 processing stages. In contrast, if selectivity arose de novo in superficial cortical layers, vMGB and A1 L4 parameter distributions would not be significantly different (second column), but A1 L2/3 and A1 L4 (as well as A1 L2/3 versus vMGB; third and fourth columns) would show significant differences. Our results support the latter possibility and the idea that while subcortical activity and inputs to A1 represent vocalizations densely and based on spectral content, a call feature-based representation emerges in A1 L2/3 that dramatically transforms how information about conspecific calls is represented in A1 outputs.

Table 1. Statistical summary of comparisons between vMGB, A1 L4, and A1 L2/3.

Parameter vMGB vs. A1 L4 A1 L2/3 vs. vMGB A1 L2/3 vs. A1 L4
Basic properties
Bandwidth ANCOVA (post hoc Tukey HSD) 0.36 (*) −0.54 (***) −0.12 (n.s.)
Spontaneous rate 0.25 (n.s.) −0.5 (***) −0.24 (n.s.)
Selectivity parameters
Selectivity (overall firing rate) −0.14 (n.s.) −0.32 (****) −0.48 (****)
Selectivity (response windows) 0.07 (n.s.) −0.52 (****) −0.44 (****)
No. of windows and response length 2D K–S (Bonferroni correction) (n.s.) (****) (***)
Kurtosis −0.04 (n.s.) 0.49 (****) 0.46 (****)
Activity fraction 0.10 (n.s.) −0.47 (***) −0.36 (***)
STRF parameters
r 0.02 (n.s.) −0.46 (***) −0.45 (****)
STRF sparsity −0.13 (n.s.) −0.28 (n.s.) −0.38 (**)
STRF kurtosis −0.01 (n.s.) −0.54 (****) −0.51 (****)
MI analyses (100 ms time bins)
Population MI 2-sided t test (FDR correction) n.s. Few time points Many time points
SIMI −0.44 (**) 0.65 (****) 0.40 (*)
ISSI–PSTH correlation −0.21 (n.s.) 0.66 (****) 0.43 (***)
ISSI–PSTH slope −0.36 (*) 0.67 (****) 0.39 (**)

Numbers denote effect size (Cliff delta, range [−1, 1]). Asterisks in parentheses denote statistical significance (****p < 0.001, ***p < 0.005, **p < 0.01, *p < 0.05, n.s., not significant). All tests are Kruskal–Wallis tests with post hoc Dunn–Sidak tests unless noted otherwise.

FDR, false discovery rate; HSD, honestly significant difference; K–S, Kolmogorov–Smirnov; MI, mutual information; PSTH, peristimulus time histogram; STRF, spectrotemporal receptive field.

Discussion

Although many previous studies have explored the neural representation of conspecific calls in subcortical and cortical areas across species [69,1527,59], exactly where and how call selective responses emerge in the auditory processing hierarchy has remained unclear. In mice, some studies have suggested that selectivity for ultrasonic vocalizations (USVs) in a manner not consistent with spectral content might arise at subcortical stations [5] and lead to an overrepresentation of USV-selective responses in the IC [60]. However, other studies have suggested that this overrepresentation is explained by a tonotopic expansion of the representation of those frequencies and that USV responses are in fact consistent with spectral tuning of neurons [61]. In bats, the majority of neurons in subcortical processing stations responded to calls consistent with neurons’ frequency tuning [3,62]. In GPs, single neurons in the IC are not selective for particular call types or call features [16]. In the MGB, although single neurons follow call envelopes less precisely [15] and neural responses to calls are less predictable from neurons’ tone tuning [63], responses do not differentiate between natural and reversed versions of calls [64], suggesting that MGB responses are not call or call feature selective. At the level of A1, some studies have reported that single neurons show selectivity for natural calls over reversed calls [18] or that neurons seem to respond to calls that share similar spectrotemporal features [23], but by and large, neural responses to calls seem to be explained by the frequency tuning of neurons [7,21]. At the level of secondary cortex, neurons have been shown to be highly selective for call type in primates [8,9] and GPs (Area S and VRB [6]). However, because of some technical limitations of these studies, including the use of anesthesia, limited stimulus sets, multiunit recordings, or not comparing across processing stages, specifically across cortical laminae, it is difficult to evaluate where transformations to call representation begin to occur. Answering the “where” question is a critical first step that will enable the targeting of experiments probing the neural mechanisms underlying these transformations to the appropriate target processing stage. In this study, we overcame these limitations by simultaneously (1) conducting experiments in unanesthetized animals, (2) using an extensive set of conspecific calls as stimuli, (3) comparing across thalamic and cortical processing stages, and (4) separating A1 neurons recorded from thalamorecipient and superficial layers. We found that whereas call representations in vMGB and A1 L4 were similar, a critical transformation occurs between A1 L4 and A1 L2/3. While vMGB and A1 L4 neurons seemed to respond primarily to the spectral content of calls resulting in a dense representation of calls, many A1 L2/3 responses were contingent on the presence of specific spectrotemporal features, resulting in a highly sparse representation of calls.

This observed transformation is consistent with previously reported increases in the nonlinearity of neural receptive fields in marmosets [31], increases in sparsity of responses in rats [65], and some reports of increased receptive field complexity in superficial A1 layers (in cats; [6668]). This transformation is also consistent with ultrahigh field human fMRI studies showing that supragranular BOLD responses are less readily explained using simple frequency tuning models [33]. Thus, the transformation of sound representation between A1 L4 and A1 L2/3 appears to be a conserved phenomenon across species, from GPs to humans. In nonhuman primates, secondary auditory cortical areas have been shown to exhibit call-selective responses [8,9], and the highest sensory cortical regions of the auditory processing pathway preferentially represent conspecific calls [1012]. Our results suggest that the emergence of call feature selectivity at supragranular A1 layers is a critical first step in building call-selective cortical specializations.

How could highly feature-selective neurons be generated? In an earlier study in marmosets, many A1 neurons recorded at shallow cortical depths were combination selective, i.e., these neurons showed responses only when specific frequencies were present with precise temporal relationships [31]. Such nonlinear mechanisms could generate call feature selectivity, but how precise temporal delays necessary for this computation are generated in the A1 circuit remains an open question. A second possibility is that although A1 L4 neurons are not selective for call type in that they respond to all categories, responses to some calls in a subset of time bins may be marginally stronger (Fig 9A and 9B). Pooling a number of A1 L4 neurons that exhibit similar marginally stronger responses to the same time bin, but whose responses are uncorrelated otherwise, could accentuate the differences between this preferred time bin and other bins. The higher SIMI observed in A1 L4 neurons compared to vMGB neurons supports the notion that there may be local periods of high information in the A1 L4 population responses. Applying a strong nonlinearity to these pooled inputs could in principle create A1 L2/3 responses that are highly selective for particular spectrotemporal call features (Fig 9C and 9D). Supporting the notion that A1 L2/3 neurons might be applying high thresholds is the fact that A1 L2/3 neurons are known to exhibit very low spontaneous rates across species [31,63], including in our own data (Fig 1E).

Fig 9. Working model of generating call selectivity in the auditory cortical hierarchy.

Fig 9

(A) Alternating nonlinear (high threshold) and more linear (pooling) stages that could result in call-selective responses in secondary cortex. (B) Schematic of nonselective A1 L4 neural responses that could show overlap in a few time bins. (C, D) A high threshold could be applied to different pools of A1 L4 neurons to result in A1 L2/3 responses that are selective to specific call features. (E) A more linear operation could pool over A1 L2/3 neurons that are selective for features belonging to the same call category to result in call category-selective secondary cortical responses.

Extending this model to the next linear (pooling) stage, the responses of multiple feature-selective A1 L2/3 neurons that respond to features belonging to the same call category could be integrated by neurons in secondary cortical areas to result in sustained call category-selective responses (Fig 9E). In anesthetized GPs, neurons that show dense firing with high contrast between call categories, which is highly useful in discriminating between call categories, have been reported in secondary areas VRB and S [6]. It is yet to be determined whether additional mechanisms could be used to increase call category selectivity by further restricting responses to only if call features are detected in a particular temporal sequence, which, for example, could be achieved by some forms of dendritic computation [6971], or using local axonal conduction delays at very short time scales [72]. Our proposed model is based on model architectures with alternating linear and nonlinear stages that have been used to explain responses in inferotemporal cortex [73]. These models are based on exclusively excitatory and feedforward operations. Other models, for example, incorporating recurrent excitatory inputs that have been shown to sharpen cortical tuning [74], or those involving cortical inhibition that could also fine-tune cortical selectivity [7578], represent alternative architectures that are more complex but biologically realistic. Specific cortical inhibitory cell types, for example, somatostatin-expressing interneurons, might play a role in generating sharp frequency tuning [79]. Thus, extensive theoretical and experimental work is necessary to test these models and dissect the neural mechanisms underlying the generation of feature selectivity.

What could be the advantages of a highly sparse representation? Extensive work in the visual cortex has proposed that sparse coding could allow for increased storage capacity for associative memory, is more energy efficient, and could make read out by downstream areas easier [80]. The possibility of easier readout is especially interesting in the auditory system, where highly variable continuous inputs need to be parsed and sequenced into categorical units (for example, words in human speech or call category in animal communication calls). The “dense” codes we found in vMGB and A1 L4 are redundant to some degree because neurons respond to highly overlapping stimulus sets. Thus, the activity of a single neuron in A1 L4 signaled the presence of multiple call features, with the actual feature identity being encoded over the population. This is reflected in our information theoretic analysis showing that in A1 L4, MI is distributed both over time bins and over neurons. A1 L2/3 effectively decorrelated A1 L4 activity, so that single neurons now carried high levels of information about the stimulus. One consequence of this decorrelation is an increase in the dimensionality of sound representation, which could serve to “untangle” [81] highly variable representations of different sound categories. As mentioned earlier, in a further processing step, a linear pooling operation could be used to pool responses of A1 L2/3 neurons that respond to different features of the same call type, resulting in truly call category-selective responses such as those observed in secondary cortical areas [8,9]. Further analysis is necessary to quantify the dimensionality of sound representation in different cortical layers and the separability of different call categories. In the auditory system, a second consideration for a neural code is robustness to environmental noise—realistic listening conditions add reverberations, noise, and competing sounds to the target sound impinging on our ears. It remains to be seen whether the feature-selective responses we have observed in A1 L2/3 neurons will remain invariant to these perturbations and will provide a more robust representation of sounds than the dense representations in A1 L4.

In conclusion, by recording from successive auditory processing stages in awake animals using a rich and behaviorally relevant stimulus set, we have demonstrated that rather than a gradual emergence of feature selectivity over the auditory processing hierarchy, selectivity for sound features appears to emerge de novo in the superficial layers of auditory cortex, resulting in a highly sparse representation of sounds by A1 L2/3 neurons. Our data thus identify that critical transformations to sound representations occur at the superficial layers of A1. These data set the stage for further studies investigating the biophysical and circuit mechanisms by which call feature selectivity arises from nonselective inputs, and how these feature-selective responses could be read out by downstream call category-selective neurons. Our data suggest that the root of observed cortical specializations for call processing [1012] could in fact reside in primary auditory cortex.

Materials and methods

Ethics

All experimental procedures conformed to NIH Guide for the care and use of laboratory animals and were approved by the Institutional Animal Care and Use Committee (IACUC) of the University of Pittsburgh (protocol number 18062947).

Animals

We acquired data from 4 male and 2 female adult, wild-type, pigmented GPs (Cavia porcellus; Elm Hill Labs, Chelmsford, Massachusetts), weighing approximately 600–1,000 g over the course of the experiments.

Surgical procedures

All experiments were conducted in unanesthetized, head-fixed, passively listening animals. To achieve head fixation, a custom head post was first surgically anchored onto the skull using dental acrylic (Metabond, Parkell, Brentwood, NY) following aseptic techniques under isoflurane anesthesia. Chambers for electrophysiological recordings were positioned over the location of auditory cortex using anatomical landmarks [6,36,37]. Postsurgical care, including administration of systemic and topical analgesics, was provided for 3 to 5 days. Following a 2-week recovery period, animals were gradually adapted to the experimental setup by slowly increasing the duration of head fixation.

Acoustic stimuli

All stimuli were generated in Matlab (Mathworks, Natick, MA) at a sampling rate of 100 kHz, converted to analog (National Instruments, Austin, TX), attenuated (TDT, Alachua, FL), power amplified (TDT, Alachua, FL), and delivered through a speaker (TangBand, Taipei, Taiwan) located approximately 90 cm from the animal on the contralateral side. We used a wide variety of stimuli including pure tones, noise bursts, frequency- and amplitude-modulated sounds, two-tone pips, and conspecific vocalizations as search stimuli to initially detect and isolate single units. Once we isolated a unit, we delivered pure tones (50 or 100 ms) covering 7 octaves in frequency (200 Hz to 25.6 kHz, 10 steps/oct.) at different sound levels (20 dB SPL spacing) to characterize its frequency response area. We defined the best frequency of the unit as the frequency eliciting the highest firing rate, best level as the sound level eliciting the highest firing rate. The bandwidth of the unit was estimated using a rectangle fit to the frequency tuning curve at the best level [82]. After characterizing basic tuning properties, we presented conspecific vocalization stimuli. All vocalizations were recorded in our animal colony using Sound Analysis Pro [83] by placing one or more animals in a sound-attenuated booth and by recording vocalizations using a directional microphone (Behringer, Willich, Germany). Two observers manually segmented and classified vocalizations into categories based on previously published criteria [6,34,35]. We verified high interobserver reliability using Cohen Kappa statistic (κ = 0.8). In electrophysiological experiments, we typically presented 2 exemplars each of 8 vocalization categories (16 vocalization stimuli; 0.4- to 3.5-second length depending on call type; typically, 10 repetitions of each stimulus). For some units, we presented additional exemplars belonging to some categories (24 stimuli) but only presented 5 repetitions. All vocalizations were normalized for RMS power and presented at 70 dB SPL in random order, with a random intertrial interval between 2 and 3 seconds. For some units, we also presented vocalizations to which we added reverberations or noise (not presented in the current manuscript).

Electrophysiology

All recordings were conducted in a sound-attenuated booth (IAC, Naperville, IL) whose walls were lined with anechoic foam (Pinta Acoustics, Minneapolis, MN). Animals were head fixed in a custom acrylic enclosure affixed to a vibration–isolation tabletop that provided loose restraint of the body. We recorded the activity of single units in the vMGB and identified cortical laminae of primary auditory cortex (A1). We sequentially performed small craniotomies (approximately 1 mm dia.) within the recording chamber using a dental drill (Osada, Los Angeles, CA) attached to a stereotactic manipulator (David Kopf Instruments, Tujunga, CA) to reach regions of interest. For vMGB recordings, we targeted previously published stereotactic coordinates [84,85] by performing a caudally angled craniotomy in the caudal part of the chamber. The location of the electrode in the vMGB was confirmed using electrophysiological properties (strong tone responses, low response latency, and expected tonotopic organization [86,87]). For cortical recordings, we performed craniotomies over the expected anatomical location of A1 [6,36,37] angled to be roughly perpendicular to the cortical surface. We used strong tone responses and tonotopic reversals to confirm that the recording location was within A1. In each recording session, we used a hydraulic microdrive (FHC) to advance a tungsten microelectrode (FHC, Bowdoin, ME or A-M Systems, Sequim, WA; 2 to 5 MΩ impedance) through the dura into the underlying target tissue. Electrophysiological signals were digitized and amplified using a low-noise amplifier (Ripple Neuro, Salt Lake City, UT), and data visualized online (Trellis software suite). We played a wide variety of search stimuli while slowly advancing the electrode. When a putative spike was detected, we used a template-matching algorithm for online spike sorting to isolate single units. Sorting was further refined offline at the conclusion of the experimental session (MKSort, provided by Ripple Neuro, Salt Lake City, UT). Using this technique, we typically acquired spike data from 1 to 3 single units simultaneously. Spike waveforms were classified into putative RS and FS categories using the peak-to-trough ratio and spike width as parameters. We only considered well-isolated single units, defined as having a peak amplitude at least 5.5 standard deviations above noise baseline, for further analysis. For A1 recordings, we sequentially recorded neural activity from superficial to deep cortical layers. At the end of each electrode track, we advanced the electrode to a depth of approximately 2 mm and acquired LFP responses every 100 μm while retracting the electrode. To do so, we presented 100 repetitions of a pure tone at 70 dB SPL, with pure tone frequency chosen to match the best frequency of the recorded column. From these local potential data, we calculated the CSD, defined as the second spatial derivative of the LFP, based on which we assigned recorded units to thalamorecipient or superficial layers [38]. After the electrode was completely retracted, the craniotomy was filled with antibiotic ointment, and recording chambers sealed using a silicone polymer (KwikSil or similar). Recording sessions were limited to 4 hours, and we typically recorded from each craniotomy for 4 to 8 electrode tracks. Craniotomies were sealed with dental cement after data acquisition was completed.

Data analysis and statistics

Analysis was based on data from 45 L2/3 RS neurons, 67 L4 RS neurons, and 33 vMGB neurons that responded to at least one vocalization in our stimulus set. We also isolated 10 call-responsive FS neurons from A1 recordings, which were not analyzed in this study.

Response window analysis

We obtained response rate estimates limited to small time bins using an algorithm similar to Issa and Wang [88] (also see [8991]). Briefly, we started with seed windows selected using relaxed criteria and gradually added additional windows until the final window met stringent criteria. To do so, we first determined whether the responses to any call in any 100 ms window (50 ms slide) located from 50 ms poststimulus onset until 100 ms poststimulus offset met 2 criteria: (1) the average rate exceeded 6 SEM of the spontaneous rate and (2) the trial-wise response distribution within the window was significantly different from the spontaneous response distribution with psoft ≤ 0.1 (single-tailed t test with FDR correction; this test is used for determining all p-values for response window analysis). The initial window could then grow in either direction by adding neighboring windows, if (1) the response in window to be added met a threshold of psoft ≤ 0.1, (2) average rate in the enlarged window exceeded 10 SEM of the spontaneous rate, and (3) the trial-wise response distribution within the enlarged window met a threshold of padd ≤ 0.01. We successively added response segments until these thresholds could not be met. To avoid a single bursty trial from spuriously increasing response rate, we replaced trial-wise rates with a z-score > 1.96 by the mean response rate of the enlarged window. The resultant window was considered the final response window if (1) the average rate exceeded 14 SEM of the spontaneous rate, (2) if the trial-wise response distribution within the final window met a threshold of pfinal ≤ 0.0001, and (3) if responses were present on at least 60% of the trials. Any windows less than 100 ms apart were coalesced if the resulting window still met the 3 final stringent criteria. If no response windows were detected for any call, we relaxed the following parameters in order: minimum trial threshold decreased to 50%, z-score for burst detection increased to 2.5, and window length increased to 200 ms (slide = 100 ms). For example, minimum responsive trial threshold was decreased to 50%, and burst detection z-score increased to 2.5 for the neuron in Fig 3C (right). Parameters for automated response window analysis were initially chosen to broadly match response regions to visual judgements of 3 independent observers in a small sample of neurons from the 3 processing stages. Results were verified to be largely consistent over a range of parameter values. While this automated analysis reliably detected excitatory responses, because of the very low spontaneous rates of cortical neurons, inhibitory responses could not be captured. Thus, when responses were mainly inhibitory rather than excitatory (2 neurons in A1 L2/3 and 9 neurons in A1 L4), the number of calls with significant responses was determined manually by 3 independent observers.

Quantification of selective responses

We quantified the selectivity of neural responses based on the following metrics: (1) We defined call selectivity as the number of call categories with significant responses—if at least one response window was detected for any exemplar belonging to a category, we counted the neuron as being responsive to that category. (2) The number of response windows per call. (3) The length of the response, which was the sum of all window lengths within a call, expressed as a fraction of the total length of that call. Together, metrics (2) and (3) indicated if a neuron was feature selective—for highly feature-selective neurons, we observed a small number of short windows, whereas for neurons with low selectivity, we observed many short windows or a single long window. We compared selectivity across processing stages using Kruskal–Wallis tests followed by pairwise post hoc tests. To quantify differences in feature selectivity across processing stages, we constructed two-dimensional distributions of the number of windows versus window length and evaluated significance using 2D K–S tests [38] with Bonferroni correction.

Sparsity

We estimated sparsity using 2 metrics. (1) As the reduced kurtosis of trial-wise firing rate responses [82,92], computed over a single window from 50 ms after stimulus onset to 100 ms after stimulus offset. A reduced kurtosis of zero indicates a normal distribution of firing rates across stimuli or response bins, suggesting a response that is not feature selective. High kurtosis values arise when many response rates are zero and few response rates are high, suggesting highly feature-selective responses. (2) As the activity fraction [40,41], defined as:

A=[i=1Nri/N]2i=1N[ri2/N] (1)

An activity fraction close to zero signifies highly sparse responses, and activity fraction close to one signifies dense responses. Sparsity across processing stages was compared using Kruskal–Wallis tests followed by pairwise post hoc Dunn–Sidak tests.

Receptive field models

We used the NEMS ([42,43]; https://github.com/LBHB/NEMS) as a platform to build LN models and estimate STRFs of call-responsive neurons. The input to the model consisted of the cochleagram of all call stimuli concatenated in time. To compute the cochleagram, we used a fast approximation algorithm that used weighted log-spaced frequency bins and 3 rate-level transformations corresponding to 3 categories of auditory nerve fibers ([44]; https://github.com/monzilur/cochlear_models). Previous work has shown that this transformation can adequately capture the inputs to auditory cortex [44]. The resolution of the cochleagram was set at 5 steps/oct. in frequency (total 6 oct. spanning 250 Hz to 16 kHz) and 20 ms in time. Linear weights and the parameters of a point nonlinearity (double exponential function) were estimated by gradient descent to minimize the squared error between the predicted PSTH and the actual PSTH (computed in 20 ms bins, averaged over 10 repetitions). The matrix of linear weights was taken to represent the receptive field or STRF of the neuron. We performed a nested cross-validation, where for every neuron’s call responses, we used 90% of the data to fit the models and the remaining 10% to validate the models. This procedure was repeated 10 times using nonoverlapping segments of validation data to fit and test the model, yielding 10 STRF estimates. The correlation coefficient between predicted responses from the validation data set (r) and actual responses was used as a metric of goodness of fit. A bootstrap procedure was used to test for significance of r values. For quantifying STRF complexity and display, we used the mean STRF (over the 10 cross-validation runs) multiplied by a significance mask. To estimate the mask, we used a bootstrap procedure by scrambling the actual linear weight matrices 1,000 times to estimate the distribution of weights at each time and frequency bin and used a two-tailed permutation test to evaluate if the observed STRF mean weights differed significantly (using FDR correction for 310 comparisons) from the bootstrap distributions. To quantify the complexity of STRFs, we used STRF sparsity [46,47], defined as the maximum absolute value of the significance-masked STRF divided by the standard deviation of the significance-masked STRF, and as a second metric, the kurtosis of significance-masked STRF weights. Sparsity and kurtosis across processing stages were compared using Kruskal–Wallis tests followed by Dunn–Sidak post hoc tests.

Information theoretic analyses

We used stimulus-specific information (ISSI) [5456] to estimate the amount of information that each recorded neuron provided about each stimulus. We also computed the weighted average of ISSI across stimuli to determine overall information content, which is conventionally referred to as the MI between the stimulus and response. Only neurons that had completed 10 trials for all the stimuli were considered for this analysis. Intuitively, if a neuron shows a consistent response pattern to a given stimulus, then it has high ISSI about that stimulus. To quantify ISSI, we extracted responses beginning 50 ms before stimulus onset and lasting until 50 ms after the length of the longest stimulus [49] in windows of varying lengths (14, 50, 100, 200, 300, and 400 ms with slide equal to half the window size). Because vocalizations had different lengths, based on a previous study [77], we restricted these analyses to the first 1,457 ms of the responses (457 ms shortest call length followed by a 1,000 ms poststimulus period where a subsequent stimulus was guaranteed not to be present). Responses to longer calls were thus truncated. For each window size, the ISSI in each time bin was calculated as:

ISSI=respp(resp|stim)*ISP(resp) (2)

where ISP(resp) is the information conveyed by a specific response pattern, calculated as:

ISP(resp)=TotalEntropyConditionalEntropy(resp) (3)
TotalEntropy=stimp(stim)*log2(p(stim)) (4)
ConditionalEntropy(resp)=stimp(stim|resp)*log2(p(stim|resp)) (5)

To correct for estimation bias arising from finite trial numbers that likely undersample response probability distributions, we subtracted an all-way shuffled estimate of ISSI (average of 100 randomizations [56]) from the value of ISSI estimated earlier. All reported values refer to the bias-corrected ISSI estimate.

Having obtained these ISSI estimates, we computed how ISSI values were distributed across time bins and across stimuli and how ISSI correlated with spiking responses of each neuron. To quantify how ISSI values were distributed across time bins and across stimuli, for each window size, we calculated an MI sparsity index (SIMI), defined as the mean kurtosis of ISSI values in high-MI time bins, with high-MI bins defined as bins with MI values exceeding 1 standard deviation of the MI values across all time bins. To determine whether high ISSI resulted from the presence or absence of spiking, we calculated the correlation between the ISSI and PSTH. Finally, to determine how much information was conveyed by each spike, we determined the slope of the ISSI versus PSTH distribution. Distributions of information-theoretic measures between A1 L4 and A1 L2/3 were compared using Kruskal–Wallis tests with post hoc pairwise tests. We chose the 100 ms window size (50 ms slide) for all comparisons shown in the main manuscript. Similar results were obtained across most tested window sizes (S1 Fig and S9 Data).

Supporting information

S1 Fig. Mutual information analyses performed across a range of analysis window sizes.

(PDF)

S1 Data. Data underlying Fig 1.

(XLSX)

S2 Data. Data underlying Fig 2.

(XLSX)

S3 Data. Data underlying Fig 3.

(XLSX)

S4 Data. Data underlying Fig 4.

(XLSX)

S5 Data. Data underlying Fig 5.

(XLSX)

S6 Data. Data underlying Fig 6.

(XLSX)

S7 Data. Data underlying Fig 7.

(XLSX)

S8 Data. Data underlying Fig 8.

(XLSX)

S9 Data. Data underlying S1 Fig.

(XLSX)

Acknowledgments

We thank Dr. Yi Zhou (ASU) for insightful comments on the manuscript. We thank Isha Kumbam and Samuel Li for recording and classifying guinea pig vocalizations; Dr. Marianny Pernia, Shi Tong Liu, and Dr. Flora Antunes for assistance with electrophysiological experiments; and Dr. Marianny Pernia for assistance with analysis. We thank Stacy Cashman and Mark Petts for surgical support; Dr. Amanda Fisher for veterinary support; and Jillian Harr, Sarah Gray, Julia Skrinjar, Brent Barbe, and Elizabeth Chasky for animal care.

Abbreviations

AL

anterolateral

BOLD

blood–oxygen level–dependent

CSD

current source density

FDR

false discovery rate

FS

fast-spiking

GP

guinea pig

HSD

honestly significant difference

IC

inferior colliculus

K–S

Kolmogorov–Smirnov

LFP

local field potential

LN

linear–nonlinear

MI

mutual information

NEMS

Neural Encoding Model System

PSTH

peristimulus time histogram

RS

regular-spiking

STRF

spectrotemporal receptive field

USV

ultrasonic vocalization

vMGB

ventral medial geniculate body

VRB

ventral–rostral belt

Data Availability

All relevant data are within the paper and its Supporting Information files.

Funding Statement

This work was supported by the National Institutes of Health, NIH R01DC017141, www.nih.gov, (SS); by the 2018 NARSAD Young Investigator grant, 27675, Brain and Behavior Research Foundation, https://www.bbrfoundation.org/, (SS) and by the Pennsylvania Lions Hearing Research Foundation, https://plhrf.org/ (SS). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Lewicki MS. Efficient coding of natural sounds. Nat Neurosci. 2002;5:356–63. doi: 10.1038/nn831 [DOI] [PubMed] [Google Scholar]
  • 2.Smith E, Lewicki MS. Efficient coding of time-relative structure using spikes. Neural Comput. 2005;17:19–45. doi: 10.1162/0899766052530839 [DOI] [PubMed] [Google Scholar]
  • 3.Bauer EE, Klug A, Pollak GD. Spectral determination of responses to species-specific calls in the dorsal nucleus of the lateral lemniscus. J Neurophysiol. 2002;88:1955–67. doi: 10.1152/jn.2002.88.4.1955 [DOI] [PubMed] [Google Scholar]
  • 4.Pollak GD. The dominant role of inhibition in creating response selectivities for communication calls in the brainstem auditory system. Hear Res. 2013;305:86–101. doi: 10.1016/j.heares.2013.03.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Roberts PD, Portfors CV. Responses to social vocalizations in the dorsal cochlear nucleus of mice. Front Syst Neurosci. 2015;9:1–13. doi: 10.3389/fnsys.2015.00001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Grimsley JMS, Shanbhag SJ, Palmer AR, Wallace MN. Processing of Communication Calls in Guinea Pig Auditory Cortex. PLoS ONE. 2012;7:e51646. doi: 10.1371/journal.pone.0051646 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Wollberg Z, Newman JD. Auditory cortex of squirrel monkey: Response patterns of single cells to species-specific vocalizations. Science. 1972:212–4. doi: 10.1126/science.175.4018.212 [DOI] [PubMed] [Google Scholar]
  • 8.Rauschecker JP, Tian B, Hauser M. Processing of complex sounds in the macaque nonprimary auditory cortex. Science. 1995;268:111–4. doi: 10.1126/science.7701330 [DOI] [PubMed] [Google Scholar]
  • 9.Tian B, Reser D, Durham A, Kustov A, Rauschecker JP. Functional specialization in rhesus monkey auditory cortex. Science. 2001;292:290–3. doi: 10.1126/science.1058911 [DOI] [PubMed] [Google Scholar]
  • 10.Petkov CI, Kayser C, Steudel T, Whittingstall K, Augath M. Logothetis NK. A voice region in the monkey brain. Nat Neurosci. 2008;11:367–74. doi: 10.1038/nn2043 [DOI] [PubMed] [Google Scholar]
  • 11.Perrodin C, Kayser C, Logothetis NK, Petkov CI. Voice cells in the primate temporal lobe. Curr Biol. 2011;21:1408–15. doi: 10.1016/j.cub.2011.07.028 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Sadagopan S, Temiz-Karayol NZ, Voss HU. High-field functional magnetic resonance imaging of vocalization processing in marmosets. Sci Rep. 2015;5:10950. doi: 10.1038/srep10950 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Atencio CA, Sharpee TO, Schreiner CE. Receptive field dimensionality increases from the auditory midbrain to cortex. J Neurophysiol. 2012;107:2594–603. doi: 10.1152/jn.01025.2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Chechik G, Anderson MJ, Bar-Yosef O, Young ED, Tishby N, Nelken I. Reduction of information redundancy in the ascending auditory pathway. Neuron. 2006;51:359–68. doi: 10.1016/j.neuron.2006.06.030 [DOI] [PubMed] [Google Scholar]
  • 15.Šuta D, Popelář J, Kvašňák E, Syka J. Representation of species-specific vocalizations in the medial geniculate body of the guinea pig. Exp Brain Res. 2007;183:377–88. doi: 10.1007/s00221-007-1056-3 [DOI] [PubMed] [Google Scholar]
  • 16.Šuta D, Kvašňák E, Popelář J, Syka J. Representation of species-specific vocalizations in the inferior colliculus of the guinea pig. J Neurophysiol. 2003;90:3794–808. doi: 10.1152/jn.01175.2002 [DOI] [PubMed] [Google Scholar]
  • 17.Šuta D, Popelář J, Burianová J, Syka J. Cortical representation of species-specific vocalizations in guinea pig. PLoS ONE. 2013;8:e65432. doi: 10.1371/journal.pone.0065432 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Wang X, Merzenich MM, Beitel R, Schreiner CE. Representation of a species-specific vocalization in the primary auditory cortex of the common marmoset: Temporal and spectral characteristics. J Neurophysiol. 1995;74:2685–706. doi: 10.1152/jn.1995.74.6.2685 [DOI] [PubMed] [Google Scholar]
  • 19.Wang X, Kadia SC. Differential representation of species-specific primate vocalizations in the auditory cortices of marmoset and cat. J Neurophysiol. 2001;86:2616–20. doi: 10.1152/jn.2001.86.5.2616 [DOI] [PubMed] [Google Scholar]
  • 20.Glass I, Wollberg Z. Auditory Cortex Responses to Sequences of Normal and Reversed Squirrel Monkey Vocalizations. Brain Behav Evol. 1983;22:13–21. doi: 10.1159/000121503 [DOI] [PubMed] [Google Scholar]
  • 21.Newman JD, Wollberg Z. Multiple coding of species-specific vocalizations in the auditory cortex of squirrel monkeys. Brain Res. 1973;54:287–304. doi: 10.1016/0006-8993(73)90050-4 [DOI] [PubMed] [Google Scholar]
  • 22.Symmes D, Alexander GE, Newman JD. Neural processing of vocalizations and artificial stimulin the medial geniculate body of squirrel monkey. Hear Res. 1980;3:133–46. doi: 10.1016/0378-5955(80)90041-6 [DOI] [PubMed] [Google Scholar]
  • 23.Winter P, Funkenstein HH. The Effect of Species-Specific Vocalization on the Discharge of Auditory Cortical Cells in the Awake Squirrel Monkey (Saimiri sciureus). Exp Brain Res. 1973;18:489–504. doi: 10.1007/BF00234133 [DOI] [PubMed] [Google Scholar]
  • 24.Aitkin L, Tran L, Syka J. The responses of neurons in subdivisions of the inferior colliculus of cats to tonal, noise and vocal stimuli. Exp Brain Res. 1994;98:53–64. doi: 10.1007/BF00229109 [DOI] [PubMed] [Google Scholar]
  • 25.Buchwald J, Dickerson L, Harrison J, Hinman C. Medial geniculate body unit responses to cat cries. In: Auditory pathway. Springer, 1988, pp. 319–322. [Google Scholar]
  • 26.Komiya H, Eggermont JJ. Neuronal responses in cat primary auditory cortex to natural and altered species-specific calls. Hear Res. 2000;150:27–42. doi: 10.1016/s0378-5955(00)00170-2 [DOI] [PubMed] [Google Scholar]
  • 27.Gourévitch B, Eggermont JJ. Spatial representation of neural responses to natural and altered conspecific vocalizations in cat auditory cortex. J Neurophysiol. 2007;97:144–58. doi: 10.1152/jn.00807.2006 [DOI] [PubMed] [Google Scholar]
  • 28.Agamaite JA, Chang C-J, Osmanski MS. Wang X. A quantitative acoustic analysis of the vocal repertoire of the common marmoset (Callithrix jacchus). J Acoust Soc Am. 2015;138:2906–28. doi: 10.1121/1.4934268 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Liu ST, Montes-Lourido P, Wang X, Sadagopan S. Optimal features for auditory categorization. Nat Commun. 2019;10:1–14. doi: 10.1038/s41467-018-07882-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Šuta D, Popelář J, Syka J. Coding of communication calls in the subcortical and cortical structures of the auditory system. Physiol Res. 2008;57(Suppl 3):S149–59. [DOI] [PubMed] [Google Scholar]
  • 31.Sadagopan S, Wang X. Nonlinear spectrotemporal interactions underlying selectivity for complex sounds in auditory cortex. J Neurosci. 2009;29:11192–202. doi: 10.1523/JNEUROSCI.1286-09.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Gaucher Q, Huetz C, Gourévitch B, Laudanski J, Occelli F, Edeline JM. How do auditory cortex neurons represent communication sounds? Hear Res. 2013;305:102–12. doi: 10.1016/j.heares.2013.03.011 [DOI] [PubMed] [Google Scholar]
  • 33.Moerel M, De Martino F, Uğurbil K, Yacoub E, Formisano E. Processing complexity increases in superficial layers of human primary auditory cortex. Sci Rep. 2019;9:1–9. doi: 10.1038/s41598-018-37186-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Berryman JC. Guinea-pig vocalizations: Their structure, causation and function. Z Tierpsychol. 1976;41(1):80–106. doi: 10.1111/j.1439-0310.1976.tb00471.x [DOI] [PubMed] [Google Scholar]
  • 35.Eisenberg JF. In: Weir BJ, Rowlands IW, editors. Biology of hystricomorph rodents. 1974;1974:211–44. [Google Scholar]
  • 36.Redies H, Sieben U, Creutzfeldt OD. Functional subdivisions in the auditory cortex of the guinea pig. J Comp Neurol. 1989;282:473–88. doi: 10.1002/cne.902820402 [DOI] [PubMed] [Google Scholar]
  • 37.Wallace MN, Rutkowski RG, Palmer AR. Identification and localisation of auditory areas in guinea pig cortex. Exp Brain Res. 2000;132:445–56. doi: 10.1007/s002210000362 [DOI] [PubMed] [Google Scholar]
  • 38.Kajikawa Y, Schroeder CE. How local is the local field potential? Neuron. 2011;72:847–58. doi: 10.1016/j.neuron.2011.09.029 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Lau B. 2-d Kolmorogov-Smirnov test, n-d energy test, Hotelling T^2 test; 2020. [cited 2020 Sept 11]. Database: GitHub [internet]. Available from: https://github.com/brian-lau/multdist. [Google Scholar]
  • 40.Rolls ET, Tovee MJ. Sparseness of the neuronal representation of stimuli in the primate temporal visual cortex. J Neurophysiol. 1995;73(2):713–26. doi: 10.1152/jn.1995.73.2.713 [DOI] [PubMed] [Google Scholar]
  • 41.Vinje WE, Gallant JL. Sparse coding and decorrelation in primary visual cortex during natural vision. Science. 2000. 18;287(5456):1273–6. doi: 10.1126/science.287.5456.1273 [DOI] [PubMed] [Google Scholar]
  • 42.Thorson IL, Liénard J, David SV. The essential complexity of auditory receptive fields. PLoS Comp Biol. 2015;11(12):e1004628. doi: 10.1371/journal.pcbi.1004628 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Pennington JR, David SV. Complementary effects of adaptation and gain control on sound encoding in primary auditory cortex. Eneuro. 2020;7(6). doi: 10.1523/ENEURO.0205-20.2020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Rahman M, Willmore BD, King AJ, Harper NS. Simple transformations capture auditory input to cortex. Proc Natl Acad Sci U S A. 2020;117(45):28442–51. doi: 10.1073/pnas.1922033117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Feng L, Wang X. Harmonic template neurons in primate auditory cortex underlying complex sound processing. Proc Natl Acad Sci U S A. 2017;114(5):E840–8. doi: 10.1073/pnas.1607519114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Atiani S, David SV, Elgueda D, Locastro M, Radtke-Schuller S, Shamma SA, et al. Emergent selectivity for task-relevant stimuli in higher-order auditory cortex. Neuron. 2014;82(2):486–99. doi: 10.1016/j.neuron.2014.02.029 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Elgueda D, Duque D, Radtke-Schuller S, Yin P, David SV, Shamma SA, et al. State-dependent encoding of sound and behavioral meaning in a tertiary region of the ferret auditory cortex. Nat Neurosci. 2019;22(3):447–59. doi: 10.1038/s41593-018-0317-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Cover TM. Elements of information theory. John Wiley & Sons, 1999. [Google Scholar]
  • 49.Liu RC, Schreiner CE. Auditory cortical detection and discrimination correlates with communicative significance. PLoS Biol. 2007;5:e173. doi: 10.1371/journal.pbio.0050173 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Strong SP, Koberle R, Van Steveninck RRDR, Bialek W. Entropy and information in neural spike trains. Phys Rev Lett. 1998;80:197. [Google Scholar]
  • 51.Vinje WE, Gallant JL. Natural stimulation of the nonclassical receptive field increases information transmission efficiency in V1. J Neurosci. 2002;22:2904–15. doi: 20026216 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Reinagel P, Reid RC. Temporal coding of visual information in the thalamus. J Neurosci. 2000;20:5392–400. doi: 10.1523/JNEUROSCI.20-14-05392.2000 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Souffi S, Lorenzi C, Varnet L, Huetz C, Edeline JM. Noise-Sensitive but More Precise Subcortical Representations Coexist with Robust Cortical Encoding of Natural Vocalizations. J Neurosci. 2020;40:5228–46. doi: 10.1523/JNEUROSCI.2731-19.2020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Butts DA. How much information is associated with a particular stimulus? Netw Comput Neural Syst. 2003;14:177–87. [PubMed] [Google Scholar]
  • 55.Butts DA, Goldman MS. Tuning curves, neuronal variability, and sensory coding. PLoS Biol. 2006;4:e92. doi: 10.1371/journal.pbio.0040092 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Montgomery N, Wehr M. Auditory cortical neurons convey maximal stimulus-specific information at their best frequency. J Neurosci. 2010;30:13362–6. doi: 10.1523/JNEUROSCI.2899-10.2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Cliff N. Dominance statistics: Ordinal analyses to answer ordinal questions. Psychol Bull. 1993;114:494–509. [Google Scholar]
  • 58.Vargha A, Delaney HD. A critique and improvement of the CL common language effect size statistics of McGraw and Wong. J Educ Behav Stat. 2000;25:101–32. [Google Scholar]
  • 59.Wallace MN, Rutkowski RG, Palmer AR. Responses to the purr call in three areas of the guinea pig auditory cortex. Neuroreport. 2005;16:2001–5. doi: 10.1097/00001756-200512190-00006 [DOI] [PubMed] [Google Scholar]
  • 60.Portfors CV, Roberts PD, Jonson K. Over-representation of species-specific vocalizationwas in the awake mouse inferior colliculus. Neuroscience. 2009;162:486–500. doi: 10.1016/j.neuroscience.2009.04.056 [DOI] [PubMed] [Google Scholar]
  • 61.Garcia-Lazaro JA, Shepard KN, Miranda JA, Liu RC, Lesica NA. An overrepresentation of high frequencies in the mouse inferior colliculus supports the processing of ultrasonic vocalizations. PLoS ONE. 2015;10(8):e0133251. doi: 10.1371/journal.pone.0133251 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Klug A, Bauer EE, Hanson JT, Hurley L, Meitzen J, Pollack GD. Response selectivity for species-specific calls in the inferior colliculus of Mexican free-tailed bats is generated by inhibition. J Neurophysiol. 2002;88:1941–54. doi: 10.1152/jn.2002.88.4.1941 [DOI] [PubMed] [Google Scholar]
  • 63.Tanaka H, Taniguchi I. Responses of medial geniculate neurons to species-specific vocalized sounds in the guinea pig. Jpn J Physiol. 1991;41:817–29. doi: 10.2170/jjphysiol.41.817 [DOI] [PubMed] [Google Scholar]
  • 64.Philibert B, Laudanski J, Edeline JM. Auditory thalamus responses to guinea-pig vocalizations: A comparison between rat and guinea-pig. Hear Res. 2005;209:97–103. doi: 10.1016/j.heares.2005.07.004 [DOI] [PubMed] [Google Scholar]
  • 65.Hromádka T, DeWeese MR, Zador AM. Sparse representation of sounds in the unanesthetized auditory cortex. PLoS Biol. 2008;6:e16. doi: 10.1371/journal.pbio.0060016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Atencio CA, Schreiner CE. Laminar diversity of dynamic sound processing in cat primary auditory cortex. J Neurophysiol. 2010;103:192–205. doi: 10.1152/jn.00624.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Atencio CA, Sharpee TO, Schreiner CE. Hierarchical computation in the canonical auditory cortical circuit. Proc Natl Acad Sci U S A. 2009;106:21894–9. doi: 10.1073/pnas.0908383106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Sharpee TO, Atencio CA, Schreiner CE. Hierarchical representations in the auditory cortex. Curr Opin Neurobiol. 2011;21:761–7. doi: 10.1016/j.conb.2011.05.027 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Branco T, Clark BA, Häusser M. Dendritic discrimination of temporal input sequences in cortical neurons. Science. 2010;329(5999):1671–5. doi: 10.1126/science.1189664 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Kerlin A, Mohar B, Flickinger D, MacLennan BJ, Dean MB, Davis C, et al. Functional clustering of dendritic activity during decision-making. Elife. 2019;8:e46966. doi: 10.7554/eLife.46966 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Hemberger M, Shein-Idelson M, Pammer L, Laurent G. Reliable sequential activation of neural assemblies by single pyramidal cells in a three-layered cortex. Neuron. 2019;104(2):353–69. doi: 10.1016/j.neuron.2019.07.017 [DOI] [PubMed] [Google Scholar]
  • 72.Egger R, Tupikov Y, Elmaleh M, Katlowitz KA, Benezra SE, Picardo MA, et al. Local axonal conduction shapes the spatiotemporal properties of neural sequences. Cell. 2020;183:537–48. doi: 10.1016/j.cell.2020.09.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Riesenhuber M, Poggio T. Hierarchical models of object recognition in cortex. Nat Neurosci. 1999;2:1019–25. doi: 10.1038/14819 [DOI] [PubMed] [Google Scholar]
  • 74.Liu BH, Wu GK, Arbuckle R, Tao HW, Zhang LI. Defining cortical frequency tuning with recurrent excitatory circuitry. Nat Neurosci. 2007;10:1594–600. doi: 10.1038/nn2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Wu GK, Arbuckle R, Liu B, Tao HW, Zhang LI. Lateral sharpening of cortical frequency tuning by approximately balanced inhibition. Neuron. 2008;58:132–43. doi: 10.1016/j.neuron.2008.01.035 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Sadagopan S, Wang X. Contribution of inhibition to stimulus selectivity in primary auditory cortex of awake primates. J Neurosci. 2010;30:7314–25. doi: 10.1523/JNEUROSCI.5072-09.2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Gaucher Q, Huetz C, Gourévitch B, Edeline JM. Cortical inhibition reduces information redundancy at presentation of communication sounds in the primary auditory cortex. J Neurosci. 2013;33:10713–28. doi: 10.1523/JNEUROSCI.0079-13.2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Gaucher Q, Yger P, Edeline JM. Increasing excitation versus decreasing inhibition in auditory cortex: consequences on the discrimination performance between communication sounds. J Physiol. 2020;598:3765–85. doi: 10.1113/JP279902 [DOI] [PubMed] [Google Scholar]
  • 79.Kato HK, Asinof SK, Isaacson JS. Network-Level Control of Frequency Tuning in Auditory Cortex. Neuron. 2017;95:412–23. doi: 10.1016/j.neuron.2017.06.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Olshausen BA, Field DJ. Sparse coding of sensory inputs. Curr Opin Neurobiol. 2004;14:481–7. doi: 10.1016/j.conb.2004.07.007 [DOI] [PubMed] [Google Scholar]
  • 81.DiCarlo JJ, Cox DD. Untangling invariant object recognition. Trends Cogn Sci. 2007;11:333–41. doi: 10.1016/j.tics.2007.06.010 [DOI] [PubMed] [Google Scholar]
  • 82.Sadagopan S, Wang X. Level invariant representation of sounds by populations of neurons in primary auditory cortex. J Neurosci. 2008;28:3415–26. doi: 10.1523/JNEUROSCI.2743-07.2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Tchernichovski O, Nottebohm F, Ho CE, Pesaran B. Mitra PP. A procedure for an automated measurement of song similarity. Anim Behav. 2000;59:1167–76. doi: 10.1006/anbe.1999.1416 [DOI] [PubMed] [Google Scholar]
  • 84.Luparello TJ. Stereotaxic atlas of the forebrain of the guinea pig. Karger Basel, 1967. [Google Scholar]
  • 85.Redies H, Brandner S, Creutzfeldt OD. Anatomy of the auditory thalamocortical system of the guinea pig. J Comp Neurol. 1989;282:489–511. doi: 10.1002/cne.902820403 [DOI] [PubMed] [Google Scholar]
  • 86.Anderson LA, Wallace MN, Palmer AR. Identification of subdivisions in the medial geniculate body of the guinea pig. Hear Res. 2007;228:156–67. doi: 10.1016/j.heares.2007.02.005 [DOI] [PubMed] [Google Scholar]
  • 87.Wallace MN, Anderson LA, Palmer AR. Phase-locked responses to pure tones in the auditory thalamus. J Neurophysiol. 2007;98:1941–52. doi: 10.1152/jn.00697.2007 [DOI] [PubMed] [Google Scholar]
  • 88.Issa EB, Wang X. Sensory responses during sleep in primate primary and secondary auditory cortex. J Neurosci. 2008;28:14467–80. doi: 10.1523/JNEUROSCI.3086-08.2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Hanes DP, Thompson KG, Schall JD. Relationship of presaccadic activity in frontal eye field and supplementary eye field to saccade initiation in macaque: Poisson spike train analysis. Exp Brain Res. 1995;103:85–96. doi: 10.1007/BF00241967 [DOI] [PubMed] [Google Scholar]
  • 90.Legendy CR, Salcman M. Bursts and recurrences of bursts in the spike trains of spontaneously active striate cortex neurons. J Neurophysiol. 1985;53:926–39. doi: 10.1152/jn.1985.53.4.926 [DOI] [PubMed] [Google Scholar]
  • 91.Sheinberg DL, Logothetis NK. Noticing familiar objects in real world scenes: the role of temporal cortical neurons in natural vision. J Neurosci. 2001;21:1340–50. doi: 10.1523/JNEUROSCI.21-04-01340.2001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Lehky SR, Sejnowski TJ, Desimone R. Selectivity and sparseness in the responses of striate complex cells. Vision Res. 2005;45:57–73. doi: 10.1016/j.visres.2004.07.021 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Lucas Smith

25 Sep 2020

Dear Dr Sadagopan,

Thank you for submitting your manuscript entitled "Abrupt emergence of vocalization selectivity in primary auditory cortex" for consideration as a Discovery Report by PLOS Biology.

Your manuscript has now been evaluated by the PLOS Biology editorial staff as well as by an academic editor with relevant expertise and I am writing to let you know that we would like to send your submission out for external peer review.

However, before we can send your manuscript to reviewers, we need you to complete your submission by providing the metadata that is required for full assessment. To this end, please login to Editorial Manager where you will find the paper in the 'Submissions Needing Revisions' folder on your homepage. Please click 'Revise Submission' from the Action Links and complete all additional questions in the submission questionnaire.

Also, after discussing your submission with the other members of the editorial team, we think your study would be a better fit as a Short Report rather than a Discovery Report, given that aspects of your study are supportive of previous theoretical work and that hierarchical changes in neuronal call selectivity have been previously described. We therefore request that when revising your manuscript you submit it as a Short Report.

Like Discovery reports, Short Reports can be up to 4 figures and you should not need to do any reformatting at this stage. Short reports may be based on a small number of experiments that might not completely flesh out the biological phenomenon under study. We aim for our Short Reports to be provocative and of general interest, in such a way as to spur future research, and/or to present a concise set of clever experiments that reconcile previously conflicting observations, resolve a specific conundrum, or simply apply elegant techniques to elucidate a brief answer to an interesting scientific question. More information on the Short Report format can be found here: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3000248

I would also like to mention that we would still be happy to consider follow up studies, if the advance is enough.

Please re-submit your manuscript within two working days, i.e. by Sep 29 2020 11:59PM.

Login to Editorial Manager here: https://www.editorialmanager.com/pbiology

During resubmission, you will be invited to opt-in to posting your pre-review manuscript as a bioRxiv preprint. Visit http://journals.plos.org/plosbiology/s/preprints for full details. If you consent to posting your current manuscript as a preprint, please upload a single Preprint PDF when you re-submit.

Once your full submission is complete, your paper will undergo a series of checks in preparation for peer review. Once your manuscript has passed all checks it will be sent out for review.

Given the disruptions resulting from the ongoing COVID-19 pandemic, please expect delays in the editorial process. We apologise in advance for any inconvenience caused and will do our best to minimize impact as far as possible.

Feel free to email us at plosbiology@plos.org if you have any queries relating to your submission.

Kind regards,

Lucas Smith, Ph.D.,

Associate Editor

PLOS Biology

Decision Letter 1

Lucas Smith

12 Nov 2020

Dear Dr Sadagopan,

Thank you very much for submitting your manuscript "Abrupt emergence of vocalization selectivity in primary auditory cortex" for consideration as a Short Reports at PLOS Biology. Your manuscript has been evaluated by the PLOS Biology editors, an Academic Editor with relevant expertise, and by independent reviewers.

The reviews of your manuscript are appended below. You will see that the reviewers find the work potentially interesting. However, based on their specific comments and following discussion with the academic editor, I regret that we cannot accept the current version of the manuscript for publication. The reviewers suggest that the study does not currently provide sufficient proof that there is, indeed, an “abrupt transition” between L4 and L2/3 – with Reviewer 2 suggesting that there are large differences between MGB and L4 responses, and discrepancies between results in Fig 1 and 3 that reflect this. Reviewer 3 also arguing that the differences between vMGB, L4 and L2/3 are not as straightforward as suggested. In addition, the reviewers feel that the study would be strengthened with data added to support the speculation that L2/3 responses are contingent on the presence of specific spectrotemporal features.

We remain interested in your study and we would be willing to consider resubmission of a comprehensively revised version of the manuscript that thoroughly addresses all the reviewers' comments and strengthens the conclusions of the study with new data and analysis. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript would be sent for further evaluation by the reviewers.

We appreciate that these requests represent a great deal of extra work, and we are willing to relax our standard revision time to allow you six months to revise your manuscript. We expect to receive your revised manuscript within 6 months, however please email us (plosbiology@plos.org) if you have any questions or concerns, or would like to request an extension. At this stage, your manuscript remains formally under active consideration at our journal; please notify us by email if you do not intend to submit a revision so that we may end consideration of the manuscript at PLOS Biology.

**IMPORTANT - SUBMITTING YOUR REVISION**

Your revisions should address the specific points made by each reviewer. Please submit the following files along with your revised manuscript:

1. A 'Response to Reviewers' file - this should detail your responses to the editorial requests, present a point-by-point response to all of the reviewers' comments, and indicate the changes made to the manuscript.

*NOTE: In your point by point response to the reviewers, please provide the full context of each review. Do not selectively quote paragraphs or sentences to reply to. The entire set of reviewer comments should be present in full and each specific point should be responded to individually, point by point.

You should also cite any additional relevant literature that has been published since the original submission and mention any additional citations in your response.

2. In addition to a clean copy of the manuscript, please also upload a 'track-changes' version of your manuscript that specifies the edits made. This should be uploaded as a "Related" file type.

*Resubmission Checklist*

When you are ready to resubmit your revised manuscript, please refer to this resubmission checklist: https://plos.io/Biology_Checklist

To submit a revised version of your manuscript, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' where you will find your submission record.

Please make sure to read the following important policies and guidelines while preparing your revision:

*Published Peer Review*

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*PLOS Data Policy*

Please note that as a condition of publication PLOS' data policy (http://journals.plos.org/plosbiology/s/data-availability) requires that you make available all data used to draw the conclusions arrived at in your manuscript. If you have not already done so, you must include any data used in your manuscript either in appropriate repositories, within the body of the manuscript, or as supporting information (N.B. this includes any numerical values that were used to generate graphs, histograms etc.). For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5

*Blot and Gel Data Policy*

We require the original, uncropped and minimally adjusted images supporting all blot and gel results reported in an article's figures or Supporting Information files. We will require these files before a manuscript can be accepted so please prepare them now, if you have not already uploaded them. Please carefully read our guidelines for how to prepare and upload this data: https://journals.plos.org/plosbiology/s/figures#loc-blot-and-gel-reporting-requirements

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosbiology/s/submission-guidelines#loc-materials-and-methods

Thank you again for your submission to our journal. We hope that our editorial process has been constructive thus far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Lucas Smith, Ph.D.,

Associate Editor,

lsmith@plos.org,

PLOS Biology

*****************************************************

REVIEWS:

Reviewer's Responses to Questions

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Michelle Moerel

Reviewer #2: No

Reviewer #3: No

Reviewer #1: The authors record responses to vocalizations in three stages of the auditory processing hierarchy of the guinea pig: ventral MGB, L4 of primary auditory cortex (A1), and L2/3 of A1. Regular spiking neurons in ventral MGB and L4 of A1 responded to many calls for an extended duration of time, while L2/3 A1 neurons were much more selective (both in call type and in response timing). The authors interpret these findings as evidence for an abrupt change in the sound representation between L4 and L2/3 of A1. Specifically, they suggest that neurons in L2/3 of A1 selectively respond to informative call features (i.e., spectrotemporal segments of calls present in category examplars), which could be a first step to call category-selective neurons.

The research is highly relevant, as it is so far unclear how the category selectivity present in higher order auditory cortex (across species) emerges from the faithful representation of sound acoustics in lower processing stages. The findings of this study clearly show where in the auditory pathway an important representational change happens, and could serve as basis for follow up studies into the mechanism underlying this change.

Comments:

1. Abstract: Is "… optimized for complete representations of sounds" the best phrasing? Neurons are also optimized for the complete representation of a sound if they are category selective. Perhaps "neurons faithfully reflect acoustic input"?

2. Abstract: the use of the term feature-selectivity is confusing (4th sentence), as it could be taken to mean simple tonotopic tuning (i.e., frequency is an acoustic feature after all).

3. Caption of Figure 2: should this say "call-responsive" instead of "call-selective" neurons? The first neuron in A responds to all 8 call categories, so this neuron is not call-selective.

4. The analyses that underlie Figures 3B-C combine two effects: that of call-selectivity and the response duration per call. That is, it does not allow discriminating if regions differ from each other because neurons respond to fewer calls, or respond shorter (or a combination of both). It would be interesting to restrict these analyses to sounds for which there is at least one responsive window

5. Fig. 3D: how can the y-axis (resp. window length [fraction of call length]) go higher than 1? Are there neurons that respond longer than the sound duration?

6. Results, page 11: "These data suggest that vMGB and A1 L4 neurons are likely driven by the frequency content of calls, when call spectral energy overlaps with the neurons' tone receptive fields. Consequently, vMGB and A1 L4 neurons show poor call selectivity because call spectral energy remains more or less consistent over the call duration and is similar across GP call types. In contrast, despite this overlap of spectral energy across call types, A1 L2/3 neurons respond only in narrow windows because they are likely driven by specific spectrotemporal features that occur during calls, consistent with an earlier theoretical model [29]." This is a relevant and interesting conclusion. However, the reason for poor selectivity in MGB/A1 and reason for the narrow response windows in A1 L2/3 is speculation; this is not supported by the data. Is it possible to better support this statement by the data? For example, by estimating RFs (showing the emergence of combination selectivity in L2/3), and/or by providing more information on the acoustics of the calls and relating that to the neural best frequency + call responsivity?

7. Results, page 13: "… could not be attributed to low-level differences in tuning properties". Perhaps rephrase, as it is not clear what is meant by "low-level differences". After all, differences in spectrotemporal tuning may also be termed low-level. Instead, the authors seem to intend to refer here exclusively to frequency tuning.

8. Figure 4: Why is data from the ventral MGB missing from the MI analyses? How does it compare to A1?

9. Discussion: "While vMGB and A1 L4 neurons seemed to respond primarily to the spectral content of calls …" and "A1 L2/3 responses were contingent on the presence of specific spectrotemporal features". This is an interpretation, not a result. Either the analysis needs to be extended to support this statement in the Discussion (see comment 6; for example by showing a non-linearity in the RFs), or the authors should better clarify what is a result and what is interpretation.

Reviewer #2: In the study authors conducted experiments, in contrast to most previous studies, in unanesthetized animals, used an extensive set of conspecific calls as stimuli, compared

the responses to these stimuli across thalamic and cortical processing stages, and separated A1 neurons recorded from thalamorecipient and superficial layers. They found that whereas call representations in vMGB and A1 L4 were similar, a critical transformation occurs between A1 L4 and A1 L2/3. While vMGB and A1 L4 neurons seemed to respond primarily to the spectral content of calls resulting in a dense representation of calls, A1 L2/3 responses were contingent on the presence of specific spectrotemporal features, resulting in a highly sparse representation of calls.

This is a well performed study, showing experimental confirmation of some ideas that were present in the previous study of authors (Liu et al., 2019). It shows for the first time that in guinea pig (and probably in other animals as well) A1 exists essential neuronal interface between A1 L4 and A1 L2/3 that is responsible for detection of specific features of animal´s vocalization.

I have following comments and questions to authors:

i/ Grimsley et al. (2012) demonstrated in narcotized guinea pigs that "the primary area (AI) and three adjacent auditory belt areas contain many units that give isomorphic responses to vocalizations". According to them "area VRB (ventrolateral belt) has a denser representation of cells that are better at discriminating among calls by using either a rate code or a temporal code than any other area". Although the authors mention the paper by Grimsley et al. (2012) in their manuscript, I believe it deserves more attention in the Discussion. Speculations, how information about the specific features of animal´s vocalizations, as detected in the A1 L2/3 by the authors, will be transmitted and used in the secondary areas of the guinea pig´s auditory cortex will be appreciated.

ii/ Traditionally different types of vocalizations are illustrated by their spectrograms and waveforms. This information is missing in the manuscript, although the results are based on 8 different vocallzation categories. The spectrograms and waveforms of vocalizations used by the authors could be shown in the Supplementary material.

iii/ Fig. 3D shows joint distributions of the number and length of response windows. vMGB and A1 L4 neurons exhibit either multiple short windows or a single long window. In contrast, A1 L2/3 neurons exhibited one or two short response windows. The significant difference between vMGB and A1 L2/3 and between A1 L4 and A1 L2/3 is not surprising but why is there large difference between vMGB and A1 L4? This finding deserves at least some explanation in the Discussion.

iv/ There is some inconsistency in the presented illustrations and descriptions of the call selectivity/responsiveness. Fig. 1G suggests there are about the same portions of neurons responding to individual calls in all structures, i.e. the overall ratio of responsiveness between MGB and A1 L2/3 is about 1:1. In contrast to this Fig. 3A suggests that the responses to calls are less frequent in the A1 L2/3 than in MGB with the ratio about 4.5 : 6.5. It is not easy to understand how the values for Fig. 1G have been calculated (e.g. for 28 MGB neurons should be the blue bars of percentage with the step of 100/28 = 3.6%, but the differences among the blue columns are much lower).

In addition, the Fig. 1G is also used as a proof that the illustrated effect of the emergence of call selectivity in A1 L2/3 is not due to only some of the call types (lines 295-300) " This was not the case in our data - neural preference for call type was evenly distributed across all tested call types across the processing stages (Fisher's exact test [37], p = 0.99; Fig. 1G)."

This statement should be supported by some analysis (like a kind of bootstrap).

v/ It could be also interesting if responses to tone and/or noise stimuli are included into the analysis, to point out if the change in selectivity is specific for complex stimuli (like calls).

vi/ Line 610. How many neurons from individual structures responded to vocalizations by inhibition?

Minor points:

Line 93. This sentence requires citation.

Line 153. Please explain what means LFP responses.

Line 513. How about placement of the chamber for electrophysiological recording in the case of MGB?

Line 519. ….at a sampling rate of 100 kHz …

Fig. 1D. Please explain what means CSD.

Reviewer #3: Montes-Lourido et al investigate the emergence of stimulus category selectivity across the thalamocortical processing hierarchy for species-specific vocalizations in guinea pigs, and how this impacts the information encoded by neurons. Through experiments conducted in awake, head-fixed animals, the authors claim that neurons in the lemniscal region of the auditory thalamus (vMGB) and thalamocortical input layer of primary auditory cortex (A1 L4) are not selective to specific calls, but that superficial layers L2/3 of A1 "abruptly" shows much more selectivity to call features. That there would be some layer-dependent differences in processing stimuli is well-expected and even demonstrated in published works. Much less is understood though about these differences and their functional importance in the case of processing species-specific vocalizations. Thus, the topic is of interest to auditory neuroscientists, and of potential relevance for sensory cortical researchers interested in ethological stimuli more generally. The manuscript is laid out and written clearly, with generally rigorous analyses and transparent presentation of data. However, conceptual, methodological and interpretational concerns reduce my enthusiasm and leave me unconvinced at this point.

Major points:

1) The authors provide good transparency on much of their neural data, but there is critically missing information about the spectral content of the vocalizations used in their study and its overlap with the best frequencies of neurons recorded. This is problematic because of potential differences in their L2/3 vs. L4 and vMGB neural populations. Many gerbil vocalizations fall heavily in the 2-10 kHz range, and especially around 3-6 kHz. This is a frequency range where there is a lack of best frequencies in the recorded population of L2/3 neurons, unlike for L4 and vMGB (Fig 1F). Even though similar percentages of neurons across areas respond to the different calls (Fig 1G), if the same types of neurons are not being sampled, then either those BFs don't exist in L2/3 (potentially interesting but unlikely given results in Grimsely et al, 2012), or else the conclusion that there are L4 vs. L2/3 differences rests on an artifact of undersampling, and would not hold up.

2) One place where this problem could lead to a misinterpretation is shown in Fig 3A. The main difference between A1 L2/3 and L4 appears to be the more uniform distribution across category number of the former vs. the more peaked distribution towards more categories of the latter. But if the BFs of L2/3 neurons are missing a spectrally dense region of the vocalizations, then it could make sense that there are not as many neurons whose responses track those acoustics as well as in the L4 or vMGB populations. Analyzing the response categories as a function of how neurons are tuned might be one way to get at this, but filling in the distribution of BFs around 3-6 kHz would be best.

3) Related to above, the authors make the claim that L2/3 differences cannot be attributed to low-level differences in tuning properties. However, they only use tuning bandwidth as a comparison, and Fig 1F seems to show that L2/3 tuning in the 6-12 kHz BF range, where many of the calls probably have spectral content, looks to be particularly narrow compared to L4. A more thorough comparison of tuning properties as a function of best frequency would be helpful. One could also look at predicting the response to calls based on the tuning curve itself. Note though that even if it were the case that tuning per se is not explanatory of the responses for some neurons, that would be consistent with findings from Eli Nelken's lab many years ago looking at how some cat auditory cortex neurons respond unexpectedly to the spectral context around bird vocalizations.

4) Conceptually, the authors push the idea that there is an "abrupt" transition between the way L4 and L2/3 of A1 encode these vocalizations. This is a potentially novel point. However, while the authors make a reasonable case that there are differences, whether those differences should constitute an "abrupt" change in coding or not is subjective. They do not provide any sense of what a "gradual" change would look like, and whether there is a quantitative way to differentiate "abrupt" from "gradual." In fact, many of the distributions comparing L4 to L2/3 look different, and while some figures show vMGB is not different from L4 (Fig 3A and B), others do not show vMGB data (Fig 4) at all, ignore differences between vMGB and L4 (Fig 3D), or do not indicate whether apparent differences are significant or not (Fig 3G).

5) The authors claim, "A1 L2/3 responses were contingent on the presence of specific spectrotemporal features, resulting in a highly sparse representation of calls." However, besides showing a sparse but higher probability response across repeated trials at specific points in the stimulus, the authors do not really provide much to support this claim. It would make the paper stronger to show this, but at this point it is speculative. Presumably, many of these calls have rhythmically repeated elements. What are the spectrotemporal elements that the L2/3 neurons are responding to and why don't they respond when those same elements occur elsewhere in the sound?

6) The authors have described a creative way to automatically identify putative regions of the spiking responses. However, they should start first by comparing overall evoked firing rates in the different areas; that by itself may show the main difference between regions (or if not, then their case for focusing on kurtosis and sparseness is stronger). Furthermore, there is some arbitrariness to how they arrive at their bounding boxes and how regions get merged together. The authors should test whether their conclusions change as they change their criterion. For example, the neuron in Fig 2B (right) has many bounding boxes in response to a Chut, while the one in Fig 2B (left) looks like it has a similar pattern of responses to a Chirp. Yet in the former, the response is broken into many bounding boxes, while they are all merged together for the latter. This seems arbitrary based on an experimenter's empirical algorithm rather than what the brain may care about. For low firing neurons such as the one in the middle of Fig 2C, there are responses to Other or Rumble calls that are not merged together under the current criterion, but might have been if the criteria were more permissive to capture the rest of what appears to be an excitatory response.

7) The authors report spontaneous activity differences in the Discussion, noting a p-value of only 0.03. This is a result, and should be put into the results section along with a plot of the distributions of the spontaneous activity across areas, just as for Fig 3 and 4, especially since they make an argument in the Discussion comparing spontaneous activity to other species. The concern is that the spontaneous activity is actually quite variable in all these areas, and that is not being reflected in the examples of Fig 2.

8) Methodologically, the authors calculate stimulus-specific information by filling in responses after the end of stimuli of varying lengths with simulated spontaneous firing. Why? This seems completely unnecessary if the goal is to understand the true nature of how neurons are conveying information about stimuli. If the actual firing after the end of a stimulus is spontaneous, then that should be present in the trial-by-trial firing already; if not, then the results from simulating firing do not reflect the reality. It is also unclear why time bins before the start of the stimulus are used.

9) While the information analyses take into account suppressed portions of the response, the data being analyzed consists only of neurons whose firing has an excitatory component, based on having an evoked rate 6 standard errors from the mean spontaneous activity in one of its windows. Prior research in awake mice has shown the importance of neurons that are fully suppressed by vocalizations. Presumably such neurons exist in the guinea pig as well, and some estimate of how often they are encountered in L4 or L2/3 would be useful. The information theoretic analysis could also include such neurons to give a more complete view.

10) The Discussion is generally written in a scholarly manner with good coverage of existing literature in the guinea pig and nonhuman primate. However, the authors ignore relevant studies in other rodent species, including rats and mice. Moreover, several points are overstated.

- The authors claim an advance in using an unanesthetized preparation to study vocalization processing, but this has been done for many years in rats and mice.

- The authors claim an advance in using an extensive set of conspecific calls, but again there are many papers that do so in rats, mice and gerbils.

- The authors reference work from the Portfors lab to argue that mice are an outlier in showing an over-representation of social vocalizations in subcortical stations in a way that is not consistent with pure tone tuning. That argument is overstated, especially given other work from the Lesica lab using dense recording methods showing that responses to calls are largely consistent with best frequencies of neurons, as with other species. Furthermore, other species besides mice also show "associative and behavioral functions" in the responses of auditory cortical neurons. Hence, aside from the fact that guinea pig vocalizations are lower frequency than mouse ultrasonic calls, there is no reason to claim that mice are an "outlier."

11) The authors make the bold claim that their results indicate that "critical transformations to sound representations occur at the A1 L4 → A1 L2/3 synapse." There is simply no evidence to support that. All the studies reported here are based on extracellular recordings broken down by depth, with layer inferred from a current source density analysis. There are no whole cell recordings, cross-correlation analyses of simultaneously recorded units, etc. that would be expected to claim anything about the specific quoted synapse. In fact, based on studies in other rodents (e.g. Barbour and Calloway, Journal of Neuroscience, 2008), L2/3 neurons receive input not only from L4, but also from L2/3 and L5. What gives rise to the different coding of L2/3 vs. L4 neurons thus need not happen within just one synapse, especially given that the sparse response windows often occur late in the stimulus. The author's intuition may well be correct, but here they provide no evidence in support of that case.

Other points

- Fig 1F - the experiments are based on recordings from 5 animals. To give more confidence that the results are not dominated by just a few animals, the best frequency plots could be modified to show different symbols for different animals so that the reader can weigh how much any one animal contributes to the results.

- Because the L2/3 distribution of call-selectivity is more uniform across the number of call categories rather than being peaked at 1 or 2, there are just as many neurons in L2/3 that are NOT call-selective. The language throughout should be toned down so that this is clearer, rather than giving the impression that one only finds call-selective responses in L2/3.

- The authors state, "Surprisingly and contrary to our expectation, we found even lower MI levels over the population of A1 L2/3 neurons (Fig. 4A, red)." I am not clear why this should be considered surprising. It fits with the lower firing rate and sparseness of the activity in L2/3.

- Fig 3E should show an example vMGB neuron as well

- Fig 4 should show analyses for vMGB as well

- "Data shows" is incorrect - "data" is plural

Finally, concerning the use of statistics, the K-S test is fine for making claims that distributions are different, but it can be overly sensitive and lead to false positives. The use of the FDR correction is a not-particularly-conservative way to account for that, but still does not provide a way to claim anything about whether there is an "abrupt" difference or not between distributions.

Decision Letter 2

Lucas Smith

11 May 2021

Dear Dr Sadagopan,

Thank you very much for submitting a revised version of your manuscript "A complex feature-based representation of vocalizations emerges in the superficial layers of primary auditory cortex" for consideration as a Research Article at PLOS Biology. This revised version of your manuscript has been evaluated by the PLOS Biology editors, the Academic Editor and the original reviewers.

The reviews are appended below. As you will see, both reviewers 1 and 2 think the revision has addressed their previous comments and they think the manuscript is substantially improved. However reviewer 3 still has a number of lingering concerns which will need to be addressed before we can consider the study for publication.

In light of the reviews, we are pleased to offer you the opportunity to address the remaining points from the reviewers in a revised version that we anticipate should not take you very long. We will then assess your revised manuscript and your response to the reviewers' comments and we may consult the reviewers again.

Along with addressing the remaining reviewer comments, we also ask that you address the following editorial requests:

1) Ethics request: In your methods section, please include the identification number of your protocol, approved by the University of Pittsburgh IACUC.

2) Data request: Please provide, as a supplementary file, the underlying data for each figure in your study. Please also reference this dataset in the figure legends. For example, to each figure legend you might add the following statement: “data underlying this figure can be found in supplementary file S1_data.” You will also need to ensure that this data file contains a legend, and is referenced in your data availability statement. I have included more information regarding our data sharing policy and this request below my signature.

3) Thank you for changing your title in response to reviewer comments in the last round of review. We have been discussing the title of your manuscript, and wonder if it might be edited slightly to be more accessible to a broader readership? If you agree, you might change it to something like 'Neuronal selectivity to specific vocalizations emerges in the superficial layers of primary auditory cortex'

4) Please take a moment to review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the cover letter that accompanies your revised manuscript.

We expect to receive your revised manuscript within 1 month.

Please email us (plosbiology@plos.org) if you have any questions or concerns, or would like to request an extension. At this stage, your manuscript remains formally under active consideration at our journal; please notify us by email if you do not intend to submit a revision so that we may end consideration of the manuscript at PLOS Biology.

**IMPORTANT - SUBMITTING YOUR REVISION**

Your revisions should address the specific points made by each reviewer. Please submit the following files along with your revised manuscript:

1. A 'Response to Reviewers' file - this should detail your responses to the editorial requests, present a point-by-point response to all of the reviewers' comments, and indicate the changes made to the manuscript.

*NOTE: In your point by point response to the reviewers, please provide the full context of each review. Do not selectively quote paragraphs or sentences to reply to. The entire set of reviewer comments should be present in full and each specific point should be responded to individually.

You should also cite any additional relevant literature that has been published since the original submission and mention any additional citations in your response.

2. In addition to a clean copy of the manuscript, please also upload a 'track-changes' version of your manuscript that specifies the edits made. This should be uploaded as a "Related" file type.

*Resubmission Checklist*

When you are ready to resubmit your revised manuscript, please refer to this resubmission checklist: https://plos.io/Biology_Checklist

To submit a revised version of your manuscript, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' where you will find your submission record.

Please make sure to read the following important policies and guidelines while preparing your revision:

*Published Peer Review*

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*PLOS Data Policy*

Please note that as a condition of publication PLOS' data policy (http://journals.plos.org/plosbiology/s/data-availability) requires that you make available all data used to draw the conclusions arrived at in your manuscript. If you have not already done so, you must include any data used in your manuscript either in appropriate repositories, within the body of the manuscript, or as supporting information (N.B. this includes any numerical values that were used to generate graphs, histograms etc.). For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5

*Blot and Gel Data Policy*

We require the original, uncropped and minimally adjusted images supporting all blot and gel results reported in an article's figures or Supporting Information files. We will require these files before a manuscript can be accepted so please prepare them now, if you have not already uploaded them. Please carefully read our guidelines for how to prepare and upload this data: https://journals.plos.org/plosbiology/s/figures#loc-blot-and-gel-reporting-requirements

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Thank you again for your submission to our journal. We hope that our editorial process has been constructive thus far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Lucas Smith

Associate Editor

PLOS Biology

lsmith@plos.org

*****************************************************

------------------------------------------------------------------------

DATA POLICY REQUEST:

You may be aware of the PLOS Data Policy, which requires that all data be made available without restriction: http://journals.plos.org/plosbiology/s/data-availability. For more information, please also see this editorial: http://dx.doi.org/10.1371/journal.pbio.1001797

Note that we do not require all raw data. Rather, we ask that all individual quantitative observations that underlie the data summarized in the figures and results of your paper be made available in one of the following forms:

1) Supplementary files (e.g., excel). Please ensure that all data files are uploaded as 'Supporting Information' and are invariably referred to (in the manuscript, figure legends, and the Description field when uploading your files) using the following format verbatim: S1 Data, S2 Data, etc. Multiple panels of a single or even several figures can be included as multiple sheets in one excel file that is saved using exactly the following convention: S1_Data.xlsx (using an underscore).

2) Deposition in a publicly available repository. Please also provide the accession code or a reviewer link so that we may view your data before publication.

Regardless of the method selected, please ensure that you provide the individual numerical values that underlie the summary data displayed in the following figure panels as they are essential for readers to assess your analysis and to reproduce it:

Figure 1 B,D-H; Figure 2; Figure 3A-C; Figure 4A-E; Figure 5C-I; Figure 6A,C,E-G,I,K; Figure 7A-D; Figure 8A-K; Figure S1A-D;

NOTE: the numerical data provided should include all replicates AND the way in which the plotted mean and errors were derived (it should not present only the mean/average values).

**IMPORTANT: Please also ensure that figure legends in your manuscript include information on where the underlying data can be found, and ensure your supplemental data file/s has a legend.

**IMPORTANT: Please ensure that your Data Statement in the submission system accurately describes where your data can be found.

------------------------------------------------------------------------

*****************************************************

REVIEWS:

Reviewer #1:

I would like to thank the authors for conducting extensive additional analyses, which confirm the conclusions they drew in the original version of this manuscript. I have no further comments.

Typo in lines 174, 176, 413: ANOCOVA --> ANCOVA

Reviewer #2: All my comments on the previous version were taken into account, the manuscript was substantially improved and I do not have other comments.

Reviewer #3: The manuscript from Montes-Lourido et al is improved and makes a more convincing case for a transformation in the representation of species-specific vocalizations between cortical layers in the guinea pig. The authors are creative in their analyses of the responses and have generally provided transparent and rigorous statistical analyses to illustrate their points. Their new Table nicely summarizes differences between the areas, though statistically speaking, it would make more sense to compute effect sizes and list those rather than the degree of significance in a statistical test.

However, one issue that remains a serious concern is the mutual information (MI) analysis, which has been expanded to include data from vMGB in addition to A1 L4 and L2/3. The authors' explanation of having to "fill-in" the spontaneous activity to compute the MI is appreciated, but their full explanation did not make it into the Methods, and that should be updated. It would also be helpful to the reader to know exactly which stimuli received these filled-in spikes (Purr?) since there are examples in Fig. 8F-H that seem to show abrupt drops in SSI about 1 second after the end of the stimulus. Given recent studies of auditory cortical Off responses after the end of stimuli, the act of substituting spiking could potentially skew results. Checking whether neurons exhibit firing rate differences over the last 1 sec vs. the simulated 1 sec of spontaneous activity, and how that affects the calculated MI, would help assuage this concern, but at the least some clearer statement about what was done and its justification in the Methods seems warranted.

More seriously, although the filling-in explanation now helps me understand what the authors did in computing the MI, I am left somewhat perplexed by how they then use these results to support their conclusions. The authors make a point about longer temporal integration times in L2/3 based on the area-under-the-curve of their bin-by-bin mutual information curves. This seems problematic. The Methods indicate that they use a sliding window with overlap between points, meaning that each time bin is not independent from each other. Moreover, the MI can obviously be negative because of their shuffle correction subtraction (e.g. Fig 8C), but it is not clear what constitutes the AUC in that case. In general, there is no obvious physical or conceptual meaning to the AUC of their MI curve aside from it being used to show a difference between neural regions. If the authors intended the AUC to essentially represent the average MI over the duration of the sounds, then at the least they should eliminate the non-independent points that overlap in that average. Even still though, computing an average MI for a neuron in this fashion is questionable because of the nonstationarity of the stimulus over the time bins and correlations in firing between time bins. A methodologically more sound way to estimate MI would be to divide time windows into smaller bins and create spike words. It takes many trials to estimate the probability distribution across words though, and it is not clear whether enough trials were run to do this well. Estimating MI based on spike counts (not words) in large windows as done by the authors may be ok (though does not get at "activity patterns" as they claim), but it is still not clear what averaging or summing the values from adjacent windows (as was done to compute the AUC) means.

As far as supporting a point about integration time, Fig. 8B does show that L2/3 looks different as a function of the window length, but there are no error bars, and no explanation of what normalization was used. Thus, it is hard to know what exactly to make of the shift to the right of that curve and why the three curves meet at 400 ms. The authors seem to be arguing that because the AUC for L2/3 neurons does not saturate above 200 ms, the integration time for L2/3 is longer than in the other two brain areas. This argument is based on the rate of change of the AUC with increasing window length, and exactly how that relates to the integration time is unclear. If anything, because the authors are essentially counting spikes in ever larger windows for their MI calculation, the exact timing of the spikes within the window becomes less important as the window size increases, so the L2/3 result seemingly says more about worsening temporal precision rather than how long a stimulus window is needed to drive responses (integration time). However, that interpretation seems to be at odds with the conclusions from Fig 4B, where the authors argue that the feature response window for L2/3 neurons is actually smaller than in L4 and vMGB. Hence, the concern is that the AUC approach to characterizing the MI and how temporally extended "patterns" of firing conveys information is leading to difficult to interpret results, and the authors should reconsider whether to include it.

Finally, also in the context of computing information, the authors refer to high-MI points, and highlight these in Fig 8C-E with red crosses. However, the Methods do not explain how these were chosen; they appear to be the peaks, but not all peaks are selected. Since high-MI points were used in their subsequent correlation analysis with the PSTH, it is important to explain this in detail. Are their results changed if they use all bins rather than just those with high MI?

Decision Letter 3

Lucas Smith

24 May 2021

Dear Dr Sadagopan,

On behalf of my colleagues and the Academic Editor, Manuel Malmierca, I am pleased to say that we can in principle offer to publish your Research Article "Neuronal selectivity to complex vocalization features emerges in the superficial layers of primary auditory cortex" in PLOS Biology, provided you address any remaining formatting and reporting issues. These will be detailed in an email that will follow this letter and that you will usually receive within 2-3 business days, during which time no action is required from you. Please note that we will not be able to formally accept your manuscript and schedule it for publication until you have made the required changes.

Together, with the Academic Editor, we have discussed your revision and think that you have satisfactorily addressed the reviewer concerns, and the majority of our editorial requests. As one last, minor editorial request, we ask that you provide a legend for each of supplementary data files that you generated, containing the data underlying your figures. An example legend might be: "S1_Data. Data underlying figures ___." You can add these legends while addressing the formatting and reporting requests which you will receive from our productions team.

Please take a minute to log into Editorial Manager at http://www.editorialmanager.com/pbiology/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production process.

PRESS

We frequently collaborate with press offices. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximise its impact. If the press office is planning to promote your findings, we would be grateful if they could coordinate with biologypress@plos.org. If you have not yet opted out of the early version process, we ask that you notify us immediately of any press plans so that we may do so on your behalf.

We also ask that you take this opportunity to read our Embargo Policy regarding the discussion, promotion and media coverage of work that is yet to be published by PLOS. As your manuscript is not yet published, it is bound by the conditions of our Embargo Policy. Please be aware that this policy is in place both to ensure that any press coverage of your article is fully substantiated and to provide a direct link between such coverage and the published work. For full details of our Embargo Policy, please visit http://www.plos.org/about/media-inquiries/embargo-policy/.

Thank you again for supporting Open Access publishing. We look forward to publishing your paper in PLOS Biology. 

Sincerely, 

Lucas Smith, Ph.D. 

Associate Editor 

PLOS Biology

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Mutual information analyses performed across a range of analysis window sizes.

    (PDF)

    S1 Data. Data underlying Fig 1.

    (XLSX)

    S2 Data. Data underlying Fig 2.

    (XLSX)

    S3 Data. Data underlying Fig 3.

    (XLSX)

    S4 Data. Data underlying Fig 4.

    (XLSX)

    S5 Data. Data underlying Fig 5.

    (XLSX)

    S6 Data. Data underlying Fig 6.

    (XLSX)

    S7 Data. Data underlying Fig 7.

    (XLSX)

    S8 Data. Data underlying Fig 8.

    (XLSX)

    S9 Data. Data underlying S1 Fig.

    (XLSX)

    Attachment

    Submitted filename: Response_to_reviewers_Rev1_v2.docx

    Attachment

    Submitted filename: Response_to_reviewers_Rev2.pdf

    Data Availability Statement

    All relevant data are within the paper and its Supporting Information files.


    Articles from PLoS Biology are provided here courtesy of PLOS

    RESOURCES