Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2011 Aug 18;7(8):e1002123. doi: 10.1371/journal.pcbi.1002123

Understanding Auditory Spectro-Temporal Receptive Fields and Their Changes with Input Statistics by Efficient Coding Principles

Lingyun Zhao 1, Li Zhaoping 2,*
Editor: Lyle J Graham3
PMCID: PMC3158037  PMID: 21887121

Abstract

Spectro-temporal receptive fields (STRFs) have been widely used as linear approximations to the signal transform from sound spectrograms to neural responses along the auditory pathway. Their dependence on statistical attributes of the stimuli, such as sound intensity, is usually explained by nonlinear mechanisms and models. Here, we apply an efficient coding principle which has been successfully used to understand receptive fields in early stages of visual processing, in order to provide a computational understanding of the STRFs. According to this principle, STRFs result from an optimal tradeoff between maximizing the sensory information the brain receives, and minimizing the cost of the neural activities required to represent and transmit this information. Both terms depend on the statistical properties of the sensory inputs and the noise that corrupts them. The STRFs should therefore depend on the input power spectrum and the signal-to-noise ratio, which is assumed to increase with input intensity. We analytically derive the optimal STRFs when signal and noise are approximated as Gaussians. Under the constraint that they should be spectro-temporally local, the STRFs are predicted to adapt from being band-pass to low-pass filters as the input intensity reduces, or the input correlation becomes longer range in sound frequency or time. These predictions qualitatively match physiological observations. Our prediction as to how the STRFs should be determined by the input power spectrum could readily be tested, since this spectrum depends on the stimulus ensemble. The potentials and limitations of the efficient coding principle are discussed.

Author Summary

Spectro-temporal receptive fields (STRFs) have been widely used as linear approximations of the signal transform from sound spectrograms to neural responses along the auditory pathway. Their dependence on the ensemble of input stimuli has usually been examined mechanistically as a possibly complex nonlinear process. We propose that the STRFs and their dependence on the input ensemble can be understood by an efficient coding principle, according to which the responses of the encoding neurons report the maximum amount of information about the sensory input, subject to limits on the neural cost in representing and transmitting information. This proposal is inspired by the success of the same principle in accounting for receptive fields in the early stages of the visual pathway and their adaptation to input statistics. The principle can account for the STRFs that have been observed, and the way they change with sound intensity. Further, it predicts how the STRFs should change with input correlations, an issue that has not been extensively investigated. In sum, our study provides a computational understanding of the neural transformations of auditory inputs, and makes testable predictions for future experiments.

Introduction

In response to acoustic input signals, neurons in the auditory pathway are typically selective to sound frequency Inline graphic and have particular response latencies. At least ignoring cases with Inline graphic kHz, in which neuronal responses often phase lock to the sound waves, a spectro-temporal receptive field (STRF) is often used to describe the tuning properties of a neuron [1], [2], [3], [4]. This is a two-dimensional function Inline graphic that reports the sensitivity of the neuron at response latency Inline graphic to acoustic inputs of frequency Inline graphic for a given stimulus ensemble (i.e., given input statistics). More specifically, in a stimulus ensemble, the power Inline graphic of the acoustic input at frequency Inline graphic at time Inline graphic fluctuates around an average level denoted by Inline graphic. If we let Inline graphic denote the neuron's response at time Inline graphic (typically its spike rate), then Inline graphic best approximates the linear relationship between Inline graphic and Inline graphic in this stimulus ensemble as

graphic file with name pcbi.1002123.e015.jpg (1)

Note that in this paper, we refer to Inline graphic as the input spectrogram, although some authors also include the average input power Inline graphic. Though Inline graphic is not a full description of acoustic input, since it ignores features such as the phase of the oscillation in the sound wave, it is the only relevant aspect of the auditory input as far as the STRF is concerned. Note that if we use Inline graphic to denote the deviation of the neural response from its spontaneous activity level, then both Inline graphic and Inline graphic have zero mean. We will use this simplification throughout the paper. In studies in which the temporal dimension is omitted, the STRF is called the spectral receptive field (SRF).

Figure 1 cartoons a typical STRF. This has excitatory and inhibitory regions, reflecting its preferred frequency and response latency. For example, if Inline graphic peaks at frequency Inline graphic and time Inline graphic, then this neuron prefers frequency Inline graphic and should respond to an input impulse Inline graphic of this frequency with latency Inline graphic. We will also refer to Inline graphic as the receptive field, the filter kernel, or the transfer function from input to neural responses, as these all convey the same or similar meanings. A neuron's STRF is typically estimated using reverse correlation methods [5], [4].

Figure 1. A schematic example of a typical spectro-temporal receptive field, plotted with a reversed abscissa.

Figure 1

This STRF has one excitatory and three inhibitory regions, prefers frequency Inline graphic, and evokes response at a typical latency Inline graphic. Since the response at time Inline graphic is Inline graphic, an input stimulus Inline graphic exactly as depicted in this plot is most likely to elicit a large response Inline graphic at time Inline graphic, or indeed a spike.

However, there are extensive nonlinearities in the signal transformation along the auditory pathway. Indeed, the STRF formulation of neural responses, though linear in spectral power, is already a second-order nonlinear function of the auditory sound wave. There are two kinds of nonlinearities when inputs are represented as spectrograms. The simpler one is a static nonlinearity Inline graphic, which when applied to the linear approximation Inline graphic of equation (1) enables better predictions of the neural responses [6], [7]. This static nonlinearity however does not alter the spectro-temporal selectivity of the neuron seen in the linear STRF. This paper is interested in the more complex nonlinearity that the STRFs are dependent on the stimulus ensemble used to estimate them [1], [5], [8], [9]. For example, the STRFs are wider when the stimuli are narrow-band rather than wide-band [10], or when the stimuli are animal vocalizations rather than noise [11]. The STRF (or SRF) also becomes more band-pass when sound intensity increases. The dependence of the STRFs on the stimulus ensemble holds, for example, for type IV neurons in the cochlear nucleus of cats [12], [13], the inferior colliculus (IC) of the frog [8] and the gerbil [7], and field L region of the songbird (which is analogous to mammalian auditory cortex) [14]. (The dependence on sound intensity also holds for the linear relationship between the auditory nerve responses and input sound waves [5]). Nonlinearities in the auditory system become progressively stronger further from the periphery.

Despite the nonlinearities, the concept of the STRF is still widely used, not only because it provides a meaningful description of the spectro-temporal selectivity of the neurons in a given stimulus ensemble, but also because it can predict neural responses to novel stimuli reasonably well, as long as the stimuli are drawn from the same stimulus ensemble as that used to estimate the STRF in the first place. Reasonable predictions from the STRFs have been obtained for the responses of auditory nerves(see [15]) and auditory midbrain neurons [6], [7], [16] (also see [2]). They have also been obtained for responses of the auditory cortical neurons when the stimulus ensemble is composed of biologically more meaningful static or dynamic ripples (broadband sound with sinusoidally modulated spectral envelopes and their linear combinations [17], [18], [19]). If the linear neural filter is augmented to include the filtering performed by the head and ears, it is also possible to predict the preferred locations of sound sources of auditory cortical neurons based on the linear neural filter for input spectrograms [20]. Meanwhile, linear STRF models fail to capture many complex phenomena, particularly in the auditory cortex, and nonlinearities are not limited to being just static or monotonic. It has been suggested that some auditory cortical neurons process auditory objects in a highly non-linear manner, by selectively responding to a weak object component while ignoring loud components that occupy the same region in frequency space in auditory mixtures of these object components [21], and some prefer low over high spectral contrast sounds [22]. Strong nonlinearities in the auditory processes have long since motivated nonlinear models of auditory responses (e.g., [5], [12], [23]).

This paper aims to understand from a computational, rather than a mechanistic, perspective why the auditory encoding transform should depend on the stimulus ensemble in the ways observed. More specifically, the paper focuses on cases in which STRFs can reasonably capture neural responses, and aims to identify and understand the computational goal of the STRFs for a given stimulus ensemble – finding a metric according to which the STRFs are optimal for the ensemble. This would provide a rationale for how the physiologically measured STRFs should depend on or adapt to the stimulus ensemble. This paper does not address what linear or nonlinear mechanisms could build the optimal STRFs, or whether or how nonlinear auditory processes enable the adaptation of the STRFs to the stimulus ensemble. Existing computational models of auditory neurons, including ones with the notion that cochlear hair cells perform independent component analysis to provide an efficient code for inputs using spikes in the auditory nerves [24], [25], cannot explain the observed dependence of the STRFs on the stimulus ensemble (see Discussion for more details).

Restricting attention to the temporal properties of STRF, Lesica and Grothe [26] observed that the temporal filter in STRF adapted to the level of ambient noise in the input environment. In particular, the temporal receptive field in the STRF changed from being bandpass to being low pass with the increase of ambient noise. They argued using a simple model that such adaptation in the STRF enables more efficient coding of the input information.

This study applies the principles of efficient coding to understand the auditory STRF and its variations with sound intensities and other input characteristics. It generalizes the work of Lesica and Grothe [26] to understand the temporal and spectral filtering characteristics of STRF adaptation to changes in noise, signal and correlations in input statistics. Explicitly, the principle of efficient coding states that the neural receptive fields should enable the neural responses to transmit as much sensory information as possible to the central nervous system, subject to the limitation in neural cost in representing and transmitting information. This principle has been proposed [27] and successfully applied to the visual system to understand the receptive fields in the early visual pathway [28], [29], [30], [31], [32], [33] (see review [34]). We will borrow heavily techniques and intuitions from vision to derive and explain the results in this paper.

To make initial progress, it is necessary to start with some simplifying assumptions. First, we assume that the statistical characteristics of the stimulus ensemble do not change more rapidly than the speed at which the sensory encoding adapts, so that the stimulus ensemble can be approximated as being stationary as far as optimal encoding is concerned. Knowing when this assumption does not hold tells us when the encoding is not optimal, e.g., when one sees poorly for a brief moment before the visual encoding adapts to a sudden change from a dark room to a bright garden. Second, for mathematical convenience, we assume that the linear STRF model as in equation (1) can approximate adapted auditory neural responses reasonably well. As we know from above, this assumption often does not hold, particularly for auditory cortical neurons. This paper leaves the extension of the optimal encoding to nonlinear cases for future studies. Third, to derive a closed-form, analytical, solution to the optimal STRF, we assume that the input statistics in the stimulus ensemble can be approximated as being Gaussian, with higher order correlations in the input contributing only negligibly to the inefficiency of the representation in the original sensory inputs. Although it is known that the natural auditory inputs are far from Gaussian [35], as for the case of vision, the discrepancy may have only a limited impact on the input inefficiency, as measured by the amount of information redundancy in the original sensory input [36], [37], [38].

To understand how sensory inputs should be recoded to increase coding efficiency, we start with visual encoding to draw insights and made analogies with auditory encoding. In vision, large amounts of raw data about the visual world are transduced by photoreceptors. However, the optic nerve, which transmits the input data to the visual cortex via thalamus, can only accommodate a dramatically smaller data rate. It has thus been proposed that early visual processes use an efficient coding strategy to encode as much information as possible given the limited bandwidth [27], [34], in other words, to recode the data such that the redundancy in the data is reduced and consequently the data can be transmitted by the limited bandwidth. Compression (while preserving most information) is possible since images are very redundant [39], [40], [41], [42], e.g., with strong correlations between visual inputs at nearby points in time and space. Removing such correlations can cut down the data rate substantially [34].

One way to remove the correlations is to transform the raw input Inline graphic into a different representation Inline graphic in neural responses that would then have a much smaller data rate than Inline graphic, yet preserving essential input information. This transform is often approximated by the visual receptive field, analogous to the auditory STRFs. For instance, the (spatial) center-surround receptive fields of the retinal ganglion cells help remove spatial redundancy [30], [31], [43]. They do this by making the ganglion cells preferentially respond to spatial contrast in the input, and so eliminating responses to visual locations whose input is redundant with that of their neighbors. Consequently, the responses of retinal ganglion cells are much less correlated than those of the photoreceptors, making their representation much more efficient. One facet of this efficient encoding hypothesis is that the optimal receptive field transform should depend on the statistical properties, such as the correlation structure and intensity, of the input. This dependence has been used to explain adaptation, to changes in input statistics, of visual receptive field characteristics, such as the sizes of center-surround regions and the color tuning of retinal neurons, or the ocular dominance properties of striate cortical neurons [32], [34], [44], [45], [46], [47]. In the auditory system, information redundancy is also reduced along the auditory pathway [48]. Although this redundancy reduction was only investigated in the neural responses to sensory inputs rather than in the coding (STRF) transform leading to the neural responses, it suggested that coding efficiency is one of the goals of early auditory processes.

More formally, the efficient coding scheme is depicted in Figure 2A. The input contains sensory signal Inline graphic and noise Inline graphic (e.g., input sampling noise). The net input Inline graphic is encoded by a linear transfer function Inline graphic into output.

graphic file with name pcbi.1002123.e045.jpg (2)

which also contains additional noise Inline graphic introduced in the encoding process. When the input has multiple channels, e.g., many different photoreceptors or hair cells, Inline graphic is a vector with many components, as indeed is Inline graphic. Output Inline graphic is a vector representing the neural population responses from many neurons. For output neuron Inline graphic, we have Inline graphic. Therefore Inline graphic is a matrix, and its Inline graphic row Inline graphic models the receptive field for output neuron Inline graphic as the array of effective weights from input receptors Inline graphic to output neuron Inline graphic. In the particular example when input neurons are photoreceptors and output neurons are retinal ganglion cells, Inline graphic is the effective connection from photoreceptor Inline graphic to ganglion cell Inline graphic (implemented via the interneurons in the amacrine cell layers of the retina), and collectively, Inline graphic describe the linear receptive field of this ganglion cell. We consider the problem of finding an optimal Inline graphic that maximizes the information extracted by Inline graphic about Inline graphic, i.e., the mutual information Inline graphic [49] between Inline graphic and Inline graphic subject to a given cost of the neural encoding, which depends on the responses in a way we will describe shortly.

Figure 2. Formulation and components of efficient coding.

Figure 2

(A) A schematic plot of the efficient encoding transform. (B) Signal transformation in the auditory system. The cochlea turns the time-varying waveform Inline graphic into a time-frequency representation Inline graphic, as the population activities of the auditory nerves, which is the input to the efficient encoding system. Signal and noise pass through a series of brain nuclei such as cochlear nucleus, superior olive, inferior colliculus, etc. The current work proposes that the effective transform STRF of the spectrogram that is collectively realized by these nuclei is, in its linear form, the optimal filter Inline graphic implied by the efficient coding principle. The output Inline graphic is the activity of neurons in a higher nucleus. (C) Three steps of signal flow within the linear encoding step Inline graphic or STRF in (A) and (B). Note that these three steps are merely abstract algorithmic steps, rather than neural implementation processes for the effective transform Inline graphic or STRF.

Therefore, the optimal Inline graphic should minimize the objective function:

graphic file with name pcbi.1002123.e075.jpg (3)

where Inline graphic is a parameter whose value specifies a particular balance between the needs to minimize costs and to maximize extracted information. Neural costs can arise from various sources, such as the metabolic energy cost for generating neural activities or spikes [50] and the cost of thicker axons to transmit higher rates of neural firing. We follow a formulation that has been productive in vision [31], [34], and model the neural cost as

graphic file with name pcbi.1002123.e077.jpg

where Inline graphic indicates the average over the stimulus ensemble. This gives

graphic file with name pcbi.1002123.e079.jpg (4)

It has been shown [29], [33], [51], [34] that the Inline graphic that provides the most efficient coding according to Inline graphic has the following properties. At high signal-to-noise ratio (SNR), Inline graphic is such that Inline graphic extracts the difference between correlated channels, and thus avoids transmitting redundant information. Hence, for example, in photopic conditions, retinal ganglion cells have center-surround spatial receptive fields which extract the spatial contrast of the input. By contrast, at low SNR, Inline graphic is a smoothing filter that averages out input noise instead of reducing redundancy. This avoids spending neural cost on transmitting noise. Hence, for example, in scotopic conditions, when SNR can be considered as being low, the receptive fields of retinal ganglion cells expand the sizes of their center regions and weaken their suppressive surrounds [52]. We will apply this framework to the auditory encoding to understand STRFs and their adaptation to stimulus ensembles.

Methods

Auditory encoding system and its comparison to vision

To apply the efficient coding principle to auditory STRFs, we borrow insights from vision by making an analogy between (aspects of) the auditory and visual systems. For simplicity, we start by ignoring input noise. While sound signals are typically air vibrations over time, at the input sampling stage, they are sampled as Inline graphic from a continuous time-frequency representation Inline graphic, namely the response at time Inline graphic of a hair cell tuned to sound vibration frequency Inline graphic. This is analogous to visual input sampling, in which the response of a photoreceptor at location Inline graphic samples the light signal in the form of electromagnetic vibrations. Auditory hair cells are tonotopically arranged in the cochlea, so that neighboring hair cells are tuned to nearby sound frequencies. Therefore, at any instant Inline graphic , the response pattern Inline graphic as a function of hair cell's location Inline graphic over the cochlea is an auditory “image” of the pattern of powers across sound frequencies, analogous to a retinal image. (In our formulation, we focus on sampling the intensity or power in Inline graphic, and ignore the phase of the sound wave at frequency Inline graphic. This is because (1) auditory nerve responses do not encode the phase except for low frequency inputs via phase locking, and (2), as mentioned, our goal is to understand the STRFs which do not concern the phase information.) While a retinal image is two dimensional in space (and one additional dimension in time), the auditory “image” at any instant Inline graphic is one dimensional in sound frequency Inline graphic. One may use time Inline graphic as the second dimension such that Inline graphic for all Inline graphic and Inline graphic collectively can be seen as a single discrete sample of the two-dimensional auditory “image”. When input noise Inline graphic is included, input Inline graphic becomes Inline graphic.

As for vision, we explore whether the auditory STRFs can be partly understood by the goal of efficiently coding auditory information. The sensory input is sampled as Inline graphic, the responses of the cochlear hair cells. This input is encoded by the STRFs to give rise to outputs Inline graphic as the neural activities of a higher nucleus, such as the inferior colliculus (IC) or the auditory cortex (Figure 2B). The STRF is then analogous to a spatial receptive field, such as that of the retinal ganglion cells. Thus the STRF should be determined by the statistics of the auditory inputs, and in particular, the correlation Inline graphic between different inputs Inline graphic and Inline graphic, where Inline graphic labels a particular spectro-temporal combination of a frequency value Inline graphic and time Inline graphic. Note that for Inline graphic, the frequency Inline graphic or Inline graphic, but not both, in the two indices Inline graphic and Inline graphic may be equal. (Here, for simplicity we assume, or pre-process the signal, such that all inputs have zero mean, i.e., Inline graphic, just like the input signal fluctuation Inline graphic around the ensemble average in the definition of the STRF in equation (1)). As in vision, natural auditory inputs express substantial correlations between inputs of neighboring frequencies and at neighboring temporal instances. When the input SNR is sufficiently high, an optimal STRF should reduce these correlations to achieve efficient transmission. Such an STRF will have neighboring excitatory and inhibitory regions in the frequency-latency domain, making the neuron be tuned to spectro-temporal contrast and be insensitive to the spectro-temporal redundancy.

Auditory STRF filter as an efficient coding transform

The general formulation and derivation of the efficient coding transform Inline graphic (or STRF) can be found in its application to vision [34]. Here we outline these results and illustrate their consequences for auditory coding. Let Inline graphic be the input with Inline graphic input channels:

graphic file with name pcbi.1002123.e122.jpg (5)

(superscript T denotes vector or matrix transpose). These Inline graphic input channels may correspond to Inline graphic auditory nerves if we omit the temporal dimension, Inline graphic time instances if we focus on a single frequency channel, or they may correspond to Inline graphic spectro-temporal labels Inline graphic for Inline graphic. Let the input correlation be described by correlation matrix Inline graphic with elements Inline graphic. The optimal transform Inline graphic that minimizes Inline graphic in equation (4) can be decomposed in three steps (Figure 2C): (1) a principal component transform to de-correlate the inputs, (2) gain control of each principal component, (3) an ortho-normal or unitary transform on the array of the gain-controlled components to arrive at various output channels. We now elaborate and elucidate these three steps.

The first step is a coordinate rotation, or ortho-normal transform, Inline graphic, by an ortho-normal matrix Inline graphic that de-correlates the input channels such that each of the channels in the transformed signal Inline graphic contains a principal component of the original signal. We denote these principal components as Inline graphic, with sub-index Inline graphic (instead of Inline graphic) as the indices of the de-correlated channels (later, we also use Inline graphic to denote the de-correlated channels in the temporal domain, or Inline graphic in spectro-temporal domain). Since the correlation between Inline graphic and Inline graphic is Inline graphic, decorrelation between principal components implies that Inline graphic is a diagonal matrix, with Inline graphic, where Inline graphic is the Inline graphic eigenvalue of matrix Inline graphic and also the average signal power of the Inline graphic principal component Inline graphic. As we will see later, when the input correlation Inline graphic depends mainly on the differences Inline graphic in frequency and time, it turns out that Inline graphic (with the index Inline graphic denoting the spectro-temporal modulation frequency Inline graphic) is the amplitude of a dynamic or moving ripple that some experiments use to estimate the STRFs of cortical and midbrain neurons [17], [18], [19], [16], [2].

The second step is gain control Inline graphic on each component Inline graphic, giving output Inline graphic. Including noise Inline graphic, which is the original input noise Inline graphic projected to the Inline graphic channel by the transform Inline graphic, and the encoding noise Inline graphic (in the decorrelated Inline graphic space), the total output becomes Inline graphic. It can be shown (see [34]) that the gain Inline graphic that minimizes Inline graphic in equation (4) is determined by the input signal-to-noise ratio Inline graphic to satisfy

graphic file with name pcbi.1002123.e169.jpg (6)

where Inline graphic is the variance of Inline graphic, and also of the input noise Inline graphic (assumed to be independent, identically distributed and Gaussian in each channel) , and Inline graphic is the variance of the encoding noise Inline graphic in each channel Inline graphic (and of the encoding noise Inline graphic in each Inline graphic since different encoding noise channels are also assumed to be independently and identically distributed).

Note that the total noise at output neuron Inline graphic is Inline graphic. One effect of the encoding transform Inline graphic is that noise corrupting different output neurons can be correlated, even when the original input noise is independent. The additional encoding noise Inline graphic could also be correlated in different output neurons, since it could also reflect a common origin in intermediate stages of the encoding processes. Our assumption of independence between Inline graphic and Inline graphic for Inline graphic is thus a simplification for mathematical convenience.

Since all the variables are assumed to be Gaussian, each output Inline graphic extracts the following amount of information

graphic file with name pcbi.1002123.e186.jpg

about the input Inline graphic and has an output power Inline graphic. Since different output channels Inline graphic from different Inline graphic are decorrelated from each other, the quantity Inline graphic in equation (4) is

graphic file with name pcbi.1002123.e192.jpg (7)

One can then verify that Inline graphic in equation (6) indeed minimizes this Inline graphic since Inline graphic at that value. Note that if Inline graphic is the amplitude of a moving ripple indexed by Inline graphic, Inline graphic will be the sensitivity of the neuron to the moving ripple.

We can write these two steps as the product Inline graphic, where Inline graphic is the principal component transform, and Inline graphic performs the gain control. Inline graphic is a diagonal matrix with diagonal elements Inline graphic. The net output is then Inline graphic. Consider imposing on this transform an orthonormal or unitary transform Inline graphic (with Inline graphic), the third step in building the efficient coding filter Inline graphic, giving Inline graphic. It follows [34] from the properties of unitary matrices that neither the first term nor the second term in Inline graphic in equation (4) will be affected by Inline graphic (at least when signal and noise are Gaussian and when the components of Inline graphic are independent and identically distributed).

Each row vector of the matrix Inline graphic determines the receptive field of a particular output channel or neuron. Without Inline graphic, Inline graphic would specify receptive fields that would be gain controlled eigenvectors or principal components of the input correlation matrix. For example, they would look like ripples covering the entire spectro-temporal range. An appropriate choice of non-trivial Inline graphic will alter the receptive field shape dramatically, giving rise to receptive field properties found in real neurons such as a finite span in input channel space. For example, if we consider only the input frequency channels Inline graphic for auditory inputs and omit the time dimension, we may prefer that the STRF for an output neuron to be selective to only a finite band of input frequencies such that the neural responses Inline graphic resemble periphery inputs Inline graphic while maintaining coding efficiency. It can be shown [34], [35] that this can be achieved by choosing Inline graphic, such that Inline graphic. We will use this choice, Inline graphic, in building our STRF in frequency domain. However, insensitive to the exact form of Inline graphic, the critical feature of the STRF comes from the gain Inline graphic specified in the second step of the encoding model (as long as one does not impose additional computational goals that may restrict the final STRFs, see Discussion). We will show later that Inline graphic often corresponds to the modulation transfer functions (MTFs, also called ripple transfer function, RTF,in different literatures) of the STRFs.

We now apply this general framework to the case of auditory encoding. Sound spectrogram Inline graphic is derived from the sound waveform Inline graphic as follows. The first step is to perform a temporally-windowed Fourier transform of Inline graphic to obtain the sound spectrum Inline graphic as a function of time, where Inline graphic is a temporal window function (e.g., Inline graphic for Inline graphic, Inline graphic otherwise). Since the cochlea performs approximately a log scale frequency analysis, we first let Inline graphic to obtain Inline graphic (although the more accurate form would be Inline graphic [53]). Then the input power in Inline graphic is Inline graphic. One may employ a further logarithmic transform Inline graphic to characterize the cochlear response better (through capturing the compressive input/output transform realized by processes in the basilar membrane and hair cells) [54], [55]. However, this further logarithmic transform is not essential for our formulation, and, as pointed out previously [56], it does not significantly affect the qualitative characteristics of the empirical STRFs. If one omits this logarithmic transform, then Inline graphic. We then subtract the mean Inline graphic from Inline graphic, and, for simplicity, denote the resulting zero mean signal still by Inline graphic, as in the definition of STRF. We next consider discrete samples Inline graphic of the continuous Inline graphic. This leads to the input correlation matrix Inline graphic.

Finally, we follow the three encoding steps above to obtain the optimal encoding transform as Inline graphic. In the sub-section “The spectral filter SRF”, we discuss the simple case in which the temporal dimension Inline graphic is omitted. Then, the input vector (equation (5)) is Inline graphic, and the input correlation matrix is Inline graphic. The efficient encoding procedure specifies the optimal spectral receptive field (SRF) Inline graphic for neuron Inline graphic, with Inline graphic. When the temporal dimension is included Inline graphic, Inline graphic, and efficient coding specifies the optimal STRF as input weights or selectivity associated with the spectrogram Inline graphic.

It is apparent that the optimal SRF and STRF depend on input statistics via the input correlation Inline graphic and the input SNR (through the steps 1 and 2 in the encoding scheme). Therefore, when the stimulus ensemble changes, altering the input correlations and signal intensity, the form of the encoding receptive field should adapt in order to maintain encoding optimality. We propose that it is this that explains the input ensemble dependence of the STRFs.

A special class of input statistics has translation invariant correlations, i.e., with Inline graphic depending only on the differences Inline graphic (quantified in octaves) and Inline graphic. This is a reasonable approximation of the input correlations in natural auditory scenes under two conditions. The first is that a local frequency range is considered that is not much larger than the range of the frequencies to which a neuron is sensitive, i.e., in the perspective of a neuron, the dependence of Inline graphic on the frequency is mainly through Inline graphic. This is analogous to approximating spatial correlation of visual inputs as translation invariant to understand the retinal ganglion cell's spatial receptive fields although the spatial sampling density varies substantially with input eccentricity [31], [34]. The second is that the environment is statistically stationary, as then the correlations in time depend only on the temporal difference Inline graphic. It can then be shown that [34] the principal components are Inline graphic, each of which has a 2D modulation frequency Inline graphic, which can be indexed by Inline graphic. The first encoding step is then a 2D Fourier transform Inline graphic of the input Inline graphic to obtain Inline graphic. Meanwhile, the original input can be written as Inline graphic, i.e., as a weighted sum of the moving ripples [19]. The second encoding step determines the gains for the ripple amplitudes Inline graphic [34] as

graphic file with name pcbi.1002123.e271.jpg (8)

i.e., replacing Inline graphic and Inline graphic in equation (6) by the corresponding Inline graphic and Inline graphic. If Inline graphic is chosen as the inverse Fourier transform

graphic file with name pcbi.1002123.e277.jpg (9)

with an extra phase function Inline graphic, then the encoding transform is Inline graphic. This gives

graphic file with name pcbi.1002123.e280.jpg (10)

which depends only on the differences Inline graphic and Inline graphic. Applying this transform to input Inline graphic to give output Inline graphic, we see, by comparison with equation (1), that the STRF is Inline graphic. This is a temporal filter tuned to sound frequency with a tuning pattern governed by Inline graphic, and centered around frequency Inline graphic. Changing the center frequency from Inline graphic to Inline graphic is like shifting from one output neuron Inline graphic to another neuron Inline graphic. Altering the phase Inline graphic in equation (9) alters the STRF shape, in particular to ensure its temporal causality. In physiology, modulation tuning function (MTF) is often mentioned as the Fourier transform of auditory receptive field [19]. Therefore, it is clear from equation (10) that the gain profile Inline graphic, which is determined by efficient coding, corresponds to the magnitude of the MTF. However, the shape of an STRF is determined by the phase as well as the magnitude of the MTF, and efficient coding does not strongly constrain the phase. Therefore, while we will illustrate the general properties of some example STRFs predicted by the theory by choosing particular Inline graphic transforms (governed by the additional requirements of spectro-temporal locality and causality), in the Results, we will generally compare physiological data to the magnitudes of the MTFs that the theory predicts.

In the Results, we will discuss the efficient coding framework for situations both with (e.g., to study temporal aspects of STRFs) and without (e.g., to study their spectral aspects) translation invariance in input statistics.

Results

To illustrate how the framework explains and predicts physiological experiments, we first discuss a few examples when the temporal or the spectral dimension is omitted, and then show a full spectro-temporal STRF.

The spectral filter SRF

We first omit time, treating the input Inline graphic as varying only in frequency. In this case, the encoding filter reduces from being an STRF to an SRF. We take Inline graphic as one of 250 discrete values Inline graphic, from low to high frequencies; hence input Inline graphic is a one dimensional vector Inline graphic. In simulations, input sample Inline graphic is generated by smoothing a random noise vector Inline graphic (Figure 3A), with all the components Inline graphic taken to be independent, zero mean, unit variance, Gaussian noise. Specifically

graphic file with name pcbi.1002123.e303.jpg (11)

where Inline graphic is a factor to scale the overall input power intensity, and Inline graphic is the smoothing matrix with elements

graphic file with name pcbi.1002123.e306.jpg (12)

explained in detail below. Here Inline graphic controls the scale of the signal Inline graphic, which decays with Inline graphic (like in an environment in which high frequency sounds do not propagate well), and Inline graphic is a normalized smoothing matrix with elements Inline graphic, in which

graphic file with name pcbi.1002123.e312.jpg (13)

Inline graphic is a normalization constant, and Inline graphic controls the range of frequency difference Inline graphic for significant correlation coefficient between the variation of Inline graphic and that of Inline graphic.

Figure 3. Simulation of the efficient spectral kernel SRF, when the temporal dimension is omitted.

Figure 3

(A) 250 samples of input spectra Inline graphic, each of which is smoothed Gaussian white noise in the frequency domain (equations (11–13), Inline graphic). (B) Correlation between different frequency channels Inline graphic. Left: Correlation Inline graphic; Right: an zoomed-in view, as Inline graphic vs Inline graphic. (C) Ten examples of eigenvectors Inline graphic of the correlation matrix Inline graphic in B; each is an independent component in Inline graphic. Smaller indices Inline graphic are associated with larger eigenvalues. (D) Gain profile (peaking at Inline graphic), and signal and noise power in decorrelated channels. (E) Four examples (Inline graphic, Inline graphic, Inline graphic, and Inline graphic) of spectral receptive fields Inline graphic; each prefers input frequencies around Inline graphic.

Consequently, each Inline graphic is also a zero mean Gaussian random variable, and the input correlations comprise a 250×250 matrix Inline graphic. One could also estimate Inline graphic from input samples Inline graphic (as when animals adapt their auditory system to environmental sound through experience), in which case element Inline graphic. Figure 3B illustrates Inline graphic (obtained numerically from 250 samples of Inline graphic in Figure 3A, of course one could use more than 250 samples to estimate Inline graphic) for Inline graphic. The correlation Inline graphic scales with strengths of the original signals Inline graphic and Inline graphic through the scales Inline graphic and Inline graphic, and so decays with frequency Inline graphic and Inline graphic. Thus the statistics of the stimulus ensemble are not translation invariant in the spectral frequency Inline graphic. Nevertheless, the correlation coefficient

graphic file with name pcbi.1002123.e352.jpg

does depend mainly on the (frequency) difference Inline graphic, since Inline graphic is almost independent of Inline graphic and Inline graphic depends mainly on Inline graphic except for the very small or very large Inline graphic and Inline graphic. This is evident in the fact that the rate of decay of Inline graphic with the difference Inline graphic in Figure 3B is almost constant. Since the stimulus ensemble is not translation invariant, we will use the general formulation to obtain the SRF. From Inline graphic, we obtain its 250 eigenvalues and the corresponding eigenvectors. Each of these is a vector with 250 components. We list them in the order of descending eigenvalues, denoting the Inline graphic eigenvector as Inline graphic, and placing it as the Inline graphic row vector of the Inline graphic transform matrix. Figure 3C depicts the eigenvectors for Inline graphic, where smaller Inline graphic is associated with a larger eigenvalue. Each principal component or eigenvector can be seen as a special input spectrum pattern Inline graphic, while a general input Inline graphic is a linear sum of the principal components with weights Inline graphic. The first encoding step is thus a transformation of the original input Inline graphic by Inline graphic to obtain the decorrelated signal Inline graphic, for Inline graphic. The average power in Inline graphic is the Inline graphic eigenvalue of matrix Inline graphic

graphic file with name pcbi.1002123.e379.jpg

The eigenvectors look roughly like oscillating waveforms (spectral oscillations) with different oscillation rates, and are comparable to the sinusoidal bases in the Fourier transform. They also resemble the “ripples” used in physiological experiments. This is because the input correlations are roughly translation invariant, at least within a small range of frequencies in which the signal power Inline graphic is roughly independent of Inline graphic (just like in vision when the statistics of inputs sampled at the retina can be seen as roughly translation invariant within a local region). Also note that smaller or larger Inline graphic is associated with eigenvectors with fewer or more oscillations. This makes Inline graphic relate monotonically to the spectral modulation frequency (corresponding to the “ripple frequency” Inline graphic in physiological experiments). Larger eigenvalues, i.e., larger signal powers Inline graphic, are associated with fewer spectral modulations or smaller indices Inline graphic, because inputs of more similar sound frequencies are more correlated with each other, i.e., Inline graphic decreases with increasing Inline graphic. The analogy between the eigenvectors and the Fourier bases can be understood as follows: if Inline graphic is strictly translation invariant, then the eigenvectors are sine waves with different spectral modulation frequencies Inline graphic. The eigenvalues are the Fourier transforms of Inline graphic, and hence they decrease with the modulation frequency Inline graphic because Inline graphic is non-negative and decreases with increasing Inline graphic.

The second encoding step is to assign the gain Inline graphic to each of these channels Inline graphic according to equation (6), giving Inline graphic (see Figure 3D; Inline graphic, Inline graphic and Inline graphic). Note that while the signal power Inline graphic decreases with increasing Inline graphic, the gain magnitude Inline graphic first increases with Inline graphic and then decreases and drops to zero at higher Inline graphic.

The gain for small Inline graphic is low since the SNR Inline graphic is high enough to make amplifying Inline graphic less necessary. From equation (6) [34],

graphic file with name pcbi.1002123.e409.jpg (14)

This implies that Inline graphic for sufficiently large SNRs. When each principal component Inline graphic is a modulation frequency mode, this gain profile Inline graphic is often called whitening. At smaller signal powers, the gain increases so as to utilize the channel's dynamic range fully. However, when SNR is too small, for example, when noise power is higher than signal power Inline graphic, gain decreases with decreasing Inline graphic [34]. This is because such input components are dominated by noise, and amplifying noise increases neural cost. Thus, in general, when Inline graphic decreases with increasing Inline graphic, the gain profile has a band-pass shape, first increasing, and then decreasing with increasing Inline graphic (see the red curve in Figure 3D). The peak of the gain occurs at Inline graphic, where Inline graphic.

Third, taking Inline graphic in order to localize the receptive Fields as best as possible, the overall encoding transform is Inline graphic. Here, the gain matrix is diagonal having elements Inline graphic. When Inline graphic (as when the eigenvectors are real and othornormalized)

graphic file with name pcbi.1002123.e424.jpg

As the overall encoding transform gives outputs Inline graphic, where Inline graphic, the Inline graphic output neuron Inline graphic has its SRF as a vector of weights for inputs Inline graphic of various frequencies Inline graphic

graphic file with name pcbi.1002123.e431.jpg

It can thus be seen as a weighted sum of the eigenvectors Inline graphic of the input correlation matrix, with weights Inline graphic for output neuron Inline graphic. Figure 3E shows SRFs for four different output neurons (or channels Inline graphic). These SRFs have different preferred frequencies Inline graphic, so that the preferred frequencies of all the output neurons span the whole input frequency range. The shapes of the SRF depend on the input statistics via the dependence of Inline graphic and Inline graphic on the input correlation matrix Inline graphic. In particular, for sufficiently high input SNR, while a neuron is excited by its preferred frequency, it is suppressed by nearby frequencies. This form of contrast enhancement achieves a measure of decorrelation between neighboring output neurons that would otherwise reflect the strong correlations between neighboring frequencies. For SRFs tuned to higher frequencies, the center excitatory regions are larger and the surround suppression is weaker. This is because SNRs are weaker for higher frequency inputs (the dependency of SRF on SNR will be discussed in the next sub-section). If the input statistics are strictly translation invariant, the SRFs for different output channels will have the same shape, and will just be centered on different frequencies.

Adaptation of SRF to input signal-to-noise ratio

When sound intensity decreases, the basilar membrane in the cochlea undergoes a smaller vibration. This decreases the magnitudes of input signals Inline graphic, and so, if the level of the noise stays unchanged, the signal-to-noise ratio Inline graphic will decrease. This will change the optimal encoding gain Inline graphic via equation (6), and thus change the final SRFs. In our example, we simulate the change in input intensity by changing Inline graphic in equation (11).

Figure 4A shows three example input intensity profiles Inline graphic, and the corresponding gain profiles Inline graphic. While an overall change of input intensity merely scales the profile Inline graphic up and down, the gain profile Inline graphic does not trivially scale up and down. When input intensity decreases, the Inline graphic at which Inline graphic becomes smaller, thereby decreasing the Inline graphic at which Inline graphic peaks. Consequently, the gain profile turns from being band-pass to being low-pass (Figure 4A).

Figure 4. The effect of signal-to-noise ratio (SNR) on gain Inline graphic and the spectral receptive field (SRF).

Figure 4

Same stimulus ensemble as in Figure 3A except the overall SNR has been scaled by Inline graphic. (A) Gain control (red), signal (blue), and noise power (black) under high, medium and low SNR. (B) The corresponding SRFs of one output neuron (channel #120) in the three SNR cases.

The non-zero gain at higher Inline graphic implies sensitivity to weaker principal components with more spectral oscillations (or higher “ripple frequencies”). Thus, as input intensity decreases, the overall SRF filter changes in two ways (Figure 4B): (1) it fluctuates less (i.e., has fewer excitatory and inhibitory regions, and with decreased strength inhibitory regions); (2) the width of the excitatory and inhibitory regions increases, as the result of losing contributions from spectral modulations Inline graphic with higher modulation frequencies.

The insights from Figure 4B can help to understand the difference between the four SRFs in Figure 3E. Given the Inline graphic as in Figure 3, one may divide the whole sound frequency range into two ranges of equal bandwidth, one for the lower and the other for the higher Inline graphic's, and treat the two ranges as if they were two different stimulus ensembles. If one ignores the overall sound frequency difference between these two ensembles, then these two ensembles differ from each other only in their SNRs, with a higher SNR for the ensemble for the lower sound frequencies Inline graphic. In this perspective, one can understand why a SRF tuned to the lower frequencies in Figure 3E has a narrower excitatory region and a stronger surround suppression than a SRF tuned to higher frequencies, using the insights gained from Figure 4. (In comparing Figure 4B with Figure 3E, one should note that each SRF in Figure 4B is depicted by zooming to the frequency region around the preferred frequency Inline graphic of the SRF.) One may even view the four SRFs in Figure 3E as if they were each exposed to one of the four different stimulus ensembles that differ in SNRs (and in sound frequency Inline graphic, and we ignore this difference). Within each of these stimulus ensembles, the input statistics may be seen as approximately translation invariant, since Inline graphic is almost independent of Inline graphic and the correlation Inline graphic is approximately only a function of the frequency difference Inline graphic within a small range of frequency Inline graphic.

Adaptation of SRF to input signal correlation

As well as adapting to the input SNR, the SRF can adapt to the signal correlations in the input. These can also vary across auditory environments. We generate two stimulus ensembles (Inline graphic and Inline graphic) based on equation (11), with short and long range (in frequency space) correlations between inputs Inline graphic and Inline graphic of different sound frequencies. We do this by setting the smoothing length Inline graphic in equation (13) to be Inline graphic and Inline graphic. Since short and long range correlations give respectively smaller and larger correlations or degrees of input redundancy, in this paper, we use the terms short/long-range and small/large correlations interchangeably. The two stimulus ensembles are made to have the same overall signal power Inline graphic, and consequently their Inline graphic vs. Inline graphic curves cross each other at a particular frequency Inline graphic (Figure 5A). In Inline graphic, signal power Inline graphic is more concentrated in lower Inline graphic's, and the “bandwidth” of gain, i.e., the range of Inline graphic's with substantial Inline graphic, is consequently narrower.

Figure 5. Adaptation of gain Inline graphic and spectral filter kernel SRF to input correlations under high/low SNR.

Figure 5

Same input ensemble as that in Figure 3A, except that the smoothing parameter, Inline graphic and Inline graphic, are set for short and long range correlations, respectively. Analogous figure format as in Figure 4, with added illustrations of the adaptation to input correlations. The thick and thin curves correspond to quantities for inputs with large and small correlations respectively, blue/red curves plot signal power Inline graphic and gain Inline graphic respectively.

If Inline graphic at Inline graphic, the Inline graphic at which signal power Inline graphic is larger in Inline graphic (Figure 5A, upper panel, Inline graphic, Inline graphic = 1, Inline graphic). Thus, the frequency Inline graphic at which gain Inline graphic peaks is also larger in Inline graphic. If the SNR is lower, so that Inline graphic at Inline graphic, then Inline graphic is instead smaller in Inline graphic than in Inline graphic. However, this is less apparent since gain profiles in both ensembles become “low-pass” in Inline graphic implying that there is no obvious “peak position” (Figure 5A, lower panel, Inline graphic ). Nevertheless, the cutoff frequency Inline graphic where Inline graphic is always smaller for Inline graphic (Figure 5A), and the optimal SRFs for it consequently enjoy a greater spectral extent (i.e., the SRFs are non-zero for a larger range of Inline graphic (Figure 5B). Intuition for this effect is that for it to be effective as either a contrast enhancing filter at a high SNR, or a smoothing filter at a low SNR, the SRF's spectral extent should match the range of the input correlations.

The temporal filter TRF

We can similarly ignore the frequency dimension of the input to understand the temporal receptive field (TRF). This is determined from the way Inline graphic+noise, the input temporal sequence Inline graphic is transformed to the output temporal sequence Inline graphic. In a statistically stable auditory environment, the input correlation should be time shift invariant, i.e., Inline graphic should depend only on Inline graphic. Denote Inline graphic. Then, the de-correlating transform Inline graphic should just be a Fourier transform Inline graphic with the principal component Inline graphic being the Fourier Amplitude of the relevant mode. Here we use index Inline graphic instead of k to denote the principal component to signify the association with the temporal Fourier amplitude. The average power Inline graphic is simply the Fourier transform of the input temporal correlation. If we set Inline graphic in equation (12) to generate inputs with shift invariant correlation, then Inline graphic where Inline graphic is the Fourier amplitude of Inline graphic. The gain control Inline graphic in the second encoding step is determined by equation (6) (substituting Inline graphic for Inline graphic). The final TRF will be the transform Inline graphic given an appropriate choice of Inline graphic.

However, the actual procedure to obtain the TRF is trickier in that the Inline graphic transform in the third encoding step to give the overall Inline graphic has to be chosen to satisfy the causality constraint. That is, the output Inline graphic at time Inline graphic should only depend on past input Inline graphic for Inline graphic, i.e., Inline graphic for Inline graphic. Moreover, it is better for the TRF to have a short temporal span and latency, an outcome that can be achieved by assuming that the optimal temporal filter Inline graphic has a minimum phase-shift [57]. Short latency can feasibly be implemented by neural synaptic and membrane mechanisms that typically have time constants no longer than a few hundred milliseconds [58]. Hence, these offer credible constraints on the TRF. Note that if we choose Inline graphic, i.e., Inline graphic, then Inline graphicwould be an even function of Inline graphic and thus not a causal temporal filter given gains Inline graphic that are all real. The filter Inline graphic can be made causal and minimal phase by choosing another Inline graphic simply as Inline graphic with a particular phase function Inline graphic, so that Inline graphic. Instead of directly obtaining this phase function Inline graphic, we can also equivalently obtain this minimum phase shift causal filter by transforming the acausal Inline graphic using standard procedures in signal processing theory as follows (see [57] for the proof). Given a non-causal filter Inline graphic with finite non-zero values in discrete time Inline graphic, first let Inline graphic to make a causal filter Inline graphic whose nonzero values are at Inline graphic. Second define

graphic file with name pcbi.1002123.e555.jpg

Among the Inline graphic complex roots of the equation Inline graphic, let Inline graphic denote the roots with Inline graphic and Inline graphic the other roots with Inline graphic. Third, let

graphic file with name pcbi.1002123.e562.jpg

The coefficients Inline graphic, Inline graphic are the values of the desired causal minimum phase filter. One example of this process is demonstrated in Figure 6A (before the minimum phase adjustment) and Figure 6B (after the minimum phase adjustment)(Inline graphic).

Figure 6. Simulation of temporal receptive field TRF, when the spectral dimension is omitted.

Figure 6

The same stimulus ensemble is used as in Figure 3A, except the factor Inline graphic in equation (12) to ensure translation invariance of correlation. (A;B) Demonstration of transforming an acausal temporal filter (A) to its causal minimum-phase counterpart (B) at a relatively high input SNR. (C) TRF for a relatively low input SNR.

The temporal kernel also depends on the SNR and the input correlations. The change in Inline graphic when sound intensity becomes lower is similar to that in the spectral case: from band-pass to low-pass. A temporal kernel under lower SNR is demonstrated in Figure 6C. The changes in Inline graphic and TRF with input correlations are analogous to those in the spectral case as well (figure not shown).

The two dimensional STRF

Finally, we show examples of the two dimensional Inline graphic. Here, we extended the assumption of shift invariance in the input correlations to the spectral dimension for the convenience of calculation. This assumption is reasonable when individual STRFs cover sufficiently small ranges of frequencies that the correlation in the spectral space is almost translation invariant within that range, as we see in our SRF examples. Then, spectral and temporal dimensions can be de-correlated at the same time by performing a 2-D Fourier transform on inputs Inline graphic, with the moving ripples as decorrelated channels, each denoted by a 2D index Inline graphic marking the spectral and temporal modulation frequencies.

Let the signal power in the de-correlated channels Inline graphic for input Inline graphic be Inline graphic. Here, Inline graphic typically decays with modulation frequency Inline graphic and Inline graphic since most natural inputs have input correlation Inline graphic that decays with Inline graphic and Inline graphic. Inline graphic is a scale factor that controls the SNR. We use the following example in our simulations

graphic file with name pcbi.1002123.e582.jpg (15)

where Inline graphic, Inline graphic and Inline graphic are parameters that control input correlation, and Inline graphic is a normalization factor. Figure 7A shows an example with Inline graphic According to equation (8), the gain Inline graphic can be obtained as shown in Figure 7B (Inline graphic, Inline graphic, and Inline graphic). In particular, in the frequency range Inline graphic in which noise is negligible relative to the signal, the gain

graphic file with name pcbi.1002123.e593.jpg (16)

specifies the whitening filter of equation (14). This gain profile changes from being a band-pass to a low-pass two dimensional filter as the SNR is lowered.

Figure 7. The 2D STRFs/MTFs implied by efficient coding and found physiologically.

Figure 7

(A) input power Inline graphic (equation (15), Inline graphic, Inline graphic) in decorrelated channels. (B, C) MTF profile Inline graphic and the corresponding STRFs with two SNRs (scaled by Inline graphic's). (D) Inline graphic and STRF as in B;C (when Inline graphic) except with larger input correlations (Inline graphic, Inline graphic in equation (15)). (E;F) Modulation transfer functions (MTFs) and their properties at low and high input sound intensities averaged over 40 IC neurons from Lesica and Grothe [7]. Here, Inline graphic is the spectral-temporal modulation frequency where the MTF peaks. Modulation frequencies in E and F are normalized by the same value across cells and intensities. Error bars in E indicate standard errors. The magnitude patterns of the MTFs for all neurons are normalized to peak value Inline graphic. Their average across neurons at each input intensity is then normalized to the same peak value and displayed in F.

As we noted before, efficient coding predicts the gain Inline graphic, or the modulation transfer function (MTF), but does not precisely determine the STRF shape. The latter depends on the less constrained Inline graphic transform. Therefore, we qualitatively compare our Inline graphic for two different Inline graphic's with the MTFs obtained from physiological experiments under two different input sound levels. Figure 7E and Figure 7F are obtained from data on STRFs of 40 cells in the inferior colliculus of animals exposed to natural rain sound at low and high sound levels [7]. We first did a two-dimensional Fourier transform on the STRF of each cell to obtain its MTF. Then the spectral modulation frequency Inline graphic and the temporal modulation frequency Inline graphic where the MTF has its maximum value were identified and normalized by a fixed value across cells. The average Inline graphic and Inline graphic across all cells are shown in Figure 7E. These two “peak frequencies” both increased when sound intensity increased. The physiological MTF averaged across all cells (Figure 7F) also becomes higher pass, both spectrally and temporally, under higher sound intensities, as predicted by efficient coding (Figure 7B).

For completeness, we illustrate in Figure 7C the model STRFs from the gain profiles Inline graphic, using an inverse Fourier transform with a proper phase function Inline graphic as the candidate Inline graphic matrix. Specifically, the model STRF is

graphic file with name pcbi.1002123.e616.jpg

where the phase Inline graphic is chosen to make the STRF causal, and with minimum phase shifts in the temporal dimension. In practice, the STRF is obtained as follows, by extending our method for obtaining the causal 1-D TRF. For each Inline graphic, we first obtain the temporal acausal filter

graphic file with name pcbi.1002123.e619.jpg

and then transformed this into a causal minimum phase filter Inline graphic as for the one dimensional TRF filter. The final two-dimensional STRF is then

graphic file with name pcbi.1002123.e621.jpg

In general the model STRF has its highest amplitude at the preferred frequency on the spectral axis and for short latencies (i.e., the early part of the temporal axis). At low Inline graphic, the STRF has a large excitatory region and a weak inhibitory surround (Figure 7C). At larger Inline graphic, the STRF involves more excitatory and inhibitory regions with an increased inhibitory strength. Overall this has a more band-pass gain profile. Meanwhile, the bandwidth for the gain Inline graphic increases with Inline graphic, thus shrinking the width of the main excitatory region. Therefore, adaptation to higher sound levels makes the frequency-time tuning curve sharper, or equivalently more narrowly tuned and so, at a single cell level, supporting a more precise read out of the time and frequency of auditory input. Qualitatively, physiologically observed STRFs adapt to the input intensity in the same way [7] (also see [14]).

The model also predicts changes to MTFs and STRFs for different input correlations. Figure 7D shows the gain function Inline graphic and STRF for an example in which the input has longer-range correlations in both spectral and temporal dimensions (we set Inline graphic while holding Inline graphic as in the high SNR case in Figure 7B and 7C). The peak modulation frequency in Inline graphic is decreased, and the excitatory region is wider compared with counterparts in Figure 7B and 7C at high SNR. This is consistent with our 1-D results in the spectral dimension (Figure 5).

Discussion

Summary of findings and predictions

In summary, this study set out to understand the computational role of auditory spectro-temporal receptive fields (STRFs). In particular, we generalized previous work [26] by proposing that STRFs are efficient codes for inputs which retain maximal information for a given neural cost associated with the output. We analyzed this proposal in detail for the case that input signals and noise are approximated as Gaussian. Mathematically, the STRF transform can be shown [34] to be composed of three abstract steps: input de-correlation, gain control, and multiplexing. For typical input statistics that are shift-invariant in sound frequency and time, the transform can be compared with two sorts of experimental data. First, gain control corresponds to the magnitude of the modulation transfer function of the STRFs. Second, by choosing the form of multiplexing to arrange the STRFs to have minimal phase, one can predict their full form. That the STRFs or the MTFs adapt to input statistics is a direct prediction of this efficient coding framework, since both the information conveyed and the neural coding cost depend on these statistics. Our efficient coding proposal is thus experimentally testable.

We made two particular predictions about the adaptation of the STRFs, one associated with input intensity, the other with input correlation. For the case of intensity, we predicted that the MTF of the STRFs should become more low pass when input intensity is lowered. Intuitively, as long as inputs at nearby frequencies and times are correlated, a low pass filter smoothes the input to reduce noise, whereas a band pass filter extracts differences between input frequencies and times to remove redundancy. Compared with a band pass STRF, a low pass STRF has one or all of the following characteristics: (1) it has fewer excitatory and inhibitory regions; (2) each excitatory/inhibitory region has a larger size; (3) the secondary or opponent region, e.g., the inhibitory region for a STRF with an primary excitatory region, is weaker. All three characteristics help to smooth noise, a necessary strategy for weak inputs. In contrast, a band-pass filter has the opposite characteristics, so as not to increase the neural cost due to the transmission of redundant input information. These predictions are analogous to those seen in adaptations of visual coding to input SNR [29], [33], [34], [51], [52]. They also generalize previous accounts of the adaptation of the temporal auditory filter [26] to input intensity.

For the case of adaptation to input correlation, our framework predicts that the sizes of the excitatory and inhibitory regions of the STRFs should adapt to the range of input correlations. That is, input ensembles with longer range correlations in frequency and/or time should lead to STRFs with larger excitatory and inhibitory regions in the corresponding feature dimensions. Longer range input correlations are typically equivalent to greater input modulation power in the lower modulation frequency range in the stimulus ensemble. Equally, larger excitatory/inhibitory regions in the STRF are typically equivalent to its MTF being tuned to lower modulation frequencies. Thus, our prediction can be stated equivalently as saying that a stimulus ensemble with greater input power in the lower modulation frequency range, spectrally and/or temporally, should lead to neural MTFs tuned to the lower modulation frequency ranges. We demonstrated this form of adaptation for SRFs in Figure 5, and for STRFs in Figure 7. In particular, with a sufficiently high SNR, the MTF profile Inline graphic should whiten the ensemble specific input modulation power Inline graphic.

Experimental evidence and tests of the predictions

Various experimental observations pertain to these predictions about adaptation to input intensity. Lesica and Grothe [7] presented natural rain sounds to gerbils and found that, for a majority of cells in inferior colliculus (IC), the STRFs have more excitatory/inhibitory regions for higher input sound levels, and only have excitatory regions, or at least very weak inhibitory regions for lower sound levels. Nagel and Doupe [14] conducted a similar study in field L of songbirds, an area analogous to mammalian auditory cortex. In both spectral and temporal dimensions, they found that the excitatory/inhibitory regions of the STRFs become smaller and sharper under higher sound intensity, while the number of such regions do not increase. These results paralleled those of an earlier study in which they only examined the temporal dimension of the receptive fields [58]. Both studies are consistent with our proposal that the MTF changes from lower to higher pass when input intensity (and hence, SNR) increases. They thus offer complementary confirmation of our predictions.

As mentioned in the Introduction, Lesica and Grothe [26] also examined the adaptation of the temporal receptive field(TRF) to vocalizations and ambient noises. They found that the TRF changed from being bandpass to lowpass when noise was mixed into the ensemble of vocalizations, and accounted for this finding in terms of efficient temporal coding. Their result can be understood as a special case of adaptation to SNR in our framework, focusing on the temporal dimension of the STRF, and treating the addition of noise as a reduction in input SNR. According to the principle of efficient coding, the spectral receptive field should also have changed from bandpass to lowpass when this noise was added.

There are as yet few physiological experiments that pertain to our prediction about adaptation to input correlations. One study by Woolley et al [11] examined the STRFs of midbrain neurons in zebra finch in response to bird songs or modulation-limited noise. Compared to that of the noise, the input modulation power of the songs is more concentrated in lower modulation frequencies. The MTFs of the STRFs matched the corresponding modulation frequency spans, consistent with our theoretical prediction.

The studies by Woolley et al [11] and Lesica and Grothe [26] could be extended to different ensembles of natural stimuli, e.g., songs, speech, animal vocalization, and environmental background, each with its own particular input correlations [59]. Findings from such extended studies would provide a stern test of the efficient coding framework. Generally, the input modulation power Inline graphic in natural sounds decays with increasing modulation frequency Inline graphic, at a rate that is specific to the ensemble [59]. Ensembles with faster decays have longer range input correlations (or larger correlations), as modelled in our Figure 5A and Figure 7BCD. We predict that this decay rate in Inline graphic should dictate the shape of the neural MTFs Inline graphic, such that ensembles with faster decay should lead to neural MTFs focusing on lower modulation frequency ranges. In particular, for high input SNR, the MTF profile should be that of a whitening filter Inline graphic, with the upper frequency limit Inline graphic for this whitening (beyond which MTF quickly decays to zero) being around the frequency at which Inline graphic is comparable to the power level of the noise. The recent study by Rodriguez et al [59] showed that inferior colliculus (IC) neurons, when examined collectively as a population, do seem to whiten typical natural stimuli, in that the population MTF Inline graphic increases with frequency Inline graphic (up to a high frequency limit). This is to be expected for an efficient code, since natural input power Inline graphic decreases with frequency. However, the neural STRFs in this study were obtained (using the moving ripple stimuli) without specific adaptation to any particular natural stimulus ensemble. We predict that if the STRFs had been measured under adaptation to the natural sounds for high SNR, then the neural MTF profile, at a neural population level if not at individual neuron level, should be ensemble specific, i.e., whitening the input power Inline graphic of the adapting stimuli.

The neural implementation of the efficient STRF and its adaptations

We seek of the overall effective STRF rather than its realization. Thus, it is important to note that the three separate steps of our mathematical analysis of the efficient STRFs are purely abstract. They do not correspond to an actual physiological implementation. In principle, when a receptive field is entirely linear, it can as well be implemented in a single step, as in multiple linear steps in a cascade. Meanwhile, the observation that STRFs adapt to changes in the statistics of auditory inputs, and indeed that visual receptive fields expand when the visual environment changes from bright outdoors to dark indoors [52], attest to the availability of the mechanisms for implementing (and thus adapting) efficient sensory coding.

We speculate that the adaptation of a STRF in a midbrain auditory neuron is likely to involve gain control in many intervening and distributed neural processes upstream along the auditory pathway [60]. Even a simple adaptation of efficient coding, in the large monopolar cells (LMCs) in an insect compound eye to changes in the distribution of input contrasts in the visual environment, involves multiple stages of processes, some in the photoreceptors and others in lamina from the receptors to the LMCs [61]. Synaptic and intrinsic mechanism were also found in the adaptation of retinal bipolar and ganglion cells to temporal contrast [62], [63]. Considering the multiple synapses from the hair cells to IC or auditory cortex, and the many recurrent and feedback networks with both excitatory and inhibitory connections [64], [65] in this pathway (for example, medial olivocochlear (MOC) efferent effects [66]), we speculate that gain control processes are likely to include synaptic facilitation and depression and distributed channel based adaptations. They should collectively achieve the effective adaptation in the gain such as the Inline graphic in equation (6) and/or the underlying eigenmodes. Because there are multiple, redundant, and distributed synapses from the auditory periphery to the neuron whose STRF we model, a STRF could be implemented in multiple ways. Such implementational redundancy is likely to be needed to accommodate the many forms of adaptation that might be needed, given a limited degree of flexibility in any individual mechanism.

The timescale of STRF adaptation to sound levels or input SNRs should be less than several or tens of seconds, or even shorter, since, in the physiological experiments, the stimulus duration for one sound intensity level is 40 s in [7] and 5 s in [14], while adaptation to mixing noise into the vocalization inputs occurs within hundreds of milliseconds in [26]. Adaptation has been observed to occur over multiple time scales, ranging from tens of milliseconds to minutes in the fly visual system [67]. In the auditory systems, midbrain neurons adapt to sound levels within hundreds of milliseconds [68], [69], while cortical adaptation happens over multiple timescales and is likely to arise from network activities [70], [71]. We still know too little about the actual mechanisms for STRF adaptation [26] or sensory adaptation in general, although it has been suggested that channel based mechanisms at the cellular level are plausible candidates [67]. Understanding the computational roles of the STRFs should motivate future investigations of these mechanisms.

Limitations of the framework

As an initial attempt to understand the computational role of the STRFs, our framework has various limitations. First, the STRF model as a whole is quantitatively inaccurate since it specifies a linear mapping between sensory inputs and neural responses (in each adapted state). The accuracy could be improved in future work through the addition of a static nonlinearity after the STRF [6], [7]. However, this would not be expected to lead to a qualitative change in STRFs or their adaptation. Extensions to dynamic nonlinearities would be much more complex. Second, for analytical convenience, we assumed that the input statistics are Gaussian, meaning that there are no input signal correlations higher than second order. The same approximation was made for the case of efficient visual coding, in the absence of good information about higher order input correlations [30], [32], [34]. Subsequent work using independent component analysis (ICA) on natural visual images avoided the Gaussian assumption, leading to models of visual encoding in primary visual cortex V1 [72], [73]. This approach has been adopted to understand the STRFs in the auditory cortex [74] and avian primary auditory area field L [75], although it cannot predict adaptation to SNR and its whitening prediction does not go beyond that obtained under the Gaussian assumption. It is still controversial whether higher order statistics are the cause for the dramatic difference between the V1 encoding and that in the retina and the lateral geniculate nucleus [34]. Furthermore, higher order correlations in natural visual inputs contribute much less redundancy (measured in signal entropy) than second order correlations [36], [37], [38]. This may explain why the Gaussian assumption was not overly deleterious to the predictions of the efficient coding principle in vision. Although higher order correlations in auditory inputs are also poorly understood, they do cause auditory adaptation, e.g., in stimulus-specific adaptation to complex temporal patterns of tones [76]. To what extent higher order input statistics can influence auditory encoding remains to be answered in future studies.

Our focus on coding efficiency ignores aspects of auditory processing devoted to additional tasks such as sound source localization or stream segmentation. The observed STRFs may reflect elements of both efficient coding and requirements associated with these tasks. In fact, some variations are possible within the context of an efficient code. For instance, we have so far restricted ourselves by making all neurons share the same MTF profile predicted by efficient coding (by restricting the Inline graphic transform to that in equation (9)). Relaxing this restriction would allow other STRFs. In particular, different neurons in the coding population could be tuned to different modulation frequency regions within the Inline graphic extent covered by the overall MTF envelope Inline graphic, and could have different shapes. Accordingly, different STRFs could have different spectral bandwidths (or resolution) and shapes, in addition to preferring different center frequencies Inline graphic. Indeed, in the auditory cortex, different neurons exhibit different spectral resolutions, and even prefer different motion directions of the spectral ripples [77], [78], [19]. (Analogously, primary visual cortical neurons are tuned to multiple spatial sizes and prefer different orientations, a coding scheme that can be shown to be consistent with efficient coding [36].) Such a collection of STRFs could satisfy the joint goals of coding efficiency and detecting ecologically meaningful auditory objects (such as vocalizations). Diversity in the shape and bandwidth of the STRFs is already present, although perhaps less so, sub-cortically, e.g., in inferior colliculus [78]. When different neurons have different STRF bandwidths, our prediction that the input modulation power will be whitened by the neural MTFs should be modified, such that the ‘neural MTFs’ should mean the collective MTF of the whole neural population within a particular auditory stage (such as IC, see [59]).

There could be alternative formulations (other than equation (4)) of the efficient coding principle, in particular, in the formulation of the neural cost. Our formulation Inline graphic causes the degeneracy of the efficient coding solution, i.e., the existence of many choices of the equally efficient coding transforms, when the signals are gaussian. Other formulations of the neural cost could break this degeneracy. For example, formulation Inline graphic in terms of the summation of individual neural channel capacity (or entropy Inline graphic), or Inline graphic in terms of the total activity level, would generate neural codes to encourage very different MTFs for different neurons. In both audition and vision, the MTFs (in audition) and the contrast sensitivity functions (the vision analog of the MTFs) for different neurons tend to be similar in the sensory periphery (cochlear nucleus and retina), but they are increasingly disparate further towards the central brain. These changes could be caused by the different cost functions in the nervous system, or, as discussed in the previous paragraph, due to the breaking of the degeneracy by additional computational tasks further downstream along the sensory pathway.

Redundancy redunction and information preservation are two essential ingredients of the efficient coding principle. While this principle has been quite successful in understanding the retinal coding, it cannot explain the enormous increase in the redundancy of the visual coding in the primary visual cortex (in which the number of neurons are about 100 times as many as those in the retina) [34], nor the drastic loss of visual information outside the focus of attention in the higher visual areas without introducing task-dependent factors. It remains to be investigated how much and in what form the efficient coding will take further along the auditory pathway. One can expect that more processes will be devoted to solving specific auditory tasks, in addition to the task of sensory encoding, in the higher stages of auditory processing.

Concluding remarks

This study was partly inspired by the success of the efficient coding principle in understanding receptive fields in the early stages of visual processing, and the way these receptive fields adapt across sensory environments. Analogies between visual and auditory processes have been explored by previous researchers [79], and we expect that they can be carried further in higher level sensory processes including segmentation, selective attention [80], and even object recognition.

In conclusion, efficient coding provides a plausible computational interpretation of various recent experimental observations on STRFs, and notably the way they adapt to input environments. By making testable predictions, it motivates experimental directions which should hopefully lead to further insights and understanding.

Acknowledgments

We are very grateful to Nick Lesica for providing us with the STRF data of 40 inferior colliculus neurons [7], from which we obtained the physiological MTF plots in Figure 7. We would also like to thank very much Dr. Bo Hong and three anonymous reviewers for their very helpful comments, and to thank very much Peter Dayan for editing the English of the manuscript.

Footnotes

The authors have declared that no competing interests exist.

The authors were funded by the Gatsby Charitable Foundation, Tsinghua University 985 fund, and National Science Foundation of China grant 60675029. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Aertsen AM, Johannesma PI. The spectro-temporal receptive field. A functional characteristic of auditory neurons. Biol Cybern. 1981;42:133–43. doi: 10.1007/BF00336731. [DOI] [PubMed] [Google Scholar]
  • 2.Escabi MA, Schreiner CE. Nonlinear spectrotemporal sound analysis by neurons in the auditory midbrain. J Neurosci. 2002;22:4114–4131. doi: 10.1523/JNEUROSCI.22-10-04114.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Klein DJ, Depireux DA, Simon JZ, Shamma SA. Robust spectrotemporal reverse correlation for the auditory system: optimizing stimulus design. J Comput Neurosci. 2000;9:85–111. doi: 10.1023/a:1008990412183. [DOI] [PubMed] [Google Scholar]
  • 4.Theunissen FE, Sen K, Doupe AJ. Spectral-temporal receptive fields of nonlinear auditory neurons obtained using natural sounds. J Neurosci. 2000;20:2315–31. doi: 10.1523/JNEUROSCI.20-06-02315.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Eggermont JJ, Johannesma PIM, Aertsen AMHJ. Reverse-correlation methods in auditory research. Q Rev Biophys. 1983;16:341–414. doi: 10.1017/s0033583500005126. [DOI] [PubMed] [Google Scholar]
  • 6.Eggermont JJ, Aertsen AMHJ, Johannesma PIM. Quantitative characterisation procedure for auditory neurons based on the spectro-temporal receptive field. Hearing Res. 1983a;10:167–190. doi: 10.1016/0378-5955(83)90052-7. [DOI] [PubMed] [Google Scholar]
  • 7.Lesica NA, Grothe B. Dynamic spectrotemporal feature selectivity in the auditory midbrain. J Neurosci. 2008;28:5412–5421. doi: 10.1523/JNEUROSCI.0073-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Eggermont JJ, Aertsen AMHJ, Johannesma PIM. Prediction of the responses of auditory neurons in the midbrain of the grass frog based on the spectro-temporal receptive field. Hearing Res. 1983b;10:191–202. doi: 10.1016/0378-5955(83)90053-9. [DOI] [PubMed] [Google Scholar]
  • 9.Christianson GB, Sahani M, Linden JF. The consequences of response nonlinearities for interpretation of spectrotemporal receptive fields. J Neurosci. 2008;28:446–455. doi: 10.1523/JNEUROSCI.1775-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Gourevitch B, Norena A, Shaw G, Eggermont JJ. Spectrotemporal receptive fields in anesthetized cat primary auditory cortex are context dependent. Cereb Cortex. 2008;19:1448–1461. doi: 10.1093/cercor/bhn184. [DOI] [PubMed] [Google Scholar]
  • 11.Woolley SMN, Gill PR, Theunissen FE. Stimulus-dependent auditory tuning results in synchronous population coding of vocalizations in the songbird midbrain. J Neurosci. 2006;26:2499–2512. doi: 10.1523/JNEUROSCI.3731-05.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Yu JJ, Young ED. Linear and nonlinear pathways of spectral information transmission in the cochlear nucleus. P Natl Acad Sci U S A. 2000;97:11780–11785. doi: 10.1073/pnas.97.22.11780. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Young ED, Oertel D. Shepherd G, editor. The cochlear nucleus. Synaptic Organization of the Brain, Oxford Press, chapter 4. 5 edition. 2003. pp. 125–164.
  • 14.Nagel KI, Doupe AJ. Organizing principles of spectro-temporal encoding in the avian primary auditory area field L. Neuron. 2008;58:938–955. doi: 10.1016/j.neuron.2008.04.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Kim PJ, Young ED. Comparative analysis of spectro-temporal receptive fields, reverse correlation functions, and frequency tuning curves of auditory-nerve fibers. J Acoust Soc Am. 1994;95:410. doi: 10.1121/1.408335. [DOI] [PubMed] [Google Scholar]
  • 16.Versnel H, Zwiers MP, van Opstal AJ. Spectrotemporal response properties of inferior colliculus neurons in alert monkey. J Neurosci. 2009;29:9725–9739. doi: 10.1523/JNEUROSCI.5459-08.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Shamma SA, Versnel H. Ripple analysis in ferret primary auditory cortex. II. Prediction of unit responses to arbitrary spectral profiles. Audit Neurosci. 1995;1:255–270. [Google Scholar]
  • 18.Kowalski N, Depireux DA, Shamma SA. Analysis of dynamic spectra in ferret primary auditory cortex. II. Prediction of unit responses to arbitrary dynamic spectra. J Neurophysiol. 1996;76:3524. doi: 10.1152/jn.1996.76.5.3524. [DOI] [PubMed] [Google Scholar]
  • 19.Depireux DA, Simon JZ, Klein DJ, Shamma SA. Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. J Neurophysiol. 2001;85:1220–1234. doi: 10.1152/jn.2001.85.3.1220. [DOI] [PubMed] [Google Scholar]
  • 20.Schnupp JWH, Mrsic-Flogel TD, King AJ. Linear processing of spatial cues in primary auditory cortex. Nature. 2001;414:200–204. doi: 10.1038/35102568. [DOI] [PubMed] [Google Scholar]
  • 21.Nelken I, Bar-Yosef O. Neurons and objects: the case of auditory cortex. Front Neurosci. 2008;2:107–113. doi: 10.3389/neuro.01.009.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Barbour DL, Wang X. Contrast tuning in auditory cortex. Science. 2003;299:1073–1075. doi: 10.1126/science.1080425. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Ahrens MB, Linden JF, Sahani M. Nonlinearities and contextual influences in auditory cortical responses modeled with multilinear spectrotemporal methods. J Neurosci. 2008;28:1929–1942. doi: 10.1523/JNEUROSCI.3377-07.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Lewicki MS. Efficient coding of natural sounds. Nat Neurosci. 2002;5:356–363. doi: 10.1038/nn831. [DOI] [PubMed] [Google Scholar]
  • 25.Smith EC, Lewicki MS. Efficient auditory coding. Nature. 2006;439:978–982. doi: 10.1038/nature04485. [DOI] [PubMed] [Google Scholar]
  • 26.Lesica NA, Grothe B. Efficient temporal processing of naturalistic sounds. PLoS One. 2008;3:e1655. doi: 10.1371/journal.pone.0001655. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Barlow HB. Possible principles underlying the transformation of sensory messages. In: Rosenblith WA, editor. Sensory Communication. Cambridge, MA: MIT Press; 1961. pp. 217–234. [Google Scholar]
  • 28.Laughlin S. A simple coding procedure enhances a neuron's information capacity. Z Naturforsch C. 1981;36:910–912. [PubMed] [Google Scholar]
  • 29.Srinivasan MV, Laughlin SB, Dubs A. Predictive coding: a fresh view of inhibition in the retina. P Roy Soc Lond B Bio. 1982;216:427–459. doi: 10.1098/rspb.1982.0085. [DOI] [PubMed] [Google Scholar]
  • 30.Linsker R. Perceptual neural organization: some approaches based on network models and information theory. Annu Rev Neurosci. 1990;13:257–281. doi: 10.1146/annurev.ne.13.030190.001353. [DOI] [PubMed] [Google Scholar]
  • 31.Atick JJ, Redlich AN. Towards a theory of early visual processing. Neural Comput. 1990;2:308–320. [Google Scholar]
  • 32.Atick JJ. Could information theory provide an ecological theory of sensory processing? Network- Comp Neural. 1992;3:213–251. doi: 10.3109/0954898X.2011.638888. [DOI] [PubMed] [Google Scholar]
  • 33.van Hateren JH. A theory of maximizing sensory information. Biol Cybern. 1992;68:23–9. doi: 10.1007/BF00203134. [DOI] [PubMed] [Google Scholar]
  • 34.Zhaoping L. Theoretical understanding of the early visual processes by data compression and data selection. Network-Comp Neural. 2006;17:301–334. doi: 10.1080/09548980600931995. [DOI] [PubMed] [Google Scholar]
  • 35.Nelken I, Rotman Y, Yosef OB. Responses of auditory-cortex neurons to structural features of natural sounds. Nature. 1999;397:154–157. doi: 10.1038/16456. [DOI] [PubMed] [Google Scholar]
  • 36.Li Z, Atick JJ. Toward a theory of the striate cortex. Neural Comput. 1994;6:127–146. [Google Scholar]
  • 37.Petrov Y, Zhaoping L. Local correlations, information redundancy, and sufficient pixel depth in natural images. J Opt Soc Am A. 2003;20:56–66. doi: 10.1364/josaa.20.000056. [DOI] [PubMed] [Google Scholar]
  • 38.Hosseini R, Sinz F, Bethge M. Lower bounds on the redundancy of natural images. Vision Res. 2010;50:2213–2222. doi: 10.1016/j.visres.2010.07.025. [DOI] [PubMed] [Google Scholar]
  • 39.Field DJ. Relations between the statistics of natural images and the response properties of cortical cells. J Opt Soc Am A. 1987;4:2379–2394. doi: 10.1364/josaa.4.002379. [DOI] [PubMed] [Google Scholar]
  • 40.Kersten D. Predictability and redundancy of natural images. J Opt Soc Am A. 1987;4:2395–2400. doi: 10.1364/josaa.4.002395. [DOI] [PubMed] [Google Scholar]
  • 41.Ruderman DL, Bialek W. Statistics of natural images: Scaling in the woods. Phys Rev Lett. 1994;73:814–817. doi: 10.1103/PhysRevLett.73.814. [DOI] [PubMed] [Google Scholar]
  • 42.Reinagel P, Zador AM. Natural scene statistics at the centre of gaze. Network-Comp Neural. 1999;10:341–350. [PubMed] [Google Scholar]
  • 43.Daugman JG. Entropy reduction and decorrelation in visual coding by oriented neural receptive fields. IEEE T Bio-Med Eng. 1989;36:107–114. doi: 10.1109/10.16456. [DOI] [PubMed] [Google Scholar]
  • 44.Atick JJ, Li Z, Redlich AN. Understanding retinal color coding from first principles. Neural Comput. 1992;4:559–572. [Google Scholar]
  • 45.Atick JJ, Li Z, Redlich AN. What does post-adaptation color appearance reveal about cortical color representation? Vision Res. 1993;33:123–129. doi: 10.1016/0042-6989(93)90065-5. [DOI] [PubMed] [Google Scholar]
  • 46.Li Z, Atick JJ. Efficient stereo coding in the multiscale representation. Network-Comp Neural. 1994;5:157–174. [Google Scholar]
  • 47.Zhaoping L. Understanding ocular dominance development from binocular input statistics. In: Bower J, editor. Proceeding of Computational Neuroscience Conference. Monterey, California: Kluwer Academic Publishers; 1995. pp. 397–402. [Google Scholar]
  • 48.Chechik G, Anderson MJ, Bar-Yosef O, Young ED, Tishby N, et al. Reduction of information redundancy in the ascending auditory pathway. Neuron. 2006;51:359–68. doi: 10.1016/j.neuron.2006.06.030. [DOI] [PubMed] [Google Scholar]
  • 49.Shannon CE. A mathematical theory of communication. Bell Syst Tech J. 1948;27 [Google Scholar]
  • 50.Levy WB, Baxter RA. Energy efficient neural codes. Neural Comput. 1996;8:531–543. doi: 10.1162/neco.1996.8.3.531. [DOI] [PubMed] [Google Scholar]
  • 51.Atick JJ, Redlich AN. What does the retina know about natural scenes? Neural Comput. 1992;4:196–210. [Google Scholar]
  • 52.Barlow HB, Fitzhugh R, Kuffler SW. Change of organization in the receptive fields of the cat's retina during dark adaptation. J Physiol-London. 1957;137:338–354. doi: 10.1113/jphysiol.1957.sp005817. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Glasberg BR, Moore BCJ. Derivation of auditory filter shapes from notched-noise data. Hearing Res. 1990;47:103–138. doi: 10.1016/0378-5955(90)90170-t. [DOI] [PubMed] [Google Scholar]
  • 54.Escabi MA, Miller LM, Read HL, Schreiner CE. Naturalistic auditory contrast improves spectrotemporal coding in the cat inferior colliculus. J Neurosci. 2003;23:11489–11504. doi: 10.1523/JNEUROSCI.23-37-11489.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Gill P, Zhang J, Woolley SMN, Fremouw T, Theunissen FE. Sound representation methods for spectro-temporal receptive field estimation. J Comput Neurosci. 2006;21:5–20. doi: 10.1007/s10827-006-7059-4. [DOI] [PubMed] [Google Scholar]
  • 56.Young ED, Calhoun BM. Nonlinear modeling of auditory-nerve rate responses to wideband stimuli. J Neurophysiol. 2005;94:4441–4454. doi: 10.1152/jn.00261.2005. [DOI] [PubMed] [Google Scholar]
  • 57.Oppenheim AV, Willsky AS, Nawab SH. Signals and systems. Prentice Hall, 2 edition; 1997. [Google Scholar]
  • 58.Nagel KI, Doupe AJ. Temporal processing and adaptation in the songbird auditory forebrain. Neuron. 2006;51:845–859. doi: 10.1016/j.neuron.2006.08.030. [DOI] [PubMed] [Google Scholar]
  • 59.Rodriguez FA, Chen C, Read HL, Escabi MA. Neural modulation tuning characteristics scale to efficiently encode natural sound statistics. J Neurosci. 2010;30:15969–15980. doi: 10.1523/JNEUROSCI.0966-10.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Robinson B, McAlpine D. Gain control mechanisms in the auditory pathway. Curr Opin Neurobiol. 2009;19:402–407. doi: 10.1016/j.conb.2009.07.006. [DOI] [PubMed] [Google Scholar]
  • 61.Laughlin SB, Hardie RC. Common strategies for light adaptation in the peripheral visual systems of fly and dragonfly. J Comp Physiol A. 1978;128:319–340. [Google Scholar]
  • 62.Rieke F. Temporal contrast adaptation in salamander bipolar cells. J Neurosci. 2001;21:9445–9454. doi: 10.1523/JNEUROSCI.21-23-09445.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Kim KJ, Rieke F. Temporal contrast adaptation in the input and output signals of salamander retinal ganglion cells. J Neurosci. 2001;21:287–299. doi: 10.1523/JNEUROSCI.21-01-00287.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Le Beau FE, Rees A, Malmierca MS. Contribution of gaba-and glycine-mediated inhibition to the monaural temporal response properties of neurons in the inferior colliculus. J Neurophysiol. 1996;75:902–919. doi: 10.1152/jn.1996.75.2.902. [DOI] [PubMed] [Google Scholar]
  • 65.Caspary DM, Palombi PS, Hughes LF. Gabaergic inputs shape responses to amplitude modulated stimuli in the inferior colliculus. Hearing Res. 2002;168:163–173. doi: 10.1016/s0378-5955(02)00363-5. [DOI] [PubMed] [Google Scholar]
  • 66.Guinan JJ., Jr Olivocochlear efferents: anatomy, physiology, function, and the measurement of efferent effects in humans. Ear Hearing. 2006;27:589–607. doi: 10.1097/01.aud.0000240507.83072.e7. [DOI] [PubMed] [Google Scholar]
  • 67.Wark B, Lundstrom BN, Fairhall A. Sensory adaptation. Curr Opin Neurobiol. 2007;17:423–429. doi: 10.1016/j.conb.2007.07.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Dean I, Robinson BL, Harper NS, McAlpine D. Rapid neural adaptation to sound level statistics. J Neurosci. 2008;28:6430–6438. doi: 10.1523/JNEUROSCI.0470-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Dean I, Harper NS, McAlpine D. Neural population coding of sound level adapts to stimulus statistics. Nat Neurosci. 2005;8:1684–1689. doi: 10.1038/nn1541. [DOI] [PubMed] [Google Scholar]
  • 70.Ulanovsky N, Las L, Farkas D, Nelken I. Multiple time scales of adaptation in auditory cortex neurons. J Neurosci. 2004;24:10440–10453. doi: 10.1523/JNEUROSCI.1905-04.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Ulanovsky N, Las L, Nelken I. Processing of low-probability sounds by cortical neurons. Nat Neurosci. 2003;6:391–398. doi: 10.1038/nn1032. [DOI] [PubMed] [Google Scholar]
  • 72.Olshausen BA, Field DJ. Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision Res. 1997;37:3311–3325. doi: 10.1016/s0042-6989(97)00169-7. [DOI] [PubMed] [Google Scholar]
  • 73.Bell AJ, Sejnowski TJ. The “independent components” of natural scenes are edge filters. Vision Res. 1997;37:3327–3338. doi: 10.1016/s0042-6989(97)00121-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Klein DJ, König P, Körding KP. Sparse spectrotemporal coding of sounds. EURASIP J Appl Sig P. 2003;7:659–667. [Google Scholar]
  • 75.Greene G, Barrett DGT, Sen K, Houghton C. Sparse coding of birdsong and receptive field structure in songbirds. Network-Comp Neural. 2009;20:162–177. doi: 10.1080/09548980903108267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Nelken I. Processing of complex stimuli and natural scenes in the auditory cortex. Curr Opin Neurobiol. 2004;14:474–480. doi: 10.1016/j.conb.2004.06.005. [DOI] [PubMed] [Google Scholar]
  • 77.Wang K, Shamma SA. Spectral shape analysis in the central auditory system. IEEE T Speech Audi P. 1995;3:382–395. [Google Scholar]
  • 78.Schreiner CE, Read HL, Sutter ML. Modular organization of frequency integration in primary auditory cortex. Annu Rev Neurosci. 2000;23:501–529. doi: 10.1146/annurev.neuro.23.1.501. [DOI] [PubMed] [Google Scholar]
  • 79.Shamma SA. On the role of space and time in auditory processing. Trends Cogn Sci. 2001;5:340–348. doi: 10.1016/s1364-6613(00)01704-6. [DOI] [PubMed] [Google Scholar]
  • 80.Fritz JB, Elhilali M, David SV, Shamma SA. Auditory attention-focusing the searchlight on sound. Curr Opin Neurobiol. 2007;17:437–455. doi: 10.1016/j.conb.2007.07.011. [DOI] [PubMed] [Google Scholar]

Articles from PLoS Computational Biology are provided here courtesy of PLOS

RESOURCES