Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2013 Jan 31;9(1):e1002889. doi: 10.1371/journal.pcbi.1002889

Temporal Adaptation Enhances Efficient Contrast Gain Control on Natural Images

Fabian Sinz 1,*, Matthias Bethge 1,2,3
Editor: Laurence T Maloney4
PMCID: PMC3561086  PMID: 23382664

Abstract

Divisive normalization in primary visual cortex has been linked to adaptation to natural image statistics in accordance to Barlow's redundancy reduction hypothesis. Using recent advances in natural image modeling, we show that the previously studied static model of divisive normalization is rather inefficient in reducing local contrast correlations, but that a simple temporal contrast adaptation mechanism of the half-saturation constant can substantially increase its efficiency. Our findings reveal the experimentally observed temporal dynamics of divisive normalization to be critical for redundancy reduction.

Author Summary

The redundancy reduction hypothesis postulates that neural representations adapt to sensory input statistics such that their responses become as statistically independent as possible. Based on this hypothesis, many properties of early visual neurons—like orientation selectivity or divisive normalization—have been linked to natural image statistics. Divisive normalization, in particular, models a widely observed neural response property: The divisive inhibition of a single neuron by a pool of others. This mechanism has been shown to reduce the redundancy among neural responses to typical contrast dependencies in natural images. Here, we show that the standard model of divisive normalization achieves substantially less redundancy reduction than a theoretically optimal mechanism called radial factorization. On the other hand, we find that radial factorization is inconsistent with existing neurophysiological observations. As a solution we suggest a new physiologically plausible modification of the standard model which accounts for the dynamics of the visual input by adapting to local contrasts during fixations. In this way the dynamic version of the standard model achieves almost optimal redundancy reduction performance. Our results imply that the dynamics of natural viewing conditions are critical for testing the role of divisive normalization for redundancy reduction.

Introduction

It is a long-standing hypothesis that the computational goal of the early visual processing stages is to reduce redundancies which are abundantly present in natural sensory signals [1], [2]. Redundancy reduction is a general information theoretic principle that plays an important role for many possible goals of sensory systems like maximizing the amount of information between stimulus and neural response [3], obtaining a probabilistic model of sensory signals [4], or learning a representation of hidden causes [3], [5]. For a population of neurons, redundancy reduction predicts that neuronal responses should be made as statistically independent from each other as possible [2].

Many prominent neural response properties such as receptive field structure or contrast gain control have been linked to redundancy reduction on natural images [2]. While an appropriate structure of linear receptive fields can always remove all redundancies caused by second order correlations, they have only little effect on the reduction of higher order statistical dependencies [6], [7]. However, one of the most prominent contrast gain control mechanisms—divisive normalization—has been demonstrated to reduce higher order correlations on natural images and sound [8][10]. Its central mechanism is a divisive rescaling of a single neuron's activity by that of a pool of other neurons [8, see also Figure 1a].

Recently, radial factorization and radial Gaussianization have been derived independently by [11] and [12], respectively, based on Barlow's redundancy reduction principle [1]. Both mechanisms share with divisive normalization the two main functional components, linear filtering and rescaling and have been shown to be the unique and optimal redundancy reduction mechanism for this class of transformations under certain symmetry assumptions for the data. Radial factorization is optimal for a more general symmetry class than radial Gaussianization [11], [13] and contains radial Gaussianization as a special case. As a consequence, radial factorization can achieve slightly better redundancy reduction for natural images than radial Gaussianization but the advantage is very small.

Here, we compare the redundancy reduction performance of divisive normalization to that of radial factorization in order to see to what extent divisive normalization can serve the goal of redundancy reduction. Our comparison shows that a non-adapting static divisive normalization is not powerful enough to capture the contrast dependencies of natural images. Furthermore, we show that (i) the shape of contrast response curves predicted by radial factorization is not consistent with that found in physiological recordings, and (ii) that for a static divisive normalization mechanism this inconsistency is a necessary consequence of strong redundancy reduction. Finally, we demonstrate that a dynamic adaptation of the half-saturation constant in divisive normalization may provide a physiologically plausible mechanism that can achieve close to optimal performance. Our proposed adaptation mechanism works via horizontal shifts of the contrast response curve along the log-contrast axis. Such shifts have been observed in experiments in response to a change of the ambient contrast level [14].

Results

Measures, Models, Mechanisms

We now briefly introduce divisive normalization, radial factorization, and the information theoretic measure of redundancy used in this study.

Redundancy reduction and multi-information

We consider a population of sensory neurons that transforms natural image patches Inline graphic into a set of neural activities Inline graphic or Inline graphic. We always use Inline graphic to denote responses to linear filters, and Inline graphic for the output of divisive normalization or radial factorization. The goal of redundancy reduction is to remove statistical dependencies between the single coefficients of Inline graphic or Inline graphic.

Redundancy is quantified by the information theoretic measure called multi-information

graphic file with name pcbi.1002889.e008.jpg (1)

which measures how much the representation differs from having independent components. More precisely, the multi-information is the Kullback-Leibler divergence between the joint distribution and the product of its marginals or, equivalently, the difference between the sum of the marginal entropies and the joint entropy. In case of Inline graphic this equals the better known mutual information. If the different entries of Inline graphic are independent, then its joint distribution equals the product of the single marginals or–-equivalently–-the joint entropy equals the sum of the marginal entropies. Thus, the multi-information is zero if and only if the different dimensions of the random vector Inline graphic are independent, and positive otherwise. In summary, the multi-information measures all kinds of statistical dependencies among the single coefficients of a random vector. In the Methods Section, we describe how we estimate the multi-information for the various signals considered here.

Divisive normalization

From all existing divisive normalization models considered previously in the literature, ours is most closely related to the one used by Schwartz and Simoncelli [9]. It consists of two main components: a linear filtering step and a rescaling step based on the Euclidean norm of the filter responses

graphic file with name pcbi.1002889.e012.jpg (2)

While the linear filters Inline graphic capture the receptive field properties, the rescaling step captures the nonlinear interactions between the single neurons. Most divisive normalization models use filters Inline graphic that resemble the receptive fields of complex cells [9], [15], [16]. Therefore, we use filters obtained from training an Independent Subspace Analysis (ISA) on a large collection of randomly sampled image patches [15], [16, see also Methods]. ISA can be seen as a redundancy reduction transform whose outputs are computed by the complex cell energy model [17], [18]. For this study, the algorithm has the advantage that it not only yields complex cell-like filter shapes, but also ensures that single filter responses Inline graphic are decorrelated and already optimized for statistical independence. This ensures that the redundancies removed by divisive normalization and radial factorization are the ones that cannot be removed by the choice of linear filters [7], [19].

Several divisive normalization models exist in the literature. They differ, for instance, by whether a unit Inline graphic is contained in its own normalization pool, or in the exact form of the rescaling function Inline graphic also known as Naka-Rushton function. From the viewpoint of redundancy reduction, the former distinction between models is irrelevant because the influence of a single unit on its normalization pool can always be removed by the elementwise invertible transformation Inline graphic which does not change the redundancies between the responses [20] (the multi-information is invariant with respect to elementwise invertible transformations). Sometimes, a more general form of the Naka-Rushton function is found in the literature which uses different types of exponents

graphic file with name pcbi.1002889.e019.jpg (3)

The divisive normalization model considered in this study (equation (2)) differs from this more general version by the type of the norm used for rescaling the single responses: Where equation (3) uses the Inline graphic-norm Inline graphic we use the Euclidean norm. Because radial factorization is defined for the more general Inline graphic-norm (see Methods), all analyses in this paper could be carried out for this more general transform. However, we instead chose to use the Euclidean norm for simplicity and to make our model more comparable to the ones most commonly used in redundancy reduction studies of divisive normalization [9], [20][22].

Also note that the Naka-Rushton function is often defined as the Inline graphicth power of equation (3). However, the form of equation (3) is more common in redundancy reduction studies in order to maintain the sign of Inline graphic. We mention the consequences of this choice in the discussion.

Radial factorization

Radial factorization is an optimal radial rescaling for redundancy reduction. We will now briefly introduce radial factorization starting from divisive normalization. For more mathematical details see the Methods Section.

On a population level, the rescaling step of divisive normalization is a nonlinear mapping that changes the Euclidean radius of the filter response population. This can be seen by decomposing divisive normalization into two multiplicative terms

graphic file with name pcbi.1002889.e025.jpg (4)

The second term normalizes the response vector Inline graphic to length one while the Naka-Rushton function in the first term determines the new radius. Since the rescaling Inline graphic depends only on the norm, the new radius does not depend on any specific direction of Inline graphic.

The redundancy between the coefficients of Inline graphic is determined by three factors: The statistics of natural image patches Inline graphic which—together with the choice of filters Inline graphic—determine the statistics of Inline graphic, and the radial transformation Inline graphic. If we allow the radial transformation to be a general invertible transform Inline graphic on the Euclidean norm, we can now ask how the different model components can be chosen in order to minimize the redundancy in Inline graphic.

A substantial part of the redundancies in natural images are second order correlations, which can be removed by linear filters during whitening [6]. Whitening does not completely determine the filters since the data can always be rotated afterwards and still stay decorrelated. Higher order decorrelation algorithms like independent component analysis use this rotational degree of freedom to decrease higher order dependencies in the filter responses Inline graphic [3]. However, there is no set of filters that could remove all statistical dependencies from natural images [6], [7], because whitened natural images exhibit an approximately spherical but non-Gaussian joint distribution [7], [21], [23], [24]. Since spherical symmetry is invariant under rotation and because the only spherically symmetric factorial distribution is the Gaussian distribution [13], [25], the marginals cannot be independent.

Hence, the remaining dependencies must be removed by nonlinear mechanisms like an appropriate radial transformation Inline graphic. Fortunately, the joint spherically symmetric distribution of the filter responses Inline graphic already dictates a unique and optimal way to choose Inline graphic: Since a rescaling with Inline graphic will necessarily result in a spherically symmetric distribution again, Inline graphic must be chosen such that Inline graphic is jointly Gaussian distributed. Therefore, we need to choose Inline graphic such that Inline graphic follows the radial distribution of a Gaussian or, in other words, a Inline graphic-distribution. This is a central point for our study: For a spherically symmetric distribution the univariate distribution on Inline graphic determines higher order dependencies in the multi-variate joint distribution of Inline graphic. This means that if we restrict ourselves to radial transformations, it is sufficient to look at radial distributions only. The fact that the Gaussian is the only spherically symmetric factorial distribution implies that the coefficients in Inline graphic can only be statistically independent if Inline graphic follows radial Inline graphic-distribution. Radial factorization finds a transformation Inline graphic which achieves exactly that by using histogram equalization on the distribution of Inline graphic [11], [12, see also Methods]. All these considerations also hold for Inline graphic-spherically symmetric distributions [11], [13].

Note that this does not imply that the neural responses Inline graphic must follow a Gaussian distribution if they are to be independent because the distribution of the single responses Inline graphic can always be altered by applying an elementwise invertible transformation Inline graphic without changing the redundancy. The above considerations only mean that given the two main model components of divisive normalization (and the assumption of spherical symmetry), the best we can do is to choose the Inline graphic to be whitening filters and Inline graphic according to radial factorization.

Radial factorization and divisive normalization are not equivalent

The goal of this study is to compare the redundancy reduction achieved by divisive normalization and radial factorization. Apart from all similarities between the two models, there is a profound mathematical difference showing that the two mechanisms are not equivalent (as noted by [12]).

Both mechanisms have the form

graphic file with name pcbi.1002889.e059.jpg

However, the radial rescalings of radial factorization and that of divisive normalization, Inline graphic and Inline graphic, have a different range. Since the Inline graphic-distribution is non-zero on all of Inline graphic the range of Inline graphic must be Inline graphic as well. However, in case of divisive normalization, the Naka-Rushton function Inline graphic saturates at Inline graphic. This means that Inline graphic can never transform a radial distribution into a Inline graphic-distribution since values beyond Inline graphic cannot be reached.

While this implies that the two mechanisms are mathematically not equivalent, it could still be that they perform similarly on data if the probability mass of the Inline graphic-distribution in the range beyond Inline graphic is small. Therefore, we choose Inline graphic to be the Inline graphic quantile of the Inline graphic-distribution in all our experiments (see Methods).

Comparison of the redundancy reduction performance

We compared the amount of redundancy removed by divisive normalization and radial factorization by measuring the multi-information in the plain filter responses Inline graphic and the normalized responses Inline graphic for a large collection of natural image patches (Figure 1b). In both cases the parameters of the radial transformation were chosen to yield the best possible redundancy reduction performance (see Methods). While both divisive normalization and radial factorization remove variance correlations (Figure 1a), the residual amount of dependencies for divisive normalization is still approximately Inline graphic of the total redundancies removed by radial factorization (Figure 1a–b). This demonstrates that divisive normalization is not optimally tailored to the statistics of natural images.

Figure 1. Redundancy reduction and radial distributions for different normalization models.

Figure 1

A: Divisive normalization model used in this study: Natural image patches are linearly filtered. These responses are nonlinearly transformed by divisive normalization or radial factorization (see text). After linear filtering the width of the conditional distribution Inline graphic of two filter responses depends on the value of Inline graphic (conditional log-histograms as contour plots). This demonstrates the presence of variance correlations. These dependencies are decreased by divisive normalization and radial factorization. B: Redundancy measured by multi-information after divisive normalization, extended divisive normalization, and radial factorization: divisive normalization leaves a substantial amount of residual redundancy (error bars show standard deviation over different datasets). C: Distributions on the norm of the filter responses Inline graphic for which divisive normalization (red) and extended divisive normalization (blue) are the optimal redundancy reducing mechanisms. The radial transformation of radial factorization and its corresponding distribution (mixture of five Inline graphic-distributions) is shown in black. While radial factorization (inset, black curve) and extended divisive normalization (inset, blue curve) achieve good redundancy reduction, they lead to physiologically implausibly shaped contrast response curves which are mainly determined by their respective radial transformations Inline graphic shown in the inset. The radial transformation of divisive normalization is shown for comparison (inset, red curve).

To understand this in more detail, we derived the distribution that Inline graphic should have if divisive normalization were the optimal redundancy reducing mechanism and compared it to the empirical radial distribution of Inline graphic represented by a large collection of uniformly sampled patches from natural images. This optimal distribution for divisive normalization can be derived by transforming a Inline graphic-distributed random variable with Inline graphic (see Methods). Since Inline graphic has limited range Inline graphic we actually have to use a Inline graphic-distribution which is truncated at Inline graphic. The parametric form of the resulting distribution is given in the Methods Section. We refer to is as Naka-Rushton distribution in the following. The parameters of the Naka-Ruston distribution are Inline graphic and Inline graphic. Since Inline graphic is already determined by fixing the range of Inline graphic to the Inline graphic quantile of the Inline graphic-distribution, the remaining free parameter is Inline graphic. In the Naka-Rushton function Inline graphic this parameter is called half-saturation constant and controls the horizontal position of the contrast response curve in model neurons.

We fitted Inline graphic via maximum likelihood (see Methods) and found that even for the best fitting Inline graphic there is a pronounced mismatch between the Naka-Rushton distribution and the empirical distribution given by the histogram (Figure 1c). This explains the insufficient redundancy reduction because the Naka-Rushton distribution expects most of the responses Inline graphic to fall into a much narrower range than responses to natural images do in reality. The Naka-Rushton function Inline graphic would map the red radial density in Figure 1c perfectly into a truncated Inline graphic-distribution. However, it maps a profound part of the true radial distribution of Inline graphic (gray histogram) close to Inline graphic, since this part is located to the right of the mode of the Naka-Rushton distribution where it expects almost no probability mass. Additionally, the Naka-Rushton distribution exhibits a small gap of almost zero probability around zero. This gap, however, also contains a portion of empirical distribution. This part gets mapped close to zero. To understand why this leaves significant redundancies, imagine the most extreme case in which all the probability mass of Inline graphic would either be mapped onto Inline graphic or on onto Inline graphic. The corresponding distribution on Inline graphic would consist of a point mass at zero and a spherical shell at Inline graphic. Such a distribution would clearly exhibit strong dependencies.

Augmenting divisive normalization by more parameters

It is clear that the suboptimal redundancy reduction performance of divisive normalization is due to its restricted parametric form. Therefore, we explored two options how to increase its degrees of freedom and thereby its redundancy reduction performance: the first option endows static divisive normalization with additional parameters Inline graphic, the second option allows for a dynamic temporal adaptation of Inline graphic.

The simplest way to increase the degrees of freedom in divisive normalization is to introduce two additional parameters in the Naka-Rushton function

graphic file with name pcbi.1002889.e114.jpg

These parameters allow for more flexibility in the scale and shape of the corresponding Naka-Rushton distribution. We label all models that use this parametrization as extended in the following. Note that the extended Naka-Rushton function only saturates for Inline graphic. This means that it could in principle transform Inline graphic into Inline graphic such that Inline graphic is Inline graphic-distributed. For Inline graphic and Inline graphic, the original Naka-Rushton function is recovered. As before, we derived the corresponding extended Naka-Rushton distribution by transforming a (truncated) Inline graphic-distributed random variable with Inline graphic. We fitted the resulting distribution to a large collection of Inline graphic, used the maximum likelihood parameters for extended divisive normalization, and measured the redundancy via multi-information in the resulting normalized responses Inline graphic.

We found that an extended divisive normalization transform achieves substantially more redundancy reduction and that the extended Naka-Rushton distribution on Inline graphic fits the image data significantly better (Figure 1b–c). However, we also find that the best extended Naka-Rushton function for redundancy reduction would yield biologically implausible contrast response curves which capture the firing rate of a neuron upon stimulation with gratings of different contrast at the neuron's preferred spatial frequency and orientation.

In the divisive normalization and the radial factorization model, the shape of the contrast response curve is determined by the shape of the radial rescaling function (Figure 1c, inset) [8]. In contrast to the normal Naka-Rushton function (Figure 1c, inset, red curve), the extended version (Figure 1c, inset, blue curve) exhibits a physiologically unreasonable shape: it starts at a non-zero value, increases without saturation, and does not resemble any sigmoidal shape at all. The non-zero level for low contrasts is a direct consequence of the optimization for redundancy reduction: redundancy reduction implies that the target radial distribution is a (truncated) Inline graphic-distribution which has only very little probability mass close to zero. Therefore, the radial rescaling function must map the substantial portion of low contrast values in the empirical distribution upwards in order to match the Inline graphic-distribution. This results in the immediate non-zero onset. This is a pronounced mismatch to the typical contrast response curves measured in cortical neurons (see Figure 2 in [14]). In fact, the addition of more parameters merely leads to a contrast response curve which is more similar to radial factorization (Figure 1, inset, black) which does not have a plausible shape, too. Therefore, we dismiss the option of adding more parameters to the Naka-Rushton function and turn to the option in which Inline graphic is allowed to dynamically adapt to the ambient contrast level.

Figure 2. Simulated eye movements and adapted contrast distributions.

Figure 2

A: Simulated eye movements on a image from the van Hateren database [31]. Local microsaccades are simulated with Brownian motion with a standard deviation of Inline graphicpx. In this example, Inline graphic patches are extracted around the fixation location and whitened. B: Values of Inline graphic for the extracted patches plotted along the Inline graphic-axis. Vertical offset was manually introduced for visibility. Colors match the ones in A. The different curves are the maximum likelihood Naka-Rushton distributions estimated from the data points of the same color.

Dynamic divisive normalization

Previous studies found that single neurons adapt to the ambient contrast level via horizontal shifts of their contrast response curve along the log-contrast axis [8], [14]. In the divisive normalization model, this shift is realized by changes in the half-saturation constant Inline graphic. This means, however, that there is not a single static divisive normalization mechanism, but a whole continuum whose elements differ by the value of Inline graphic (Figure 2). This is equivalent to a continuum of Naka-Rushton distributions which can be adapted to the ambient contrast level by changing the value of Inline graphic. Since this kind of adaptation increases the degrees of freedom, it could also lead to a better redundancy reduction performance.

In order to investigate adaptation to the local contrast in a meaningful way, we used a simple model of saccades and micro-saccades on natural images to sample fixation locations and their corresponding filter responses Inline graphic (see Methods). Previous studies on redundancy reduction with divisive normalization [9], [11], [12] ignored both the structure imposed by fixations between saccades in natural viewing conditions, and the adaptation of neural contrast response curves to the ambient contrast level via the adaptation of Inline graphic [14]. Figure 2 shows an example of simulated eye movements on a natural image from the van Hateren database. For each sample location, we computed the corresponding values of Inline graphic and fitted a Naka-Rushton distribution to it. The right hand side show the resulting Naka-Rushton distributions. One can see that the mode of the distribution shifts with the location of the data, which itself depends on the ambient contrast of the fixation location.

A dynamically adapting Inline graphic predicts that the distribution of Inline graphic across time should be well fit by a mixture of Naka-Rushton distributions. Let Inline graphic (we use Inline graphic to emphasize that the radial distribution is a univariate density and not a multivariate density on Inline graphic), then averaged over all time points Inline graphic, the distribution of Inline graphic is given by

graphic file with name pcbi.1002889.e147.jpg (5)

where Inline graphic denotes a single Naka-Rushton distribution at a specific point in time.

We fitted such a mixture distribution to samples Inline graphic from simulated eye movements (see Methods). Figure 3a shows that the mixture of Naka-Rushton distributions fits the empirical data very well, thus confirming the possibility that a dynamic divisive normalization mechanism may be used to achieve optimal redundancy reduction.

Figure 3. Radial distribution and redundancy reduction achieved by the dynamically adapting model.

Figure 3

A: Histogram of Inline graphic for natural image patches sampled with simulated eye movements: The distribution predicted by the dynamically adapting model closely matches the empirical distribution. B: Same as in Fig. 1B but for simulated eye movement data. The dynamically adapting Inline graphic achieves an almost optimal redundancy reduction performance. C: Each colored line shows the distribution of a random variable from 3A transformed with a Naka-Rushton function. Different colors correspond to different values of Inline graphic. The dashed curve corresponds to a truncated Inline graphic-distribution. A mixture of the colored distributions cannot resemble the truncated Inline graphic-distribution since there will either be peaks on the left or the right of the dashed distribution that cannot be canceled by other mixture components.

The next step is to find an explicit dynamic adaptation mechanism that can achieve optimal redundancy reduction. To this end, we sought for a way to adapt Inline graphic such that the redundancies between the output responses Inline graphic were small. Our temporally adapting mechanism chooses the current Inline graphic based on the recent stimulation history by using correlations between the contrast values at consecutive time steps. We estimated Inline graphic for the present set of filter responses Inline graphic from the immediately preceding responses Inline graphic by sampling Inline graphic from a Inline graphic-distribution whose parameters were determined by the mean and the variance of the posterior Inline graphic which was derived from the mixture distribution above (see Methods). We found that this temporal adaptation mechanism significantly decreased the amount of residual redundancies to about Inline graphic (Figure 3B). Note that the proposed mechanism is a simple heuristic that does not commit to a particular biophysical implementation of the adaptation, but it demonstrates that there is at least one mechanism that can perform well under realistic conditions a neural system would face.

Looking at the joint dynamics of Inline graphic and its Inline graphic (Figure 4) we find them to be strongly and positively correlated. Therefore, a higher value of Inline graphic is accompanied by a higher value of Inline graphic. This is analogous to the adaptation of neural contrast response curves observed in vivo where a higher contrast (higher Inline graphic) shifts the contrast response curve to the right (higher Inline graphic), and vice versa [14].

Figure 4. Dynamics of the adaptive Inline graphic.

Figure 4

The scatter plot shows the values of Inline graphic plotted against the Inline graphic used to transform Inline graphic in the dynamic divisive normalization model. The two values are clearly correlated. This indicates that the shift of the contrast response curve, which is controlled by Inline graphic, tracks the ambient contrast level, which is proportional to Inline graphic. Single elements in the plot are colored according to the quantile the value of Inline graphic falls in. When the ambient contrast level changes abruptly (e.g. when a saccade is made), this value is large. If the ambient contrast level is relatively stable (e.g. during fixation), this value is small. In those situations (blue dots), Inline graphic and Inline graphic exhibit the strongest proportionality.

In order to demonstrate that improved redundancy reduction is a true adaptation mechanism which relies on correlations between temporally subsequent sample, we need to preclude the possibility that Inline graphic can be sampled independently (i.e. context independent). For strong redundancy reduction, the normalized responses Inline graphic should follow a (possibly truncated) Inline graphic-distribution (see Methods). The history-independent choice of Inline graphic predicts that this truncated Inline graphic-distribution should be expressible as a mixture of distributions that result from transforming random variables, that follow a mixture of Naka-Rushton distributions from Figure 3C, with Naka-Rushton functions for different values of Inline graphic (see Methods for the derivation). We transformed the input distribution with Naka-Rushton functions that differed in the value of Inline graphic (Figure 3C, colored lines). Different colors in Figure 3C refer to different values of Inline graphic. If Inline graphic was history-independent, a positively weighted average of the colored distributions should be able to yield a truncated Inline graphic-distribution (Figure 3C, dashed line). It is obvious that this is not possible. Every component will either add a tail to the left of the Inline graphic-distribution or a peak to the right of it. Since distributions can only be added with non-negative weight in a mixture, there is no way that one distribution can make up for a tail or peak introduced by another. Therefore, Inline graphic cannot be chosen independently of the preceding stimulation, but critically relies on exploiting the temporal correlation structure in the input.

Discussion

In this study we have demonstrated that a static divisive normalization mechanism is not powerful enough to capture the contrast dependencies of natural images leading to a suboptimal redundancy reduction performance. Static divisive normalization could only exhibit close to optimal performance if the contrast distribution of the input data would be similar to a Naka-Rushton distribution that we derived in this paper. For the best fitting Naka-Rushton distribution, however, the interval containing most of the probability mass is too narrow and too close to zero compared to the contrast distribution empirically found for natural image patches. A divisive normalization mechanism that uses the Inline graphic-norm as in equation (3) instead of the Euclidean norm would suffer from the same problem because the Naka-Rushton distribution for Inline graphic-norms other than Inline graphic would have similar properties. However, the good performance of extended divisive normalization demonstrates that it is not necessary to model the contrast distribution perfectly everywhere but that it would be sufficient to match the range where most natural contrasts appear (Figure 1C).

Not every mapping on natural contrasts that achieves strong redundancy reduction is also physiologically plausible: We showed that the extended static mechanism yields physiologically implausible contrast response curves. Extending the static mechanism of divisive normalization for better redundancy reduction simply makes it more similar to the optimal mechanism and, therefore, yields implausible tuning curves as well. We thus suggested to consider temporal properties of divisive normalization and devised a model that can resolve this conflict by temporally adapting the half-saturation constant Inline graphic using temporal correlations between consecutive data points caused by fixations.

Another point concerning physiological plausibility is the relationship between divisive normalization models used to explain neurophysiological observations, and those used in redundancy reduction studies like ours. One very common neurophysiological model was introduced by Heeger [8] which uses half-squared instead of linear single responses:

graphic file with name pcbi.1002889.e196.jpg (6)

In order to represent each possible image patch this model would need two neurons per filter: one for the positive part and one for the negative part Inline graphic. Of course, these two units would be strongly anti-correlated since only one can be nonzero at a given point in time. Therefore, taking a redundancy reduction view requires considering the positive and the negative part. For this reason it is reasonable to use Inline graphic as the most basic unit and define the normalization as in equation (2). Since Inline graphic and Inline graphic are just two different representations of the same information, the multi-information between Inline graphic is the same as the multi-information between different tuples Inline graphic. Apart from this change of viewpoint, the two models are equivalent, because the normalized half-squared response of equation (6) can be obtained by half-squaring the normalized response of equation (2). Therefore, a model equivalent to the one in equation (6) can be obtained by using the model of equation (2) and representing its responses Inline graphic by twice as many half-squared coefficients afterwards.

Previous work on the role of contrast gain control for efficient coding has either focused on the temporal domain [26], [27], or on its role in the spatial domain as a redundancy reduction mechanism for contrast correlations in natural images [9], [11], [12]. Our results emphasize the importance of combining both approaches by showing that the temporal properties of the contrast gain control mechanism can have a critical effect on the redundancies that originate from the spatial contrast correlations in natural images. Our analysis does not commit to a certain physiological implementation or biophysical constraints, but it demonstrates that the statistics of natural images require more degrees of freedom for redundancy reduction in a population response than a classical static divisive normalization model can offer. Our heuristic mechanism demonstrates that strong redundancy reduction is possible with an adaptation mechanism that faces realistic conditions, i.e. has only access to stimuli encountered in the past.

As we showed above, biologically plausible shapes of the contrast response curve and strong redundancy reduction cannot be easily brought together in a single model. Our dynamical model offers a possible solution to this problem. To what extent this model reflects the physiological reality, however, still needs to be tested experimentally.

The first aspect to test is whether the adaptation of the half-saturation constant reflects the temporal structure imprinted by saccades and fixations as predicted by our study. Previous work has measured adaptation timescales for Inline graphic [14], [28]. However, these measurements are carried out in anesthetized animals and cannot account for eye movements. Since our adaptation mechanism mainly uses the fact that contrasts at a particular fixation location are very similar it predicts that that adaptive changes of Inline graphic should be seen from one fixation location to another when measured under natural viewing conditions.

The mechanism we proposed is only one possible candidate for a dynamic contrast gain control mechanism that can achieve strong redundancy reduction. We conclude the paper with defining a measure that can be used to distinguish contrast gain control mechanisms that are likely to achieve strong redundancy reduction from those that do not. As discussed above, a necessary condition for strong redundancy reduction is that the the location and the width of the distribution of Inline graphic implied by a model must match the distribution of unnormalized responses Inline graphic determined by the statistics of natural images. In order to measure the location and the width of the distributions in a way that does not depend on a particular scaling of the data, we plotted the median against the width of the Inline graphicInline graphic–percentile interval (Figure 5). For the empirical distributions generated by the statistics of the image data we always found a ratio greater than Inline graphic. We also included a dataset from real human eye movements by Kienzle et al. to ensure the generality of this finding [29] as real fixations could introduce a change in the statistics due to the fact that real observers tend to look at image regions with higher contrasts [30]. All models that yield strong redundancy reduction also exhibit a ratio greater than Inline graphic. Thus, the ratio of the median to the width of the contrast distribution is a simple signature that can be used to check whether an adaptation mechanism is potentially powerful enough for near-optimal redundancy reduction.

Figure 5. Median vs. width of Inline graphic to Inline graphic percentile interval of the models shown in Figure 3b.

Figure 5

The red line corresponds to a static Inline graphic for different values of Inline graphic, the blue triangles correspond to the temporally adapting Inline graphic, the orange markers correspond to uniformly sampled (diamond) and fixational image patches with Brownian motion micro-saccades (circle) from Kienzle et al. [29], the gray markers to simulated eye movement datasets from van Hateren image data [31], and the black marker to the optimal extended divisive normalization model. All transforms that yield a strong redundancy reduction have models that exhibit a ratio greater than Inline graphic (dashed lines).

Methods

The code and the data are available online under http://www.bethgelab.org/code/sinz2012.

Data

van Hateren data

For the static experiments, we used randomly sampled Inline graphic patches from the van Hateren database [31]. For all experiments we used the logarithm of the raw light intensities. We sampled Inline graphic pairs of training and test sets of Inline graphic patches which we centered on the pixel mean.

For the simulated eye movements, we also used Inline graphic pairs of training and test sets. For the sampling procedure, we repeated the following steps until Inline graphic samples were drawn: We first drew an image randomly from the van Hateren database. For each image, we simulated ten saccades to random locations in that image. For each saccade location which was uniformly drawn over the entire image, we determined the number Inline graphic of patches to be sampled from around that location by Inline graphic where Inline graphic was the assumed sampling frequency and Inline graphic was a sample from an exponential distribution with average fixation time Inline graphic (i.e. Inline graphic). The actual locations of the patches were determined by Brownian motion starting at the saccade location and then propagating with a diffusion constant of Inline graphic. This means that each patch location was drawn relative to the previous one based on an isotropic Gaussian centered at the current location with a standard deviation of Inline graphic.

Kienzle data

The van Hateren database is a standard dataset for static natural image statistics. To make sure that our results also hold for real fixations, we sampled data from the images used by Kienzle et al. [29]. We computed the Inline graphic and Inline graphic percentiles, as well as the width of the interval between them, for both datasets for Figure 5.

We constructed two datasets: One where the patches were uniformly drawn from the images, and one where we again used Brownian motion with a similar standard deviation around human fixation spots to simulate human fixational data. We applied the same preprocessing as for the van Hateren data: centering and whitening.

Models

Both the divisive normalization model and the optimal radial factorization consist of two steps: a linear filtering step and a radial rescaling step (Table 1). In the following, we describe the different steps in more detail.

Table 1. Model components of the divisive normalization and radial factorization model: Natural image patches are filtered by a set of linear oriented band-pass filters.

divisive normalization model radial factorization
filtering Inline graphic Inline graphic
normalization Inline graphic Inline graphic
(static case Inline graphic and Inline graphic)

The filter responses are normalized and their norm is rescaled in the normalization step.

Filters

The receptive fields of our model neurons, i.e. the linear filters of our models, are given by the rows of a matrix Inline graphic. In summary, the filters are obtained by (i) projecting the data onto the Inline graphic dimensional subspace that is insensitive to the DC component in the image patches, (ii) performing dimensionality reduction and whitening using principal component analysis, and (iii) training an independent subspace analysis algorithm (ISA) to obtain Inline graphic:

  1. The projection of the data onto the Inline graphic dimensional subspace that is insensitive to the DC component is achieved via the matrix Inline graphic. This matrix is a fixed matrix for which the coefficients in each row sum to zero and all rows are mutually orthogonal. The matrix we used has been obtained via a QR-decomposition as described in the Methods Section of [7].

  2. The dimensionality reduction and whitening is achieved by Inline graphic. The matrix Inline graphic contains the principal components of Inline graphic such that Inline graphic. As it is common practice, we kept only the first Inline graphic principal components to avoid “noisy” high frequency filters. However, our analysis would also be valid and lead to the same conclusions if we kept the full set of filters.

  3. The last matrix Inline graphic is constrained to be an orthogonal matrix because the covariance of whitened data remains white under orthogonal transformations. This additional degree of freedom is used by Independent Subspace Analysis (see below) to optimize the filter shapes for redundancy reduction beyond removing second-order correlations. While the matrix Inline graphic has a large effect on the particular filter shapes, the same results would have been obtained with any type of whitening filter, i.e. for any orthogonal matrix Inline graphic, because they only differ by an orthogonal rotation. Since we use the Euclidean norm in the divisive normalization model, the rotation would not change the norm of the filter responses and therefore all radial distributions would be the same. The only aspect in our analysis for which the filter choice would make a (small) difference is the multi-information of the raw filter responses. When using ICA filter, the multi-information could be a bit lower. However, since even for rather drastic changes of filter shapes (within the class of whitening filters) there is only a small effect on redundancy reduction [6], the particular choice of filter shapes does not affect any of our conclusions. The same is true for any choice of parametric filters as long as the covariance matrix of the filter responses is proportional to the identity matrix. Since the second-order correlations provide the dominant contribution to the multi-information any substantial deviation from the class of whitening filters is likely to yield suboptimal results.

The independent subspace analysis (with two-dimensional subspaces) used to obtain the matrix Inline graphic is based on the model by Hyvärinen [16]:

graphic file with name pcbi.1002889.e253.jpg (7)

where Inline graphic denotes the list of free parameters for each Inline graphic. More specifically, Inline graphic consists of the value Inline graphic for the Inline graphic-norm and the parameters of the radial distribution for each of the Inline graphic-spherically symmetric distributions. Each single Inline graphic was chosen to be a two-dimensional Inline graphic-spherically symmetric distribution [32]

graphic file with name pcbi.1002889.e262.jpg
graphic file with name pcbi.1002889.e263.jpg

with a radial Inline graphic-distribution Inline graphic with shape Inline graphic and scale Inline graphic. Therefore, the parameters Inline graphic were given by Inline graphic. In the denominator, Inline graphic denotes the surface area of the Inline graphic-norm unit sphere in two dimensions [32]. During training, we first fixed Inline graphic; after initial convergence, we retrained the model with free Inline graphic and Inline graphic.

The likelihood of the data under equation (7) was optimized by alternating between optimizing Inline graphic for fixed Inline graphic, and optimizing the Inline graphic for fixed Inline graphic. The gradient ascent on the log-likelihood of Inline graphic over the orthogonal group used the backprojection method by Manton [19], [33], [34]. Optimizing over Inline graphic yields filter pairs that resemble quadrature pairs like in the energy model of complex cells [17], [18].

Radial rescaling

Optimal contrast gain control: radial factorization

In the following we describe the general mechanism of radial factorization. The spherical symmetric case mostly used in this study is obtained by setting Inline graphic.

Radial factorization is the optimal redundancy reduction mechanism for Inline graphic-spherically symmetric distributed data [11], [32]. Samples from Inline graphic-spherically symmetric distributions with identical Inline graphic-norm Inline graphic are uniformly distributed on the Inline graphic-sphere with that radius. A radial distribution Inline graphic determines how likely it is that a data point is drawn from an Inline graphic-sphere with that specific radius. Since the distribution on the sphere is uniform for any Inline graphic-spherically symmetric distribution, the radial distribution Inline graphic determines the specific type of distribution. For example, Inline graphic and Inline graphic yields an isotropic Gaussian since the Gaussian distribution is spherically symmetric (Inline graphic) and has a radial Inline graphic-distribution (Inline graphic). One can show that, for a fixed value of Inline graphic, there is only one type of radial distribution such that the joint distribution is factorial [13]. For Inline graphic this radial distribution is the Inline graphic-distribution corresponding to a joint Gaussian distribution. For Inline graphic, the radial distribution is a generalization of the Inline graphic-distribution and the joint distribution is the so called Inline graphic-generalized Normal [35].

Radial factorization is a mapping on the Inline graphic-norm Inline graphic of the data points that transforms a given source Inline graphic-spherically symmetric distribution into a Inline graphic-generalized Normal. To this end, it first models the distribution of Inline graphic with a flexible distribution Inline graphic and then nonlinearly rescales Inline graphic such that the radial distribution becomes a generalized Inline graphic-distribution. This is achieved via histogram equalization Inline graphic where the Inline graphic denote the respective cumulative distribution functions. On the level of joint responses Inline graphic, radial factorization first normalizes the radius to one and then rescales the data point with the new radius:

graphic file with name pcbi.1002889.e313.jpg

In our case Inline graphic was chosen to be a mixture of five Inline graphic-distributions.

When determining the optimal redundancy reduction performance on the population response, we set Inline graphic in order to use the same norm as the divisive normalization model. Only when estimating the redundancy of the linear filter responses, we use Inline graphic [11].

Note that the divisive normalization model and the radial factorization model used in this study are invariant with respect to the choice of Inline graphic since the Euclidean norm (Inline graphic) is invariant under orthogonal transforms. However, the choice of Inline graphic would affect the redundancies in the plain filter responses Inline graphic in Figure 1B. But even if we had chosen a different Inline graphic, i.e. another set of whitening filters, the redundancy between the coefficients of Inline graphic would not vary much as previous studies have demonstrated [6], [7].

Divisive normalization model and Naka-Rushton distribution

We use the following divisive normalization transform

graphic file with name pcbi.1002889.e324.jpg

which is the common model for neural contrast gain control [8] and redundancy reduction [9].

Divisive normalization acts on the Euclidean norm of the filter responses Inline graphic. Therefore, divisive normalization can only achieve independence if it outputs a Gaussian random variable. While in radial factorization the target and source distribution were fixed, and the goal was to find a mapping that transforms one into the other, we now fix the mapping to divisive normalization, the target distribution on the normalized response Inline graphic to be Gaussian (Inline graphic to be Inline graphic-distributed) and search for the corresponding source distribution that would lead to a factorial representation when divisive normalization is applied. Since divisive normalization saturates at Inline graphic, we will actually have to use a truncated Inline graphic-distribution on Inline graphic. Inline graphic becomes the truncation threshold. Note that radial truncation actually introduces some dependencies, but we keep them small by choosing the truncation threshold Inline graphic to be the Inline graphic percentile of the radial Inline graphic-distribution which is approximately Inline graphic. The Inline graphic was chosen to keep the target distribution close to a factorial Gaussian. However, it could still be that another cut-off (value of Inline graphic) leads to a better redundancy reduction even though the target distribution is less factorial for lower values of Inline graphic (quantiles lower than Inline graphic). We made sure that this is not the case by choosing different values of Inline graphic, computing the best Inline graphic via a maximum likelihood fit of a Naka-Rushton distribution (see below), and estimating the multi-information in the transformed outputs. We found that the choice of Inline graphic has virtually no effect on the residual multi-infomation (it varies by Inline graphic for Inline graphic and takes its optimum within this interval). Therefore, we kept the Inline graphic choice as it is most similar to the target distribution of radial factorization.

Note also that choosing a Gaussian target distribution does not contradict the finding that cortical firing rates are found to be exponentially distributed [36] since each single response Inline graphic can always be transformed again to be exponentially distributed without changing the redundancy of Inline graphic.

The distribution on Inline graphic such that

graphic file with name pcbi.1002889.e350.jpg

is truncated Inline graphic-distributed can be derived by a simple change of variables. In the resulting distribution

graphic file with name pcbi.1002889.e352.jpg

the truncation threshold Inline graphic, the half-saturation constant Inline graphic, and the scale of the Inline graphic-distribution become parameters of the model. The parameter Inline graphic of the Naka-Rushton distribution controls the variance of the corresponding Gaussian and was always chosen such that the Gaussian was white with variance one. Inline graphic was determined by the Inline graphic-percentile. The only remaining free parameter of the Naka-Rushton distribution is Inline graphic which simultaneously affects both shape and scale. Inline graphic is the regularized-incomplete-gamma function which accounts for the truncation at Inline graphic. We call the distribution Naka-Rushton distribution and denote it with Inline graphic.

To derive the distribution on Inline graphic for which the extended divisive normalization transformation Inline graphic yields a Inline graphic-distribution, the steps are exactly the same as for the plain divisive normalization transform above. This yields

graphic file with name pcbi.1002889.e366.jpg

for Inline graphic. The parameters of the distribution are now Inline graphic and Inline graphic.

The parameters for all divisive normalization transforms were estimated via maximum likelihood of the Naka-Rushton distribution on the Euclidean norms Inline graphic of the filter responses to natural image patches. As before, we did not optimize for Inline graphic in the extended Naka-Rushton distribution but fixed it such that the corresponding Gaussian was white.

Dynamically adapting Inline graphic

For the model with dynamically adapting Inline graphic, we first model the Euclidean norms Inline graphic of the filter responses to the patches from the simulated eye movement data with a mixture of Inline graphic Naka-Rushton distributions

graphic file with name pcbi.1002889.e376.jpg

using EM [37]. Inline graphic denotes the probability that Inline graphic. The values of Inline graphic where chosen in Inline graphic equidistant steps from Inline graphic to Inline graphic.

How much redundancy reduction can be achieved with a dynamically adapting Inline graphic, depends on the dynamics according to which it is selected based on the recent history. While there might be many strategies, we chose a parsimonious one based on the mean and the standard deviation of the posterior over Inline graphic. Our heuristic consists of two steps: First the mean and the standard deviation of the posterior Inline graphic derived from the mixture distribution is approximated with piecewise linear functions Inline graphic and Inline graphic, then we sample Inline graphic used to transform Inline graphic from a Inline graphic-distribution with mean and standard deviation Inline graphic and Inline graphic. This strategy emphasizes that the first two moments of the posterior are the important features for obtaining a good Inline graphic.

In more detail, we evaluated the posterior

graphic file with name pcbi.1002889.e394.jpg

of the mixture distribution at Inline graphic equidistant locations between Inline graphic and Inline graphic, computed the posterior mean and standard deviation at those locations, rescaled the standard deviation by Inline graphic, and fitted the piecewise linear functions on the intervals Inline graphic to each set of values. In the first interval, the linear function was constraint to start at zero. From these two functions Inline graphic and Inline graphic, we computed two functions for the scale Inline graphic and the shape Inline graphic of a Inline graphic-distribution

graphic file with name pcbi.1002889.e405.jpg

via moment matching. We obtained the value Inline graphic for transforming a value Inline graphic with a Naka-Rushton function by sampling Inline graphic from a Inline graphic-distribution with shape and scale determined by Inline graphic and Inline graphic.

Computation of percentiles for Figure 5

For the dynamically adapting Inline graphic in Figure 5, we sampled from

graphic file with name pcbi.1002889.e413.jpg

and computed the percentiles based on the sampled dataset. For the sampling procedure, we drew Inline graphic from the Inline graphic-distribution Inline graphic with shape and scale computed from Inline graphic and then sampled Inline graphic from the Naka-Rushton distribution Inline graphic with that Inline graphic. We repeated that for all Inline graphic from a test set of simulated eye movement radii. This procedure was carried out for all pairs of training and test sets, and the distributions fitted to them.

For the static case, we sampled data from single Naka-Rushton distributions for different values of Inline graphic and computed the percentiles from the samples.

History-independent choice of Inline graphic

In the following, let Inline graphic and Inline graphic be the unnormalized and normalized responses at time Inline graphic, respectively, and Inline graphic be the recent history of responses. The underlying generative structure of the model for temporally correlated data is the following: given a fixed history Inline graphic, Inline graphic and Inline graphic are sampled from Inline graphic and Inline graphic. Then, Inline graphic is generated from Inline graphic and Inline graphic through divisive normalization.

For strong redundancy reduction, Inline graphic should follow a truncated Inline graphic-distribution, which means that for given history Inline graphic and Inline graphic, the unnormalized response energy Inline graphic must have a Naka-Rushton distribution

graphic file with name pcbi.1002889.e441.jpg

because normalizing this response via Inline graphic yields a truncated Inline graphic-distribution. Averaged over all histories Inline graphic and half-saturation constants Inline graphic the distribution of Inline graphic is a mixture of Naka-Rushton distributions

graphic file with name pcbi.1002889.e447.jpg (8)

If Inline graphic depends deterministically on Inline graphic we obtain equation (5).

If Inline graphic could be chosen independently of the preceding history the distribution of Inline graphic would be given by

graphic file with name pcbi.1002889.e452.jpg

where Inline graphic is the marginal distribution of Inline graphic transformed with divisive normalization and a specific value of Inline graphic. Since redundancy reduction requires Inline graphic to be truncated Inline graphic-distributed, Inline graphic can be chosen independently only if the truncated Inline graphic-distribution can be modelled as mixture of the different Inline graphic. Since we assume stationarity, we can drop the index Inline graphic in the equation.

Multi-information estimation

We use the multi-information to quantify the statistical dependencies between the filter responses Inline graphic [38]. The multi-information is the Inline graphic-dimensional generalization of the mutual-information. It is defined as the Kullback-Leibler divergence between the joint distribution and the product of its marginals or, equivalently, the difference between the sum of the marginal entropies and the joint entropy

graphic file with name pcbi.1002889.e464.jpg (9)

The multi-information is zero if and only if the different dimensions of the random vector Inline graphic are independent. Since the joint entropy Inline graphic is hard to estimate we employ a semi-parametric estimate of the multi-information that is conservative in the sense that it is downward biased.

For the marginal entropies Inline graphic, we use a jackknifed estimator for the discrete entropy on the binned values [39]. We chose the bin size with the heuristic proposed by Scott [40]. We obtain an estimate for the differential entropy by correcting with the logarithm of the bin width (see e.g. [7]).

In order to estimate the joint entropy, we use the average log-loss to get an upper bound

graphic file with name pcbi.1002889.e468.jpg

Since the average log-loss overestimates the true entropy, replacing the joint entropy by Inline graphic in equation (1) underestimates the multi-information. Therefore, we sometimes get estimates smaller than zero. Since the multi-information is always positive, we set the value to zero in that case. For computing errorbars on the multi-information estimations, we use the negative values but a mean zero in such cases, which effectively increases the standard deviation of the error.

Since we want commit ourselves as little as possible to a particular model, we estimate Inline graphic by making the assumption that Inline graphic is Inline graphic-spherically symmetric distributed but estimating everything else with non-parametric estimators. If Inline graphic is Inline graphic-spherically symmetric distributed, the radial component is independent from the directional component [32] and we can write

graphic file with name pcbi.1002889.e475.jpg (10)

The entropy Inline graphic of the radial component is again estimated via a histogram estimator. The term Inline graphic is approximated by the empirical mean.

Putting all the equations together yields our estimator for the multi-information under the assumption of Inline graphic-spherically symmetric distributed Inline graphic

graphic file with name pcbi.1002889.e480.jpg

where Inline graphic are the univariate entropies estimated via binning.

Since the optimal value of Inline graphic for filter responses Inline graphic to natural image patches is approximately Inline graphic we use that value to estimate the multi-information of Inline graphic.

When estimating the multi-information of the responses Inline graphic of either divisive normalization or radial factorization, we use the fact that

graphic file with name pcbi.1002889.e487.jpg

where Inline graphic is the Jacobian of the normalization transformation. The mean is estimated by averaging over data points. The determinants of radial factorization, divisive normalization, and extended divisive normalization are given by

graphic file with name pcbi.1002889.e489.jpg
graphic file with name pcbi.1002889.e490.jpg
graphic file with name pcbi.1002889.e491.jpg

All multi-information values were computed on test data.

For the dynamically adapting model, the Inline graphic for each data point Inline graphic is sampled from a Inline graphic-distribution whose parameters are determined from the previous value Inline graphic and the posterior over Inline graphic obtained from the mixture of Naka-Rushton distributions. Since Inline graphic changes from step to step it becomes part of the representation and should be included when computing the multi-information (i.e. the redundancy) between the outputs Inline graphic. Therefore, the redundancy for the dynamically adapting model is measured by Inline graphic. For its computation, we use that Inline graphic, where Inline graphic is the mutual information between Inline graphic and Inline graphic. In the following, we write Inline graphic if Inline graphic. Under the assumption that both Inline graphic and Inline graphic are spherically symmetric distributed, we can decompose respective random variables into the uniform (on the sphere) and the radial part: Inline graphic and Inline graphic. This yields

graphic file with name pcbi.1002889.e510.jpg

which means that we can restrict ourselves to the mutual information between the two univariate signals Inline graphic and Inline graphic, which we estimate from a two-dimensional histogram with Inline graphic bins.

Acknowledgments

We thank P. Berens, L. Busse, S. Katzner and L. Theis for fruitful discussions and comments on the manuscript.

Funding Statement

This study was financially supported by the German Ministry of Education, Science, Research and Technology through the Bernstein award (BMBF; FKZ: 01GQ0601). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Barlow HB (1961) Possible Principles Underlying the Transformations of Sensory Messages. In: Rosenblith WA, editor. Sensory Communication. Cambridge, MA: MIT Press. pp. 217–234.
  • 2. Simoncelli EP, Olshausen BA (2003) Natural Image Statistics and Neural Representation. Annual Review of Neuroscience 24: 1193–1216. [DOI] [PubMed] [Google Scholar]
  • 3. Bell AJ, Sejnowski TJ (1997) The “independent components” of natural scenes are edge filters. Vision Research 37: 3327–3338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Barlow HB (1989) Unsupervised Learning. Neural Computation 1: 295–311. [Google Scholar]
  • 5. Lewicki MS, Olshausen BA (1999) Probabilistic framework for the adaptation and comparison of image codes. Journal of the Optical Society of America A 16: 1587–1601. [Google Scholar]
  • 6. Bethge M (2006) Factorial coding of natural images: how effective are linear models in removing higher-order dependencies? Journal of the Optical Society of America A 23: 1253–1268. [DOI] [PubMed] [Google Scholar]
  • 7. Eichhorn J, Sinz F, Bethge M (2009) Natural Image Coding in V1: How Much Use Is Orientation Selectivity? PLoS Comput Biol 5: e1000336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Heeger DJ (1992) Normalization of cell responses in cat striate cortex. Vis Neurosci 9: 181–197. [DOI] [PubMed] [Google Scholar]
  • 9. Schwartz O, Simoncelli EP (2001) Natural signal statistics and sensory gain control. Nat Neurosci 4: 819–825. [DOI] [PubMed] [Google Scholar]
  • 10. Carandini M, Heeger DJ (2011) Normalization as a canonical neural computation. Nature reviews Neuroscience 13: 51–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Sinz F, Bethge M (2009) The Conjoint Effect of Divisive Normalization and Orientation Selectivity on Redundancy Reduction. In: Koller D, Schuurmans D, Bengio Y, Bottou L, editors. Advances in neural information processing systems 21: 22nd Annual Conference on Neural Information Processing Systems 2008. Red Hook, NY, USA: Curran Associates. pp. 1521–1528.
  • 12. Lyu S, Simoncelli EP (2009) Nonlinear extraction of independent components of natural images using radial gaussianization. Neural Computation 21: 1485–1519. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Sinz F, Gerwinn S, Bethge M (2009) Characterization of the p-generalized normal distribution. Journal of Multivariate Analysis 100: 817–820. [Google Scholar]
  • 14. Bonds AB (1991) Temporal dynamics of contrast gain in single cells of the cat striate cortex. Vis Neurosci 6: 239–255. [DOI] [PubMed] [Google Scholar]
  • 15. Hyvärinen A, Hoyer P (2000) Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces. Neural Computation 12: 1705–1720. [DOI] [PubMed] [Google Scholar]
  • 16. Hyvärinen A, Koester U (2007) Complex cell pooling and the statistics of natural images. Network: Computation in Neural Systems 18: 81–100. [DOI] [PubMed] [Google Scholar]
  • 17. Pollen D, Ronner S (1981) Phase relationships between adjacent simple cells in the visual cortex. Science 212: 1409–1411. [DOI] [PubMed] [Google Scholar]
  • 18. Adelson EH, Bergen JR (1985) Spatiotemporal energy models for the perception of motion. Journal of the Optical Society of America A 2: 284–299. [DOI] [PubMed] [Google Scholar]
  • 19.Sinz F, Simoncelli EP, Bethge M (2009) Hierarchical Modeling of Local Image Features through Lp-Nested Symmetric Distributions. In: Bengio Y, Schuurmans D, Lafferty J, Williams C, Culotta A, editors. Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Red Hook, NY, USA: Curran Associates. pp. 1696–1704.
  • 20. Lyu S (2011) Dependency Reduction with Divisive Normalization: Justification and Effectiveness. Neural Computation 23: 2942–2973. [DOI] [PubMed] [Google Scholar]
  • 21. Wainwright MJ, Simoncelli EP (2000) Scale mixtures of Gaussians and the statistics of natural images. Neural Information Processing Systems 12: 855–861. [Google Scholar]
  • 22.Wainwright MJ, Schwartz O, Simoncelli EP (2002) Natural image statistics and divisive normalization: modeling nonlinearities and adaptation in cortical neurons. In: Statistical theories of the brain. MIT Press. pp. 203–222.
  • 23. Field DJ (1987) Relations between the statistics of natural images and the response properties of cortical cells. Journal of the Optical Society of America A 4: 2379–2394. [DOI] [PubMed] [Google Scholar]
  • 24. Ruderman DL, Bialek W (1994) Statistics of natural images: Scaling in the woods. Physical Review Letters 73: 814. [DOI] [PubMed] [Google Scholar]
  • 25. Kac M (1939) On a Characterization of the Normal Distribution. American Journal of Mathematics 61: 726–728. [Google Scholar]
  • 26. Brenner N, Bialek W, De Ruyter Van Steveninck R (2000) Adaptive rescaling maximizes information transmission. Neuron 26: 695–702. [DOI] [PubMed] [Google Scholar]
  • 27. Wark B, Lundstrom BN, Fairhall A (2007) Sensory adaptation. Current Opinion in Neurobiology 17: 423–429. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Hu M, Wang Y (2011) Rapid Dynamics of Contrast Responses in the Cat Primary Visual Cortex. PLoS ONE 6: e25410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Kienzle W, Franz MO, Schölkopf B, Wichmann FA (2009) Center-surround patterns emerge as optimal predictors for human saccade targets. Journal of Vision 9: 7.1–15. [DOI] [PubMed] [Google Scholar]
  • 30. Reinagel P, Zador AM (1999) Natural scene statistics at the centre of gaze. Network 10: 341–350. [PubMed] [Google Scholar]
  • 31. Van Hateren JH, Van Der Schaaf A (1998) Independent component filters of natural images compared with simple cells in primary visual cortex. Proceedings of the Royal Society B Biological Sciences 265: 359–366. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Gupta AK, Song D (1997) Lp-norm spherical distribution. Journal of Statistical Planning and Inference 60: 241–260. [Google Scholar]
  • 33. Manton JH (2002) Optimization algorithms exploiting unitary constraints. Signal Processing, IEEE Transactions on 50: 635–650. [Google Scholar]
  • 34. Sinz F, Bethge M (2010) Lp -Nested Symmetric Distributions. Journal of Machine Learning Research 11: 3409–3451. [Google Scholar]
  • 35. Goodman IR, Kotz S (1973) Multivariate theta]-generalized normal distributions. Journal of Multivariate Analysis 3: 204–219. [Google Scholar]
  • 36. Baddeley R, Abbott LF, Booth MC, Sengpiel F, Freeman T, et al. (1997) Responses of neurons in primary and inferior temporal visual cortices to natural scenes. Proceedings of the Royal Society B Biological Sciences 264: 1775–1783. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B Methodological 39: 1–38. [Google Scholar]
  • 38. Perez A (1977) ε-admissible simplification of the dependence structure of a set of random variables. Kybernetika 13: 439–444. [Google Scholar]
  • 39. Paninski L (2003) Estimation of Entropy and Mutual Information. Neural Computation 15: 1191–1253. [Google Scholar]
  • 40. Scott DW (1979) On optimal and data-based histograms. Biometrika 66: 605–610. [Google Scholar]

Articles from PLoS Computational Biology are provided here courtesy of PLOS

RESOURCES