Where is the light? Bayesian perceptual priors for lighting direction

JV Stone; IS Kerrigan; J Porrill

doi:10.1098/rspb.2008.1635

. 2009 Feb 25;276(1663):1797–1804. doi: 10.1098/rspb.2008.1635

Where is the light? Bayesian perceptual priors for lighting direction

JV Stone ^1,^*, IS Kerrigan ², J Porrill ¹

PMCID: PMC2674484 PMID: 19324801

Abstract

Perception of shaded three-dimensional figures is inherently ambiguous, but this ambiguity can be resolved if the brain assumes that figures are lit from a specific direction. Under the Bayesian framework, the visual system assigns a weighting to each possible direction, and these weightings define a prior probability distribution for light-source direction. Here, we describe a non-parametric maximum-likelihood estimation method for finding the prior distribution for lighting direction. Our results suggest that each observer has a distinct prior distribution, with non-zero values in all directions, but with a peak which indicates observers are biased to expect light to come from above left. The implications of these results for estimating general perceptual priors are discussed.

Keywords: perception, Bayes, lighting direction

1. Introduction

Perception consists of interpreting two-dimensional retinal images of a three-dimensional world. The process of projecting a three-dimensional scene onto a two-dimensional retina necessarily discards information about the three-dimensional structure of that scene. This makes it impossible, in principle, to deduce all of the three-dimensional structure of a scene, and perception is therefore a classic example of an ill-posed problem (Poggio et al. 1985). However, even though such problems cannot be solved by deduction, acceptable solutions can be found using statistical inference. This involves using additional information, usually based on prior experience, to interpret two-dimensional retinal images, where this additional information takes the form of heuristics (rules of thumb) or constraints (rules which exclude certain ‘illegal’ solutions).

Within the Bayesian framework, this extra information is realized in the form of prior distributions. For example, the image marked with a cross in figure 1 can be interpreted as either convex or concave. The particular perception evoked by this image depends only on the direction in which the light source is assumed to originate (Rittenhouse 1786; Brewster 1847; Oppel 1856; Kleffner & Ramachandran 1992). If the light source is assumed to originate from below, then the image is interpreted as convex, but if the light source is assumed to originate from above, then the image is interpreted as concave. As this is the usual interpretation made by human observers, it implies that we implicitly assume light originates from above. However, such demonstrations provide only a qualitative impression of where we assume the light source to be.

Typical stimulus presented to an observer on a single trial. The observer's task is to indicate whether or not the quadrant marked with a cross (×) appears convex or concave. This response implicitly defines the perceived direction of the light source. For example, if the marked quadrant is perceived as convex, then this implies that the light originates from the lower right (i.e. approx. 300°), but if it is perceived as concave, then this implies that the light originates from the upper left (i.e. approx. 120°).

In reality, it is unlikely that human observers make the simplistic assumption that light comes only from above or below. More realistically, each observer assigns a probability to each possible light-source direction, which may be based on prior experience of the directions in which light sources originate.

These probability values collectively define a prior probability density function, which can be visualized using a polar plot, where the radial distance in a given direction indicates the relative probability that the light originates from that direction (as in figure 2c). In this paper, we show how it is possible to estimate the overall form of this prior, which, for reasons that will become obvious, we call the light-from-above prior. For the sake of clarity, note that we do not seek the prior for lighting direction, which could be obtained empirically, but the prior as used by a given observer.

Evaluating cross-validation method for estimating the value of the smoothing parameter λ. Results for the simulated observer's data shown in (c) for σ=1. See appendix B. (a) At each sampled value of λ, three quarters of the data were used to estimate the prior. Using this prior, the likelihood of the remaining quarter was evaluated using equation (2.23), where σ=1. This was repeated once for each of four disjoint quarters, and the mean of the four resultant likelihood functions is plotted here. The minimum with respect to the negative log likelihood corresponds to λ≈700. (b) To evaluate the success of cross-validation for each sampled value λ_j of λ, the Kullback–Leibler (KL) divergence between the known prior $p_{θ}^{*} (θ)$ of this simulated observer and the prior ${\hat{p}}_{θ} (θ)$ obtained with λ_j was calculated as $E_{KL} = 1000 \times Δ θ \sum {\hat{p}}_{θ} (θ_{i}) log {\hat{p}}_{θ} (θ_{i}) / p_{θ}^{*} (θ_{i}),$ where i=(0, 10, …, 350), θ_i=10×i and Δθ=10°. The minimum with respect to the KL distance also corresponds to λ≈700, confirming that cross-validation chooses a value of λ, which provides a good estimate of the true prior. (c) Estimating the light-from-above prior p_θ(θ) for a simulated observer. The lighting direction varies around the circle, and the probability that the stimulus was judged to be convex varies with distance from the origin. The graph shows (i) a sample from the posterior p(c₀|x) as the proportion of convex responses (dashed), (ii) the known prior (dotted), (iii) the estimated prior (solid), based on the proportion of convex responses, as it would be for a human observer, and (iv) the mean vector (solid line), which is the mean of the prior (see appendix C). The direction of this vector indicates the bias in the prior, and its length shows the amount of bias (see appendix A). The simulated observer was exposed to the same 36 lighting directions and the same number of trials per lighting direction (32) as the human observers used in the experiment described in the text, a discrimination parameter that was set at σ^*=1 dB and a concavity preference set at p(c₁)=0.5. The value of the smoothing parameter estimated from cross-validation is λ=700 (see (a) and (b)). Using λ=700 and σ=1, the concavity preference was estimated as $\hat{p} (c_{1}) = 0.507$ .

Our general strategy is closely related to that described in Paninski (2006). However, in the simulated experiment described by Paninski, the observer estimates a continuous parameter, and so each trial provides an equality constraint on the prior. Here, we concentrate on the more common case in which the observer makes a forced choice, so that each trial provides a weaker, inequality constraint on the prior.

2. Results

The shape information in our images is a function of two parameters, the direction θ of the light source and the three-dimensional shape c of the imaged surface, which specifies whether the stimulus is concave c=c₁ or convex c=c₀. On each trial, the observer is presented with an image x, and makes a binary response r=1 if the stimulus appears concave or r=0 if the stimulus appears convex (see appendix A).

We assume that the observer's perceived shape $\hat{c}$ of a shape c depends on two quantities: the posterior probability density function and the loss function. First, the probability (density) that the shape has value c and that the light source is in direction θ given an image x defines the joint posterior probability density function p(c,θ|x). Second, the ‘cost’ of perceiving a shape as $\hat{c}$ , when it is actually c, is defined by the loss function $D (\hat{c}, c)$ .

The observer's perception is assumed to correspond to the shape $\hat{c}$ , which minimizes the expected loss, where this expectation is taken over all possible values of θ and c

E = \int_{c} \int_{θ} p (c, θ | x) D (\hat{c}, c) d θ d c,

(2.1)

= \int_{c} D (\hat{c}, c) [\int_{θ} p (c, θ | x) d θ] d c .

(2.2)

Using Bayes' rule, the posterior is given by

p (c, θ | x) = p (x | c, θ) p (c, θ) / p (x),

(2.3)

where the observer's prior expectations about shapes and lighting directions define the joint prior distribution p(c, θ), and where the probability of the observed image for a given three-dimensional shape and lighting direction defines the likelihood function p(x|c, θ). The integral in square brackets in equation (2.2) can now be rewritten as

p (c | x) = \frac{1}{p (x)} \int_{θ} p (x | c, θ) p (c, θ) d θ,

(2.4)

so that the expected loss is

E = \int_{c} p (c | x) D (\hat{c}, c) d c .

(2.5)

In fact, each stimulus x is consistent with only two lighting directions, θ_x and ${\bar{θ}}_{x} = θ_{x} + 180 °$ . This implies that the likelihood p(x|c, θ) is a delta function, which is zero except at θ=θ_x and $θ = {\bar{θ}}_{x}$ ,

p (x | c_{0}, θ) = δ (θ - θ_{x}),

(2.6)

p (x | c_{1}, θ) = δ (θ - {\bar{θ}}_{x}) .

(2.7)

Substituting equation (2.6) in equation (2.4) for c=c₀ yields

p (c_{0} | x) = \frac{1}{p (x)} \int δ (θ - θ_{x}) p (c_{0}, θ) d θ,

(2.8)

= p (c_{0}, θ_{x}) / p (x) .

(2.9)

If the observer assumes that the stimulus shape and the lighting direction are independent, then the joint prior distribution p(c₀,θ_x) factorizes to yield

p (c_{0} | x) = p (c_{0}) p_{θ} (θ_{x}) / p (x),

(2.10)

where p_θ(θ_x) is the prior over lighting direction and p(c₀) is the prior for the shape c₀. A similar calculation for c=c₁ yields

p (c_{1} | x) = p (c_{1}) p_{θ} ({\bar{θ}}_{x}) / p (x) .

(2.11)

Regardless of the value of θ_x and ${\bar{θ}}_{x}$ , each observer perceives the stimulus as either convex c₀ or concave c₁, and responds accordingly. Thus, together, p(c₁) and p(c₀) is a pair of co-determined observer-specific scalar priors, such that p(c₁)+p(c₀)=1. We call the prior p(c₁) the concavity preference for a given observer, which can be estimated using the same method (described below) for estimating the prior p_θ(θ).

We choose the zero/one loss function to model the forced choice task, i.e. $D (\hat{c}, c) = 0$ for a correct decision and $D (\hat{c}, c) = 1$ for an incorrect decision (Bishop 1996); the optimal decision rule under this loss function minimizes the number of misclassified stimuli. Substituting this loss function into equation (2.5), we find that the observer should respond r=0 (convex) if the log posterior ratio

L = log \frac{p (c_{0} | x)}{p (c_{1} | x)},

(2.12)

= log \frac{p (c_{0}) p (θ_{x})}{p (c_{1}) p ({\bar{θ}}_{x})},

(2.13)

\geq 0

(2.14)

and the response should be r=1 (concave) otherwise.

This deterministic rule would lead to the same decision for all presentations of a given stimulus. In order to model the stochastic character of human decision making, we follow a general suggestion of (Paninski 2006), and assume that our rule is stochastic (see §3). Specifically, we assume that the process (e.g. the observer's criterion) that compares the log posterior probability log p(c₀|x) with log p(c₁|x) is subject to noise. In order to be clear about the implications of this, we define

L_{0} = 10 log p (c_{0}) p_{θ} (θ_{x}),

(2.15)

L_{1} = 10 log p (c_{1}) p_{θ} ({\bar{θ}}_{x})

(2.16)

and rewrite equation (2.14) as L=L₀−L₁. We assume that the distribution of L₀ values is Gaussian with mean ${\bar{L}}_{0}$ and standard deviation σ, and that the distribution of L₁ values is Gaussian with mean ${\bar{L}}_{1}$ and also with standard deviation σ. As L₀ and L₁ are both Gaussian with variance σ², L is also Gaussian with mean $\bar{L} = {\bar{L}}_{0} - {\bar{L}}_{1}$ and variance $σ_{L}^{2} = 2 σ^{2}$ . For simplicity, we assume that σ is the same for all lighting directions.

Note that we have chosen to measure the relative log likelihood (sometimes called evidence) in decibels (dB) as suggested by Jaynes (2003). This allows easy comparison of levels of evidence. For example, evidence of 3 dB for a hypothesis means that it is about twice as likely than its alternative, and 10 dB means that it is about 10 times as likely. Jaynes has suggested that an evidence threshold of approximately 1 dB is characteristic of many human judgements (Jaynes 2003).

We assume that the probability P(c₀|x) of the observer perceiving a shape c₀ is described by the cumulative density function of a Gaussian with zero mean and variance $σ_{L}^{2}$ ,

P (c_{0} | x) = \frac{1}{σ_{L} \sqrt{2 π}} \int_{- \infty}^{L} e^{- η^{2} / (2 σ_{L}^{2})} d η,

(2.17)

= (1 + erf (L \sqrt{2} / σ_{L})) / 2,

(2.18)

= q,

(2.19)

where q is defined for brevity. For a given value of q, if the same stimulus is presented on n trials and if responses are independent across trials, then the probability that the observer responds r=1 (concave) on m of those n trials is

p (m | q) = C_{n, m} q^{m} {(1 - q)}^{n - m},

(2.20)

where C_n,m is a binomial coefficient. For a given light direction, C_n,m is constant, and so it does not affect the value ${\hat{q}}_{i}$ that maximizes p(m|q), and is omitted below.

We discretize the lighting direction into N values: θ_i:i=1, …, N. For a given value of θ_i, we present the stimulus n_i times, and record the number m_i of ‘concave’ responses, so that

p (m_{i} | q_{i}) = q_{i}^{m_{i}} {(1 - q_{i})}^{n_{i} - m_{i}} .

(2.21)

Thus, the n_i binary responses of a single observer to repeated presentations of the same stimulus are maximally consistent with the value ${\hat{q}}_{i}$ of q_i, which is the probability that the observer perceives the shape as concave when the lighting direction is θ_i.

When considered over all N lighting directions, and assuming independent noise, the probability of the vector m=(m₁, …, m_N) for a given vector q=(q₁, …, q_N) is

p (m | q) = \prod_{i} q_{i}^{m_{i}} {(1 - q_{i})}^{n_{i} - m_{i}},

(2.22)

which is the likelihood function of q. The vector $\hat{q}$ of q that maximizes p(m|q) is the maximum-likelihood estimate of the true value q^*. Taking logs and multiplying by minus, one transforms equation (2.22) into the negative log likelihood function of q,

E_{f} = - \sum_{i}^{N} m_{i} log q_{i} + (n_{i} - m_{i}) log (1 - q_{i}) .

(2.23)

As both the prior distribution and the concavity preference are implicit in $\hat{q}$ , this provides an estimate ${\hat{p}}_{θ} (θ)$ of the true prior distribution $p_{θ}^{*} (θ)$ , and an estimate $\hat{p} (c_{1})$ of the true concavity preference p^*(c₁).

As discussed later, the unknown value of the discrimination parameter σ_L means that, in practice, the prior is not completely determined by equation (2.14) (see §3); but for the sake of brevity, we will refer to this as ‘estimating the prior’.

(a) Smoothing the prior

Unless the dataset is very large, the prior distribution estimated by direct minimization of E_f will not be very smooth. Smoothness of the prior probability for lighting direction is an important physical constraint, which we can model by regularizing the solution

E = E_{f} + λ^{2} E_{s},

(2.24)

where E_s is a measure of the smoothness of p_θ(θ), and λ is proportional to the square of the expected angular scale over which the prior for lighting direction is expected to change. This regularization procedure can be thought of as specifying a ‘prior for priors’ (Paninski 2006).

Paninski suggests using the usual L₂ norm on the derivative of the prior to measure smoothness. A related measure that is more appropriate to this probabilistic situation (see §3) is the Fisher information, which measures the extent to which the prior p_θ(θ) is localized, and which is a weighted version of the usual L₂ norm,

E_{s} = E [{(\frac{d log p_{θ} (θ)}{d θ})}^{2}],

(2.25)

= \sum_{i} \frac{{(p_{θ} (θ_{i + 1}) - p_{θ} (θ_{i}))}^{2}}{p_{θ} (θ_{i}) Δ θ} .

(2.26)

In summary, for given values of the smoothing parameter λ, the values of the N elements of the discretized prior p_θ(θ) and the concavity preference p(c₁) can be estimated simultaneously as those values which minimize E (equation (2.24)). The value of λ was estimated using cross-validation (Bishop 1996; see appendix B), and the MatLab minimization procedure ‘fminsearch’ was used to find an estimate of $p_{θ}^{*} (θ)$ and p^*(c₁).

(b) Results for simulated observer

In order to test our methods, we first analysed data from a simulated observer with a known prior $p_{θ}^{*} (θ)$ . The prior was defined as a von Mises distribution (Fisher 1995) $p_{θ} (θ) = exp (κ cos (θ - μ))$ , with location parameter μ=−45° and dispersion parameter κ=0.33. The value of the smoothing parameter λ has no explicit representation when generating data for the simulated observer, and cross-validation (appendix B) was used to find an estimate of $\hat{λ} = 700$ (figure 2). This was then used with the known value of σ=1 to estimate the simulated observer's prior ${\hat{p}}_{θ} (θ)$ for lighting direction and its concavity preference $\hat{p} (c_{1})$ . The concavity preference of this simulated observer had been defined as p(c₁)=0.5, and was subsequently estimated as $\hat{p} (c_{1}) = 0.507$ . The method also recovered an accurate estimate of the prior, as shown in figure 2c.

(c) Results for human observers

Using cross-validation (appendix B), the estimated value of the smoothing parameter was $\hat{λ} = 400$ (figure 3). This was then used with σ=2 to estimate each observer's prior p_θ(θ) (figure 4). In each case, the estimated prior is biased towards the upper left, in agreement with previous findings on group average data (Mamassian & Landy 2001). Thus, the left biases observed in each posterior in figure 4 and in Mamassian & Goutcher (2001), as well as the left and right biases reported in Sun & Perona (1998) and Adams et al. (2004), are probably due to a bias in each observer's prior, rather than a bias in the likelihood function. The estimated prior concavity preferences for all observers were within the range $\hat{p} (c_{1}) = 0.49 - 0.51$ , compared with findings for the posterior in Adams et al. (2004) (0.44), which used similar stimuli. Details of the experimental procedure are given in appendix A.

Cross-validation. Result for estimating the value of the smoothing parameter λ for observer a in figure 4, with σ=2. The minimum with respect to the negative log likelihood corresponds to λ≈400. This curve is typical of that obtained for other observers, and a value of $\hat{λ} = 400$ was therefore used for all human observers.

Polar plots of estimated priors for eight observers. Each graph shows the frequency of convex responses (dashed) as a function of light-source direction. This is essentially a sample from the observer's posterior, and is used to estimate the prior (solid). For display purposes, the lengths of all mean vectors (solid line) have been scaled by the same factor across all graphs, and all graphs are drawn to the same scale (see appendix C). Note that all biases are to the left, with values of 20°, 7°, 9°, 18°, 34°, 14°, 28° and 16°, respectively (mean 18°). These results were obtained using all the data for each observer with λ=400 and σ=2.

3. Discussion

When an observer is asked to report the concavity/convexity of a shape for a range of different lighting directions, the resultant set of responses (usually depicted as a polar plot) represents a sample from their posterior probability density function for shape. It is this sample from the observer's posterior which has been used in all previous experiments to provide estimates of observers' posterior for lighting direction.

The main contribution of this paper is a method for using this sampled posterior, in combination with a likelihood function and a loss function, to estimate the prior probability density function for lighting direction and the prior for concavity preference in individual observers. In order to achieve this, we assume plausible forms for the likelihood and loss functions. For the loss function, we assume that each observer attempts to minimize the number of misclassified stimuli, an objective which corresponds to making responses consistent with the mode of the posterior probability density function. With regard to the likelihood function, each convexity/concavity response is consistent with one of two possible lighting directions, which effectively implies that the likelihood function is a delta function with non-zero values corresponding to these two lighting directions. This provides a posterior which is proportional to the prior for exactly two lighting directions and two shapes (convex/concave). An estimate of each observer's prior and concavity preference was then obtained by minimizing a regularized (smoothed) version of the negative log likelihood of the sampled posterior.

(a) Related work

Research on motion perception explained the change in perceived speed that occurs at different levels of contrast by assuming a specific (Gaussian) form the speed prior (Weiss et al. 2002). Other researchers assume that the mean of the posterior coincides with the true stimulus value in a sensorimotor task (Körding & Wolpert 2004) or that (i) the log of the prior is a straight line, (ii) the likelihood is Gaussian, and (iii) the mean of the posterior is the true mean (Stocker & Simoncelli 2006a). We make none of these assumptions.

A parametric estimate of the lighting prior has previously been obtained (Mamassian & Landy 2001) under the assumption that it can be described by a two-parameter von Mises distribution (see below).

The method described here is inspired by Paninski (2006). However, our method is different from Paninski's in two key respects. First, the stochastic choice model assumes that the log posterior probabilities (and not posterior probabilities) are subject to additive Gaussian noise (equation (2.17)). This has a number of advantages. (i) The chosen value of σ corresponds naturally to a threshold value for the evidence (in the sense of log posterior ratio) needed to obtain a given choice rate in the presence of encoding noise. (ii) There are no problems of positivity in adding an unbounded noise contribution to probability values which should be positive. (iii) The neural encoding of log probabilities has been shown to have a direct neural interpretation as an approximation to Poisson noise in neural populations (Gold & Shadlen 2001).

Second, we have replaced the L₂ regularizer used in Paninski (2006) with Fisher information. This is more closely related to the probabilistic nature of the problem. Essentially, regularization using Fisher information (equation (2.26)) tries to satisfy the experimental constraints using the least localized prior density. By up-weighting the contribution for low probabilities (i.e. by 1/p_θ(θ)), the Fisher regularizer takes account of the fact that small ripples in low-probability regions are just as significant as larger ripples at higher probability values when the task requires a likelihood ratio judgement. There is inevitably a trade-off between the form of the estimated prior and the nature of the smoothing function. However, because the Fisher regularizer seeks that prior with the least localized density, it can be interpreted as the regularizer of least commitment.

(b) The experimental task

The design of the concavity–convexity task was chosen for a number of reasons. First, we have chosen a forced choice task. Experiments in which the observer provides an explicit estimate of lighting direction on each trial could provide more powerful constraints on the prior. However, asking observers to estimate lighting direction is an unnatural task, and is therefore likely to yield data that are both biased and noisy. Although we have seen that the forced choice experiment leaves some aspects of the prior unconstrained, it requires far fewer modelling assumptions than parameter estimation alternatives, and so the information that is obtained is more reliable.

Second, although the question of the modification of the prior by feedback is of great interest (Adams et al. 2004), no feedback was given here, and there is no correct response for the chosen stimuli. This is important because, in most applications, even a small number of trials with feedback reduce the dependence of the posterior on the prior to insignificance (Mele & Rawling 2004; indeed, this ‘washing out’ property is often invoked to protect Bayesian methods from the consequences of choosing incorrect priors). In our experiment, neither the posterior nor the prior can be updated as a consequence of feedback. It is generally assumed that exposure to a biased population of stimuli (e.g. exposure to mainly concave stimuli) induces a shift in the prior. However, this appears to be the case only if feedback is given to correct the interpretation of ambiguous stimuli. Observers adapt their visual interpretation of stimuli as those in figure 1, provided they are given haptic feedback of those stimuli (Adams et al. 2004). Moreover, this adaptation was found to affect performance on a different (lightness judgement) task, which required an assumption regarding light direction, indicating a shift in the mean of the light-from-above prior. From a statistical perspective, this makes sense. Decisions based on a series of measurements with corrective feedback are initially based mainly on prior expectations. However, the corrective feedback can be used to update the prior, making future decisions more reliable, as in the classic Kalman filter (Kalman 1960). However, exposure to a biased population of stimuli without feedback induces after-effects in the opposite direction to that predicted by a shift in the prior. These after-effects are consistent with a change in the likelihood function and not in the prior (Stocker & Simoncelli 2006b). In our experiment, observers were exposed to an unbiased population of stimuli and received no feedback. Given the above considerations, this suggests that the prior and likelihood were not affected by the stimuli, and were reasonably constant throughout the experiment.

Third, we have used stimuli which are essentially noise free. Many visual tasks have unavoidable sensory noise, and when this is not the case, experimenters have added artificial noise, specifically in order to allow a Bayesian analysis. By virtually eliminating this sensory noise in a very simple task, we have ensured that any stochastic variation in responses must be a result of noise in the internal encoding of variables used in the decision process, noise which we have modelled by the parameter σ.

(c) Estimating the discrimination parameter

Our estimate of the prior depends on the value of the discrimination parameter σ, and we have not addressed how to fix a value for σ. This parameter cannot be estimated directly from experimental data because, for any given value of σ, there is a prior which fits the observed data equally well, as shown in figure 5. This ambiguity is unavoidable for judgement tasks that depend only on likelihood ratios, which comprise the majority of choice tasks (Green & Swets 1966). For the task considered here, this dependence is made explicit in equation (2.17), where the posterior probability is seen to be a function of the ratio L/σ, so that smaller log likelihood differences L can always be reliably detected by using a smaller value for σ.

The discrimination parameter σ is undetermined. (a) Graph of an example log prior, log p_θ(θ) (solid horizontal sinusoid curve), as a function of lighting direction θ. Given two hypothetical neurons with preferred lighting directions θ and $\bar{θ}$ , their responses are determined by their log probability density functions, log p(θ) and $log p (\bar{θ})$ , indicated by the vertical dashed and solid curves, respectively. For a given stimulus, the larger of the two observed values from the probability density functions log p(θ) and $log p (\bar{θ})$ determines the lighting direction assumed by the observer, and this, in turn, determines the concave/convex observer response. These two observed values are noisy estimates of the probability density function means, so the choice probability q (see equation (2.19)) is determined by the relative overlap of the probability density functions (vertical dashed and solid curves) for these two quantities. (b) A log prior with amplitude variations k times smaller than in (a) leads to the same choice probabilities as in (a), provided the noise level σ is also reduced by a factor k. (For simplicity, this analysis assumes a concavity preference of 0.5.)

We have chosen a value σ=2 dB to analyse human data, which is a generous approximation to the 1 dB assumed as a nominal value for the discrimination threshold for human judgements (Jaynes 2003). We anticipate that an analysis similar to that proposed by Jazayeri & Movshon (2006) based on Poisson statistics of individual model neurons would constrain the value of σ.

We note that this choice of discrimination parameter gives estimated lighting priors (figure 4), which are similar in shape to the von Mises distributions assumed in Mamassian & Landy (2001), but which are less localized than implied by the values of their estimated concentration parameters. This is consistent with our aim to use the prior of least commitment.

(d) Priors for other parameters

The method described for estimating perceptual priors can, in principle, be applied to a variety of other parameters. These include priors for low-level parameters (e.g. speed, direction, line orientation, colour, spectral illuminance), but could also be extended to high-order parameters (e.g. faces, words, syllables).

(e) More complex priors

In this study, we have just two variable parameters, light direction (θ) and the convexity/concavity (c) of a fixed shape, and there is no reason to expect these parameters to be correlated in the physical world. Hence, we were able to assume independence and factorize the joint prior p(θ,c)=p_θ(θ)p(c). This assumption was essential in order to make the estimation problem tractable, but it may not be justified in general.

A prior is just the re-scaled marginal distribution of a multivariate prior distribution. In this study, we have kept all parameters constant except light direction (θ) and the convexity/concavity (c) of a fixed shape. This implies that the prior we have estimated is the marginal of a two-dimensional joint distribution p(θ, c). Moreover, this joint distribution is itself a marginal distribution of a high-dimensional prior distribution with axes that include parameters such as shape, illuminance spectrum, multiple light sources, colour and stereo disparity. Had we the time and the means to find the light-from-above marginal of this high-dimensional prior distribution, it is possible that the result would be quite different.

4. Conclusion

If Helmholtz was correct in stating that perception is a form of ‘unconscious inference’ (von Helmholtz 1867), then this implies the existence of a posterior (which determines a perception), a likelihood function (the conditional probability of the retinal image) and a prior (the observer's expectations about the statistical structure of the visual world). Studies in computational neuroscience suggest that the visual system is adapted to the statistical structure of its physical environment (Olshausen & Field 2004). Moreover, this adaptation occurs over a range of time scales, and shapes the evolution of the visual system over generations, and the transfer functions of visual neurons over a matter of seconds (Rieke et al. 1996). Here, we have described a method for characterizing the prior for lighting direction. We anticipate that this method will be used to characterize many other priors used for perceptual inference.

Acknowledgments

Thanks to Stephen Isard for his useful discussions.

Appendix A. Experimental methods

(a) Participants

There were eight observers, in the age range of 21–26 years (mean age=22.7). Observers all gave their informed consent and were paid £5 sterling.

(b) Apparatus and procedure

The experiment was run in a dimly lit room. Stimuli were generated using the MatLab (v. 7.3.0 R2006b) and PsychToolbox (v. 3.0.8) (Pelli 1997). The observer viewed stimuli on a 17 inch TFT monitor, at a distance of 57 cm, using a chin rest. Each observer completed 576 trials in a morning and afternoon session (not on the same day), making a total of 1152 trials. Stimuli were presented in 16 blocks of 36 trials each. After each block of 36 trials, the observer was able to take a break. In each trial, a stimulus was presented with one of the discs marked with an ‘×’ in the outermost corner, as in figure 1. The observer's task was to indicate whether the marked disc appeared to be convex or concave by pressing one of two response keys. Each stimulus remained on the screen until the observer made a response, after which the screen went blank, and there was a pause of 0.5–1 s before the next stimulus appeared. Observers received no feedback.

The lighting direction adopted one of 36 directions ‘around the clock’, at intervals of 10°. For each lighting direction, the stimulus had two complementary configurations. In one configuration, the top left and bottom right discs were convex, whereas the top right and bottom left were concave, and in the complementary configuration it was the other way around. The reason for having two configurations per lighting direction was to ensure that each stimulus looked identical to its complementary configuration when lit from 180° further around the clock. Each disc position (e.g. top left) in each configuration was presented twice at each lighting direction, making a total of 1152 trials (i.e. 4 positions×2 configurations×2 repeats×36 light orientations×2 sessions).

Appendix B. Estimating lambda

Cross-validation consists of splitting each observer's data into two subsets, a training dataset s_train and a test dataset s_test. For each putative value of λ=λ_j, the training data s_train was used to estimate $\hat{q}$ (and therefore the prior) by minimizing E. Setting $q_{i} = {\hat{q}}_{i}$ in equation (2.23), E_f(s_test) was then evaluated using the test data s_test, which yields estimate of the likelihood of the test data for λ_j. This procedure is repeated over a range of values for λ_j, and the value of λ_j that minimizes E_f(s_test) is taken to be $\hat{λ}$ . In order to obtain a robust estimate for $\hat{λ}$ , this whole procedure was repeated using four runs, as follows. Initially, the data were split into four subsets. On each run, three subsets were combined to make the training set, and the remaining subset was used as the test set. Each of the four subsets took its turn as the test set on exactly one run, with the remaining three subsets being used as the training set. Each run yielded a curve for E_f(s_test) as a function of λ, and these four curves were averaged. The value of λ corresponding to the minimum of this average curve was taken to be $\hat{λ}$ for a single observer. The value of σ was set to σ=1 for the simulated observer and to σ=2 for human observers.

Appendix C. The mean vector

The mean vector is the mean of the estimated prior distribution. The direction of this vector indicates the direction of the bias (anisotropy) in the prior and its length shows the amount of bias. The x and y components of the mean vector are $x = \sum_{θ} p_{θ} (θ) cos θ$ and $y = \sum_{θ} p_{θ} (θ) sin θ$ , respectively.

References

Adams W., Graf E., Ernst M. Experience can change the ‘light-from-above’ prior. Nat. Neurosci. 2004;7:1057–1058. doi: 10.1038/nn1312. doi:10.1038/nn1312 [DOI] [PubMed] [Google Scholar]
Bishop C. Oxford University Press; Oxford, UK: 1996. Neural networks for pattern recognition. [Google Scholar]
Brewster D. On the conversion of relief by inverted vision. Edinb. Phil. Trans. 1847;15:657. [Google Scholar]
Fisher N.I. Cambridge University Press; Cambridge, UK: 1995. Statistical analysis of circular data. [Google Scholar]
Gold J., Shadlen M. Neural computations that underlie decisions about sensory stimuli. Trends Cogn. Sci. 2001;5:10–16. doi: 10.1016/s1364-6613(00)01567-9. doi:10.1016/S1364-6613(00)01567-9 [DOI] [PubMed] [Google Scholar]
Green D.M., Swets J.A. Wiley; New York, NY: 1966. Signal detection theory and psychophysics. [Google Scholar]
Jaynes E. Cambridge University Press; Cambridge, UK: 2003. Probability theory: the logic of science. [Google Scholar]
Jazayeri M., Movshon J. Optimal representation of sensory information by neural populations. Nat. Neurosci. 2006;9:690–696. doi: 10.1038/nn1691. doi:10.1038/nn1691 [DOI] [PubMed] [Google Scholar]
Kalman R.E. A new approach to linear filtering and prediction problems. Trans. ASME J. Basic Eng. 1960;82:35–45. [Google Scholar]
Kleffner D.A., Ramachandran V.S. On the perception of shape from shading. Percept. Psychophys. 1992;52:18–36. doi: 10.3758/bf03206757. [DOI] [PubMed] [Google Scholar]
Körding K., Wolpert M. Bayesian integration in sensorimotor learning. Nature. 2004;427:244–247. doi: 10.1038/nature02169. doi:10.1038/nature02169 [DOI] [PubMed] [Google Scholar]
Mamassian P., Goutcher R. Prior knowledge on the ilumination position. Cognition. 2001;81:B1–B9. doi: 10.1016/s0010-0277(01)00116-0. doi:10.1016/S0010-0277(01)00116-0 [DOI] [PubMed] [Google Scholar]
Mamassian P., Landy M.S. Interaction of visual prior constraints. Vision Res. 2001;41:2653–2668. doi: 10.1016/s0042-6989(01)00147-x. doi:10.1016/S0042-6989(01)00147-X [DOI] [PubMed] [Google Scholar]
Mele A., Rawling P. Oxford University Press; Oxford, UK: 2004. The Oxford handbook of rationality. [Google Scholar]
Olshausen B., Field D. Sparse coding of sensory inputs. Curr. Opin. Neurobiol. 2004;14:481–487. doi: 10.1016/j.conb.2004.07.007. doi:10.1016/j.conb.2004.07.007 [DOI] [PubMed] [Google Scholar]
Oppel J.J. Uber ein anaglyptoskop. Poggendorffs Annalen der Physik und Chemie. 1856;99:466–469. doi:10.1002/andp.18561751108 [Google Scholar]
Paninski, L. 2006 Nonparametric inference of prior probabilities from Bayes-optimal behavior. In Advances in neural information processing systems, vol. 18 (eds Y. Weiss, B. Schölkopf & J. Platt), pp. 1067–1074. Cambridge, MA: MIT Press.
Pelli D. The videotoolbox software for visual psychophysics: transforming numbers into movies. Spat. Vis. 1997;10:437–442. doi:10.1163/156856897X00366 [PubMed] [Google Scholar]
Poggio T., Torre V., Koch C. Computational vision and regularization theory. Nature. 1985;317:314–319. doi: 10.1038/317314a0. doi:10.1038/317314a0 [DOI] [PubMed] [Google Scholar]
Rieke F., Warland D., Van Steveninck R., Bialek W. MIT Press; Cambridge, MA: 1996. Spikes: exploring the neural code, a Bradford book. [Google Scholar]
Rittenhouse D. Explanation of an optical deception. Trans. Am. Philos. Soc. 1786;9:578–585. [Google Scholar]
Stocker A., Simoncelli E. Noise characteristics and prior expectations in human visual speed perception. Nat. Neurosci. 2006a;9:578–585. doi: 10.1038/nn1669. doi:10.1038/nn1669 [DOI] [PubMed] [Google Scholar]
Stocker, A. & Simoncelli, E. 2006b Sensory adaptation within a Bayesian framework for perception. In Advances in Neural Information Processing Systems Conference, vol. 18, pp. 1291–1298.
Sun J., Perona J. Where is the Sun? Nat. Neurosci. 1998;1:183–184. doi: 10.1038/630. doi:10.1038/630 [DOI] [PubMed] [Google Scholar]
von Helmholtz H. Voss; Leipzig, Germany: 1867. Handbuch der physiologischen Optik. [Google Scholar]
Weiss Y., Simoncelli E., Adelson E. Motion illusions as optimal percepts. Nat. Neurosci. 2002;5:598–604. doi: 10.1038/nn0602-858. doi:10.1038/nn0602-858 [DOI] [PubMed] [Google Scholar]

[bib1] Adams W., Graf E., Ernst M. Experience can change the ‘light-from-above’ prior. Nat. Neurosci. 2004;7:1057–1058. doi: 10.1038/nn1312. doi:10.1038/nn1312 [DOI] [PubMed] [Google Scholar]

[bib2] Bishop C. Oxford University Press; Oxford, UK: 1996. Neural networks for pattern recognition. [Google Scholar]

[bib3] Brewster D. On the conversion of relief by inverted vision. Edinb. Phil. Trans. 1847;15:657. [Google Scholar]

[bib5] Fisher N.I. Cambridge University Press; Cambridge, UK: 1995. Statistical analysis of circular data. [Google Scholar]

[bib6] Gold J., Shadlen M. Neural computations that underlie decisions about sensory stimuli. Trends Cogn. Sci. 2001;5:10–16. doi: 10.1016/s1364-6613(00)01567-9. doi:10.1016/S1364-6613(00)01567-9 [DOI] [PubMed] [Google Scholar]

[bib4] Green D.M., Swets J.A. Wiley; New York, NY: 1966. Signal detection theory and psychophysics. [Google Scholar]

[bib8] Jaynes E. Cambridge University Press; Cambridge, UK: 2003. Probability theory: the logic of science. [Google Scholar]

[bib9] Jazayeri M., Movshon J. Optimal representation of sensory information by neural populations. Nat. Neurosci. 2006;9:690–696. doi: 10.1038/nn1691. doi:10.1038/nn1691 [DOI] [PubMed] [Google Scholar]

[bib10] Kalman R.E. A new approach to linear filtering and prediction problems. Trans. ASME J. Basic Eng. 1960;82:35–45. [Google Scholar]

[bib11] Kleffner D.A., Ramachandran V.S. On the perception of shape from shading. Percept. Psychophys. 1992;52:18–36. doi: 10.3758/bf03206757. [DOI] [PubMed] [Google Scholar]

[bib12] Körding K., Wolpert M. Bayesian integration in sensorimotor learning. Nature. 2004;427:244–247. doi: 10.1038/nature02169. doi:10.1038/nature02169 [DOI] [PubMed] [Google Scholar]

[bib14] Mamassian P., Goutcher R. Prior knowledge on the ilumination position. Cognition. 2001;81:B1–B9. doi: 10.1016/s0010-0277(01)00116-0. doi:10.1016/S0010-0277(01)00116-0 [DOI] [PubMed] [Google Scholar]

[bib13] Mamassian P., Landy M.S. Interaction of visual prior constraints. Vision Res. 2001;41:2653–2668. doi: 10.1016/s0042-6989(01)00147-x. doi:10.1016/S0042-6989(01)00147-X [DOI] [PubMed] [Google Scholar]

[bib15] Mele A., Rawling P. Oxford University Press; Oxford, UK: 2004. The Oxford handbook of rationality. [Google Scholar]

[bib16] Olshausen B., Field D. Sparse coding of sensory inputs. Curr. Opin. Neurobiol. 2004;14:481–487. doi: 10.1016/j.conb.2004.07.007. doi:10.1016/j.conb.2004.07.007 [DOI] [PubMed] [Google Scholar]

[bib17] Oppel J.J. Uber ein anaglyptoskop. Poggendorffs Annalen der Physik und Chemie. 1856;99:466–469. doi:10.1002/andp.18561751108 [Google Scholar]

[bib18] Paninski, L. 2006 Nonparametric inference of prior probabilities from Bayes-optimal behavior. In Advances in neural information processing systems, vol. 18 (eds Y. Weiss, B. Schölkopf & J. Platt), pp. 1067–1074. Cambridge, MA: MIT Press.

[bib19] Pelli D. The videotoolbox software for visual psychophysics: transforming numbers into movies. Spat. Vis. 1997;10:437–442. doi:10.1163/156856897X00366 [PubMed] [Google Scholar]

[bib20] Poggio T., Torre V., Koch C. Computational vision and regularization theory. Nature. 1985;317:314–319. doi: 10.1038/317314a0. doi:10.1038/317314a0 [DOI] [PubMed] [Google Scholar]

[bib21] Rieke F., Warland D., Van Steveninck R., Bialek W. MIT Press; Cambridge, MA: 1996. Spikes: exploring the neural code, a Bradford book. [Google Scholar]

[bib22] Rittenhouse D. Explanation of an optical deception. Trans. Am. Philos. Soc. 1786;9:578–585. [Google Scholar]

[bib23] Stocker A., Simoncelli E. Noise characteristics and prior expectations in human visual speed perception. Nat. Neurosci. 2006a;9:578–585. doi: 10.1038/nn1669. doi:10.1038/nn1669 [DOI] [PubMed] [Google Scholar]

[bib24] Stocker, A. & Simoncelli, E. 2006b Sensory adaptation within a Bayesian framework for perception. In Advances in Neural Information Processing Systems Conference, vol. 18, pp. 1291–1298.

[bib25] Sun J., Perona J. Where is the Sun? Nat. Neurosci. 1998;1:183–184. doi: 10.1038/630. doi:10.1038/630 [DOI] [PubMed] [Google Scholar]

[bib7] von Helmholtz H. Voss; Leipzig, Germany: 1867. Handbuch der physiologischen Optik. [Google Scholar]

[bib26] Weiss Y., Simoncelli E., Adelson E. Motion illusions as optimal percepts. Nat. Neurosci. 2002;5:598–604. doi: 10.1038/nn0602-858. doi:10.1038/nn0602-858 [DOI] [PubMed] [Google Scholar]

PERMALINK

Where is the light? Bayesian perceptual priors for lighting direction

JV Stone

IS Kerrigan

J Porrill

Abstract

1. Introduction

Figure 1.

Figure 2.