de Gardelle and Summerfield (1) claimed that, when judging the mean of a set of stimuli, it is optimal to downweight outliers, and that human subjects follow this robust averaging strategy. Here, we show that, in their task, the optimal observer would equally weight all observations. In ref. 1, subjects were presented with a set of eight colors that are denoted by a vector x = (x1, …, x8) and drawn independently from a Gaussian distribution on a red–blue color axis with variance σ2 and mean of either μ (blue) or −μ (red). On a given trial, μ was set randomly to one of two values, and σ2 was set randomly to one of three values. The subject indicated whether the mean was blue (C = 1) or red (C = −1). When the prior probabilities are equal, the optimal decision is based on the likelihoods of both options [i.e., p(x|C = 1) and p(x|C=−1)]. Because μ and σ2 are unknown to the observer, the optimal observer computes these likelihoods by averaging (marginalizing) over all six possibilities (Eq. 1):
where in the last step, we have used the conditional independence of the observations given μ and σ2. The work by de Gardelle and Summerfield (1), however, computed the likelihoods by first factorizing and then marginalizing (Eq. 2):
The first step in Eq. 2 is a mathematical mistake, because the observations are only independent when conditioned on μ and σ2. They strongly covary otherwise, because the same μ and σ2 are used for all observations on a given trial. de Gardelle and Summerfield recognized this mistake in their SI Methods but failed to realize that their main model prediction was a direct consequence of it. Indeed, when we simulate decisions based on the incorrect likelihood (Eq. 2) and then perform logistic regression as in ref. 1, we find downweighting of observations xi with a larger magnitude (Fig. 1, dashed), consistent with the model predictions in ref. 1. By contrast, the decision based on the correct likelihood (Eq. 1) is to report blue whenever Σi xi is positive (i.e., a simple averaging rule). Thus, the optimal observer equally weights all observations (Fig. 1, solid). The moral of the story is that not applying marginalization and conditional independence rules in the correct order can have severe consequences.
The experimental data presented in ref. 1, if reliable, should be taken as evidence against the hypothesis that humans accumulate evidence optimally in this task. That being said, robust averaging is a known concept in perception (2) and can be optimal when the observer considers multiple possible generative processes for the data (3). It remains to be seen whether this explanation applies to the current data. An alternative explanation could be that the stimulus spaces used in ref. 1 were not perceptually uniform.
Footnotes
The authors declare no conflict of interest.
References
- 1.de Gardelle V, Summerfield C. Robust averaging during perceptual judgment. Proc Natl Acad Sci USA. 2011;108:13341–13346. doi: 10.1073/pnas.1104517108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Landy MS, Maloney LT, Johnston EB, Young M. Measurement and modeling of depth cue combination: in defense of weak fusion. Vision Res. 1995;35:389–412. doi: 10.1016/0042-6989(94)00176-m. [DOI] [PubMed] [Google Scholar]
- 3.Knill DC. Mixture models and the probabilistic structure of depth cues. Vision Res. 2003;43:831–854. doi: 10.1016/s0042-6989(03)00003-8. [DOI] [PubMed] [Google Scholar]