Probabilistic population codes for Bayesian decision making

Jeffrey M Beck 1,7, Wei Ji Ma 1,2,7, Roozbeh Kiani 3, Tim Hanks 3, Anne K Churchland 3, Jamie Roitman 4, Michael N Shadlen 3, Peter E Latham 5, Alexandre Pouget 1,6
When making a decision, one must first accumulate evidence, often over time, and then select the appropriate action. Here, we present a neural model of decision making that can perform both evidence accumulation and action selection optimally. More specifically, we show that, given a Poisson-like distribution of spike counts, biological neural networks can accumulate evidence without loss of information through linear integration of neural activity, and can select the most likely action through attractor dynamics. This holds for arbitrary correlations, any tuning curves, continuous and discrete variables, and sensory evidence whose reliability varies over time. Our model predicts that the neurons in the lateral intraparietal cortex involved in evidence accumulation encode, on every trial, a probability distribution which predicts the animal’s performance. We present experimental evidence consistent with this prediction, and discuss other predictions applicable to more general settings.

Decision making affects all aspects of human behavior, on time scales varying from seconds to hours to days. For instance, imagine you are driving your car towards a busy intersection and your brakes fail. Within a few hundred milliseconds, you have to decide where to steer your car. Although this is a task we handle relatively easily, in fact it involves three separate, and nontrivial, stages. First, sensory evidence must be accumulated over time. Here, the sensory evidence consists of the image of cars and people in the intersection. Second, the accumulation must be stopped at some point (waiting too long can have disastrous consequences in this situation). Third, an action must be selected. This task is difficult because the sensory evidence and the response are continuous variables, the reliability of the sensory evidence is a priori unknown, and it can vary greatly over time. For instance, as you get closer to the intersection, your ability to distinguish different objects improves. The reliability of the visual information can also vary from day to day: it is much easier to analyze the scene on a sunny day than on a foggy one.

There is currently no neural model that can deal with this type of decision optimally, where by optimal, we mean that the accumulation of evidence is done without loss of information and that the chosen option is the most likely one given the sensory evidence (we do not address the issue of when to make the decision; see Discussion). Yet, it is essential to understand optimal decision making in the face of multiple choices and unknown and time-varying reliability, since most decisions we make fall into this category. Most models are concerned only with binary decision making, and even with this limitation, cannot deal optimally with sensory evidence of unknown and continuously changing reliability. This problem is conceptual: these models have no clear probabilistic interpretation or, when they do, are limited to situations in which the evidence has a constant and known reliability over time and over trials. As a result, it is unclear how, or even if, they are related to the general case we consider in this paper.

Here we present the first neural model of decision making that performs sensory evidence accumulation and response selection optimally when there are multiple or a continuum of possible decisions and the reliability of the sensory input varies over time or across trials. This model is built around the observation that spike counts in the brain are close to what we call ‘Poisson-like’ (Ma et al., 2006; Shadlen and Newsome, 1998; Tolhurst et al., 1982). Given this observation, our main contributions are twofold. First, we show that for Poisson-like distributions, optimal evidence accumulation can be performed through simple integration of neural activities, while optimal response selection can be implemented through attractor dynamics. Second, we show (again for Poisson-like distributions of neural activity) that neurons encode the posterior probability distribution over the variables of interest at all times. This latter contribution has far-reaching implications, since it suggests that neurons implicated in simple perceptual decisions represent quantities that are directly relevant to inference, confidence, and belief.

When accompanied by a termination rule, our model, like a number of others (Ditterich et al., 2003; Gold and Shadlen, 2007; Laming, 1968; Link, 1992; Link and Heath, 1975; Mazurek et al., 2003; Ratcliff and Rouder, 1998; Reddi and Carpenter, 2000; Smith and Ratcliff, 2004; Stone, 1960; Usher and McClelland, 2001; Wald, 1947; Wald and Wolfowitz, 1947) accounts for the speed-accuracy trade-off reported in humans and monkeys for binary choices. However, it goes beyond previous neural models in three ways. First, it captures the speed-accuracy trade-off and the physiology of LIP cells in experiments involving four choices. Second, as previously indicated, it predicts that neural activity in LIP encodes a probability distribution over actions. This is a new prediction about the response of LIP neurons, which we have tested and verified using data from area LIP recorded while monkeys engaged in a decision among two or four alternatives. Third, it makes predictions for the responses of cells in LIP and SC when there are multiple choices, a continuum of choices, and when the reliability of the cue varies over time.

Task and model architecture

For concreteness, we consider a motion direction task that has been extensively used to study decision making in humans and animals. In this task, an observer sees a random-dot kinematogram in which a fraction of the dots move coherently in a particular direction while all the other dots move randomly (Fig. 1a). The task of the observer is to report the direction of motion with a saccadic eye movement to a choice target that is associated with that direction of motion. The reliability of the sensory evidence can be controlled by changing the percentage of dots moving coherently. In most experiments, this task is restricted to binary decision making (right vs left) and constant coherence over the course of a trial. We also consider a more general setting in which the mean direction of moving dots and the direction of the saccade can take any value (Fig. 1b) and the reliability of the motion information (the coherence) can vary not only across trials, but also during a trial.

Figure 1.

Figure 1

Task and network architecture. a. Binary decision making. The subject must decide whether the dots are moving to the right or to the left. Only a fraction of the dot are moving to the right or the left coherently (black arrows). The other dots move in random directions. The animal indicates its response by moving its eyes in the perceived direction (green arrow). b. Continuous decision making, for which the dots can move in any direction. The animal responds by making a saccade to the outside circle in the perceived direction. c. Network architecture. The network consists of three interconnected layers of neurons with Gaussian tuning curves. In MT, the tuning curves are for direction of motion, while in LIP and SCb, the tuning curves are for saccade direction. The layers differ by their connectivity and dynamics. The LIP neurons have long time constant (1s), allowing them to integrate their input, and lateral connections, allowing them to implement short range excitation and long range inhibition. The SCb layer forms an attractor network, for which smooth hills of activity are stable regardless of their position. The blue dots indicate representative patterns of activity 200 ms into a trial for the MT and LIP layer, and at the end of the trial for the SCb layer.

A minimal model of this task (and, in fact, any decision-making task that involves integrating evidence over time) requires three distinct populations of neurons: an input layer, an evidence accumulation layer, and a read-out layer where motor output is generated (Fig. 1c). Here we label these MT (middle temporal), LIP (lateral intraparietal), and SCb (superior colliculus, in particular those cells that exhibit a motor burst; hence the index ‘b’), based on what is known about the functions of these regions. These labels are used for convenience only: it is quite likely that the sensory integration involves many other cells beside the ones in LIP, and that the motor burst is not generated solely in the SC.

Bayesian formulation

We denote the population activity of M neurons in area MT at time tn by a vector rMT(tn) (see Fig. 1c for an example), where rMT{r1MT,,rMMT} and riMT(tn) is the spike count of neuron i in the time interval [(n−1) δt, nδt]. In our simulations we set δt to 50ms, although our results are insensitive to that choice.

The stimulus is characterized by a direction of motion, s, and task-irrelevant variables such as contrast and motion coherence, which we refer to as nuisance parameters and collectively denote c (where c={c(t1),c(t2),…,c(tN)}). When a stimulus (s,c) is presented, MT generates a series of patterns of activity over time, denoted rMT(t1:tN) ≡ {rMT(t1),…, rMT(tN)}. Because of neural variability, rMT(t1:tN) is not the same on every presentation of (s,c), but follows a probability distribution p(rMT(t1:tN)|s,c). If we assume that the activity is uncorrelated on timescales of 50 ms, this distribution can be written as a product over time,


Given a series of activity patterns rMT(t1:tN) and assuming that one knows c, the optimal strategy for inferring the direction of motion is to apply Bayes’ rule to compute a probability distribution over s, given rMT(t1:tN). If the prior on s is flat, this so-called posterior distribution is given by

p(srMT(t1:tN),c)n=1Np(rMT(tn)s,c)p(rMT(tn)c). (1)

This distribution captures everything there is to know about s given all the data from MT since the beginning of the trial, and as such, it retains all the information in the MT activity. Therefore, if the brain uses a Bayesian approach to decision making, the goal of the accumulation layer (LIP) should be to generate a pattern of activity at time tn that encodes this distribution (Eq. 1). An even better solution would be to encode a posterior distribution p(s|rMT(t1:tN)) that does not depend on c, the nuisance parameters (or in the jargon of probabilistic inference, a posterior in which c has been marginalized out; p( s | rMT) = ∫dcp (s | rMT,c)p(c | rMT)). This would allow downstream areas to perform optimal computations over LIP activity without having to estimate the nuisance parameters. In other words, we should seek a set of feedforward connections between MT and LIP, and lateral connections within LIP, such that

p(srLIP(tN))=p(srMT(t1:tN)). (2)

It is critical to note that the approach we have just outlined requires that neural responses in MT and LIP represent probability distributions. In MT, rMT(tn) represents p(s|rMT(tn)), which is obtained from the response distribution, p(rMT(tn)|s) (sometimes called the noise distribution) through Bayes’ rule: p(s|rMT(tn)) ∝ p(rMT(tn)|s), (we are assuming a flat prior over s for this encoding step; non-flat priors can be incorporated into our approach, but are not central to the current argument). The same idea also applies to LIP. We refer to populations that represent probability distributions in this way as probabilistic population codes(Ma et al., 2006). The existence of such codes is central to our approach: neurons represent probability distributions via Bayes’ rule and, as a result, neural computations, such as accumulation of evidence, can be optimized by tailoring neural operations to the encoded distributions.

Once the accumulation is stopped, an action must be selected. The optimal strategy under many reasonable cost functions is to choose the action corresponding to the most likely stimulus. This value, denoted ŝ, is given by

s^=argmaxsp(srMT(t1:tstop)), (3)

where tstop is the stopping time. Note that, for simplicity, we use the same variable s to refer to both the direction of motion and the direction of a saccade, since they are indistinguishable in this experiment.

In a minimal optimal network, the third layer should encode the estimate, ŝ, as a stereotyped motor command. This is a task for which attractor networks are ideally suited, because they can take a noisy hill of activity as input and produce a smooth hill of stereotyped shape and height as output (Zhang, 1996) (see top layer in Fig. 1c). Stereotyped hills like these are observed, for instance, in the motor layer of the superior colliculus (SC), where the position of the peak of a hill determines the direction and amplitude of the upcoming saccade(Lee et al., 1988). The fact that the hill is smooth in Fig 1c might appear unrealistic but see the Supplementary Information for why this is in fact not a significant concern.

The question we address in the rest of the paper is how to implement optimal accumulation (Eq. (2)) and optimal response selection (Eq. (3)) in neural hardware.

Optimal network

Not surprisingly, the network connectivity needed to achieve optimality depends strongly on how the information about the stimulus is represented in MT, which in turn depends on the structure of the neuronal variability. Here we assume that the variability in MT conditioned on the value of a stimulus belongs to the exponential family with linear sufficient statistics (Ma et al., 2006). This choice is a natural one, since it is consistent with experimental measurements in a wide range of cortical areas(Ma et al., 2006). Specifically, we assume that p(rMT(tn)|s, c(tn)) has the form

p(rMT(tn)s,c(tn))=Φ(rMT(tn),c(tn))exp(h(s)·rMT(tn)) (4)

where Φ(rMT(tn), c(tn) is an arbitrary function of rMT(tn) and c(tn), and “·” is the standard dot product: h(srMT(tn) = Σihi(s)riMT(tn). Note that the nuisance parameter, c(tn), does not appear in the kernel h. In the rest of the paper, we refer to distributions with the property that h depends only on s as ‘Poisson-like’.

Independent Poisson variability is a special case of the Poisson-like family, with hi(s) being the log of the tuning curve of neuron i. Importantly, correlated neuronal responses (as observed in the brain) are also in the Poisson-like family, although there are restrictions on the nuisance parameters (Ma et al., 2006). These restrictions arise because h(s) is not independent of the tuning curves and the covariance matrix, but is related via

h(s)=1(s,c(tn))f(s,c(tn)) (5)

where f(s,c(tn)) is the tuning curve (the mean of r as a function of s), a prime denotes a derivative with respect to s, and Σ(s,c(tn)) is the covariance matrix of r. Since the right-hand side of Eq. (5) depends on c(tn) and the left-hand side does not, satisfying this equation is not trivial. There is, however, a rather natural condition under which it is satisfied: c(tn) is contrast(Sclar and Freeman, 1982). This is because contrast has a multiplicative effect on both tuning curves(Anderson et al., 2000; Sclar and Freeman, 1982) and covariance(Gershon et al., 1998; Kohn and Smith, 2005; Tolhurst et al., 1982), so f′(s,c(tn)) is proportional to some monotonic function g(c(tn)), Σ−1(s,c(tn)), is proportional to 1/g(c(tn)), and thus the c(tn)-dependence disappears from the right-hand side. (This is also the case when c(tn) is stimulus intensity for tactile stimuli). Whether or not Eq. (5) is satisfied for other nuisance parameters must be checked on a case-by-case basis.

If the activity in MT satisfies Eq. (4), then we can insert Eq. (4) into Eq. (1), and we see that the right hand side is independent of c. Thus, the probability of the stimulus given the entire history of MT activity is given by

p(srMT(t1:tN))exp(h(s)·n=1NrMT(tn)) (6)

where the constant of proportionality depends on MT activity but not on s. Consequently, when the prior is flat, we see that Eq. (2) is satisfied if the LIP activity is constructed by simply adding the MT activity,

rLIP(tN)=n=1NrMT(tn). (7)

The problem with this simple summing operation, however, is that LIP activity would saturate very quickly. Fortunately, it is possible to show that global inhibition can be used to alleviate this problem while preserving optimality (see Supplementary Note).

Finally, to guarantee that the SCb activity peaks at the optimal location (at ŝ in Eq. (3) and in Fig. 1c), we must introduce recurrent connectivity so that the SCb layer can support a hill of activity without input. In addition, input to the SCb must be gated, so that it receives no input until decision time. Once a decision is made, the instantaneous activity in the LIP layer is used to initialize the SCb activity. After initialization, the LIP activity is removed, and the SCb layer evolves under its own dynamics. As shown in the Supplementary Note, if the neuronal variability is Poisson-like in LIP, the SCb layer peaks at ŝ when

vSC(s)h(s) (8)

where v†SC(s) is the left null eigenvector of the Jacobian evaluated on the attractor implemented by the SCb and h′(s) is the same function that appears in Eq. (5). Importantly, v†SC(s) can be tuned to satisfy Eq. (8) by adjusting network parameters such as the weights of lateral connections in SCb. Consequently, when the neuronal variability is Poisson-like in LIP, there exists a set of parameters for which the superior colliculus generates a maximum-likelihood estimate.

Note that if the variability in LIP is not Poisson-like, attractor dynamics is no longer guaranteed to be optimal. In fact, there is no known optimal network for most distributions. It is therefore quite remarkable that, of all distributions, the cortex appears to exhibit those for which attractor dynamics can be tuned to be optimal.

Implications of optimality

There are several important features of our network that are somewhat hidden by the above analysis. First, if the neurons are Poisson-like, Eq. (7) leads to optimal accumulation of evidence (i.e., Eq. (2) is satisfied) even when the reliability of the sensory information varies from trial to trial, or over the course of a single trial. This might sound counterintuitive at first. Consider, for example, an image whose contrast increases over time. Since the data become progressively more reliable, the decision should be based more strongly on the information acquired at later times. A way to implement this would be to boost the weights from MT onto LIP as the contrast increases. However, this reweighting would have to be done on very short timescales and would require a constantly updated and reliable estimate of contrast. With Poisson-like variability, there is no need to reweight the input over time, because both MT and LIP represent probability distributions at all times and in a manner which is invariant to the value of contrast. This is easy to see in the case of contrast: as contrast increases, the reliability of the sensory evidence increases, but so does the amplitude of the population activity in MT. Since the MT activity is added on top of LIP activity, its impact scales with its amplitude, and therefore, in proportion to its reliability.

A second feature of our network is that the reliability of the data (encoded in the nuisance parameters, c) plays no role in estimating the stimulus, in the sense that even if we knew c our estimate of the posterior over the stimulus would not improve. This is a strong result, and one that is highly unusual in Bayesian inference. Much more typical is that the nuisance parameters are either estimated, or integrated out of the posterior, both of which introduce additional uncertainty in the inference process. The ramification of this is that, assuming no loss of information in processing after the SCb layer, the posterior in LIP exactly reflects the behavioral performance. For instance, if the posterior in LIP on a given trial is Gaussian with a standard deviation of 10° at decision time, and the decision involves computing the maximum-likelihood estimate, the discrimination threshold of the animal should be around 10° as well, across multiple trials of the same type. As we will discuss later, this prediction can be tested with existing data. Importantly, this prediction does not apply to the SCb layer in our model: instead, variability in this region, estimated on a trial-by-trial basis, would encode the motor error for the saccadic eye movement.

Evidence accumulation: Simulation results

We have shown so far that if the responses of MT neurons have Poisson-like statistics, optimal evidence accumulation can be performed by adding spikes over time, and optimal action selection can be performed with a single attractor network. Importantly, the attractor network can extract the maximum-likelihood estimate of the stimulus, s, without any need to know either the nuisance parameters, c, or how much time has elapsed since the start of the trial(Shadlen et al., 2006b).

These results are important but they are based on assumptions that are not necessarily exactly true in vivo. For instance, real neurons do not simply add spikes over time. Moreover, the response of MT neurons to random-dot kinematograms may not be exactly Poisson-like (in fact, it is not exactly Poisson-like according to the current models of MT; see Supplementary Note). It is therefore essential that we test our theory in biologically realistic networks. In particular, we want to address two critical questions in the simulated network: 1. Does the LIP layer accumulate evidence optimally? 2. Can a single attractor network extract the maximum-likelihood estimate from LIP activity, for all coherences and at all times?

For these simulations, we use a network similar to the one depicted in Fig. 1c. For the LIP layer, we use linear-nonlinear-Poisson (LNP) neurons(Plesser and Gerstner, 2000) with a long time constant (1 s)(Renart et al., 2003). We use LNP neurons because they provide a good approximation to real neurons, while producing spikes with realistic count statistics close to the exponential family(Paninski, 2004; Plesser and Gerstner, 2000). The LIP layer receives spatially correlated spike trains from area MT(Britten et al., 1993; Zohary et al., 1994). The feedforward connections from MT to LIP, which are purely excitatory, connect neurons with similar direction preferences using a Gaussian weighting profile. The LIP layer also has lateral connections with short-range excitation and long-range inhibition; i.e., the weights are excitatory between neurons with similar preferred directions and inhibitory otherwise. The inhibition is used to prevent saturation. Finally, rather than constructing the SCb network, we make use of the fact that the line attractor implements a local linear estimator which can be tuned to be optimal(Deneve et al., 1999; Latham et al., 2003) (see Supplementary Note).

We focus first on the binary case, for which dots move at either 0° or 180°. Figure 2a shows the average activity in the LIP layer over time for a stimulus moving at 180°. The average posterior distribution encoded by these activity patterns is illustrated in Fig. 2b. As expected, the probability corresponding to 180° grows over time while the probability corresponding to 0° decreases. In Fig. 2c, we show firing rate versus time for all coherences and for the neurons optimally tuned to 180° and 0°. The neurons in the model behave quantitatively like actual LIP neurons, as can be seen in Fig. 2d (data from (Roitman and Shadlen, 2002)).

Figure 2.

Figure 2

Binary decision making (as illustrated in Fig. 1a). Panels a–c: model; panel d: data. a Firing rate in LIP at four different times for a coherence of 51.2%. The direction of the moving dots is 180°. b. Probability distributions encoded by the firing rates shown in a averaged over 1000 trials. As expected, the probability of the 180° direction goes up while the probability of the 0° direction goes down. c. Firing rate over time for two units tuned to 180° (solid line) and 0° (dotted lines) for 6 different level of coherence. These averages were obtained over trials for which the model’s choice was 180°. d. Same as in b but for actual neurons in LIP (N= 45). Data from Roitman and Shadlen, 2002. The model and the data show similar trends.

To determine whether the LIP layer accumulates evidence optimally, we first consider experiments in which coherence is fixed within a trial. Here, we take “optimal” to mean that when LIP updates its estimate of the direction of motion of the moving dots, it takes into account both its own uncertainty about direction and the uncertainty in MT. From a quantitative point of view, this implies that the expected log odds of making a correct choice (log[pcorrect/(1−pcorrect)] where pcorrect is the probability of making a correct choice) grows linearly with time, because the evidence is provided at a constant rate (see Supplementary Note). Moreover, the slope should increase with coherence, and if the coherence changes during the trial, so should the slope.

Fig. 3a shows that the log odds do indeed grow linearly with time, and the larger the coherence, the faster it increases. Furthermore, if we double or quadruple the coherence at time t=100 ms, the slope of the log odds changes to the correct slope within 100 ms (Fig. 3a, dotted lines).

Figure 3.

Figure 3

Log odds and Fisher information as a function of time. The origin (t=0) on all plots corresponds to the start of the integration of evidence which about 220 ms after stimulus onset in the experimental data. Panels a–c: model; panel d: data. a. Log odds for a binary decision as a function of time for four different levels of coherence (solid lines). Blue and black dotted lines: the coherence increases to 51.2% at t=100 ms. After 100 ms, the slope matches the 51.2% coherence trials, as expected if the model is Bayes optimal. b. Fisher information as a function of time for continuous decision making (as in Fig. 1b). Fisher information rises linearly with time, with higher slopes for higher coherences, as expected for Bayes optimality. Dotted Line: trial in which the coherence increases from 25.6% to 51.2%. In both a and b, the kink at t=50 ms is due to the discretization of time. c. Square: Fisher information estimated by a single local linear estimator across all times and all coherences. Circles: Fisher information estimated by a local optimal estimator trained separately for each time and each coherence. Dotted lines: for each coherence, the upper line corresponds to the information estimated from the training set, while the lower trace is the information obtained from the testing set. The solid line is the average of the upper and lower dotted lines. The fact that both estimators return similar values of Fisher information shows that decoding LIP can be done nearly optimally without any knowledge of time or coherence. Green line: trials in which the coherence starts at 25.6% and then switches to 51.2% at 100ms. d. Same as in a but for actual LIP neurons (N=45, Data from Roitman and Shadlen, 2002). The results are quantitatively similar to the model. The y-axis is arbitrary up to a multiplicative factor and a DC offset.

We repeated these simulations for the continuous case, where the stimulus can move in any direction. Figure 4a shows the time evolution of the firing rate in LIP and Fig. 4b shows the average posterior distributions encoded by this activity. As evidence accumulates in favor of 180°, the activity at 180° increases and the probability distribution becomes narrower. To determine whether this accumulation process is optimal, we can run the same test as in the binary case, except this time we use the average of the inverse of the variance of the posterior distributions (the Fisher information(Papoulis, 1991)) rather than the log odds. Fig. 3b shows that, indeed, Fisher information increases linearly with time and the slope is an increasing function of coherence. Furthermore, when the coherence increases during the trial so does the slope.

Figure 4.

Figure 4

Continuous decision making (as illustrated in Fig. 1b). a. Firing rates of model neurons in LIP at four different times for a coherence of 51.2%. The direction of the moving dots is 180°. b. Probability distributions encoded by the firing rates shown in a averaged over 1000 trials. As expected, the peak of the distribution is close to 180° and the variance of the distribution decreases over time.

We now turn to the second question: can the maximum-likelihood estimate be computed from LIP activity, for all coherences and at all times, with a single attractor network? Because attractor networks are mathematically equivalent to local linear estimators(Deneve et al., 1999), this question can be rephrased as: is the performance of a single local linear estimator similar to the performances of a family of estimators, each specialized for one time and one coherence? Figure 3c shows that the Fisher information recovered by the specialized linear estimators is indeed very similar to the information recovered by a single one, hence demonstrating that a single attractor network can optimally decode LIP for all coherences and at all times.

Finally, we performed another test, now at decision time. With our framework, the network encodes a probability distribution at all times, and in particular at decision time. This distribution reflects the quality of the data that have been accumulated and, consequently, the performance of the animal. Hence, for both two and four choice experiments, the log odds estimated in the LIP layer should be higher at high coherence than at low coherence, since the performance of the animal is better in the former case. Fig. 5a and b show that our model behaves as predicted. Note the important distinction with single-race bounded accumulation models (Bogacz et al., 2006; Huk and Shadlen, 2005; Link, 1992; Link and Heath, 1975; Ratcliff and Rouder, 1998). In such models, the state of the system is characterized by the value of the accumulation process. When the bound is hit, this value is always the same (Gold and Shadlen, 2001; Link, 1992; Shadlen et al., 2006a). Thus, there is no principled way to recover the probability that the decision is correct. An ad hoc solution has been proposed for two race models (Vickers, 1979), but it was not derived from probabilistic principles, and does not readily generalize to more than two choices.

Figure 5.

Figure 5

Average log odds at decision time computed from the model and data for the two and four choice experiments. a. Average log odds at decision time for a two choice experiment estimated from two neurons in the LIP layer of the model tuned to 0° and 180° on trials for which the model selected 180°. The average log odds is defined as the ratio of the probability that the direction is equal to 180° over the probability that it is equal 0° averaged over trials. b. Same as in a but for the 4 choice experiment (for consistency with the 2 choice experiment, we use log odds in the four choice experiment). c. Same as in a but for actual LIP neurons (N=45) in the 2 choice experiment. (dotted line, data from Roitman and Shadlen, 2002; solid line, data from Churchland et al, 2008) d. Same as in b but for actual LIP neurons (N=51–70) in the four choice experiment. (data from Churchland et al, 2008). In both c and d, the log odds increases with coherence. Since higher coherence also implies higher performance, log odds also increases with performance. This is indeed what is expected if the posterior encoded in LIP reflects the quality of the data and, at decision time, the performance of the animal. On these plots, the y-axis is arbitrary up to a multiplicative factor.

Speed-accuracy tradeoff

When monkeys are tested on our decision making task in which they are free to choose when to respond, their psychometric and chronometric functions follow the profiles shown in Figs. 6a and 6b. To obtain these curves with our model, we used a stopping rule similar to the one used in most models: a fixed bound on the maximum activity in the network (see Supplementary Note). As can be seen, our model readily captures the performance and reaction time reported in monkeys whether the task involves two or four choices (data from Churchland et al., 2008). Moreover, the rate at which activity builds up on average in the LIP layer of the model as a function of coherence and number of choices is similar to what has been reported in vivo (see Fig. S1)

Figure 6.

Figure 6

Performance and reaction time for the model versus monkeys. a. Probability of correct responses as a function of coherence. Blue: two choice experiment. Red: four choice experiment. Solid lines: model. Closed circles: data from Churchland et al, 2008. b. Reaction time as a function of coherence. Legend as in a.

Experimental predictions

Our model makes two experimentally testable predictions. The first is that the population response in LIP encodes a probability distribution over the stimulus and, more importantly, that this distribution reflects both the reliability of the evidence and the performance of the animal. Therefore, we predict that if the population activity in LIP is decoded with the same method used in our simulations, the results will match those shown in Figs. 3a–b and 5a. Rigorously testing this prediction requires multi-unit recordings in LIP, which are not currently available. However, we can test it qualitatively with the spike trains obtained from single cell recordings (Roitman and Shadlen, 2002). If the spikes trains in LIP reflect the quality of the sensory data, the expected log odds computed from these spike trains should grow linearly with time, and the rate of growth should be proportional to coherence. We have performed this analysis, and this is indeed what we found, as illustrated in Fig. 3d. Furthermore, if these odds reflect the performance of the animal, we should find that the log odds in LIP at decision time grows with coherence for both the two and four choice experiment (since performance improves with coherence). Again, this is what we observed (Fig. 5b)

Recent experiments suggest that a similar property may hold for build-up cells in the superior colliculus (Kim, 2008; Ratcliff et al., 2007)). For instance, Kim and Basso (2008) have recorded simultaneously from neurons responding to the selected target and neurons responding to the distractors in a four choice experiment. They reported that the difference in activity between these neurons increases with performance. Under the assumption of Poisson-like neural variability, this difference would lead to an increase in the posterior probability assigned to the target as a function of performance.

As additional multi-unit data become available, it will be interesting to test our predictions more quantitatively. In particular, it will be important to determine whether the population of firing rates representing evidence for competing directions affects confidence judgments. It will also be important to determine whether a decoder that has knowledge of time and coherence performs better than a decoder that does not have such knowledge, and whether or not this additional information is accessible to the animal. As shown in Fig. 3c, our model predicts that there should be little difference.

The second experimental prediction concerns the time-evolution of the population activity in LIP. As can be seen in Fig. 4a, the width of the population activity does not change over time (once the curves are normalized for height and the baseline is removed), in contrast to the decoded probability distribution, which gets narrower as time progresses (Fig. 4b). This prediction is slightly weaker, since a population code with an invariant width is a sufficient but not a necessary condition for our proposed model. Nonetheless, the finding that the width of the population activity is invariant over time would be consistent with our model, while ruling out codes in which neural activities are proportional to probability (Barber et al., 2003; Eliasmith and Anderson, 2003).


We have shown that when the variability in spike count is Poisson-like, integration of evidence and action selection can be performed optimally in networks of spiking neurons, even when the variables involved are continuous and the reliability of the data changes over time. This result might explain why spike counts appear to follow ‘Poisson-like’ distributions throughout most of the cortex: this particular format greatly simplifies optimal Bayesian inference for decision making.

We have also shown that performance is near-optimal even when the distribution of spike counts is not exactly Poisson-like (see Fig. 3c) but instead follows the experimentally observed distribution in MT in response to random dot kinematograms. It would be interesting to explore how far one has to be from the Poisson-like family before there is significant departure from optimal Bayesian inference. If a stimulus could trigger such non-Poisson statistics in the brain, we could test whether subjects’ performance degrades as predicted by our model.

At first glance, it might appear that our model does not differ much from previous neural models of decision making (Machens et al., 2005; Mazurek et al., 2003; Ratcliff and Rouder, 1998; Reddi and Carpenter, 2000; Smith and Ratcliff, 2004; Usher and McClelland, 2001; Wang, 2002; Wong and Wang, 2006). Previous neural models have indeed shown that a neural integrator can capture the behavior of subjects in a binary decision task, as can a point attractor network. They have even provided a probabilistic interpretation of the neural integration in terms of accumulation of log odds. It is important to emphasize, however, that these models, and their probabilistic interpretations, apply under very restrictive conditions, and do not generalize to real-world problems. In particular, in the context of estimating motion direction, they cannot handle decision making over continuous choices, or time- or trial-varying coherence. For example, the notion that LIP neurons are effectively accumulating log odds when they integrate the difference in activity of MT neurons with opposite preferences is true only for binary decisions and fixed coherence (Gold and Shadlen, 2001). This notion does not generalize easily to multiple directions (Bogacz and Gurney, 2007; McMillen and Holmes, 2006) and does not generalize at all to time- and trial-varying coherence.

The general case requires that we deal with the difficult problem of hidden variables: how do we extract information about a variable (e.g. direction) from neural activity which is influenced by other, hidden variables (e.g. coherence) whose value is unknown and varies over time? This is one of the hardest problems faced by the brain, and no general solution has been provided in the context of decision making. Here, however, we have found a solution that can be implemented with biologically plausible mechanisms. Moreover, this solution led to a strong prediction which is that the log odds (or the posterior distribution in the case of multiple or continuous choice) are available on a trial-by-trial basis in LIP at all times, and in particular at decision time (without any knowledge of coherence or time). As shown in Fig. 5, the responses of LIP neurons in vivo are consistent with this prediction.

Our probabilistic framework also helps to clarify the benefits and limitations of using attractor dynamics for decision making, an approach that has been used in several models (Machens et al., 2005; Wang, 2002; Wong and Wang, 2006)}. Attractor dynamics is a good way to perform optimal action selection (as we do in the SCb layer), but not evidence integration (which is why we don’t use it in the LIP layer). Moreover, attractor dynamics can provide an optimal solution for action selection, but, importantly, only for a limited family of distributions, one of which is Poisson-like. This is a critical point, as it emphasizes the strong link between the response distribution and optimal inference.

Our framework is sufficiently powerful that it can be extended in several directions, including incorporating prior information, dealing with time-varying stimuli, and taking into account non-trivial reward functions when selecting actions. This last extension is critical. We have shown how the evidence accumulation and the response selection can be optimized in neural circuits, but we have not shown how to optimize reward rates. Optimizing reward rate is a complex problem that depends crucially on the cost function and the stopping process (Kiani et al., 2008). This lies beyond the scope of the present paper, but it is an important issue, which we intend to explore in future studies. It remains to be seen if it can be incorporated in the PPC framework. We believe that a promising idea is to explore whether LIP encodes the expected reward as a function of saccade direction and amplitude. Recent experimental data suggest that LIP might indeed represent either expected reward for all actions or the probability that an action will maximize reward (Platt and Glimcher, 1999; Sugrue et al., 2004). Either way, our framework should be applicable, since these quantities are similar to probability distributions over stimulus values.

P.E.L. is supported by the Gatsby Charitable Foundation and National Institute of Mental Health Grant R01 MH62447, and AP by NSF grant # BCS0446730 and MURI grant N00014-07-1-0937. MSN and A.P. are jointly supported by NIDA grants #BCS0346785 and a research grant from the James S. McDonnell Foundation. We thank Daphne Bavelier for her suggestions and comments.


