Fixation patterns in simple choice reflect optimal information sampling

Frederick Callaway; Antonio Rangel; Thomas L Griffiths

doi:10.1371/journal.pcbi.1008863

. 2021 Mar 26;17(3):e1008863. doi: 10.1371/journal.pcbi.1008863

Fixation patterns in simple choice reflect optimal information sampling

Frederick Callaway ^1,^*, Antonio Rangel ², Thomas L Griffiths ^1,³

Editor: Stefano Palminteri⁴

PMCID: PMC8026028 PMID: 33770069

Abstract

Simple choices (e.g., eating an apple vs. an orange) are made by integrating noisy evidence that is sampled over time and influenced by visual attention; as a result, fluctuations in visual attention can affect choices. But what determines what is fixated and when? To address this question, we model the decision process for simple choice as an information sampling problem, and approximate the optimal sampling policy. We find that it is optimal to sample from options whose value estimates are both high and uncertain. Furthermore, the optimal policy provides a reasonable account of fixations and choices in binary and trinary simple choice, as well as the differences between the two cases. Overall, the results show that the fixation process during simple choice is influenced dynamically by the value estimates computed during the decision process, in a manner consistent with optimal information sampling.

Author summary

Any supermarket shopper is familiar with the problem of choosing between a small number of items. Even these “simple choices” can be challenging because we have to think about the options to determine which one we like most, and we can’t think about all of them at once. This raises a question: what should we think about—and for how long should we think—before making a decision? We formalize this question as an information sampling problem, and identify an optimal solution. Observing what people look at while making choices, we find that many of the key patterns in their eye fixations are consistent with optimal information sampling.

Introduction

Consider the problems faced by a diner at a buffet table or a shopper at a supermarket shelf. They are presented with a number of options and must evaluate them until they identify the most desirable one. A central question in psychology and neuroscience is to understand the algorithms, or computational processes, behind these canonical simple choices.

Previous work has established two important features of the processes underlying simple value-based choices. First, choices and reaction times are well explained by information sampling models like the diffusion decision model (DDM) [1–3] and the leaky competing accumulator model [4, 5]. In these models, individuals are initially uncertain about the desirability of each option, but they receive noisy signals about the options’ values that they integrate over time to form more accurate estimates. A central insight of these models is that sampling information about unknown subjective values is a central feature of simple choice. Second, visual attention affects the decision-making process. In particular, items that are fixated longer are more likely to be chosen [6–13], unless they are aversive, in which case they are chosen less frequently [7, 14]. These findings have been explained by the Attentional Drift Diffusion Model (aDDM), in which the value samples of the fixated item are over-weighted relative to those of unfixated ones (or equivalently in the binary case, discounting the influence of the unattended item on the drift rate) [9, 10, 12, 13]. See [15, 16] for reviews.

These insights raise an important question: What determines what is fixated and when during the decision process? Previous work has focused on two broad classes of theories. One class suggests that decisions and fixations are driven by separate processes, so that fixations affect how information about values is sampled and integrated, but not the other way around. In this view, although fixations can be modulated by features like visual saliency or spatial location, they are assumed to be independent of the state of the decision process. This is the framework behind the aDDM [9, 10, 12] and related models [17–19].

Another class of theories explores the idea that the decision process affects fixations, especially after some information about the options’ values has been accumulated. Examples of this class include the Gaze Cascade Model [6], an extension of the aDDM in which options with more accumulated evidence in their favor are more likely to be fixated [20], and a Bayesian sampling model in which options with less certain estimates are more likely to be fixated [21]. However, these models have not considered how uncertainty and value might interact, nor have they considered the optimality of the posited fixation process (although see [22–24] for such analyses in simplified settings).

Research on eye movements in the perceptual domain suggests a third possibility: that fixations are deployed to sample information optimally in order to make the best choice. Previous work in vision has shown that fixations are guided to locations that provide useful information for performing a task, and often in ways that are consistent with optimal sampling [25]. For example, in visual search (e.g., finding an ‘M’ in a field of ‘Ns’) people fixate on areas most likely to contain the target [26, 27]; in perceptual discrimination problems, people adapt their relative fixation time to the targets’ noise levels [28, 29]; and in naturalistic task-free viewing, fixations are drawn to areas that have high “Bayesian surprise”, i.e., areas where meaningful information is most likely to be found [30]. The properties of fixations in these types of tasks are captured by optimal sampling models that maximize expected information gain [25, 31]. However, these models have not been applied in the context of value-based decision making, and thus the extent to which fixation patterns during simple choices are consistent with optimal information sampling is an open question.

In this paper, we draw these threads together by defining a model of optimal information sampling in canonical simple choice tasks and investigating the extent to which it accounts for fixation patterns and their relation to choices. In a value-based choice, optimal information sampling requires maximizing the difference between the value of the chosen item and the cost of acquiring the information needed to make the choice. Our model thus falls into a broad class of models that extend classical rational models of economic choice [32, 33] to additionally account for constraints imposed by limited cognitive resources [34–39]. However, as is common in this approach, we stop short of specifying a full algorithmic model of simple choice. Instead, we ask to what extent people’s fixations are consistent with optimal information sampling, without specifying how the brain actually implements an optimal sampling policy.

Exploring an optimal information sampling model of fixations in simple choice is useful for several reasons. First, since fixations can affect choices, understanding what drives the fixation process can provide critical insight into the sources of mistakes and biases in decision-making. In particular, the extent to which behaviors can be characterized as mistakes depends on the extent to which fixations sample information sub-optimally. Second, simple choice algorithms like the DDM have been shown to implement optimal Bayesian information processing when the decision-maker receives the same amount of information about all options at the same rate [40–46], and this is often viewed as an explanation for why the brain uses these algorithms in the first place. In contrast, the optimal algorithm when the decision-maker must sample information selectively is unknown. Third, given the body of evidence showing that fixations are deployed optimally in perceptual decision making, it is interesting to ask if the same holds for value-based decisions. Given that such problems are characterized by both a different objective function (maximizing a scalar value rather than accuracy) and a different source of information (e.g., sampling from memory [47–49] rather than from a noisy visual stimulus), it is far from clear that optimal information sampling models will still provide a good account of fixations in this setting.

Building on the previous literature, our model assumes that the decision maker estimates the value of each item in the choice set based on a sequence of noisy samples of the items’ true values. We additionally assume that these samples can only be obtained from the attended item, and that it is costly to take samples and to switch fixation locations. This sets up a sequential decision problem: at each moment the decision maker must decide whether to keep sampling, and if so, which item to sample from. Since the model does not have a tractable analytical solution, in order to solve it and take it to the data, we approximate the optimal solution using tools from metareasoning in artificial intelligence [50–53].

We compare the optimal fixation policy to human fixation patterns in two influential binary and trinary choice datasets [9, 10]. We find that the model captures many previously identified patterns in the fixation data, including the effects of previous fixation time [21] and item value [17, 20, 22]. In addition, the model makes several novel predictions about the differences in fixations between binary and trinary choices and about fixation durations, which are consistent with the data. Finally, we identify a critical role of the prior distribution in producing the classic effects of attention on choice [7, 9, 10, 14]. Overall, the results show that the fixation process during simple choice is influenced by the value estimates computed during the decision process, in a manner consistent with optimal information sampling.

Model

Sequential sampling model

We consider simple choice problems in which a decision maker (DM) is presented with a set of items (e.g., snacks) and must choose one. Each item i is associated with some true but unknown value, u⁽ⁱ⁾, the utility that the DM would gain by choosing it. Following previous work [1, 9, 40, 42–46, 54], we assume that the DM informs her choice by collecting noisy samples of the items’ true values, each providing a small amount of information, but incurring a small cost. The DM integrates the samples into posterior beliefs about each item’s value, choosing the item with maximal posterior mean when she terminates the sampling process.

As illustrated in Fig 1, we model attention by assuming that the DM can only sample from one item at each time point, the item she is fixating on. This sets up a fundamental problem: How should she allocate fixations in order to make good decisions without incurring too much cost? Specifically, at each time point, the DM must decide whether to select an option or continue sampling, and in the latter case, she must also decide which item to sample from. Importantly, she cannot simply allocate her attention to the item with the highest true value because she does not know the true values. Rather, she must decide which item to attend to based on her current value estimates and their uncertainty.

The DM’s belief about the item values at time t is described by a set of Gaussians, one for each item, with means $μ_{t}^{(i)}$ and precisions $λ_{t}^{(i)}$ (the precision is the inverse of the variance). These estimated value distributions are initialized to the DM’s prior belief about the distribution of values in the environment. That is, she assumes that $u^{(i)} \sim Gaussian (\bar{μ}, {\bar{σ}}^{2})$ and consequently sets $μ_{0}^{(i)} = \bar{μ}$ and $λ_{0}^{(i)} = {\bar{σ}}^{- 2}$ for all i. We further discuss the important role of the prior below.

We model the control of attention as the selection of cognitive operations, c_t, that specify either an item to sample, or the termination of sampling. If the DM wishes to sample from item c at time-step t, she selects c_t = c and receives a signal

\begin{matrix} x_{t} \sim Gaussian (u^{(c)}, σ_{x}^{2}), \end{matrix}

(1)

where u^(c) is the unknown true value of the item being sampled, and $σ_{x}^{2}$ is a free parameter specifying the amount of noise in each signal. The belief state is then updated in accordance with Bayesian inference:

\begin{matrix} \begin{matrix} λ_{t + 1}^{(c)} & = λ_{t}^{(c)} + σ_{x}^{- 2} \\ μ_{t + 1}^{(c)} & = \frac{σ_{x}^{- 2} x_{t} + λ_{t}^{(c)} μ_{t}^{(c)}}{λ_{t + 1}^{(c)}} \\ λ_{t + 1}^{(i)} & = λ_{t}^{(i)} and μ_{t + 1}^{(i)} = μ_{t}^{(i)} for i \neq c . \end{matrix} \end{matrix}

(2)

The cognitive cost of each step of sampling and updating is given by a free parameter, γ_sample. We additionally impose a switching cost, γ_switch, that the DM incurs whenever she samples from an item other than the one sampled on the last timestep (i.e., makes a saccade to a different item). Thus, the cost of sampling is

\begin{matrix} cost (c_{t}) = γ_{sample} + 1 (c_{t} \neq c_{t - 1}) γ_{switch} . \end{matrix}

(3)

Note that the model includes the special case in which there are no switching costs (γ_switch = 0).

In addition to choosing an item to sample, the DM can also decide to stop sampling and choose the item with the highest expected value. In this case, she selects c_t = ⊥. It follows that if the choice is made at time step T (i.e., c_T = ⊥) the chosen item is $i^{*} = arg {max}_{i} μ_{T}^{(i)}$ . The DM’s total payoff on a single decision is given by:

\begin{matrix} payoff = \underset{\begin{matrix} utility of \\ chosen item \end{matrix}}{\underset{︸}{u^{(i^{*})}}} - \underset{cognitive}{\underset{︸}{\sum_{t = 1}^{T - 1} cost (c_{t}) .}} cost \end{matrix}

(4)

Optimal policy

We assume that the decisions about where to fixate and when to stop sampling are made optimally, subject to the informational constraints described in the previous section. Formally, we assume that the c_t are selected by an optimal policy. A policy selects the next cognitive operation to execute, c_t, given the current belief state, (μ_t, λ_t); it is optimal if it selects c_t in a way that maximizes the expectation of Eq 4. How can we identify such a policy? Problems of this kind have been explored in the artificial intelligence literature on rational metareasoning [50, 51]. Thus, we cast the model described above as a metalevel Markov decision process [52], and identify a near-optimal policy using a recently developed method that has been shown to achieve strong performance on a related problem [53]. In accordance with past work modeling people’s choices [55] and fixations [20, 21], we assume that people follow a softmax policy in selecting each cognitive operation by sampling from a Boltzmann distribution based on their estimated values. Thus, their choices of cognitive operations are guided by the optimal policy, but subject to some noise. See Methods for details.

What does optimal attention allocation look like? In order to provide an intuitive understanding, we focus on two key properties of belief states: (1) uncertainty about the true values and (2) differences in the value estimates. Fig 2A shows the probability of the optimal policy (for a model with parameters fit to human data) sampling an item as a function of these two dimensions (marginalizing over the other dimensions according to their probability of occurring in simulated trials). We see that the optimal policy tends to fixate on items that are uncertain and have estimated values similar to the other items. In the case of trinary—but not binary—choice, we additionally see a stark asymmetry in the effect of relative estimated value. While the policy is likely to sample from an item whose value is substantially higher than the competitors, it is unlikely to sample from an item with value well below. In particular, the policy has a strong preference to sample from the items with best or second-best value estimates.

Fig 2 — (A) Probability of fixating on item 1 as a function of the precision of its value estimate, λ⁽¹⁾, and the mean of its relative value estimate, μ⁽¹⁾ − mean(μ⁽²⁾, μ⁽³⁾). The heat map denotes the probability of fixating item 1 as opposed to fixating one of the other items or terminating the sampling process. (B) Illustration of the value of sampling. Each panel shows a belief state for trinary choice. The curves depict the estimated beliefs for each item’s value, and the shaded regions show the probability that the item’s true value is higher than the current best value estimate. This probability correlates strongly with the value of sampling the item because sampling is only valuable if it changes the choice (the full value of sampling additionally depends on the size of the potential gain in value, as well as the cost of future samples and the possibility of sampling other items). In each case, it is more valuable to sample the orange item than the purple item because either (top) its value is more uncertain, or (bottom) its value is closer to the leading value.

To see why this is optimal, note that sampling is only valuable insofar as it affects choice, and that the chosen item is the one with maximal estimated value when sampling stops. Thus, the optimal policy generally fixates on the item for which gathering more evidence is most likely to change which item has maximal expected value. There are two ways for this to happen: either the value of the current best item is reduced below the second-best item, or the value of some alternative item is increased above the best item. The former can only happen by sampling the best item, and the latter is ceteris paribus most likely to occur by sampling the second-best item because it is closer to the top position than the third-best item is (Fig 2B bottom). However, if uncertainty is much greater for the third-best item, this can outweigh the larger difference in estimated value (Fig 2B top). See [22] for a more formal justification for value-directed attention in a simplified non-dynamic case.

The prior distribution

Recall that the initial belief about each item’s value is set to the DM’s prior belief about the distribution of values in the environment; that is $μ_{0}^{(i)} = \bar{μ}$ and $λ_{0}^{(i)} = {\bar{σ}}^{- 2}$ . This corresponds to the DM assuming that each item’s value is drawn from a prior distribution of true values given by $u^{(i)} \sim Gaussian (\bar{μ}, {\bar{σ}}^{2})$ . This assumption is plausible if this is the actual distribution of items that the DM encounters, and she is a Bayesian learner with sufficient experience in the context under study. However, given that these models are typically used to study choices made in the context of an experiment (as we do here), the DM might not have learned the exact prior distribution at work. As a result, we must consider the possibility that she has a biased prior.

In order to investigate the role of the prior on the model predictions, we assume that it takes the form of a Gaussian distribution with a mean and standard deviation related to the actual empirical distribution as follows:

\begin{matrix} \begin{matrix} \bar{μ} & = α \cdot mean (ratings) \\ \bar{σ} & = std (ratings) . \end{matrix} \end{matrix}

(5)

Here, mean(ratings) denotes the mean value ratings of all items, which provide independent and unbiased measures of the true value of the items (computed across trials in both experiments), and α is a free parameter that specifies the amount of bias in the prior (α = 0 corresponds to a strong bias and α = 1 corresponds to no bias). As a result, the DM has correct beliefs about the prior variance, but is allowed to have a biased belief about the prior mean. This case could arise, for example, if the average true value of the items used in the experiment differs from the average item that the DM encounters in her daily life.

Model fitting

We apply the model to two influential simple choice datasets: a binary food choice task [9] and a trinary food choice task [10]. In each study, participants first provided liking-ratings for 70 snack items on a -10 to 10 scale, which are used as an independent measure of the items’ true values. They then made 100 choices among items that they had rated positively, while the location of their fixations was monitored at a rate of 50 Hz. See S1 Appendix for more details on the experiments.

The model has five free parameters: the standard deviation of the sampling distribution σ_x, the cost per sample γ_sample, the cost of switching attention γ_switch, the prior bias α, and the inverse temperature of the softmax policy used to select cognitive operations, β. This last parameter controls the amount of noise in the fixation decisions. In order to fit the model, we need to make an assumption about the time that it takes to acquire each sample, which we take to be 100 ms. Note, however, that this choice is not important: changing the assumed duration leads to a change in the fitted parameters, but not in the qualitative model predictions.

We use an approximate maximum likelihood method to fit these parameters to choice and fixation data, which is described in the Methods section. Importantly, since the same model can be applied to N-item choices, we fit a common set of parameters jointly to the pooled data in both datasets. Thus, any differences in model predictions between binary and trinary choices are a priori predictions resulting from the structure of the model, and not differences in the parameters used to explain the two types of choices. We estimate the parameters using only the even trials, and then simulate the model in odd trials in order to compare the model predictions with the observed patterns out-of-sample. Because the policy optimization and likelihood estimation methods that we use are stochastic, we display simulations using the 30 top performing parameter configurations to give a sense of the uncertainty in the predictions. The parameter estimates were (mean ± std) σ_x = 2.60 ± 0.216, α = 0.581 ± 0.118, γ_switch = 0.00995 ± 0.001, γ_sample = 0.00373 ± 0.001, and β = 364 ± 81.2 As explained in the Methods, the units of these parameter estimates are standard deviations of value (i.e., $\bar{σ}$ ).

In order to explore the role of the prior, we also fit versions of the model in which the prior bias term was fixed to α = 0 or α = 1. The former corresponds to a strongly biased prior and the latter corresponds to a completely unbiased prior. For α = 0, the fitted parameters were σ_x = 3.16 ± 0.409, γ_switch = 0.00875 ± 0.002, γ_sample = 0.00319 ± 0.001, and β = 326 ± 81.2. For α = 1, they were σ_x = 2.66 ± 0.272, γ_switch = 0.0118 ± 0.002, γ_sample = 0.00506 ± 0.001, and β = 330.0 ± 97.9.

All the figures below are based on model fits estimated at the group level on the pooled data. However, for completeness we also fit the model separately for each individual, and report these fits in S2 Appendix. We also carry out a validation of our model fitting approach in S1 Appendix.

Results

We now investigate the extent to which the predictions of the model, fitted on the even trials, are able to account for observed choice, reaction time and fixation patterns in the out-of-sample odd trials.

Basic psychometrics

We begin by looking at basic psychometric patterns. Fig 3A compares the choice curves predicted by the model with the actual observed choices, separately for the case of binary and trinary choice. It shows that the model captures well the influence of the items’ true values (as measured by liking ratings) on choice.

Fig 3B plots the distribution of total fixation times. This measure is similar to reaction time except that it excludes time not spent fixating on one of the items. We use total fixation time instead of reaction time because the model does not account for the initial fixation latency nor the time spent saccading between items (although it does account for the opportunity cost of that time, through the γ_sample parameter). As shown in the figure, the model provides a reasonable qualitative account of the distributions, although it underpredicts the mode in the case of two items and the skew in both cases.

Fig 3C shows the relationship between total fixation time and trial difficulty, as measured by the relative liking rating of the best item. We find that the model provides a reasonable account of how total fixation time changes with difficulty. This prediction follows from the fact that fewer samples are necessary to detect a large difference than to either detect a small difference or determine that the difference is small enough to be unimportant. However, the model exhibits considerable variation in the predicted intercept and substantially overpredicts total fixation time in difficult trinary choices.

Finally, Fig 3D shows the relationship between total fixation time and the average rating of all the items in the choice set. This “overall value effect” has been emphasized in recent research [13, 16] because it is consistent with multiplicative attention weighting (as in the aDDM) but not an additive boosting model (e.g., [11]). Bayesian updating results in a form of multiplicative weighting (specifically, a hyperbolic function, c.f. [14]), and thus our model also predicts this pattern. Surprisingly, we do not see strong evidence for the overall value effect in the datasets we consider, but we note that the effect has been found robustly in several other datasets [13, 56–59]. Note that, in the binary case, the predicted overall value effect is symmetric around the prior mean; that is, choices between two very bad items will also be made quickly. Indeed, with an unbiased-prior, the model predicts an inverted-U relationship around the prior mean.

Several additional patterns in Fig 3 are worth highlighting. First, all the models make similar and reasonable predictions of the psychometric choice curve and fixation time distributions. Second, the models with some prior bias provide a better account of the fixation time curves in binary choice than the unbiased model, and qualitatively similar predictions to the aDDM. Finally, despite using a common set of parameters, all the models capture well the differences between binary and trinary choice.

Basic fixation properties

We next compare the predicted and observed fixation patterns. An observed “fixation” refers to a contiguous span of time during which a participant looks at the same item. A predicted model fixation refers to a continuous sequence of samples taken from one item.

Fig 4A shows the distribution of the number of fixations across trials. The model-predicted distribution is reasonably similar to the observed data. However, in the two-item case, the model is more likely to make only one fixation, suggesting that people have a tendency to fixate both items at least once that the model does not capture.

Fig 4B shows the relationship between the total number of fixations and decision difficulty. We find that the model captures the relationship between difficulty and the number of fixations reasonably well, with the same caveats as for Fig 3B.

The original binary and trinary choice papers [9, 10] observed a systematic change in fixation durations over the course of the trial, as shown in Fig 4C. Although the model tends to underpredict the duration of the first two fixations in the three-item case, it captures well three key patterns: (a) the final fixation is shorter, (b) later (but non-final) fixations are longer and (c) fixations are substantially longer in the two-item case. The final prediction is especially striking given that the model uses the same set of fitted parameters for both datasets. The model predicts shorter final fixations because they are cut off when a choice is made [9, 10]. The model predicts the other patterns because more evidence is needed to alter beliefs when their precision is already high; this occurs late in the trial, especially in the two-item case where samples are split between fewer items.

Fig 4 also shows that the main model provides a more accurate account than the aDDM of how the number of fixations changes with trial difficulty, and of how fixation duration evolves over the course of a trial. One difficulty in making this comparison is that the aDDM assumes that non-final fixation durations are sampled from the observed empirical distribution, conditional on a number of observable variables, and thus the accuracy of its predictions regarding fixation duration and fixation number depends on the details of this sampling. To maximize comparability with the existing literature, here we use the same methods as in the original implementations [9, 10].

Uncertainty-directed attention

As we have seen, one of the key drivers of fixations in the optimal policy is uncertainty about the items’ values. Specifically, because the precision of the posteriors increases linearly with the number of samples, the model predicts that, other things being equal, fixations should go to items that have received less cumulative fixation time. However, the difference in precision must be large enough to justify paying the switching cost. In this section we explore some of the fixation patterns associated with this mechanism.

Fig 5A depicts the distribution of relative cumulative fixation time at the beginning of a new fixation, starting with the second fixation. That is, at the onset of each fixation, we ask how much time has already been spent fixating the newly fixated item, compared to the other items. In both cases, the actual and predicted distributions are centered below zero, so that items tend to be fixated when they have received less fixation time than the other items. Additionally, the model correctly predicts the lower mode and fatter left tail in the two-item case.

Note, however, that a purely mechanical effect can account for this basic pattern: the item that is currently fixated will on average have received the most fixation time, but it cannot be the target of a new fixation, which drives down the fixation advantage of newly fixated items. For this reason, it is useful to look further at the three-item case, which affords a stronger test of uncertainty-directed attention. In this case, the target of each new fixation (excluding the first) must be one of the two items that are not currently fixated. Thus, comparing the cumulative fixation times for these items avoids the previous confound. Fig 5B thus plots the distribution of fixation time for the fixated item minus that of the item which could have been fixated but was not. We see a similar pattern to Fig 5A (right) in both the data and model predictions. This suggests that uncertainty is not simply driving the decision to make a saccade, but is also influencing the location of that saccade.

Fig 5C explores this further by looking at the location of new fixations in the three-item case, as a function of the difference in cumulative fixation time between the two possible fixation targets. Although the more-previously-fixated item is always less likely to be fixated, the probability of such a fixation actually increases as its fixation advantage grows. This counterintuitive model prediction results from the competing effects of value and uncertainty on attention. Since items with high estimated value are fixated more, an item that has been fixated much less than the others is likely to have a lower estimated value, and is therefore less likely to receive more fixations. However, we see that the predicted effect is much stronger than the observed effect, and that the aDDM model provides a better account of this pattern than our main model. However, note that the accuracy of this fit follows from the fact that the aDDM samples fixation locations and durations from the empirical distribution, conditioned on the previous three fixation locations and the item ratings.

Value-directed attention

A second key driver of attention in the optimal policy is estimated value, which directs fixations to the two items with the highest posterior means. As illustrated in Fig 2A, this implies that fixation locations should be sensitive to relative estimated values in the trinary but not in the binary case.

Although we cannot directly measure the participants’ evolving value estimates, we can use the liking ratings as a proxy for them because higher-rated items will tend to result in higher value estimates. Using this idea, Fig 6A shows the proportion of fixation time devoted to the left item as a function of its relative rating. Focusing first on the three-item case, both the model and data show a strong tendency to spend more time fixating on higher rated items (which are therefore likely to have higher estimated values). In the two item case, the model simulations show a smaller but also positive effect. This is counterintuitive since the model predicts that in the two-item case fixation locations are insensitive to the sign of the relative value estimates (Fig 2A). However, the pattern likely arises due to the tendency to fixate last on the chosen item (see Fig 7A below).

Fig 7 — (A) Probability that the last fixated item is chosen as a function of its relative rating. (B) Probability that the left item is chosen as a function of its final fixation advantage, given by total fixation time to the left item minus the mean total fixation time to the other item(s). (C) Probability of choosing the first-seen item as a function of the first-fixation duration. See Fig 3 for more details.

Fig 6B provides an alternative test that avoids confounds associated with the final fixation. It shows the duration of the first fixation, which is rarely final, as a function of the rating of the first fixated item. In the three-item case, both the model and data show longer initial fixations to high-rated items, although the model systematically underpredicts the mean first fixation duration. This prediction follows from the fact that, under the optimal policy, fixations are terminated when the fixated item’s estimated value falls out of the top two (below zero for the first fixation); the higher the true value of the item, the less likely this is to happen. In the two-item case, however, the model predicts that first fixation duration should be largely insensitive to estimated value; highly valuable items actually receive slightly shorter fixations because these items are more likely to generate extremely positive samples that result in terminating the first fixation and immediately choosing the fixated item. Consistent with this prediction, humans show little evidence for longer first fixations to high-rated items in the binary case.

Previous work has suggested that attention may be directly influenced by the true value of the items [17, 18, 60]. In our model, however, attention is driven only by the internal value estimates generated during the decision making process. To distinguish between these two accounts, we need a way to dissociate estimated value from true value. One way to do this is by looking at the time course of attention. Early in the decision making process, estimated values will be only weakly related to true value. However, with time the value estimates become increasingly accurate and thus more closely correlate with true value. Thus, if the decision maker always attends to the items with high estimated value, she should be increasingly likely to attend to items with high true value as the trial progresses. Fig 6C shows the probability of fixating on the worst item as a function of the cumulative fixation time to any of the items. In both the two- and three-item cases, the probability begins near chance. In the three-item case, however, the probability quickly falls. This is consistent with a model in which attention is driven by estimated value rather than value itself.

The model makes even starker predictions in the three-item case. First, take all trials in which the decision-maker samples from different items during the first three fixations. Consider the choice of where to deploy the fourth fixation. The model predicts that this fixation should be to the first-fixated item if its posterior mean is larger than that of the second-fixated item, and vice versa. As a result, the probability that the fourth fixation is a refixation to the first-fixated item should increase with the difference in ratings between the first- and second-fixated items. As shown in Fig 6D, the observed pattern follows the model prediction.

Finally, the model makes a striking prediction regarding the location of the third fixation in the three-item case. Consider the choice of where to fixate after the first two fixations. The decision maker can choose to fixate on the item that she has not seen yet, or to refixate the first-fixated item. The model predicts a refixation to the first-seen item if both that item and the second-seen item already have high value estimates (leaving the unfixated item with the lowest value estimate). Consistent with this prediction, Fig 6E shows that the probability of the third fixation being a refixation to the first-seen item increases with that item’s rating. Note that the model with α fixed to zero (corresponding to a strong prior bias), dramatically overpredicts the intercept. This is because this model greatly underestimates the value of the not-yet-fixated item.

Fig 6 shows that our main model provides a better prediction of some fixation patterns, whereas the aDDM provides a better fit of others. However, it is important to keep in mind that whereas our model provides predictions for these fixation patterns based on first principles, the predictions of the aDDM for these patterns are largely mechanistic since that model samples fixation locations and durations from the observed empirical distribution. As a result, it is not surprising that Fig 6B shows a better match between the aDDM and the data since the predicted durations are, literally, sampled from the observed data conditional on the first item rating.

Choice biases

Previous work has found a systematic positive correlation between relative fixation time and choice for appetitive (i.e., positively valenced) items [6, 7, 9, 10, 14, 20]. In particular, models like the aDDM propose that an exogenous or random increase in fixations towards an appetitive item increase the probability that it will be chosen, which leads to attention driven choice biases. Here we investigate whether the optimal model can account for these types of effects.

Importantly, in the type of optimal fixation model proposed here, there are two potential mechanisms through which such correlations can emerge. The first is driven by the prior. If the prior mean is negatively biased, then sampling from an item will on average increase its estimated value. This follows from the fact that sampling will generally move the estimated value towards the item’s true value, and a negatively biased prior implies that the initial value estimate is generally less than the true value. The second mechanism, which is only present in trinary choice, is the result of value-directed attention. Here, the causal direction is flipped, with value estimates driving fixations rather than fixations driving value estimates. In particular, items with higher estimated value are both more likely to be fixated, and more likely to be chosen. Thus, fixations and choice are correlated through a common cause structure. Importantly, the two mechanisms are not mutually exclusive; in fact, our model predicts that both will be in effect for choice between more than two items.

Fig 7A shows that there is a sizable choice bias towards the last-seen item in both datasets, as evidenced by the greater-than-chance probability of choosing an item whose value is equal to the mean of the other items. Our model provides a strong quantitative account of the pattern in trinary choice, but substantially underpredicts the effect in binary choice. Interestingly, it predicts a weaker effect than the aDDM in the binary case, but a stronger effect in the trinary case.

To understand this result, it is important to think about the prior beliefs implicit in the aDDM and related models [9, 10, 20]. Since these are not Bayesian models, they do not posit an explicit prior that is then modified by evidence. However, the aDDM can be viewed as an approximation to a Bayesian model with a prior centered on zero, as reflected by the initial point of the accumulator (zero) and the multiplicative discounting (the evidence for the non-attended item is discounted towards zero). The latter roughly corresponds to the Bayesian regularization effect, wherein the posterior mean falls closer to the prior mean when the likelihood is weak (low precision). Given this, our model predicts a weaker effect in the binary case because it has a weaker prior bias (α = 0.58) than the one implicit in the aDDM (α = 0). Our model predicts a stronger effect in the trinary case due to the value-directed attention mechanism. Critically, although the aDDM accounts for the effect of true value on fixations (by sampling from the empirical fixation distribution), only the optimal model accounts for the effects of estimated value. Thus, conditioning on true value (as we do in Fig 7A) breaks the value-based attention mechanism in the aDDM but not in the optimal model. Finally, note that the optimal model with α = 0 provides a good account of the bias in the binary case, but dramatically overpredicts it in the trinary case.

Fig 7B shows that the average probability of choosing the left item increases substantially with its overall relative fixation time. As before, in comparison with the aDDM, the optimal model provides better captures the full strength of the bias in the trinary case, but underpredicts the effect in the binary case. The optimal model with α fixed to zero performs best in both cases. Note that the fit of the aDDM is not as close as for similar figures in the original papers because we simulate all models with the observed ratings (rather than all possible combination of item ratings) and we consider a larger range of final time advantage. We replicate the original aDDM figures in S1 Appendix.

Finally, Fig 7C shows that the probability of choosing the first fixated item increases with the duration of the first fixation. Importantly, this figure shows that the attention-choice correlation cannot be explained solely by the tendency to choose the last-fixated item. Again, all four models qualitatively capture the effect, with varying degrees of quantitative fit.

Discussion

We have built a model of optimal information sampling during simple choice in order to investigate the extent to which it can provide a quantitative account of fixation patterns, and their relationship with choices, during binary and trinary decisions. The model is based on previous work showing that simple choices are based on the sequential accumulation of noisy value samples [1, 44, 61–64] and that the process is modulated by visual attention [7, 9, 10, 17, 20, 21, 65]. However, instead of proposing a specific algorithmic model of the fixation and choice process, as is common in the literature, our focus has been on characterizing the optimal fixation policy and its implications. We build on previous work on optimal economic decision-making in which samples are acquired for all options at the same rate [40, 44–46], and extend it to the case of endogenous attention, where the decision maker can control the rate of information acquired about each option. We formalized the selection of fixations as a problem of dynamically allocating a costly cognitive resource in order to gain information about the values of the available options. Leveraging tools from metareasoning in artificial intelligence [50–53], we approximated the optimal solution to this problem, which takes the form of a policy that selects which item to fixate at each moment and when to terminate the decision-making process.

We found that, despite its simplicity, the optimal model accounts for many key fixation and choice patterns in two influential binary and trinary choice datasets [9, 10]. The model was also able to account for striking differences between the two- and three-item cases using a common set of parameters fitted out of sample. More importantly, the results provide evidence in favor of the hypothesis that the fixation process is influenced by the evolving value estimates, at least to some extent. Consider, for example, the increase in fixation duration over the course of the trial shown in Fig 4C, the tendency to equate fixation time across items (Fig 5B), and the relationship between the rating of the first fixated item and the probability of re-fixating it (Fig 6D and 6E). These effects are explained by our model, but are hard to explain with exogenous fixations, or with fixations that are correlated with the true value of the items, but not with the evolving value estimates (e.g., as in [17, 18, 66]).

Optimal information sampling models may appear inappropriate for value-based decision-making problems, in which perceptual uncertainty about the identity of the different choice items (often highly familiar junk foods) is likely resolved long before a choice is made. Two features of the model ameliorate this concern. First, the samples underlying value-based decisions are not taken from the external display (as in perceptual decisions), but are instead generated internally, perhaps by some combination of mental simulation and memory recall [47–49]. Second, the model makes the eye-mind assumption [15, 67]: what a person is looking at is a good indicator of what they are thinking about. Importantly, these assumptions implicitly underlie all sequential sampling models of value-based decision-making.

Our model is not the first to propose that the fixation and value-estimation processes might interact reciprocally. However, no previous models fully capture the key characteristics of optimal attention allocation, which appear to be at least approximated in human fixation behavior. For example, the Gaze Cascade Model [6] proposes that late in a trial subjects lock-in fixations on the favored option until a choice is made, [20] propose an aDDM in which the probability of fixating an item is given by a softmax over the estimated values, and [21] propose a Bayesian model of binary choice in which fixations are driven by relative uncertainty. In contrast to these models, the optimal model predicts that fixations are driven by a combination of the estimated uncertainty and relative values throughout the trial, and that attention is devoted specifically to the items with the top two value estimates. Although the data presented here strongly support the first prediction, further data are necessary to distinguish between the top-two rule and the softmax rule of [20].

Our results shed further light on the mechanisms underlying the classic attention-choice correlation that has motivated previous models of attention-modulated simple choice. First, our results highlight an important role of prior beliefs in sequential sampling models of simple choice (c.f. [68]). All previous models have assumed a prior mean of zero, either explicitly [21, 68] or implicitly [9, 10, 20]. Such a prior is negatively biased when all or most items have positive value, as is often the case in experimental settings. This bias is critical in explaining the classic attention-choice correlation effects because it creates a net-positive effect of attention on choice: if one begins with an underestimate, attending to an item will on average increase its estimated value. However, we found that the best characterization of the full behavior was achieved with a moderately biased prior, both in terms of our approximate likelihood and in the full set of behavioral patterns in the plots.

Our results also suggest another (not mutually exclusive) mechanism by which the attention-choice correlation can emerge: value-directed attention. We found that the optimal model with no prior bias (α = 1) predicts an attention-choice correlation in the trinary choice case. This is because, controlling for true values, an increase in estimated value (e.g., due to sample noise) makes the model more likely to both fixate and choose an item. This could potentially help to resolve the debate over additive vs. multiplicative effects of attention on choice [11, 13]. While the prior-bias mechanism predicts a multiplicative effect, the value-directed attention mechanism predicts that fixation time and choice will be directly related (as predicted by the additive model). Although we did not see strong evidence for value-directed attention in the binary dataset, such a bias has been shown in explicit information gathering settings [69] and could be at work in other binary choice settings.

Our work most closely relates to two recent lines of work on optimal information sampling for simple choice. First, Hébert and Woodford [70, 71] consider sequential sampling models based on rational inattention. They derive optimal sampling strategies under highly general information-theoretic constraints, and establish several interesting properties of optimal sampling, such as the conditions under which the evidence accumulation will resemble a jump or a diffusion process. In their framework, the decision maker chooses, at each time point, an arbitrary information structure, the probability of producing each possible signal under different true states of the world. In contrast, we specify a very small set of information structures, each of which corresponds to sampling a noisy estimate of one item’s value (Eq 1). This naturally associates each information structure with fixating on one of the items, allowing us to compare model predictions to human fixation patterns. Whether human attention more closely resembles flexible construction of dynamic information structures, or selection from a small set of fixed information structures is an interesting question for future research.

In a second line of work, concurrent to our own, Jang, Sharma, and Drugowitsch [68] develop a model of optimal information sampling for binary choice with the same Bayesian structure as our model and compare their predictions to human behavior in the same binary choice dataset that we use [9]. There are three important differences between the studies. First, they consider the possibility that samples can also be drawn in parallel for the unattended item, but with higher variance. However, they find that a model in which almost no information is acquired for the unattended item fits the data best, consistent with the assumptions of our model. Second, they use dynamic programming to identify the optimal attention policy almost exactly. This allows them to more accurately characterize truly optimal attention allocation. However, dynamic programming is intractable for more than two items, due to the curse of dimensionality. Thus, they could not consider trinary choice, which is of special interest because only this case makes value-directed attention optimal, and forces the decision-maker to decide which of the unattended items to fixate next, rather than simply when to switch to the other item. Third, they assumed (following previous work) that the prior mean is zero. In contrast, by varying the prior, we show that although a biased prior is needed to account for the attention-choice correlation in binary choice, the data is best explained by a model with only a moderately biased prior mean, about halfway between zero and the empirical mean.

We can also draw insights from the empirical patterns that the model fails to capture. These mismatches suggest that the model, which was designed to be as simple as possible, is missing critical components that should be explored in future work. For example, the underprediction of fixation durations early in the trial could be addressed by more realistic constraints on the fixation process such as inhibition of return, and the overprediction of the proportion of single-fixation trials in the two-item case could be explained with uncertainty aversion. Although not illustrated here, the model’s accuracy could be further improved by including bottom-up influences on fixations (e.g., spatial or saliency biases [18, 72]).

While we have focused on attention in simple choice, other studies have explored the role of attention in more complicated multi-attribute choices [5, 73–82]. None of these studies have carried out a full characterization of the optimal sampling process or how it compares to observed fixation patterns, although see [83, 84] for some related results. Extending the methods in this paper to that important case is a priority for future work. Finally, in contrast to many sequential sampling models, our model is not intended as a biologically plausible process model of how the brain actually makes decisions. Exploring how the brain might approximate the optimal sampling policy presented here, and also how optimal sampling might change under accumulation mechanisms such as decay and inhibition is another priority for future work.

Methods

The model was implemented in the Julia programming language [85]. The code can be found at https://github.com/fredcallaway/optimal-fixations-simple-choice.

Attention allocation as a metalevel Markov decision process

To characterize optimal attention allocation in our model, we cast the model as a metalevel Markov decision process (MDP) [52]. Like a standard MDP, a metalevel MDP is defined by a set of states, a set of actions, a transition function giving the probability of moving to each state by executing a given action in a given state, and a reward function giving the immediate utility gained by executing a given action in a given state. In a metalevel MDP, the states, $B$ , correspond to beliefs (mental states), and the actions, $C$ , correspond to computations (cognitive operations). However, formally, it is identical to an MDP, and can be interpreted as such.

In our model, a belief state, $b \in B$ , corresponds to a set of posterior distributions over each item’s value. Because the distributions are Gaussian, the belief can be represented by two vectors, μ and λ, that specify the mean and precision of each distribution. That is

p (u^{(i)} ∣ b) = Gaussian (u^{(i)}; μ^{(i)}, \frac{1}{λ^{(i)}})

To model the switching cost, the belief state must also encode the currently attended item, i.e., the item sampled last (taking a null value, ⊘, in the initial belief). Thus, a belief is a tuple b_t = (μ_t, λ_t, last_t). The dimensionality of the belief space is 2N + 1 where N is the number of items.

A computation, $c \in C$ , corresponds to sampling an item’s value and updating the corresponding estimated value distribution. There are N such computations, one for each item. Additionally, all metalevel MDPs have a special computation, ⊥ that terminates the computation process (in our case, sampling) and selects an optimal external action given the current belief state (in our case, choosing the item with maximal posterior mean).

The metalevel transition function describes how computations update beliefs. In our model, this corresponds to the sampling and Bayesian belief updating procedure specified in Eq 2, which we reproduce here for the reader’s convenience. Note that we additionally make explicit the variable that tracks the previously sampled item. Given the current belief, b_t = (μ_t, λ_t, last_t), and computation, c, the next belief state, b_t+1 = (μ_t+1, λ_t+1, last_t+1), is sampled from the following generative process:

\begin{matrix} \begin{matrix} x_{t} & \sim Gaussian (u^{(c)}, σ_{x}^{2}) \\ λ_{t + 1}^{(c)} & = λ_{t}^{(c)} + σ_{x}^{- 2} \\ μ_{t + 1}^{(c)} & = \frac{σ_{x}^{- 2} x_{t} + λ_{t}^{(c)} μ_{t}^{(c)}}{λ_{t + 1}^{(c)}} \\ λ_{t + 1}^{(i)} & = λ_{t}^{(i)} and μ_{t + 1}^{(i)} = μ_{t}^{(i)} for i \neq c . \\ {last}_{t + 1} & = c \end{matrix} \end{matrix}

(6)

Finally, the metalevel reward function incorporates both the cost of computation and the utility of the chosen action. The metalevel reward for sampling is defined

\begin{matrix} R (b_{t}, c_{t}) = - cost (b_{t}, c_{t}) = - (γ_{sample} + 1 ({last}_{t} \neq ⊘ \land c_{t} \neq {last}_{t}) γ_{switch}) . \end{matrix}

That is, the cost of sampling includes a fixed cost, γ_sample, as well as an additional switching cost, γ_switch, that is paid when sampling from a different item than that sampled on the last time step. We assume that this cost is not paid for the first fixation; however, this assumption has no effect on the optimal policy for reasonable parameter values. The action utility is the true value of the chosen item, i.e., $u^{(i_{T}^{*})}$ where $i_{T}^{*} = \underset{i}{argmax} μ_{T}^{(i)}$ . The metalevel reward for the termination computation, ⊥, is the expectation of this value. Because we assume accurate priors and Bayesian belief updating, this expectation can be taken with respect to the agent’s own beliefs [52], resulting in

\begin{matrix} R (b_{t}, ⊥) = E [u^{(i_{T}^{*})} | b_{t}] = max_{i} μ_{t}^{(i)} . \end{matrix}

Optimal metalevel policy

The solution to a metalevel MDP takes the form of a Markov policy, π, that stochastically selects which computation to take next given the current belief state. Formally, c_t ∼ π(b_t). The optimal metalevel policy, π*, is the one that maximizes expected total metalevel reward,

\begin{matrix} π^{*} = \underset{π}{argmax} E [\sum_{t}^{T} R (b_{t}, c_{t}) | c_{t} \sim π (b_{t})] . \end{matrix}

Replacing R with its definition, we see that this requires striking a balance between the expected value of the chosen item and the computational cost of the samples that informed the choice,

\begin{matrix} π^{*} = \underset{π}{argmax} E [max_{i} μ_{T}^{(i)} - \sum_{t}^{T - 1} cost (b_{t}, c_{t}) | c_{t} \sim π (b_{t})] . \end{matrix}

That is, one wishes to acquire accurate beliefs that support selecting a high-value item, while at the same time minimizing the cost of the samples necessary to attain those beliefs. This suggests a strategy for selecting computations optimally. For each item, estimate how much one’s decision would improve if one sampled from it (and then continued sampling optimally). Subtract from this number the cost of taking the sample (and also the estimated cost of the future samples). Now identify the item for which this value is maximal. If it is positive, it is optimal to take another sample for this item; otherwise, it is optimal to stop sampling and make a decision.

This basic logic is formalized in rational metareasoning as the value of computation (VOC) [51]. Formally, VOC(b, c) is defined as the expected increase in total metalevel reward if one executes a single computation, c, and continues optimally rather than making a choice immediately (i.e., executing ⊥):

\begin{matrix} VOC (b_{t}, c) = R (b_{t}, c) + E [\sum_{t^{'} = t + 1}^{T} R (b_{t^{'}}, c_{t^{'}}) | c_{t^{'}} \sim π^{*} (b_{t^{'}})] - R (b, ⊥) . \end{matrix}

In our model, this can be rewritten

\begin{matrix} VOC (b_{t}, c) = - cost (b_{t}, c) + E [max_{i} μ_{T}^{(i)} - \sum_{t^{'} = t + 1}^{T - 1} cost (b_{t^{'}}, c_{t^{'}}) | c_{t^{'}} \sim π^{*} (b_{t^{'}})] - max_{i} μ_{t}^{(i)} . \end{matrix}

That is, the VOC for sampling a given item in some belief state is the expected improvement in the value of the chosen item (rather than making a choice based on the current belief) minus the cost of sampling that item and the expected cost of all future samples.

We can then define the optimal policy as selecting computations with maximal VOC:

\begin{matrix} π^{*} (b) \sim Uniform (\underset{c}{argmax} VOC (b, c)) . \end{matrix}

For those familiar with reinforcement learning, this recursive joint definition of π* and VOC is exactly analogous to the joint definition of the optimal policy with the state-action value function, Q [86]. Indeed, VOC(b, c) = Q(b, c) − R(b, ⊥).

Finally, by definition, VOC(b, ⊥) = 0 for all b. Thus, the optimal policy terminates sampling when no computation has a positive VOC.

Approximating the optimal policy

For small discrete belief spaces, the optimal metalevel policy can be computed exactly using standard dynamic programming methods such as value iteration or backwards induction [87]. These methods can also be applied to low-dimensional, continuous belief spaces by first discretizing the space on a grid [45], and this approach has recently been used to characterize the optimal fixation policy in binary choice [68]. Unfortunately, these methods are infeasible in the trinary choice case, since the belief space has six continuous dimensions. Instead, we approximate the optimal policy by extending the method proposed in [53]. This method is based on an approximation of the VOC as a linear combination of features,

\begin{matrix} \begin{matrix} \hat{VOC} (b, c; w) = & w_{1} {VOI}_{myopic} (b, c) + w_{2} {VOI}_{item} (b, c) + w_{3} {VOI}_{full} (b) - (cost (c) + w_{4}), \end{matrix} \end{matrix}

(7)

for all c ≠ ⊥, with $\hat{VOC} (b, ⊥; w) = VOC (b, ⊥) = 0$ .

We briefly define the features here, and provide full derivations in S1 Appendix. The VOI terms quantify the value of information [88] that might be gained by different additional computations. Note that the VOI is different from the VOC because the latter includes the costs of computation as well as its benefits. In general, the VOI is defined as the expected improvement in the utility of the action selected based on additional information rather than the current belief state: $E_{\tilde{b} ∣ b} [R (\tilde{b}, ⊥) - R (b, ⊥)]$ , where $\tilde{b}$ is a hypothetical future belief in which the information has been gained, the distribution of which depends on the current belief.

VOI_myopic(b, c) denotes the expected improvement in choice utility from drawing one additional sample from item c before making a choice, as opposed to making a choice immediately based on the current belief, b. VOI_item(b, c) denotes the expected improvement from learning the true value of item c, and then choosing the best item based on that information. Finally, VOI_full(b) denotes the improvement from learning the true value of every item and then making an optimal choice based on that complete information.

Together, these three features approximate the expected value of information that could be gained by the (unknown) sequence of future samples. Importantly, this true value of information always lies between the lower bound of VOI_myopic and the upper bound of VOI_full (see Fig D in S1 Appendix), implying that the true VOI is a convex combination of these two terms. Note, however, that the weights on this combination are not constant across beliefs, as assumed in our approximation. Thus, including the VOI_item term, improves the accuracy of the approximation, by providing an intermediate value between the two extremes. Finally, the last two terms in Eq 7 approximate the cost of computation: cost(c) is the cost of carrying out computation c and w₄ approximates the expected future costs incurred under the optimal policy. Although maximizing $\hat{VOC} (b, c; w)$ identifies the policy with the best performance, it is unlikely that humans make attentional decisions using such perfect and noiseless maximization. Thus, we assume that computations are chosen using a Boltzmann (softmax) distribution [55] given by

\begin{matrix} π (c ∣ b; w, β) \propto exp {β \hat{VOC} (b, c; w)}, \end{matrix}

where the inverse temperature, β, is a free parameter that controls the degree of noise. Note that computation selection is fully random when β = 0 and becomes deterministic as β → ∞.

To identify the weights used in the approximation, we first assume that w_i ≥ 0 and w₁ + w₂ + w₃ = 1, since w_1:3 features form a convex combination and w₄ captures the non-negative future cost. Previous work [53] used Bayesian optimization to identify the weights within this space that maximize total expected metalevel reward. However, we found that often a large area of weight space resulted in extremely similar performance, despite inducing behaviorally distinct policies. Practically, this makes identifying a unique optimal policy challenging, and theoretically we would not expect all participants to follow a single unique policy when there is a wide plateau of high-performing policies. To address this, we instead identify a set of near-optimal policies and assume that human behavior will conform to the aggregate behavior of this set.

To identify this set of near-optimal policies, we apply a method based on Upper Confidence Bound (UCB) bandit algorithms [89]. We begin by sampling 8000 weight vectors to roughly uniformly tile the space of possible weights. Concretely, we divide a three-dimensional hypercube into 800 = 20³ equal-size boxes and sample a point uniformly from each box. The first two dimensions are bounded in (0, 1) and are used to produce w_1:3 using the following trick: Let x₁ and x₂ be the lower and higher of the two sampled values. We then define w_1:3 = [x₁, x₂ − x₁, 1 − x₂]. If x₁ and x₂ are uniformly sampled from (0, 1), and indeed they are, then this produces w_1:3 uniformly sampled from the 3-simplex. The third dimension produces the future cost weight; we set w₄ = x₃ ⋅ maxcost where maxcost is the lowest cost for which no computation has positive $\hat{VOC}$ in the initial belief state. We then simulate 100 decision trials for each of the resulting policies, providing a baseline level of performance. Using these simulations, we compute an upper confidence bound of each policy’s performance equal to ${\hat{μ}}_{i} + 3 {\hat{σ}}_{i}$ , where ${\hat{μ}}_{i}$ and ${\hat{σ}}_{i}$ are the empirical mean and standard deviations of the metalevel returns sampled for policy i. A standard UCB algorithm would then simulate from the policy maximizing this value. However, because we are interested in identifying a set of policies, we instead select the top 80 (i.e. 1% of) policies and simulate 10 additional trials for each, updating ${\hat{μ}}_{i}$ and ${\hat{σ}}_{i}$ for each one. We iterate this step 5000 times. Finally, we select the 80 policies with the highest expected performance as our characterization of optimal behavior in the metalevel MDP. To eliminate the possibility of fitting noise in the optimization procedure, we use one set of policies to compute the likelihood on the training data and re-optimize a new set of policies to generate plots and compute the likelihood of the test data. Note that we use the box sampling method described in the previous paragraph rather than a deterministic low discrepancy sampling strategy [90] so that the set of policies considered are not exactly the same in the fitting and evaluation stages.

How good is the approximation method? Previous work found that this approach generates near-optimal policies on a related problem, with Bernoulli-distributed samples and no switching costs [53]. Note that in the case of Bernoulli samples, the belief space is discrete and thus the optimal policy can be computed exactly if an upper bound is placed on the number of computations that can be performed before making a decision. Although introducing switching costs makes the metareasoning problem more challenging to solve, in the Bernoulli case we have found that they only induce a modest reduction in the performance of the approximation method relative to the full optimal policy, achieving 92% of optimal reward in the worst case (see S1 Appendix for details). This suggests that this method is likely to provide a reasonable approximation to the optimal policy in the model with Gaussian samples used here, but a full verification of this fact is beyond the scope of the current study.

Implementation of the prior

In the main text, we specified the prior as a property of the initial belief state. However, for technical reasons (in particular, to reuse the same set of optimized policies for multiple values of α), it is preferable to perform policy optimization and simulation in a standardized space, in which the initial belief state has μ₀ = 0 and λ₀ = 1. We then capture the prior over the ratings of items in the experiment by transforming the ratings into this standardized space such that the transformed values are in units defined by the prior. Concretely, given an item rating r⁽ⁱ⁾, we set the true value to

\begin{matrix} u^{(i)} = \frac{r^{(i)} - \bar{μ}}{\bar{σ}}, \end{matrix}

(8)

where $\bar{μ}$ and $\bar{σ}$ denote the prior mean and standard deviation. Modulo the resultant change in units (all parameter values are divided by $\bar{σ}$ ), this produces the exact same behavior as the naïve implementation, in which the initial belief itself varies.

There is one non-trivial consequence of using this approach when jointly fitting multiple datasets: The jointly fit parameters are estimated in the standardized space, rather than the space defined by the raw rating scale. As a result, if we transform the parameters back into the raw rating space, the parameters will be slightly different for the two datasets (even though they are identical in the transformed space). This was done intentionally because we expect that the parameters will be consistent in the context-independent units (i.e., standard deviations of an internal utility scale). However, this decision turns out to have negligible impact in our case because the empirical rating distributions are very similar. Specifically, the empirical rating distributions are (mean ± std) 3.492 ± 2.631 for the binary dataset and 4.295 ± 2.524 for the trinary dataset. Due to the difference in standard deviations, all parameters (except α, which is not affected) are 2.631/2.524 = 1.042 times larger in the raw rating space for the binary dataset compared to the trinary dataset. The difference in empirical means affects $\bar{μ}$ , which is 3.492/4.295 = 0.813 times as large in the binary compared to trinary dataset. However, given our interpretation of α as a degree of updating towards the empirical mean, this difference is as intended.

Model simulation procedure

Given a metalevel MDP and policy, π, simulating a choice trial amounts to running a single episode of the policy on the metalevel MDP. To run an episode, we first initialize the belief state, b₀ = (μ₀ = 0, λ₀ = 1, last₀ = ⊘). Note that last₀ = ⊘ indicates that no item is fixated at the onset of a trial.

The agent then selects an initial computation c₀ ∼ π(b₀) and the belief is updated according to the transition dynamics (Eq 6). Note that π(c∣b₀) assigns equal sampling probability to all of the items, since the subject starts with symmetrical beliefs. This process repeats until some time step, T, when the agent selects the termination action, ⊥. The predicted choice is the item with maximal posterior value, $i_{T}^{*} = \underset{i}{argmax} μ_{T}^{(i)}$ . In the event of a tie, the choice is sampled uniformly from the set of items with maximal expected value in the final belief state; in practice, this never happens with well-fitting parameter values.

To translate the sequence of computations into a fixation sequence, we assume that each sample takes 100 ms and concatenate multiple contiguous samples from the same item into one fixation. The temporal duration of a sample is arbitrary; a lower value would result in finer temporal predictions, but longer runtime when simulating the model. In this way, it is very similar to the dt parameter used in simulating diffusion decision models. Importantly the qualitative predictions of the model are insensitive to this parameter because σ_x and γ_sample can be adjusted to result in the same amount of information and cost per ms.

We simulate the model for two different purposes: (1) identifying the optimal policy and (2) comparing model predictions to human behavior. In the former case, we randomly sample the true utilities on each “trial” i.i.d. from Gaussian(0, 1). This corresponds to the assumption that the fixation policy is optimized for an environment in which the DM’s prior is accurate. When simulating a specific trial for comparison to human behavior, the true value of each item is instead determined by the liking ratings for the items presented on that trial, as specified in Eq 8.

Model parameter estimation

The model has five free parameters: the standard deviation of the sampling distribution, σ_x, the cost per sample, γ_sample, the cost of switching attention, γ_switch, the degree of prior updating, α, and the inverse temperature of the Boltzmann policy, β. We estimate a single set of parameters at the group level using approximate maximum likelihood estimation in the combined two- and three-item datasets, using only the even trials.

To briefly summarize the estimation procedure: given a candidate set of parameter values, we construct the corresponding metalevel MDP and identify a set of 80 near-optimal policies for that MDP. We then approximate the likelihood of the human fixation and choice data using simulations from the optimized policies. Finally, we perform this full procedure for 70,000 quasi-randomly sampled parameter configurations and report the top thirty configurations (those with the highest likelihood) to give a rough sense of the uncertainty in the model predictions. A parameter recovery exercise (reported in S1 Appendix) suggests that this method, though approximate, is sufficient to identify the parameters of the model with fairly high accuracy. Below, we explain in detail how we estimate and then maximize the approximate likelihood.

The primary challenge in fitting the model is in estimating the likelihood function. In principle, we could seek to maximize the joint likelihood of the observed fixation sequences and choices. However, like most sequential sampling models, our model does not have an analytic likelihood function. Additionally, the high dimensionality of the fixation data makes standard methods for approximating the likelihood [91, 92] infeasible. Thus, taking inspiration from Approximate Bayesian Computation methods [93, 94], we approximate the likelihood by collapsing the high dimensional fixation data into four summary statistics: the identity of the chosen item, the number of fixations, the total fixation time, and the proportion of fixation time on each item. As described below, we estimate the joint likelihood of these summary statistics as a smoothed histogram of the statistics in simulated trials, and then approximate the likelihood of a trial by the likelihood of its summary statistics. We emphasize, however, that we do not use this approximate likelihood to evaluate the performance of the model. Instead, we intend it to be a maximally principled (and minimally researcher-specified) approach to choosing model parameters, given that computing a true likelihood is computationally infeasible.

Given a set of near-optimal policies, we estimate the likelihood of the summary statistics for each trial using a smoothed histogram of the summary statistics in simulated trials. Critically, this likelihood is conditional on the ratings for the item in that trial. However, it depends only on the (unordered) set of these ratings; thus, we estimate the conditional likelihood once for each such set. Given a set of ratings, we simulate the model 625 times for each of the 80 policies, using the resulting 50,000 simulations to construct a histogram of the trial summary statistics. The continuous statistics (total and proportion fixation times) are binned into quintiles (i.e., five bins containing equal amounts of the data) defined by the distribution in the experimental data. For the fixation proportions, the quintiles are defined on the rating rank of the item rather than the spatial location because we expect the distributions to depend on relative rating in the three-item case. Values outside the experimental range are placed into the corresponding tail bin. Similarly, trials with five or more fixations are all grouped into one bin (including e.g., six and seven fixations) and cases in which the model predicts zero fixations are grouped into the one-fixation bin. This latter case corresponds to choosing an item immediately without ever sampling, and occurs rarely in well-fitting instantiations of the model, but happens frequently when γ_sample is set too high. For each simulation, we compute the binned summary statistics, identify the corresponding cell in the histogram, and increase its count by one. Finally, we normalize this histogram, resulting in a likelihood over the summary statistics. To compute the likelihood of a trial, $L (d ∣ θ$ ), we compute the binned summary statistics for the trial and look up the corresponding value in the normalized histogram for that trial’s rating set.

To account for trials that are not well explained by our model, we use add-n smoothing, where n was chosen independently for each θ to maximize the likelihood. This is equivalent to assuming a mixture between the empirical distribution and a uniform distribution with mixing weight ϵ. Thus, the full approximate likelihood is

\begin{matrix} L (D ∣ θ) = max_{ϵ \in [0, 0.5]} \prod_{d \in D} (ϵ \frac{1}{C} + (1 - ϵ) L (d ∣ θ)), \end{matrix}

where C = N ⋅ 5^N+1 is the total number of cells in the histogram. Importantly, this error model is only used to approximate the likelihood; it is not used for generating the model predictions in the figures—indeed, it could not be used in this way because the error model is defined over the summary statistics, and cannot generate full sequences of fixations. Thus, the ϵ parameter should be interpreted in roughly the same way as the bandwidth parameter of a kernel density estimate [91], rather than as an additional free parameter of the model.

We then use this approximate likelihood function to identify a maximum likelihood estimate, $\hat{θ} = arg max L (D ∣ θ)$ . Based on manual inspection, we identified the promising region of parameter space to be σ_x ∈ (1, 5), γ_sample ∈ (0.001, 0.01), γ_switch ∈ (0.003, 0.03), and β ∈ (100, 500). We then ran an additional quasi-random search of 10,000 points within this space using Sobol low-discrepancy sequences [90]. This approach has been shown to be more effective than both grid search and random search, while still allowing for massive parallelization [95].

Note that the optimal policy does not depend on α because the DM believes her prior to be unbiased (by definition) and makes her fixation decisions accordingly. The alternative, optimizing the policy conditional on α, would imply that the DM is internally inconsistent, accounting for the bias in her fixations but not in the prior itself. Thus, we optimize α separately from the other parameters. Specifically, we consider 10,000 possible instantiations of all the other parameters, find optimal policies once for each instantiation, and evaluate the likelihood for seven values of α; these seven values included the special cases of 0 and 1 as well as five additional evenly-spaced values with a random offset (roughly capturing the low-discrepancy property of the Sobol sequence).

We found that the stochasticity in the policy optimization and likelihood estimation coupled with weak identifiability for some parameters resulted in slightly different results when re-running the full procedure; thus, to give a rough sense of the uncertainty in the estimate, we identify the top thirty parameters, giving us both mean and standard deviation for each parameter and the total likelihood.

Supporting information

S1 Appendix. Supplementary methods and results.

Includes descriptions of the tasks for the datasets we model, individual fitting methods and summary of results, parameter recovery results, aDDM implementation and validation, derivations for the value of information features, and a validation of the policy approximation method.

(PDF)

Click here for additional data file.^{(1.4MB, pdf)}

S2 Appendix. Individual fitting results.

Includes versions of all plots in the main text with separate panels for each participant (including model predictions with parameters fit to each participant).

(PDF)

Click here for additional data file.^{(25.6MB, pdf)}

Acknowledgments

We thank Ian Krajbich for his help in simulating the aDDM and Bas van Opheusden for suggesting the method for efficiently computing VOI_full.

Data Availability

The data and analysis code can be found on Github: https://github.com/fredcallaway/optimal-fixations-simple-choice.

Funding Statement

This research was supported by a grant from Facebook Reality Labs awarded to TG (https://research.fb.com/category/augmented-reality-virtual-reality/) and a grant from the NOMIS Foundation (https://nomisfoundation.ch/) awarded to AR. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Ratcliff R, McKoon G. The diffusion decision model: theory and data for two-choice decision tasks. Neural computation. 2008;20(4):873–922. 10.1162/neco.2008.12-06-420 [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Ratcliff R, Smith PL, Brown SD, McKoon G. Diffusion Decision Model: Current Issues and History. Trends in Cognitive Science. 2016;20(4):260–281. 10.1016/j.tics.2016.01.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Milosavljevic M, Malmaud J, Huth A, Koch C, Rangel A. The Drift Diffusion Model can account for the accuracy and reaction time of value-based chioces under high and low time pressure. Judgment and Decision Making. 2010;5(6):437–449. [Google Scholar]
4. Usher M, McClelland JL. The Time Course of Perceptual Choice: The Leaky, Competing Accumulator Model. Psychological Review. 2001;108(3):550–592. 10.1037/0033-295X.108.3.550 [DOI] [PubMed] [Google Scholar]
5. Usher M, McClelland JL. Loss Aversion and Inhibition in Dynamical Models of Multialternative Choice. Psychological Review. 2004;111(3):757–769. 10.1037/0033-295X.111.3.757 [DOI] [PubMed] [Google Scholar]
6. Shimojo S, Simion C, Shimojo E, Scheier C. Gaze bias both reflects and influences preference. Nature Neuroscience. 2003;6(12):1317–1322. 10.1038/nn1150 [DOI] [PubMed] [Google Scholar]
7. Armel KC, Beaumel A, Rangel A. Biasing Simple Choices by Manipulating Relative Visual Attention. Judgment and Decision Making. 2008;3(5):396–403. [Google Scholar]
8. Glaholt MG, Reingold EM. Stimulus exposure and gaze bias: A further test of the gaze cascade model. Attention, Perception & Psychophysics. 2009;71(3):445–450. 10.3758/APP.71.3.445 [DOI] [PubMed] [Google Scholar]
9. Krajbich I, Armel C, Rangel A. Visual fixations and the computation and comparison of value in simple choice. Nature Neuroscience. 2010;13(10):1292–1298. 10.1038/nn.2635 [DOI] [PubMed] [Google Scholar]
10. Krajbich I, Rangel A. Multialternative drift-diffusion model predicts the relationship between visual fixations and choice in value-based decisions. Proceedings of the National Academy of Sciences. 2011;108(33):13852–13857. 10.1073/pnas.1101328108 [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Cavanagh JF, Wiecki TV, Kochar A, Frank MJ. Eye tracking and pupillometry are indicators of dissociable latent decision processes. Journal of Experimental Psychology: General. 2014;143(4):1476–1488. 10.1037/a0035813 [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Tavares G, Perona P, Rangel A. The Attentional Drift Diffusion Model of Simple Perceptual Decision-Making. Frontiers in Neuroscience. 2017;11. [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Smith SM, Krajbich I. Gaze Amplifies Value in Decision Making. Psychological Science. 2019;30(1):116–128. 10.1177/0956797618810521 [DOI] [PubMed] [Google Scholar]
14. Armel KC, Rangel A. Neuroeconomic models of economic decision making: The impact of computation time and experience on decision values. American Economic Review. 2008;98(2):163–168. 10.1257/aer.98.2.163 [DOI] [Google Scholar]
15. Orquin JL, Mueller Loose S. Attention and Choice: A Review on Eye Movements in Decision Making. Acta Psychologica. 2013;144(1):190–206. 10.1016/j.actpsy.2013.06.003 [DOI] [PubMed] [Google Scholar]
16. Krajbich I. Accounting for Attention in Sequential Sampling Models of Decision Making. Current Opinion in Psychology. 2018;29:6–11. [DOI] [PubMed] [Google Scholar]
17. Gluth S, Spektor MS, Rieskamp J. Value-based attentional capture affects multi-alternative decision making. Elife. 2018;7:e39659. 10.7554/eLife.39659 [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Towal RB, Mormann M, Koch C. Simultaneous modeling of visual saliency and value computation improves predictions of economic choice. Proceedings of the National Academy of Sciences. 2013;110(40):E3858–E3867. 10.1073/pnas.1304429110 [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Thomas AW, Molter F, Krajbich I, Heekeren HR, Mohr PNC. Gaze Bias Differences Capture Individual Choice Behaviour. Nature Human Behaviour. 2019;3(6):625–635. 10.1038/s41562-019-0584-8 [DOI] [PubMed] [Google Scholar]
20. Gluth S, Kern N, Kortmann M, Vitali CL. Value-Based Attention but Not Divisive Normalization Influences Decisions with Multiple Alternatives. Nature Human Behaviour. 2020;4(6):634–645. 10.1038/s41562-020-0822-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Song M, Wang X, Zhang H, Li J. Proactive information sampling in value-based decision-making: Deciding when and where to saccade. Frontiers in Human Neuroscience. 2019;13(February):1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Sepulveda P, Usher M, Davies N, Benson AA, Ortoleva P, De Martino B. Visual Attention Modulates the Integration of Goal-Relevant Evidence and Not Value. eLife. 2020;9:e60705. 10.7554/eLife.60705 [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Moreno-Bote R, Ramírez-Ruiz J, Drugowitsch J, Hayden BY. Heuristics and Optimal Solutions to the Breadth–Depth Dilemma. Proceedings of the National Academy of Sciences. 2020;117(33):19799–19808. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Ramírez-Ruiz J, Moreno-Bote R. Optimal Allocation of Finite Sampling Capacity in Accumulator Models of Multi-Alternative Decision Making. arXiv:210201597 [q-bio]. 2021;. [DOI] [PMC free article] [PubMed]
25. Gottlieb J, Oudeyer PY. Towards a neuroscience of active sampling and curiosity. Nature Reviews Neuroscience. 2018;19(12):758–770. 10.1038/s41583-018-0078-0 [DOI] [PubMed] [Google Scholar]
26. Najemnik J, Geisler WS. Optimal eye movement strategies in visual search. Nature. 2005;434(7031):387–391. 10.1038/nature03390 [DOI] [PubMed] [Google Scholar]
27. Eckstein MP. Visual search: A retrospective. Journal of Vision. 2011;11(5):1–36. [DOI] [PubMed] [Google Scholar]
28. Cassey TC, Evens DR, Bogacz R, Marshall JAR, Ludwig CJH. Adaptive Sampling of Information in Perceptual Decision-Making. PLOS ONE. 2013;8(11):e78993. 10.1371/journal.pone.0078993 [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Ludwig CJ, Evens DR. Information foraging for perceptual decisions. Journal of Experimental Psychology: Human Perception and Performance. 2017;43(2):245–264. [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Itti L, Baldi P. Bayesian surprise attracts human attention. Vision Research. 2009;49(10):1295–1306. 10.1016/j.visres.2008.09.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
31. Gottlieb J, Oudeyer PY, Lopes M, Baranes A. Information-Seeking, Curiosity, and Attention: Computational and Neural Mechanisms. Trends in Cognitive Sciences. 2013;17(11):585–593. 10.1016/j.tics.2013.09.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Savage LJ. The Foundations of Statistics. The Foundations of Statistics. Oxford, England: John Wiley & Sons; 1954. [Google Scholar]
33. Von Neumann J, Morgenstern O. Theory of Games and Economic Behavior. Princeton, NJ, US: Princeton University Press; 1944. [Google Scholar]
34. Lewis RL, Howes A, Singh S. Computational Rationality: Linking Mechanism and Behavior through Bounded Utility Maximization. Topics in Cognitive Science. 2014;6(2):279–311. 10.1111/tops.12086 [DOI] [PubMed] [Google Scholar]
35. Griffiths TL, Lieder F, Goodman ND. Rational Use of Cognitive Resources: Levels of Analysis between the Computational and the Algorithmic. Topics in Cognitive Science. 2015;7(2):217–229. 10.1111/tops.12142 [DOI] [PubMed] [Google Scholar]
36. Lieder F, Griffiths TL. Resource-Rational Analysis: Understanding Human Cognition as the Optimal Use of Limited Computational Resources. Behavioral and Brain Sciences. 2019;. [DOI] [PubMed] [Google Scholar]
37. Gershman SJ, Horvitz EJ, Tenenbaum JB. Computational Rationality: A Converging Paradigm for Intelligence in Brains, Minds, and Machines. Science. 2015;349 (6245). 10.1126/science.aac6076 [DOI] [PubMed] [Google Scholar]
38. Sims CA. Stickiness. Carnegie-Rochester Conference Series on Public Policy. 1998;49:317–356. 10.1016/S0167-2231(99)00013-5 [DOI] [Google Scholar]
39. Caplin A, Dean M. Behavioral Implications of Rational Inattention with Shannon Entropy. NBER Working Paper. 2013;(August):1–40. [Google Scholar]
40. Bogacz R, Brown E, Moehlis J, Holmes P, Cohen JD. The physics of optimal decision making: A formal analysis of models of performance in two-alternative forced-choice tasks. Psychological Review. 2006;113(4):700–765. 10.1037/0033-295X.113.4.700 [DOI] [PubMed] [Google Scholar]
41. Moreno-Bote R. Decision Confidence and Uncertainty in Diffusion Models with Partially Correlated Neuronal Integrators. Neural Computation. 2010;22(7):1786–1811. 10.1162/neco.2010.12-08-930 [DOI] [PubMed] [Google Scholar]
42. Drugowitsch J, Moreno-Bote R, Churchland AK, Shadlen MN, Pouget A. The Cost of Accumulating Evidence in Perceptual Decision Making. Journal of Neuroscience. 2012;32(11):3612–3628. 10.1523/JNEUROSCI.4010-11.2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
43. Bitzer S, Park H, Blankenburg F, Kiebel SJ. Perceptual decision making: drift-diffusion model is equivalent to a Bayesian model. Frontiers in Human Neuroscience. 2014;8(February):1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
44. Tajima S, Drugowitsch J, Pouget A. Optimal policy for value-based decision-making. Nature Communications. 2016;7:1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
45. Tajima S, Drugowitsch J, Patel N, Pouget A. Optimal policy for multi-alternative decisions. Nature Neuroscience. 2019;22(9):1503–1511. 10.1038/s41593-019-0453-9 [DOI] [PubMed] [Google Scholar]
46. Fudenberg D, Strack P, Strzalecki T. Speed, accuracy, and the optimal timing of choices. American Economic Review. 2018;108(12):3651–3684. 10.1257/aer.20150742 [DOI] [Google Scholar]
47. Biderman N, Bakkour A, Shohamy D. What Are Memories For? The Hippocampus Bridges Past Experience with Future Decisions. Trends in Cognitive Sciences. 2020;24(7):542–556. 10.1016/j.tics.2020.04.004 [DOI] [PubMed] [Google Scholar]
48. Bakkour A, Palombo DJ, Zylberberg A, Kang YHR, Reid A, Verfaellie M, et al. The Hippocampus Supports Deliberation during Value-Based Decisions. eLife. 2019;8:undefined–undefined. 10.7554/eLife.46080 [DOI] [PMC free article] [PubMed] [Google Scholar]
49. Wang S, Feng SF, Bornstein A. Mixing memory and desire: How memory reactivation supports deliberative decision-making. 2020;. [DOI] [PMC free article] [PubMed] [Google Scholar]
50. Matheson JE. The Economic Value of Analysis and Computation. IEEE Transactions on Systems Science and Cybernetics. 1968;4(3):325–332. 10.1109/TSSC.1968.300126 [DOI] [Google Scholar]
51. Russell S, Wefald E. Principles of metareasoning. Artificial Intelligence. 1991;49(1-3):361–395. 10.1016/0004-3702(91)90015-C [DOI] [Google Scholar]
52.Hay N, Russell S, Tolpin D, Shimony SE. Selecting Computations: Theory and Applications. In: Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence. UAI’12. Arlington, Virginia, USA: AUAI Press; 2012. p. 346–355.
53.Callaway F, Gul S, Krueger P, Griffiths TL, Lieder F. Learning to select computations. In: Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference; 2018.
54. Gold JI, Shadlen MN. Banburismus and the Brain: Decoding the Relationship between Sensory Stimuli, Decisions, and Reward. Neuron. 2002;36(2):299–308. 10.1016/S0896-6273(02)00971-6 [DOI] [PubMed] [Google Scholar]
55. McFadden D. Economic choices. American Economic Review. 2001;91(3):351–378. 10.1257/aer.91.3.351 [DOI] [Google Scholar]
56. Frömer R, Dean Wolf CK, Shenhav A. Goal Congruency Dominates Reward Value in Accounting for Behavioral and Neural Correlates of Value-Based Decision-Making. Nature Communications. 2019;10(1):1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
57. Hunt LT, Kolling N, Soltani A, Woolrich MW, Rushworth MFS, Behrens TEJ. Mechanisms Underlying Cortical Activity during Value-Guided Choice. Nature Neuroscience. 2012;15(3):470–476. 10.1038/nn.3017 [DOI] [PMC free article] [PubMed] [Google Scholar]
58. Polanía R, Krajbich I, Grueschow M, Ruff CC. Neural Oscillations and Synchronization Differentially Support Evidence Accumulation in Perceptual and Value-Based Decision Making. Neuron. 2014;82(3):709–720. 10.1016/j.neuron.2014.03.014 [DOI] [PubMed] [Google Scholar]
59. Pirrone A, Azab H, Hayden BY, Stafford T, Marshall JAR. Evidence for the Speed–Value Trade-off: Human and Monkey Decision Making Is Magnitude Sensitive. Decision. 2018;5(2):129–142. 10.1037/dec0000075 [DOI] [PMC free article] [PubMed] [Google Scholar]
60. Anderson BA. The attention habit: How reward learning shapes attentional selection. Annals of the New York Academy of Sciences. 2016;1369(1):24–39. 10.1111/nyas.12957 [DOI] [PubMed] [Google Scholar]
61. Ratcliff R. A theory of memory retrieval. Psychological Review. 1978;85(2):59–108. 10.1037/0033-295X.85.2.59 [DOI] [Google Scholar]
62. Teodorescu AR, Usher M. Disentangling Decision Models: From Independence to Competition. Psychological Review. 2013;120(1):1–38. 10.1037/a0030776 [DOI] [PubMed] [Google Scholar]
63. Busemeyer JR, Townsend JT. Decision Field Theory: A Dynamic-Cognitive Approach to Decision Making in an Uncertain Environment. Psychological Review. 1993;100(3):432–459. 10.1037/0033-295X.100.3.432 [DOI] [PubMed] [Google Scholar]
64. Holmes WR, Trueblood JS, Heathcote A. A new framework for modeling decisions about changing information: The Piecewise Linear Ballistic Accumulator model. Cognitive psychology. 2016;85:1–29. 10.1016/j.cogpsych.2015.11.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
65. Smith SM, Krajbich I. Attention and Choice across Domains. Journal of Experimental Psychology General. 2018;147(12):1810–1826. 10.1037/xge0000482 [DOI] [PubMed] [Google Scholar]
66. Stojić H, Orquin JL, Dayan P, Dolan RJ, Speekenbrink M. Uncertainty in learning, choice, and visual fixation. Proceedings of the National Academy of Sciences of the United States of America. 2020;117(6):3291–3300. 10.1073/pnas.1911348117 [DOI] [PMC free article] [PubMed] [Google Scholar]
67. Just MA, Carpenter PA. Eye Fixations and Cognitive Processes. Cognitive Psychology. 1976;. 10.1016/0010-0285(76)90015-3 [DOI] [Google Scholar]
68. Jang A, Sharma R, Drugowitsch J. Optimal policy for attention-modulated decisions explains human fixation behavior. BioRXiv. 2020;. [DOI] [PMC free article] [PubMed] [Google Scholar]
69. Hunt LT, Rutledge RB, Malalasekera WMN, Kennerley SW, Dolan RJ. Approach-induced biases in human information sampling. PLOS Biology. 2016;14(11):e2000638. 10.1371/journal.pbio.2000638 [DOI] [PMC free article] [PubMed] [Google Scholar]
70. Hébert B, Woodford M. Rational Inattention with Sequential Information Sampling. Working Paper. 2017; p. 1–141. [Google Scholar]
71. Hebert B, Woodford M. Rational Inattention When Decisions Take Time. Journal of Chemical Information and Modeling. 2019;53(9):1689–1699. [Google Scholar]
72. Itti L, Koch C. A Saliency-Based Search Mechanism for Overt and Covert Shifts of Visual Attention. Vision Research. 2000;40(10):1489–1506. [DOI] [PubMed] [Google Scholar]
73. Roe RM, Busemeyer JR, Townsend JT. Multialternative decision field theory: A dynamic connectionist model of decision making. Psychological Review. 2001;108(2):370–392. 10.1037/0033-295X.108.2.370 [DOI] [PubMed] [Google Scholar]
74. Noguchi T, Stewart N. Multialternative decision by sampling: A model of decision making constrained by process data. Psychological Review. 2018;125(4):512–544. 10.1037/rev0000102 [DOI] [PMC free article] [PubMed] [Google Scholar]
75. Russo JE, Dosher BA. Strategies for Multiattribute Binary Choice. Journal of Experimental Psychology Learning, Memory, and Cognition. 1983;9(4):676–696. 10.1037/0278-7393.9.4.676 [DOI] [PubMed] [Google Scholar]
76. Trueblood JS, Brown SD, Heathcote A. The Multiattribute Linear Ballistic Accumulator Model of Context Effects in Multialternative Choice. Psychological Review. 2014;121(2):179–205. 10.1037/a0036137 [DOI] [PubMed] [Google Scholar]
77. Berkowitsch NAJ, Scheibehenne B, Rieskamp J. Rigorously testing multialternative decision field theory against random utility models. Journal of Experimental Psychology: General. 2014;143(3):1331–1348. 10.1037/a0035159 [DOI] [PubMed] [Google Scholar]
78. Fisher G. An attentional drift diffusion model over binary-attribute choice. Cognition. 2017;168:34–45. 10.1016/j.cognition.2017.06.007 [DOI] [PubMed] [Google Scholar]
79. Krajbich I, Lu D, Camerer C, Rangel A. The Attentional Drift-Diffusion Model Extends to Simple Purchasing Decisions. Frontiers in Psychology. 2012;3. [DOI] [PMC free article] [PubMed] [Google Scholar]
80. Westbrook A, van den Bosch R, Määttä JI, Hofmans L, Papadopetraki D, Cools R, et al. Dopamine promotes cognitive effort by biasing the benefits versus costs of cognitive work. Science. 2020;367(6484):1362–1366. 10.1126/science.aaz5891 [DOI] [PMC free article] [PubMed] [Google Scholar]
81. Shi SW, Wedel M, Pieters F. Information acquisition during online decision making: A model-based exploration using eye-tracking data. Management Science. 2013;59(5):1009–1026. 10.1287/mnsc.1120.1625 [DOI] [Google Scholar]
82. Manohar SG, Husain M. Attention as Foraging for Information and Value. Frontiers in Human Neuroscience. 2013;7(November):1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
83. Gabaix X, Laibson D, Moloche G, Weinberg S. Costly Information Acquisition: Experimental Analysis of a Boundedly Rational Model. American Economic Review. 2006;96 (4)(4):1043–1068. 10.1257/000282806779468544 [DOI] [Google Scholar]
84. Yang L, Toubia O, De Jong MG. A bounded rationality model of information search and choice in preference measurement. Journal of Marketing Research. 2015;52(2):166–183. 10.1509/jmr.13.0288 [DOI] [Google Scholar]
85. Bezanson J, Edelman A, Karpinski S, Shah VB. Julia: A fresh approach to numerical computing. SIAM review. 2017;59(1):65–98. 10.1137/141000671 [DOI] [Google Scholar]
86. Sutton RS, Barto AG. Reinforcement learning: An introduction. MIT press; 2018. [Google Scholar]
87.Callaway F, Lieder F, Das P, Gul S, Krueger PM, Griffiths TL. A resource-rational analysis of human planning. In: Proceedings of the 40th Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society; 2018.
88. Howard RA. Information value theory. IEEE Transactions on systems science and cybernetics. 1966;2(1):22–26. 10.1109/TSSC.1966.300074 [DOI] [Google Scholar]
89. Auer P, Cesa-Bianchi N, Fischer P. Finite-time analysis of the multiarmed bandit problem. Machine learning. 2002;47(2-3):235–256. [Google Scholar]
90. Sobol’ IM. On the distribution of points in a cube and the approximate evaluation of integrals. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki. 1967;7(4):784–802. [Google Scholar]
91. Turner BM, Sederberg PB. A Generalized, Likelihood-Free Method for Posterior Estimation. Psychonomic Bulletin and Review. 2014;21(2):227–250. 10.3758/s13423-013-0530-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
92.van Opheusden B, Acerbi L, Ma WJ. Unbiased and Efficient Log-Likelihood Estimation with Inverse Binomial Sampling. arXiv:200103985 [cs, q-bio, stat]. 2020;. [DOI] [PMC free article] [PubMed]
93. Sunnåker M, Busetto AG, Numminen E, Corander J, Foll M, Dessimoz C. Approximate Bayesian Computation. PLOS Computational Biology. 2013;9(1):e1002803. 10.1371/journal.pcbi.1002803 [DOI] [PMC free article] [PubMed] [Google Scholar]
94. Csilléry K, Blum MGB, Gaggiotti OE, François O. Approximate Bayesian Computation (ABC) in Practice. Trends in Ecology & Evolution. 2010;25(7):410–418. 10.1016/j.tree.2010.04.001 [DOI] [PubMed] [Google Scholar]
95. Bergstra J, Bengio Y. Random search for hyper-parameter optimization. Journal of machine learning research. 2012;13(Feb):281–305. [Google Scholar]

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008863.r001

Decision Letter 0

Samuel J Gershman, Stefano Palminteri

15 Oct 2020

Dear Callaway,

Thank you very much for submitting your manuscript "Fixation patterns in simple choice reflect optimal information sampling" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments (major revisions).

All Reviewers' points need to be addressed very carefully. We will send the revisions back to the original reviewers.

I would like to stress that both Reviewer 2 and Reviewer 3 raise important issues concerning the model assumption (DDM vs. aDDM and sequential vs. parallel sampling) that will require additional and extensive modelling work.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Stefano Palminteri

Associate Editor

PLOS Computational Biology

Samuel Gershman

Deputy Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Reviewer #1: In the manuscript entitled “Fixation patterns in simple choice reflect optimal information sampling”, Callaway and colleagues propose a model for optimal information sampling in value-based decision making. The model is fitted to binary and trinary choices data, and data fit and misfits are discussed.

Overall, I found this paper quite interesting. It raises an important issue about the optimality of fixations in value-based decision making that has not been explored so far. However, I am concerned about the theoretical assumptions of such a model.

Main points

1. My first concern is about the concept of information sampling in value-based decision making and its dependency on visual inputs. While visual inputs constitute the accumulated evidence in perceptual decision making (as described in the sequential sampling framework), the evidence accumulated in value-based decisions has less to do with visual aspects of the stimuli (as they are typically easy to identify based on their appearances) and more with values retrieved from memory. In fact, value based decisions can also be carried out without any visual input: One could ask participants to press left if they prefer a Kit-kat and right if they prefer a Bounty, without presenting Kit-kats and Bounties on the left and right sides of the screen. Participants could also be asked to choose between a right and a left option and to fixate a cross in the middle of the screen. Therefore, fixations should not be seen as instrumental or necessary for value based-decisions. Because of this, the authors’ position, i.e., that “fixations are deployed to sample information optimally in order to make the best possible choice” [line 39] is to me inappropriate in value-based decisions. The authors say that such an account comes from perceptual decision making, where fixations are instrumental for the accumulation of evidence (the more attention to a specific area of the visual field, the better discriminating evidence I can gather). This is however not the case for value-based decisions, as it is possible to gather information for or against one option without looking at it (because the evidence is stored in memory).

2. The other, related, theoretical concern has to do with the use of “selective/sequential”, as opposed to “parallel” information sampling. The authors say that, in the DDM, “the same amount of information about all options” is received “in parallel” [line 60], and the DDM constitutes the optimal algorithm for that kind of situation. However, in the aDDM, “decisions are based on sequentially accumulated value samples” [line 31] and that “the optimal algorithm when the decision maker can sample information selectively is unknown” [line 62]. In their model, participants get samples only from the item they are gazing at, and can only get samples from the other item(s) by switching the gaze towards them, which is costly. I have 3 points related this:

(a) can the author actually provide evidence that participants are indeed gathering information selectively/sequentially? This would mean to find evidence that people only gather information about an option at a time. This seems to be a crucial point: If there is actually no evidence for this, then the DDM is actually still the optimal algorithm (well, at least for 2-options tasks).

(b) The other point is that there should be some clarification about mutual inhibition. In both the DDM and aDDM, evidence is accumulated as a single sum, and there is perfect inhibition between the 2 options. This means that the higher the evidence in favour of one option at a time point t (including the noise), the more discounted the evidence for the other option will be. As far as I understand (please correct me if I am wrong), in their account, it seems like there is no inhibition at all. So I think what the authors are proposing here is not only a model sequential information sampling, but also information sampling in which the evidence accumulated for one option is independent from the one accumulated from the other. I think it should be an open question whether this is optimal in the case of sequential information sampling as well. In case I misunderstood this point, I would still like the question of inhibition to be clarified in the paper, explaining how that plays a crucial role in optimality.

(c) I think it is incorrect to say that in the aDDM samples are accumulated sequentially. In the aDDM, the drift-rate changes within a trial to account for attentional fluctuations. However, the drift-rate is the same as in the DDM: it assumes that evidence is accumulated in a single sum and with perfect inhibition, so never only for one option alone.

3. In a previous paper (Smith & Krajbich, Psychol Sci, 2018) the aDDM was compared with a model in which "gaze merely adds evidence, providing a fixed advantage for the attended option”, also called by the authors an additive model (Krajbich, Current Opinion in Psychology, 2018). The authors found that only the aDDM could account for overall-value effects. Can the author explain what their optimal model say about such effects? Also can the authors discuss the similarities between theirs and this additive model accounts?

4. I think it would benefit to report the aDDM fits (quantitative and qualitative) in this paper as well, even though they might be found in the original papers. These could work well as a benchmark model and make very clear where the authors’ proposed account and the aDDM converge/diverge.

5. In my opinion, the authors should provide a better definition of optimality in value-based decision making. Can their view be better grounded in existing literature perhaps?

6. I found the Methods a bit brief and at the same time quite hard to read. I suggest to make them a bit more accessible. I have a few points here:

(a) The model is fit at a group level, across two tasks, and only group-level patterns are shown for the model fits. Can the authors justify this choice and also show how individual fits actually look like? As said before, perhaps comparing their model to the aDDM?

(b) Can the authors provide evidence that the model is identifiable and that the parameters (in this case at the group level) are unbiased? Is the use of the chosen trial summary statistics for fitting sufficient and are they unbiased?

Minor points

- Figure 3B: Better use quantiles and not kernel density estimators to compare RT/fixation time distributions.

- Response times (versus reaction), as we are in the value-based and not perceptual domain.

- I believe the (more) correct name is diffusion decision model (and not drift diffusion model), see for example: Ratcliff, R., Smith, P. L., Brown, S. D., & McKoon, G. (2016). Diffusion decision model: Current issues and history. Trends in cognitive sciences, 20(4), 260-281.

- Repetition of “process”, line 29.

- "choice" is missing, line 81.

- Something funny on line 512.

- Quantiles spelled wrong, line 533.

Reviewer #2: ## Summary

The manuscript presents an "optimal" joint model for choices, fixations, and fixation times in binary and ternary value-based decision making. The model is an optimal model in the sense that it assumes that an agent attempts to maximize their payoff while choosing from which of the options to sample (i.e., fixate on) which increases the information of the value of the option and when to stop and pick the option with the highest value. The model has only four free parameter (or five parameters, see below), but is able to predict an impressive number of empirical patterns and only fails to explain a few, especially for the binary choices. Importantly, the model performance is evaluated using 50% of hold-out data that was not used to fit the model.

## Evaluation

There is a lot to like about this paper: It starts with an interesting perspective of the task of the participant, considers several data types (i.e., eye movements and choices), implements the idea in a formally and technically sophisticated manner, uses different data for model estimation and model testing, and the model provides an overall pretty good performance. Thus, on the positive side, the paper provides an impressive computational effort to value-based decision making. However, there are also a few negative issues, listed below, that somewhat limit the impact of the paper. The big lingering question is what we really learn from the model about the task? Thus, my main suggestions are about the framing of the results. Nevertheless, I believe these issues can be fixed in a revision by providing a more balanced view of the contribution of the model. In addition, there are a few situation which I believe some parts could be clarified a bit.

Main issues

1. This might be more of a reviewer than an author issue, but I have some problems with the usage of the term "optimal" in this manuscript. To me, the term "optimal model" implies that the model captures the relevant constraints of the task and implements the one strategy (or "metalevel" strategy in the present case) that, if followed, maximises the payoff. However, in the current case we now have the surprising situation that the data is largely consistent with the optimal model for the three item case, but shows some more pronounced misfit for the two item case. What does this mean? One possibility is that participants do not follow the optimal strategy in the two item case, but they do follow it in the three item case. I find this possibility hard to believe. What would be the reason that when going from two to three items the behaviour of the participants changes so substantially? The other possibility seems to be, and this is also the possibility explicitly mentioned by the authors in the discussion (ll. 345), that the optimal model is not really the optimal model and other optimal models could capture the pattern for both two and three items. But if this is the case, in what sense is then the original model actually optimal? In other words, what is the benefit of avoiding proposing a "specific algorithmic model" if the optimal model is treated in exactly the same manner as such an algorithmic model, its assumption are subject to change if they do not fit the data (for a discussion see e.g., Tauber et al., 2017, Psychological Review).

2. A somewhat related point (i.e., also discussed in Tauber et al., 2017) is the question of individual behaviour versus aggregate behaviour. The approach of the authors of fitting a single parameter set across two experiments and showing that the model simultaneously predicts both data sets is pretty impressive. However, the downside of this is that all evaluations of the model also happens on the aggregate level and are prone to the ecological fallacy. There is no indication whether any of the data pattern against which the model is compared is a pattern that holds across individuals, for a group of individuals, or is potentially an aggregation artefact. I do not think this disqualifies the current model but this issue clearly restricts its explanatory power. I feel like that the two approaches to handle it are either providing some indication for each data pattern how strongly it holds across participants (e.g., graphically if possible) or acknowledge this limitation explicitly.

3. One of the key take-home messages of the paper that the authors highlight themselves is that "the results show that the fixation process during simple choice is influenced dynamically by the value estimates computed during the decision process" (quote from the abstract). It is clear that this is an important insight going forward. However, I wonder how much of this take away is a consequence of the model. In contrast, the discussion highlights that this is mainly a behavioral pattern (see ll. 339 to 341) that needs to be explained. The current model does so, but it is unclear if it really is the only one that does so. I think it would be helpful to be clearer in how much of this insight is data driven and how much is model driven.

4. The main text states (p. 7, 164): "We use maximum likelihood to fit these parameters to choice and fixation data." I think this sentence is an overly generous summary of the actual fitting process. It is clear that this model and the fitting process have a lot of moving parts and fitting this model requires a lot of choices. One obvious reason for this is that the complexity of the model is such that only a stochastic approach for fitting the model is computationally feasible. However, instead of using a more established approach for fitting such models (e.g., ABC, Palestro et al., 2019; likelihood-free methods, e.g., van Opheusden et al., 2020) the authors develop their own method for fitting such data which involves steps of data transformation and binning and a grid-search to find plausible parameter regions. The authors provide some assurances that this method is valid, especially for a simplified case, but we mostly have to take the authors word that they implemented everything correctly and none of the somewhat arbitrary choices plays a too important role. Of course, the biggest argument for the authors case is the overall good performance of the model on the 50% hold-out data. To sum this up, I think one could argue if the maximum likelihood for this model and data actually exists (e.g., what is the conditional distribution of the fixation times?). To me it looks as if the authors have found a set of representative parameter estimates that produce the main qualitative patterns in the data. Maybe one could make a case that the procedure "approximates" the likelihood or "pseudo-likelihood" but even this seems a bit of a stretch.

5. I had problems understanding the "Uncertainty-directed attention" section of the paper. In particular, I did not understand what the actual DV is here. For example, why is a negative "relative cumulative fixation time at the beginning of a new fixation, starting with the second fixation" an indication for "a tendency to fixate on items that have been fixated relatively less in the trial"? Again, this might be more of a problem that I am missing the relevant background, but this was the only section of the paper in which I did not understand the measures and how they were derived from the data. Maybe this can be clarified.

6. The method section contains a section on the "Optimal policy for model with random fixations". However, this policy does not any more seem to be part of the paper. Maybe it was part of a previous version?

7. Again, this probably shows my lack of relevant background knowledge, but it may be helpful to add an explanation of what the difference between a metalevel MDP and a regular MDP for the actual fitting is. Does it make a difference if I simulate from a regular MDP or a metalevel MDP? If so, how?

8. It is unclear if epsilon, the mixture weight between the approximate likelihood and the uniform error process, is also a free parameter. From the description it it seems like it, then it would be more fair to say the model as five free parameters. Also, how big is the estimate of epsilon?

9. The method section states "thus, to give a rough sense of the uncertainty in the estimate, we identify the top thirty parameters, giving us both mean and standard error estimates for each parameter and the total likelihood." The main text gives the parameter estimates as "mean ± std". Please clarify: Is the "std" in the main text the standard error of the parameter estimates (i.e., SD of the 30 top estimates divided by sqrt(30)) or simply the SD of the 30 top estimates? If the former I am not sure I understand how it makes sense to divide by sqrt(30) here. The 30 seems to ba an arbitrary cut-off point (to denote something like "close enough") but this would then not justify to use this as the sample size in the standard error calculation.

10. The method section states "Based on manual inspection, we identified the promising region of parameter space to be ... A grid search confirmed that this area was likely to contain the global optimum (see Fig. S4)." However, with the exception of one parameter, y_switch, figure S4 only shows the range given in this part of the text. Thus, it does not permit the inference that the chosen region is the promising region as essentially no information about values outside the region are available.

Minor issues:

- l. 81: "during simple is influenced". Seems like "decision making" is missing here after "simple".

- Figure 2, note: "This probability correlates strongly with the value of sampling the item". I wonder why it says "correlates strongly". What is missing from a perfect correlation (only the switching costs)?

- l. 512: "liking ratings provided by the participants." seems to be wrong here.

- l. 547: "between" appears twice.

## References:

Palestro, J. J., Sederberg, P. B., Osth, A. F., Zandt, T. V., & Turner, B. M. (2019). Likelihood-Free Methods for Cognitive Science (Softcover reprint of the original 1st ed. 2018 Edition). Springer.

van Opheusden, B., Acerbi, L., & Ma, W. J. (2020). Unbiased and Efficient Log-Likelihood Estimation with Inverse Binomial Sampling. ArXiv:2001.03985 [Cs, q-Bio, Stat]. http://arxiv.org/abs/2001.03985

Reviewer #3: It has been a pleasure to read the paper from Callaway and colleagues on fixation patterns in value-based choice and optimal information sampling. I think this is a very timely paper. In the last decade many studies have investigated the role of attention in value-based choice using models like the aDDM, LCA or the model from Cavanagh et al. However, a shortcoming/oddity of these models is that attention is treated as an exogenous variable and therefore does not provide a satisfactory explanation of where the attention pattern originated. This work attempts to fill in this gap by proposing a generative model of attention based on optimal information sampling, studying its effect on value based choice in binary and trinary decisions. The manuscript is well written and the authors present the results of their data fitting to previous studies in a straightforward way. I particularly appreciated the fact that the authors point out to the reader the cases where their model undershoots, overshoots, or completely fails to predict some of the data. Below are a few suggestions that I hope the authors will find useful in their revisions:

1) It looks like that their model works at its best for 3 options setting (but note that also in this case it is often off mark - e.g. fig 6.B). In fact, while the model captures some of the basic behavioural tendencies for the 2 options task, it completely fails to predict some of the key effects for which the aDDM was developed in the first place (i.e. value-directed attention and choice bias). I appreciate the approach of developing a model with minimal assumptions. But these failures in the 2-choice setting are not just small wrinkles for a model of attention as optimal information sampling. While the authors acknowledge this shortcoming, they don’t provide a compelling explanation of what causes this and how this can be fixed. They mention in the discussion another approach (used in another recent study) that uses a negatively biased prior. But they say in the supplemental that they tried a similar approach “was not sufficient to even qualitatively capture the bias in the binary choice”. However, eyeballing the figure S1B compared with 7B it looks that some of the shape in the data is captured by using biased priors. Is using biased prior the only way to fix this problem or the authors have other intuitions that can help for the binary case? In any case this issue needs to be investigated in more details. Unfortunately the behavioural effect of choice bias and value-directed attention are robust findings at the core of many previous papers using aDDM. I therefore think that this shortcoming cannot be easily swept under the carpet. Models with minimal assumption are fine, but not if they fail to capture (even qualitatively) such strong effects. My fears are double here: 1) such mismatch between data and model might be a clear indication that the authors are missing some key ingredients in their assumptions 2) that the paper will not be not so impactful (or even worst dismissed) by people in the field of DM since most studies use binary choice. Since a lot of good work has gone into this, that would be a real shame. Therefore, I suggest putting more work into understanding what causes these mismatch and how think how it can be ameliorated (even if this comes at a cost of increasing model complexity).

2) For the 3 options experiment most analyses are based on a specific definition of DV X - mean of other 2 options. While this approach has been used in some previous papers, I find it a bit arbitrary. Who decides that participants are evaluating that item against the mean of the other 2 is the right operation? why not the median or a weighted average? or is just considering 2nd best ignoring completely the worst item?. I suggest a more agnostic approach, using multinomial logistic regression model in which the value of each option is inputted independently and not requiring a priori specification of DV.

3) Sum value (i.e. the value of both option) has been used in a number of studies including some recent work from Krajbich group. This has enabled the authors to distinguish between different competing dynamic models (in particular arbitrating between additive vs multiplicative effect of attention). I am wondering if is worth investigating what their optimal model predict for sum value effect on attention. That might give some interesting new insights.

4)The work by Hébert and Woodford 2019 seems relevant here but is not currently discussed.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions, please see http://journals.plos.org/compbiol/s/submission-guidelines#loc-materials-and-methods

PLoS Comput Biol. 2021 Mar 26;17(3):e1008863. doi: 10.1371/journal.pcbi.1008863.r002

Author response to Decision Letter 0

25 Dec 2020

Attachment

Submitted filename: plos-response.pdf

Click here for additional data file.^{(261.7KB, pdf)}

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008863.r003

Decision Letter 1

Samuel J Gershman, Stefano Palminteri

18 Feb 2021

Dear Dr Callaway,

Thank you very much for submitting your manuscript "Fixation patterns in simple choice reflect optimal information sampling" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

As you will see the Reviewers were overall satisfied by the revisions. There are few remaining issues/suggestions from R1 and R2 that need to be addressed before we can proceed accepting the paper.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Stefano Palminteri

Associate Editor

PLOS Computational Biology

Samuel Gershman

Deputy Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Dear Dr Callaway,

As you will see the Reviewers were overall satisfied by the revisions. There are few remaining issues/suggestions from R1 and R2 that need to be addressed before we can proceed accepting the paper.

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: I thank the authors for carefully replying to my previous points. I believe the manuscript has truly improved.

1. I would like to clarify the issue with the term "sequential". What I meant is that both the DDM and aDDM are sequential sampling models in which evidence is sampled (sequentially) in a relative fashion, so it is always evidence for A relative to B that it sampled (as in the Sequential probability ratio test). This is what distinguishes sequential models in which evidence is accumulated "in a single sum" (such as DDM, aDDM, DFT, OU models) from sequential sampling models in which evidence is accumulated in separate accumulators (such as race models, LCA, etc). Therefore, I would like to stress that when the authors were using the term "sequential" to refer to the fact that evidence was accumulated sequentially for different options, this does not correspond to neither what the DDM or the aDDM "do".

Therefore, I suggest to change the term "sequential" with perhaps "time-varying accumulation rates" or something similar. I think it should be clear that the authors are not suggesting that when looking at A you are not accumulating evidence for B (again, the accumulated evidence is relative for A vs. B, not absolute for A in both the DDM and aDDM). However, the crucial addition of the aDDM vs. the DDM is that it allows the accumulation rate to vary within the trial based on attentional shifts.

2. The other clarification is about inhibition. I am not sure what the authors mean by "In the traditional two-alternative choice DDM (and the original aDDM) the role of inhibition depends on the exact neural network used to implement or approximate the algorithm". In Bogacz 2006 (The Physics of Optimal Decision Making: A Formal Analysis of Models of Performance in Two-Alternative Forced-Choice Tasks) it is shown how the DDM relates to other sequential sampling models only when inhibition is high or there is perfect inhibition. Moreover, since the DDM is related to the SPRT, I am not sure I get why the author say that "Adding inhibition would necessarily violate the rational norm of Bayesian inference, and thus does not seem appropriate for our model given our emphasis on optimality". For what I understand, mutual inhibition is necessary for optimality in sequential sampling models. Perhaps the authors can elaborate/better explain their point.

Reviewer #2: I feel the revision addresses my concerns with the previous version adequately. Overall it is a really interesting paper. I found only two small minor issues listed below.

- l. 366: "In the two item case, the model simulations show a smaller but *also* positive effect." The "but" alone seems to indicate the positive effect contrasts with the trinary case which however is also positive.

- ll. 367 to 369: "This is counterintuitive since the model predicts that in the two-item case fixation locations are insensitive *to* the sign of the relative value estimates (Fig 2A). However, the patter*n* likely arises due to the tendency to fixate last on the chosen item (see Fig 7A below)."

Reviewer #3: The revisions that the authors have conducted address my previous comments. I am therefore happy to recommend this paper for publication in Plos Comp Biology

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Figure Files:

Data Requirements:

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see http://journals.plos.org/ploscompbiol/s/submission-guidelines#loc-materials-and-methods

References:

Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.

PLoS Comput Biol. 2021 Mar 26;17(3):e1008863. doi: 10.1371/journal.pcbi.1008863.r004

Author response to Decision Letter 1

4 Mar 2021

Attachment

Submitted filename: plos-response-2.pdf

Click here for additional data file.^{(134.3KB, pdf)}

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008863.r005

Decision Letter 2

Samuel J Gershman, Stefano Palminteri

10 Mar 2021

Dear Mr Callaway

We are pleased to inform you that your manuscript 'Fixation patterns in simple choice reflect optimal information sampling' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology.

Best regards,

Stefano Palminteri

Associate Editor

PLOS Computational Biology

Samuel Gershman

Deputy Editor

PLOS Computational Biology

***********************************************************

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008863.r006

Acceptance letter

Samuel J Gershman, Stefano Palminteri

23 Mar 2021

PCOMPBIOL-D-20-01568R2

Fixation patterns in simple choice reflect optimal information sampling

Dear Dr Callaway,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Andrea Szabo

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Appendix. Supplementary methods and results.

(PDF)

Click here for additional data file.^{(1.4MB, pdf)}

S2 Appendix. Individual fitting results.

Includes versions of all plots in the main text with separate panels for each participant (including model predictions with parameters fit to each participant).

(PDF)

Click here for additional data file.^{(25.6MB, pdf)}

Attachment

Submitted filename: plos-response.pdf

Click here for additional data file.^{(261.7KB, pdf)}

Attachment

Submitted filename: plos-response-2.pdf

Click here for additional data file.^{(134.3KB, pdf)}

Data Availability Statement

The data and analysis code can be found on Github: https://github.com/fredcallaway/optimal-fixations-simple-choice.

[pcbi.1008863.ref001] 1. Ratcliff R, McKoon G. The diffusion decision model: theory and data for two-choice decision tasks. Neural computation. 2008;20(4):873–922. 10.1162/neco.2008.12-06-420 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref002] 2. Ratcliff R, Smith PL, Brown SD, McKoon G. Diffusion Decision Model: Current Issues and History. Trends in Cognitive Science. 2016;20(4):260–281. 10.1016/j.tics.2016.01.007 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref003] 3. Milosavljevic M, Malmaud J, Huth A, Koch C, Rangel A. The Drift Diffusion Model can account for the accuracy and reaction time of value-based chioces under high and low time pressure. Judgment and Decision Making. 2010;5(6):437–449. [Google Scholar]

[pcbi.1008863.ref004] 4. Usher M, McClelland JL. The Time Course of Perceptual Choice: The Leaky, Competing Accumulator Model. Psychological Review. 2001;108(3):550–592. 10.1037/0033-295X.108.3.550 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref005] 5. Usher M, McClelland JL. Loss Aversion and Inhibition in Dynamical Models of Multialternative Choice. Psychological Review. 2004;111(3):757–769. 10.1037/0033-295X.111.3.757 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref006] 6. Shimojo S, Simion C, Shimojo E, Scheier C. Gaze bias both reflects and influences preference. Nature Neuroscience. 2003;6(12):1317–1322. 10.1038/nn1150 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref007] 7. Armel KC, Beaumel A, Rangel A. Biasing Simple Choices by Manipulating Relative Visual Attention. Judgment and Decision Making. 2008;3(5):396–403. [Google Scholar]

[pcbi.1008863.ref008] 8. Glaholt MG, Reingold EM. Stimulus exposure and gaze bias: A further test of the gaze cascade model. Attention, Perception & Psychophysics. 2009;71(3):445–450. 10.3758/APP.71.3.445 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref009] 9. Krajbich I, Armel C, Rangel A. Visual fixations and the computation and comparison of value in simple choice. Nature Neuroscience. 2010;13(10):1292–1298. 10.1038/nn.2635 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref010] 10. Krajbich I, Rangel A. Multialternative drift-diffusion model predicts the relationship between visual fixations and choice in value-based decisions. Proceedings of the National Academy of Sciences. 2011;108(33):13852–13857. 10.1073/pnas.1101328108 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref011] 11. Cavanagh JF, Wiecki TV, Kochar A, Frank MJ. Eye tracking and pupillometry are indicators of dissociable latent decision processes. Journal of Experimental Psychology: General. 2014;143(4):1476–1488. 10.1037/a0035813 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref012] 12. Tavares G, Perona P, Rangel A. The Attentional Drift Diffusion Model of Simple Perceptual Decision-Making. Frontiers in Neuroscience. 2017;11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref013] 13. Smith SM, Krajbich I. Gaze Amplifies Value in Decision Making. Psychological Science. 2019;30(1):116–128. 10.1177/0956797618810521 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref014] 14. Armel KC, Rangel A. Neuroeconomic models of economic decision making: The impact of computation time and experience on decision values. American Economic Review. 2008;98(2):163–168. 10.1257/aer.98.2.163 [DOI] [Google Scholar]

[pcbi.1008863.ref015] 15. Orquin JL, Mueller Loose S. Attention and Choice: A Review on Eye Movements in Decision Making. Acta Psychologica. 2013;144(1):190–206. 10.1016/j.actpsy.2013.06.003 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref016] 16. Krajbich I. Accounting for Attention in Sequential Sampling Models of Decision Making. Current Opinion in Psychology. 2018;29:6–11. [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref017] 17. Gluth S, Spektor MS, Rieskamp J. Value-based attentional capture affects multi-alternative decision making. Elife. 2018;7:e39659. 10.7554/eLife.39659 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref018] 18. Towal RB, Mormann M, Koch C. Simultaneous modeling of visual saliency and value computation improves predictions of economic choice. Proceedings of the National Academy of Sciences. 2013;110(40):E3858–E3867. 10.1073/pnas.1304429110 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref019] 19. Thomas AW, Molter F, Krajbich I, Heekeren HR, Mohr PNC. Gaze Bias Differences Capture Individual Choice Behaviour. Nature Human Behaviour. 2019;3(6):625–635. 10.1038/s41562-019-0584-8 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref020] 20. Gluth S, Kern N, Kortmann M, Vitali CL. Value-Based Attention but Not Divisive Normalization Influences Decisions with Multiple Alternatives. Nature Human Behaviour. 2020;4(6):634–645. 10.1038/s41562-020-0822-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref021] 21. Song M, Wang X, Zhang H, Li J. Proactive information sampling in value-based decision-making: Deciding when and where to saccade. Frontiers in Human Neuroscience. 2019;13(February):1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref022] 22. Sepulveda P, Usher M, Davies N, Benson AA, Ortoleva P, De Martino B. Visual Attention Modulates the Integration of Goal-Relevant Evidence and Not Value. eLife. 2020;9:e60705. 10.7554/eLife.60705 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref023] 23. Moreno-Bote R, Ramírez-Ruiz J, Drugowitsch J, Hayden BY. Heuristics and Optimal Solutions to the Breadth–Depth Dilemma. Proceedings of the National Academy of Sciences. 2020;117(33):19799–19808. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref024] 24.Ramírez-Ruiz J, Moreno-Bote R. Optimal Allocation of Finite Sampling Capacity in Accumulator Models of Multi-Alternative Decision Making. arXiv:210201597 [q-bio]. 2021;. [DOI] [PMC free article] [PubMed]

[pcbi.1008863.ref025] 25. Gottlieb J, Oudeyer PY. Towards a neuroscience of active sampling and curiosity. Nature Reviews Neuroscience. 2018;19(12):758–770. 10.1038/s41583-018-0078-0 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref026] 26. Najemnik J, Geisler WS. Optimal eye movement strategies in visual search. Nature. 2005;434(7031):387–391. 10.1038/nature03390 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref027] 27. Eckstein MP. Visual search: A retrospective. Journal of Vision. 2011;11(5):1–36. [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref028] 28. Cassey TC, Evens DR, Bogacz R, Marshall JAR, Ludwig CJH. Adaptive Sampling of Information in Perceptual Decision-Making. PLOS ONE. 2013;8(11):e78993. 10.1371/journal.pone.0078993 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref029] 29. Ludwig CJ, Evens DR. Information foraging for perceptual decisions. Journal of Experimental Psychology: Human Perception and Performance. 2017;43(2):245–264. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref030] 30. Itti L, Baldi P. Bayesian surprise attracts human attention. Vision Research. 2009;49(10):1295–1306. 10.1016/j.visres.2008.09.007 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref031] 31. Gottlieb J, Oudeyer PY, Lopes M, Baranes A. Information-Seeking, Curiosity, and Attention: Computational and Neural Mechanisms. Trends in Cognitive Sciences. 2013;17(11):585–593. 10.1016/j.tics.2013.09.001 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref032] 32. Savage LJ. The Foundations of Statistics. The Foundations of Statistics. Oxford, England: John Wiley & Sons; 1954. [Google Scholar]

[pcbi.1008863.ref033] 33. Von Neumann J, Morgenstern O. Theory of Games and Economic Behavior. Princeton, NJ, US: Princeton University Press; 1944. [Google Scholar]

[pcbi.1008863.ref034] 34. Lewis RL, Howes A, Singh S. Computational Rationality: Linking Mechanism and Behavior through Bounded Utility Maximization. Topics in Cognitive Science. 2014;6(2):279–311. 10.1111/tops.12086 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref035] 35. Griffiths TL, Lieder F, Goodman ND. Rational Use of Cognitive Resources: Levels of Analysis between the Computational and the Algorithmic. Topics in Cognitive Science. 2015;7(2):217–229. 10.1111/tops.12142 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref036] 36. Lieder F, Griffiths TL. Resource-Rational Analysis: Understanding Human Cognition as the Optimal Use of Limited Computational Resources. Behavioral and Brain Sciences. 2019;. [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref037] 37. Gershman SJ, Horvitz EJ, Tenenbaum JB. Computational Rationality: A Converging Paradigm for Intelligence in Brains, Minds, and Machines. Science. 2015;349 (6245). 10.1126/science.aac6076 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref038] 38. Sims CA. Stickiness. Carnegie-Rochester Conference Series on Public Policy. 1998;49:317–356. 10.1016/S0167-2231(99)00013-5 [DOI] [Google Scholar]

[pcbi.1008863.ref039] 39. Caplin A, Dean M. Behavioral Implications of Rational Inattention with Shannon Entropy. NBER Working Paper. 2013;(August):1–40. [Google Scholar]

[pcbi.1008863.ref040] 40. Bogacz R, Brown E, Moehlis J, Holmes P, Cohen JD. The physics of optimal decision making: A formal analysis of models of performance in two-alternative forced-choice tasks. Psychological Review. 2006;113(4):700–765. 10.1037/0033-295X.113.4.700 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref041] 41. Moreno-Bote R. Decision Confidence and Uncertainty in Diffusion Models with Partially Correlated Neuronal Integrators. Neural Computation. 2010;22(7):1786–1811. 10.1162/neco.2010.12-08-930 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref042] 42. Drugowitsch J, Moreno-Bote R, Churchland AK, Shadlen MN, Pouget A. The Cost of Accumulating Evidence in Perceptual Decision Making. Journal of Neuroscience. 2012;32(11):3612–3628. 10.1523/JNEUROSCI.4010-11.2012 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref043] 43. Bitzer S, Park H, Blankenburg F, Kiebel SJ. Perceptual decision making: drift-diffusion model is equivalent to a Bayesian model. Frontiers in Human Neuroscience. 2014;8(February):1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref044] 44. Tajima S, Drugowitsch J, Pouget A. Optimal policy for value-based decision-making. Nature Communications. 2016;7:1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref045] 45. Tajima S, Drugowitsch J, Patel N, Pouget A. Optimal policy for multi-alternative decisions. Nature Neuroscience. 2019;22(9):1503–1511. 10.1038/s41593-019-0453-9 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref046] 46. Fudenberg D, Strack P, Strzalecki T. Speed, accuracy, and the optimal timing of choices. American Economic Review. 2018;108(12):3651–3684. 10.1257/aer.20150742 [DOI] [Google Scholar]

[pcbi.1008863.ref047] 47. Biderman N, Bakkour A, Shohamy D. What Are Memories For? The Hippocampus Bridges Past Experience with Future Decisions. Trends in Cognitive Sciences. 2020;24(7):542–556. 10.1016/j.tics.2020.04.004 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref048] 48. Bakkour A, Palombo DJ, Zylberberg A, Kang YHR, Reid A, Verfaellie M, et al. The Hippocampus Supports Deliberation during Value-Based Decisions. eLife. 2019;8:undefined–undefined. 10.7554/eLife.46080 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref049] 49. Wang S, Feng SF, Bornstein A. Mixing memory and desire: How memory reactivation supports deliberative decision-making. 2020;. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref050] 50. Matheson JE. The Economic Value of Analysis and Computation. IEEE Transactions on Systems Science and Cybernetics. 1968;4(3):325–332. 10.1109/TSSC.1968.300126 [DOI] [Google Scholar]

[pcbi.1008863.ref051] 51. Russell S, Wefald E. Principles of metareasoning. Artificial Intelligence. 1991;49(1-3):361–395. 10.1016/0004-3702(91)90015-C [DOI] [Google Scholar]

[pcbi.1008863.ref052] 52.Hay N, Russell S, Tolpin D, Shimony SE. Selecting Computations: Theory and Applications. In: Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence. UAI’12. Arlington, Virginia, USA: AUAI Press; 2012. p. 346–355.

[pcbi.1008863.ref053] 53.Callaway F, Gul S, Krueger P, Griffiths TL, Lieder F. Learning to select computations. In: Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference; 2018.

[pcbi.1008863.ref054] 54. Gold JI, Shadlen MN. Banburismus and the Brain: Decoding the Relationship between Sensory Stimuli, Decisions, and Reward. Neuron. 2002;36(2):299–308. 10.1016/S0896-6273(02)00971-6 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref055] 55. McFadden D. Economic choices. American Economic Review. 2001;91(3):351–378. 10.1257/aer.91.3.351 [DOI] [Google Scholar]

[pcbi.1008863.ref056] 56. Frömer R, Dean Wolf CK, Shenhav A. Goal Congruency Dominates Reward Value in Accounting for Behavioral and Neural Correlates of Value-Based Decision-Making. Nature Communications. 2019;10(1):1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref057] 57. Hunt LT, Kolling N, Soltani A, Woolrich MW, Rushworth MFS, Behrens TEJ. Mechanisms Underlying Cortical Activity during Value-Guided Choice. Nature Neuroscience. 2012;15(3):470–476. 10.1038/nn.3017 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref058] 58. Polanía R, Krajbich I, Grueschow M, Ruff CC. Neural Oscillations and Synchronization Differentially Support Evidence Accumulation in Perceptual and Value-Based Decision Making. Neuron. 2014;82(3):709–720. 10.1016/j.neuron.2014.03.014 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref059] 59. Pirrone A, Azab H, Hayden BY, Stafford T, Marshall JAR. Evidence for the Speed–Value Trade-off: Human and Monkey Decision Making Is Magnitude Sensitive. Decision. 2018;5(2):129–142. 10.1037/dec0000075 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref060] 60. Anderson BA. The attention habit: How reward learning shapes attentional selection. Annals of the New York Academy of Sciences. 2016;1369(1):24–39. 10.1111/nyas.12957 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref061] 61. Ratcliff R. A theory of memory retrieval. Psychological Review. 1978;85(2):59–108. 10.1037/0033-295X.85.2.59 [DOI] [Google Scholar]

[pcbi.1008863.ref062] 62. Teodorescu AR, Usher M. Disentangling Decision Models: From Independence to Competition. Psychological Review. 2013;120(1):1–38. 10.1037/a0030776 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref063] 63. Busemeyer JR, Townsend JT. Decision Field Theory: A Dynamic-Cognitive Approach to Decision Making in an Uncertain Environment. Psychological Review. 1993;100(3):432–459. 10.1037/0033-295X.100.3.432 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref064] 64. Holmes WR, Trueblood JS, Heathcote A. A new framework for modeling decisions about changing information: The Piecewise Linear Ballistic Accumulator model. Cognitive psychology. 2016;85:1–29. 10.1016/j.cogpsych.2015.11.002 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref065] 65. Smith SM, Krajbich I. Attention and Choice across Domains. Journal of Experimental Psychology General. 2018;147(12):1810–1826. 10.1037/xge0000482 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref066] 66. Stojić H, Orquin JL, Dayan P, Dolan RJ, Speekenbrink M. Uncertainty in learning, choice, and visual fixation. Proceedings of the National Academy of Sciences of the United States of America. 2020;117(6):3291–3300. 10.1073/pnas.1911348117 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref067] 67. Just MA, Carpenter PA. Eye Fixations and Cognitive Processes. Cognitive Psychology. 1976;. 10.1016/0010-0285(76)90015-3 [DOI] [Google Scholar]

[pcbi.1008863.ref068] 68. Jang A, Sharma R, Drugowitsch J. Optimal policy for attention-modulated decisions explains human fixation behavior. BioRXiv. 2020;. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref069] 69. Hunt LT, Rutledge RB, Malalasekera WMN, Kennerley SW, Dolan RJ. Approach-induced biases in human information sampling. PLOS Biology. 2016;14(11):e2000638. 10.1371/journal.pbio.2000638 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref070] 70. Hébert B, Woodford M. Rational Inattention with Sequential Information Sampling. Working Paper. 2017; p. 1–141. [Google Scholar]

[pcbi.1008863.ref071] 71. Hebert B, Woodford M. Rational Inattention When Decisions Take Time. Journal of Chemical Information and Modeling. 2019;53(9):1689–1699. [Google Scholar]

[pcbi.1008863.ref072] 72. Itti L, Koch C. A Saliency-Based Search Mechanism for Overt and Covert Shifts of Visual Attention. Vision Research. 2000;40(10):1489–1506. [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref073] 73. Roe RM, Busemeyer JR, Townsend JT. Multialternative decision field theory: A dynamic connectionist model of decision making. Psychological Review. 2001;108(2):370–392. 10.1037/0033-295X.108.2.370 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref074] 74. Noguchi T, Stewart N. Multialternative decision by sampling: A model of decision making constrained by process data. Psychological Review. 2018;125(4):512–544. 10.1037/rev0000102 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref075] 75. Russo JE, Dosher BA. Strategies for Multiattribute Binary Choice. Journal of Experimental Psychology Learning, Memory, and Cognition. 1983;9(4):676–696. 10.1037/0278-7393.9.4.676 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref076] 76. Trueblood JS, Brown SD, Heathcote A. The Multiattribute Linear Ballistic Accumulator Model of Context Effects in Multialternative Choice. Psychological Review. 2014;121(2):179–205. 10.1037/a0036137 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref077] 77. Berkowitsch NAJ, Scheibehenne B, Rieskamp J. Rigorously testing multialternative decision field theory against random utility models. Journal of Experimental Psychology: General. 2014;143(3):1331–1348. 10.1037/a0035159 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref078] 78. Fisher G. An attentional drift diffusion model over binary-attribute choice. Cognition. 2017;168:34–45. 10.1016/j.cognition.2017.06.007 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref079] 79. Krajbich I, Lu D, Camerer C, Rangel A. The Attentional Drift-Diffusion Model Extends to Simple Purchasing Decisions. Frontiers in Psychology. 2012;3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref080] 80. Westbrook A, van den Bosch R, Määttä JI, Hofmans L, Papadopetraki D, Cools R, et al. Dopamine promotes cognitive effort by biasing the benefits versus costs of cognitive work. Science. 2020;367(6484):1362–1366. 10.1126/science.aaz5891 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref081] 81. Shi SW, Wedel M, Pieters F. Information acquisition during online decision making: A model-based exploration using eye-tracking data. Management Science. 2013;59(5):1009–1026. 10.1287/mnsc.1120.1625 [DOI] [Google Scholar]

[pcbi.1008863.ref082] 82. Manohar SG, Husain M. Attention as Foraging for Information and Value. Frontiers in Human Neuroscience. 2013;7(November):1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref083] 83. Gabaix X, Laibson D, Moloche G, Weinberg S. Costly Information Acquisition: Experimental Analysis of a Boundedly Rational Model. American Economic Review. 2006;96 (4)(4):1043–1068. 10.1257/000282806779468544 [DOI] [Google Scholar]

[pcbi.1008863.ref084] 84. Yang L, Toubia O, De Jong MG. A bounded rationality model of information search and choice in preference measurement. Journal of Marketing Research. 2015;52(2):166–183. 10.1509/jmr.13.0288 [DOI] [Google Scholar]

[pcbi.1008863.ref085] 85. Bezanson J, Edelman A, Karpinski S, Shah VB. Julia: A fresh approach to numerical computing. SIAM review. 2017;59(1):65–98. 10.1137/141000671 [DOI] [Google Scholar]

[pcbi.1008863.ref086] 86. Sutton RS, Barto AG. Reinforcement learning: An introduction. MIT press; 2018. [Google Scholar]

[pcbi.1008863.ref087] 87.Callaway F, Lieder F, Das P, Gul S, Krueger PM, Griffiths TL. A resource-rational analysis of human planning. In: Proceedings of the 40th Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society; 2018.

[pcbi.1008863.ref088] 88. Howard RA. Information value theory. IEEE Transactions on systems science and cybernetics. 1966;2(1):22–26. 10.1109/TSSC.1966.300074 [DOI] [Google Scholar]

[pcbi.1008863.ref089] 89. Auer P, Cesa-Bianchi N, Fischer P. Finite-time analysis of the multiarmed bandit problem. Machine learning. 2002;47(2-3):235–256. [Google Scholar]

[pcbi.1008863.ref090] 90. Sobol’ IM. On the distribution of points in a cube and the approximate evaluation of integrals. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki. 1967;7(4):784–802. [Google Scholar]

[pcbi.1008863.ref091] 91. Turner BM, Sederberg PB. A Generalized, Likelihood-Free Method for Posterior Estimation. Psychonomic Bulletin and Review. 2014;21(2):227–250. 10.3758/s13423-013-0530-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref092] 92.van Opheusden B, Acerbi L, Ma WJ. Unbiased and Efficient Log-Likelihood Estimation with Inverse Binomial Sampling. arXiv:200103985 [cs, q-bio, stat]. 2020;. [DOI] [PMC free article] [PubMed]

[pcbi.1008863.ref093] 93. Sunnåker M, Busetto AG, Numminen E, Corander J, Foll M, Dessimoz C. Approximate Bayesian Computation. PLOS Computational Biology. 2013;9(1):e1002803. 10.1371/journal.pcbi.1002803 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008863.ref094] 94. Csilléry K, Blum MGB, Gaggiotti OE, François O. Approximate Bayesian Computation (ABC) in Practice. Trends in Ecology & Evolution. 2010;25(7):410–418. 10.1016/j.tree.2010.04.001 [DOI] [PubMed] [Google Scholar]

[pcbi.1008863.ref095] 95. Bergstra J, Bengio Y. Random search for hyper-parameter optimization. Journal of machine learning research. 2012;13(Feb):281–305. [Google Scholar]

PERMALINK

Fixation patterns in simple choice reflect optimal information sampling

Frederick Callaway

Antonio Rangel

Thomas L Griffiths

Roles

Abstract

Author summary

Introduction

Model

Sequential sampling model

Fig 1. Sampling and belief updating in the binary choice task.

Optimal policy

Fig 2. Optimal fixation policy.

The prior distribution

Model fitting

Results

Basic psychometrics

Fig 3. Basic psychometrics.

Basic fixation properties

Fig 4. Basic fixation patterns.

Uncertainty-directed attention

Fig 5. Uncertainty-directed attention.

Value-directed attention

Fig 6. Value-directed attention.

Fig 7. Choice biases.

Choice biases

Discussion

Methods

Attention allocation as a metalevel Markov decision process

Optimal metalevel policy

Approximating the optimal policy

Implementation of the prior

Model simulation procedure

Model parameter estimation

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Samuel J Gershman

Stefano Palminteri

Roles

Author response to Decision Letter 0

Decision Letter 1

Samuel J Gershman

Stefano Palminteri

Roles

Author response to Decision Letter 1

Decision Letter 2

Samuel J Gershman

Stefano Palminteri

Roles

Acceptance letter

Samuel J Gershman

Stefano Palminteri

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases