Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Aug 31.
Published in final edited form as: Behav Neurosci. 2021 Apr;135(2):192–201. doi: 10.1037/bne0000448

The case against economic values in the orbitofrontal cortex (or anywhere else in the brain)

Benjamin Y Hayden 1, Yael Niv 2
PMCID: PMC12398331  NIHMSID: NIHMS1850023  PMID: 34060875

Abstract

Much of traditional neuroeconomics proceeds from the hypothesis that value is reified in the brain, that is, that there are neurons or brain regions whose responses serve the discrete purpose of encoding value. This hypothesis is supported by the finding that the activity of many neurons covaries with subjective value as estimated in specific tasks and has led to the idea that the primary function of the orbitofrontal cortex is to compute and signal economic value. Here we consider an alternative: that economic value, in the cardinal, common-currency sense, is not represented in the brain and used for choice by default. This idea is motivated by consideration of the economic concept of value, which places important epistemic constraints on our ability to identify its neural basis. It is also motivated by the behavioral economics literature, especially work on heuristics, which proposes value-free process models for much if not all of choice. Finally, it is buoyed by recent neural and behavioral findings regarding how animals and humans learn to choose between options. In light of our hypothesis, we critically reevaluate putative neural evidence for the representation of value and explore an alternative: direct learning of action policies. We delineate how this alternative can provide a robust account of behavior that concords with existing empirical data.

Introduction

The past twenty years have seen a great deal of interest in understanding how our brains implement economic choices (Glimcher and Fehr, 2013; Rushworth et al., 2011; Rangel et al., 2008; Padoa-Schioppa, 2011; Loewenstein et al., 2008; Camerer et al., 2005). Much research in this field of neuroeconomics rests on the assumption that choices between options rely on an explicit valuation process (Kable and Glimcher, 2009; Montague and Berns, 2002; Padoa-Schioppa, 2011; O’Doherty, 2014; Levy and Glimcher, 2012). That is, that the brain first assigns value to each option and then compares those values to determine choice. The concept of valuation – which stems from economic theory – is so ingrained that it may seem inevitable. How else would one literally compare apples to oranges? Indeed, the idea that value exists on a single cardinal scale, also called a “common currency,” has been extended to encompass not only goods, but also effort costs and time delays; everything an agent needs to bundle into evaluation of a mode of action that is aimed at procuring a specific outcome.

In parallel, the computational framework of reinforcement learning, which has been a cornerstone of neuroeconomics, also makes a scalar value signal central to its implementation (Sutton & Barto, 2018; Niv, 2009). Specifically, in reinforcement learning, the value of an option—a state or an action—is the expected sum of future rewards contingent upon that choice. As a sum of future rewards that may be of different types (and include costs as negative rewards), reinforcement-learning value is naturally calculated in some unitless common currency.

However, not all reinforcement learning algorithms rely on or even calculate values (Sutton & Barto, 2018), and reinforcement-learning values, as sums of future rewards, are not synonymous with economic values of specific goods. Likewise, many empirically-supported process models of choice get by with no valuation (Vlaev et al., 2011; Miller et al., 2019; Gigerenzer and Gaissmaier, 2011). The fact that the brain can compute values to compare apples and oranges does not mean that it routinely does so, or that valuation is the primary process underlying choice. In this opinion paper, we argue that in many choice scenarios the brain may not be computing values at all, despite appearing to do so. We will demonstrate that the rationale for a value signal is weak, as are both behavioral and neural evidence supporting value computation. Finally, we propose an alternative – direct learning of action policies – and suggest there is scant direct evidence for value learning that cannot be explained by this and other alternative theories. Our hypothesis is important as it suggests a different interpretation of previous data and requires that studies attempting to resolve mechanisms of value computation in the brain first establish that in the specific situation studied, valuation is indeed occurring. In particular, much work has associated the orbitofrontal cortex in representing the expected economic value of different goods or options (Bartra et al., 2013; Levy & Glimcher, 2012) – a role that must be critically reconsidered if we agree that the brain may not necessarily be representing such values in many of the experiments so far used to test this hypothesis.

Why argue against value?

Intuitively, our thesis is that while we may know that we prefer an orange to an apple (for one thing, oranges don’t brown when exposed to oxygen), we may make this judgment without consulting an internal scalar (cardinal) or even a universal ordinal value signal. Consequently, we may be hard-pressed to express the precise value that we put on an orange. This is not merely an issue of conscious access to our internal value of oranges (and we note here that, in neuroeconomics, value-based choices are generally thought to be part of explicit, aware, goal-directed or model-based decision making that relies on frontal cortex areas, in particular, the orbitofrontal cortex), but may be due to the fact that deciding on preferences can rely on many alternative mechanisms that don’t require or rely on calculation of such a value.

Valuation is hard (Payne et al., 1992). It is also often unnecessary: when choosing between, say, an orange and a car, it is immediately clear that one is better than the other without calculating the precise value of either. If you are extremely thirsty, you might choose the orange, whereas if you need to go somewhere a car is the only relevant option. Arguably, many real-life choices are between options that are sufficiently different in the needs that they fulfill as to be more similar to this extreme example than they are to the choice between apples and oranges (Juechems & Summerfield, 2019). Valuation may only be necessary when choosing between two very similarly valued items. However, if the items are sufficiently similar in how they satisfy our needs, the brain may decide to choose randomly, according to some value-free heuristic, or according to past choices – and move on (Chater, 2018). Alternatively, the brain can try to calculate the exact values of the options to the necessary precision that arbitrates between them. Indeed, choosing between similar options often takes longer than choosing between very different options, even when making rather trivial choices that do not warrant the time and effort invested (Pirrone et al., 2018; Teodorescu et al., 2016). An example is the inordinate amount of time some of us may spend deciding what brand of tuna fish to buy, despite the fact that our time (as per our salary) is worth much more than the money we will save by correctly evaluating which brand provides the best “value for money”. This scenario also illustrates how inept we are at comparing the value of different kinds – here, time and money – which should have been straightforward if the modus operandi of the brain was to evaluate everything in terms of some common currency (see below).

So perhaps the brain can, when needed, calculate values. However, we argue that this is not the main means by which the brain makes decisions, and perhaps not the natural mode of decision making.

What is value?

Before moving on, we would like to delineate precisely how we are defining value for the sake of our argument. The term “value” has multiple uses (the problems this multiplicity raises are carefully laid out by O’Doherty, 2014). We consider “value” a hypothesized scalar parameter that reflects the worth of a specific item or outcome1. Because it is scalar, it is necessarily abstract. It can refer to any good, and makes use of a common currency code that is comparable across goods of different types (e.g., food, water, recreation time). Value, by this definition, is cardinal, not ordinal, meaning it can be defined for an option per se, rather than solely relative to other options. That is, it reifies the idea of “utils” – quantifiable units of value, often used in a jocular manner in Economics classes.

The idea of a common currency is key as it implies that all relative calculations have already happened – the “value” is in some denomination that objectively defines it as compared to other such values. One might argue that this is too narrow a definition, but we believe this is what neuroeconomists have in mind when talking about representation of common currency values in the brain. For example, we know of no neuroeconomic model that imagines a neural implementation of an ordinal value scale. This having been said, even an ordinal scale should not change based on the comparison set, and should satisfy transitivity – which many of the neural signals attributed to value do not, as we detail below.

Our definition of value, for the purpose of this paper, is different from other types of value that are discussed in reinforcement learning. Specifically, here we are discussing value as the reward worth of a single item/event, not the expected sum of all future rewards (R in reinforcement learning models, not V). One might argue that in order to compute such an expected sum V, the subjective worth of all individual rewards must be translated to a common currency that can be added. In this sense, reinforcement learning models do presuppose a common currency reward value. However, they don’t necessarily commit to economic properties of such values, such as transitivity and consistency (see also Juechems & Summerfield, 2019), and we discuss below a class of reinforcement learning algorithms that can make do with evaluations that are only relative to the current options, and not applicable in other situations.

One important feature of our definition of value is that in our view, although value is inferred from choice, it is not strictly identical to choice, nor necessarily implied by choice. For instance, if we observe a consistent preference for A over B, A is assumed to have a higher value than B. But choice of A over B isn’t sufficient to infer the existence of a self-consistent value function: a decision-maker may adhere to a heuristic policy that results in stable preference for [A>B], [B>C], and [C>A] (Lichtenstein and Slovic, 2006). Moreover, choice can be consistently altered without manipulating the reward value of an item (Schonberg & Katz, 2020), which suggests that value, as inferred from choice, is not untarnished by processes that are not economic in nature. Thus, value can be inferred from behavior given certain assumptions, but preferences do not always lead to a value function.

The common currency hypothesis

The notion that economic decision-makers make use of internal value functions to compare options is often dated to Daniel Bernoulli’s proposal in 1738 of a logarithmic utility curve to explain preferences in the St. Petersburg Paradox (Martin, 2011). Here, a decisionmaker is offered a chance to play a game in which they win $2(n−1) with n being the first time, in a series of coin tosses, that a coin falls on “heads”. The paradox is that although the expected monetary value of this game (that is, the sum of expected wins multiplied by their probability n=12n10.5n=n=10.5) is infinity, people are not willing to pay even $10 to play this game (Hayden and Platt, 2009). The explanation that Bernoulli proposed proved foundational within microeconomic theory. He argued that decision-makers don’t base calculations on the nominal, objective cash value of the potential gain, but rather on its subjective value. If the subjective value grows more slowly than the objective value (the idea of “diminishing marginal utility”), an optimizing decision-maker will appear risk-averse, for instance, when evaluating the St. Petersburg gamble.

Utility, or subjective value, has been central to many if not most microeconomic models. These include the axiomatic approaches of Pareto, Von Neumann and Morgenstern, and Samuelson. It is also central to behavioral theories such as prospect theory and decision field theory (Kahneman and Tversky, 1979; Busemeyer and Townsend, 1993). Ironically, however, explaining choices in the St. Petersburg paradox using such a subjective value function requires utility for money that diminishes so rapidly that it does not generalize to choices in other contexts. Instead, heuristic accounts – accounts that don’t rely on ideas about value maximization – provide better quantitative matches for St. Petersburg choices (Hayden and Platt, 2009).

Despite its central importance in economics, economists are typically agnostic about whether the concept of value is just a convenient description, or whether it is instantiated in the brain. As Friedman argued, decision-makers behave as if we compute and compare values, but we cannot conclude that we actually do so (Friedman, 1953). In that work, he famously compared economic agents to a trained billiards player who makes excellent shots as if having a sophisticated grasp of Euclidean geometry, although in fact such a theoretical understanding is not necessary for good billiards skill.

Economists typically stop at the point of saying that economic models are ‘as if’ models, with some arguing that the question of the underlying reality of economic variables is outside the domain of economics (Gul and Pesendorfer, 2008; Harrison, 2008). In contrast, neuroeconomics has generally taken as a default assumption that these ‘as if’ theories are reified in the brain (Kable and Glimcher, 2009; Montague and Berns, 2002; Padoa-Schioppa, 2011; ODoherty, 2014; Levy and Glimcher, 2012; Rich and Wallis, 2011).

Modern tools such as neuroimaging and single unit physiology provide the opportunity to assess the implementation of decision making directly. Indeed, neuroscientists have had little trouble identifying correlates of value in several brain areas (Levy and Glimcher, 2012; Wallis, 2007; Plassmann et al., 2007; Knutson et al., 2001), leading to the suggestion that the orbitofrontal cortex (OFC) is a nexus for representing the economic value of goods in the brain (see, for example, Rushworth et al., 2011; Wallis, 2007), alongside other brain areas that are important for computing and representing value such as the ventromedial prefrontal cortex (Bartra et al., 2013) and ventral striatum (Haber and Knutson, 2010). However, it is not clear that the signals identified in these studies actually represent value as proposed – on a common currency scale, for comparing options and making choices. Moreover, so called “value signals” only correlate with value, so may be signaling other quantities, such as attention, action plans, vigor, or preference, which also correlate with value (O’Doherty, 2014; Wallis and Rich, 2011; Maunsell, 2004). Hence it is important to carefully consider alternative interpretations for these findings, as we do further below, after discussing some practical constraints and philosophical conundrums.

Alternatives to valuation in decision making

While common currency value provides a convenient and general mechanism for making choices, there are many possible alternatives (Vlaev et al., 2011; Gigerenzer and Gaissmaier, 2011; Lichtenstein and Slovic, 2006; Kahneman et al., 1982), some of which are quite general and robust. Consider for example the “priority heuristic” (Branstätter et al., 2006). This de minimis heuristic approach proposes that decision-makers first identify a dimension along which options vary and then compare options along that dimension. If that results in a choice, they stop; otherwise, they move on to the next dimension. This heuristic can explain many phenomena, including Allais preferences (Allais, 1953), the reflection effect (Fishburn & Kochenberger, 1979), the certainty effect (Kahneman and Tversky, 1979), the fourfold pattern in risky choice (Tversky and Fox, 1995) that motivates prospect theory, and several intransitivities – all without ever requiring computing of value (Brandstätter et al., 2006). And the priority heuristic is one of a large number of heuristics that do a remarkable job at describing behavior (Gigerenzer and Gaissmaier, 2011). These heuristics are generally motivated by psychological observations, and thus are consistent with known data from the psychology – if not the neuroscience – of choice. In particular, they reflect the assumption that calculating value is difficult, that humans typically use shortcuts whenever possible, and that heuristics are a good shortcut (Payne et al., 1992; Gigerenzer and Gaissmaier, 2011; Lieder and Griffiths, 2020). Moreover, these heuristics are not limited to humans, but apply to other species, including monkeys (Santos and Rosati, 2015; Marsh et al., 2002; Heilbronner and Hayden, 2016; Shafir et al., 2002).

Importantly for our argument, the success of heuristic approaches demonstrates that value calculations are not a priori essential for neuroeconomic theories of choice. In particular, heuristic theories can solve many problems for which value is proposed to be needed (Vlaev et al., 2011; Stevens, 2016; Piantadosi and Hayden, 2015; Tversky, 1969; Lichtenstein and Slovic, 2006; Kahneman et al., 1982). Heuristics readily allow for comparison of multi-dimensional goods and bundles, and for choice across dissimilar goods. While heuristics are not perfect, and lead to many choice anomalies, choice is, empirically, full of anomalies. Moreover, several non-heuristic process models also eschew value computation steps, including decision by sampling, query theory, and fuzzy trace theory; these all account for a wide range of choice behavior as well (Stewart et al., 2006; Stewart and Simpson, 2008; Reyna, 2008; Weber et al., 2008).

Practical issues in the neuroscience of value representation

Despite the above, it is often considered axiomatic that an internal, neural, value scale must exist in the brain. The job of neuroscientists is then to find this signal – to pinpoint brain activity that correlates with value. In this section, we challenge this viewpoint by considering some basic issues that come up in the neuroscience of value.

The first problem is that it is impossible to precisely measure value. We can measure preferences between options, and use the data to infer option values, but elicited preferences are noisy measurements that also reflect factors other than value. For example, depending on how preferences are elicited, they may also reflect the tendency to press the same button repeatedly rather than change actions, the tendency to switch between options due to a prior belief about depleting resources, or the amount of attention or looking time for each of the options (Shimojo et al., 2003; Armel et al., 2008; Sugrue et al., 2005; Schonberg & Katz, 2020). Finally, value may shift from trial to trial, even depending on recent outcomes, meaning that methods that average across trials produce misleading value estimates (Sugrue et al., 2005). One could, in theory, incorporate these factors into the inferred value of an option. For example, there may be inherent value of choosing the same thing twice in a row. However, it becomes unclear what the definition of the “option” is that is being evaluated: if the value of an apple after eating (or choosing, or even just viewing) an orange is different from the value of an apple without that proximal experience, can we determine the value of goods at all? And if we can change the value (read: preference) for an option just by directing more attention to it (Schonberg et al., 2014; Salomon et al., 2018), or inducing choice of it (Izuma et al., 2010; Voigt et al., 2017; Sharot et al., 2012), is preference really measuring the subjective economic worth of a good? The risk is circularity – if any choice behavior can be explained by supposing value that depends on local history, then the concept of value adds no additional explanatory power beyond that of recent events and choices.

Different ways to measure value raise a second challenge of inconsistent measures. Indeed, if we had a single neural value function we called on, values elicited by different measures would match. Unfortunately, they do not (Lichtenstein and Slovic, 2006). To explain this, in their seminal work, Sarah Lichtenstein and Paul Slovic proposed that preferences do not arise from any internal value function, but instead are constructed at the time of elicitation (see also Tversky and Shafir, 1992; Payne et al., 1992; Ariely et al., 2003). That is, in the view of these and like-minded scholars, value doesn’t sit in the brain waiting to be used; rather, preference is a complex and active process that takes place at the time the decision is made. Critically, in this theory we compute choices in a largely ad-hoc manner based on the available options, without an intermediary common-currency valuation stage. As such, there is no guarantee of consistency or reliability; any consistency or reliability observed may be explained as a result of strong attractor states in the way the system determines the choice, and deviations from consistency are evidence for the specific nature of the algorithm and its idiosyncrasies.

This having been said, experimenters can roughly estimate value, even if not measure it precisely. For example, in a given experiment they can examine choices and determine with confidence that a monkey (behaves as if it) places more value on a gamble as compared to a safe option with a matched expected value. One could then use this fact to try to identify a neural correlate of value. However, for this neural endeavor to be valid, it is important to identify all confounding variables and regress them out. O’Doherty’s (2014) review delineates the practical difficulty of doing so, using the overall rubric of “visceral, autonomic, and skeletomotor” activity. In practice, confounding variables include both stimulus and outcome identity, information about the state or structure of the world, the surpriseness, informativeness, and informational value of stimuli, details of the action associated with selecting of consuming the reward, including its likelihood and vigor, and the attention and arousal engendered by the stimulus (e.g. Botvinik-Nezer et al., 2020; Roesch et al., 2006; Wilson et al., 2014; Blanchard et al., 2015; O’Doherty, 2007; Niv et al., 2007; Yoo et al., 2018; Roesch and Olson, 2004). Indeed, in studies that separately assess encoding of outcome identity versus outcome value, activity in brain areas that are often considered to be emblematic of economic value (in particular, the OFC) turns out to correlate with outcome identity instead (Klein-Flügge et al., 2013).

A final insurmountable problem is that it may be impossible, even in theory, to obtain a brain measure of value that is independent of behavior. Suppose for example, that we identify a particular class of neurons whose firing rates are perfectly correlated with value down to our ability to measure it through preference. Supporting this idea, we observe that any procedure that modifies value (as inferred from behavior) changes the firing rate of these neurons in a manner consistent with our predictions. We may tentatively hypothesize that these neurons are (or are among) the value neurons of the brain. However, as shown by Schonberg and colleagues, preference can be changed irrespective of changing the economic worth of goods (Schonberg & Katz, 2020). Therefore, to differentiate value neurons from preference neurons, we would need to show that these neurons do not strictly follow expressed preference when the value function diverges from it. This is, of course, not possible if the value function never measurably diverges from preference. And if we assume that values can diverge from preference, it is not clear how to define values to start with. We call this “the neuroeconomic relativity problem” because, like Einstein’s relativity problem, it reflects the fact that there is no external reference frame to which one can calibrate value inferences.

Reconsidering the motivation for common currency

It may be worth asking, then, what does having a single common currency buy you? Why would the brain invest in such an organization? One advantage of a common currency scale is that it simplifies comparing options that differ along multiple dimensions. For example, when hunting for an apartment, the options may differ along dimensions of price, area, neighborhood, and amenities. The logic is that these dimensions must be first combined into a single scalar per apartment so that the scalars can be compared. However, this is not the only way to solve the problem, and importantly, may not be the way humans make their decisions. For example, the apartment shopper may choose a single dimension and pick the winner along that dimension, as discussed above, or may compare separately on each dimension and choose the apartment that wins on most counts (Tversky, 1972). Laboratory studies where humans can choose what attributes to view indeed suggest that people don’t uncover all the attributes of one option (to calculate its value) and then continue to the next option, but rather prefer to view information for all options attribute by attribute (Fellows, 2006; Hunt et al., 2014). Indeed, as mentioned, much empirical work indicates that human decision-makers broadly favor heuristic approaches that eschew a value stage in a large number of contexts (Gigerenzer and Gaissmeier, 2011; Lieder and Griffiths, 2020; Brandstätter et al., 2006; Kahneman et al., 1982).

Notably, systems that make use of heuristics may generate internal variables that are conceptually distinct from value but that correlate with value, thus leading to an interpretational confound. Consider, for example, a relatively well-understood implementation of choice in a (non-brain) distributed system: the selection of hive sites in bees (Apis mellifera, Seeley, 2010; Seeley and Burhman, 1999; Seeley et al., 2006). Bee swarms select a hive site by sending out scouts to investigate potential sites. Each site differs along roughly 20 dimensions of varying but measurable importance (size, safety from predators, exposure to sunlight, wind, etc.). Each scout bee that encounters the hive site performs an extremely poor estimation of the value of the site – it typically will only sample three-to-four of the dimensions and even then, estimates them poorly. The bee then returns and if the estimated site quality is sufficiently good, indicates an assessment of its quality to the other bees in the swarm. The quality it signals will be correlated with the overall value of the hive site, but only weakly, and, critically, will only integrate a subset of relevant value components. The overall comparison between options is indirect – the options race to attract adherents; the majority of the decision is made by a positive feedback process. This example is particularly relevant because scholars of perceptual decision making may recognize this process as strongly related to ideas about how the brain decides between options by racing to bounds, and there is evidence that deliberative processes in the brain follow the same principles as choice processes in bee swarms (Franks et al., 2003; Mitchell, 2009; Passino et al. 2008; Pirrone et al., 2018; Pais et al., 2013; Eisenreich et al, 2017).

Evaluating the neural evidence for the common currency hypothesis

Evidence for the common currency hypothesis comes from the observation that firing rates of single neurons or hemodynamic responses of voxels correlate with values of offers and outcomes (Padoa-Schioppa, 2011; Rangel et al., 2008; Levy and Glimcher, 2012; Kennerley et al., 2009; Kennerley et al., 2011). These responses depend on multiple elements of offers (e.g., the expected reward, as well as the associated response costs), and are modulated by factors that affect subjective value, such as context or level of satiety (see O’Neill and Schultz, 2010; Rudebeck et al., 2017; Conen and Padoa-Schioppa, 2019; Azab and Hayden, 2018 for only a few examples). Often, variations in these firing rates predict variations in choice (Conen and Padoa-Schioppa, 2015; Strait et al., 2014; Sugrue et al., 2004). Such patterns have been taken as evidence that the neural signal encodes value, in particular implicating the OFC (O’Reilly 2020; Bartra et al., 2013; Levy & Glimcher, 2012).

However, such patterns of neural responding do not definitively demonstrate a commoncurrency value signal. For example, in the original finding of value-encoding neurons in the OFC (Tremblay & Schultz, 1999), neurons responded more strongly to a high-valued option than to a low-valued one. However, the principle of common currency only held within a given choice pair – responses for option B were low when this option was paired with a better option A, but were high when this same option was paired with a worse option C. Other studies found value-correlated responses to be more stable across contexts (Padoa-Schioppa and Assad, 2008), but most findings support an encoding that depends on the alternatives (e.g. Padoa-Schioppa, 2009; Zimmermann et al., 2018; Kobayashi et al., 2010). One might interpret this finding as reflecting simple range adaptation of neurons that have a finite firing rate; however, a more parsimonious explanation is that the firing pattern is consistent with a relative preference code rather than an abstract value code. This interpretation places the putative neural representation of value one stage later than we would expect from a common-currency code – immediately after comparison (because a relative valuation is itself a comparison) rather than as an input to comparison, raising the question of where the input came from. An interpretation of this code as something other than economic value would obviate this worry. Importantly, with a code that depends on alternatives, there is no real sense in which we can read out the “true subjective value” of an option. We can only know if an option’s value is higher than another value – similar to the information provided by preferences and choice behavior.

Another corollary of such a relative code is that it is unclear to what extent we can read out a meaningful value signal when only one option is available. Indeed, presenting a subject with one option at a time should, theoretically, provide the best ability to read out option values from neural activity in areas associated with value, such as OFC, ventromedial prefrontal cortex, and dorsal and subgenual anterior cingulate cortices. Doing so reveals that while neural activity in these regions does correlate with the value of the first option, neural responses to the second (alternative) option correlate with the value difference, that is, they reflect the result of value comparison rather than valuation per se (Strait et al., 2014; Azab and Hayden, 2018; Hunt et al., 2018). Moreover, even responses to the first offer do not simply encode its value, but also contain information about the likelihood that option will be chosen (presumably relative to the expected value of the second offer, Azab and Hayden, 2017). These results suggest that even when evaluation is experimentally segregated from comparison, pure value encodings may not exist. In fact, the idea that prefrontal neurons are selective or responsive to a single experimenter-defined variable has been increasingly falling out of favor (Rigotti et al., 2013; Fusi et al., 2016; Raposo et al., 2014), with “mixed selectivity” appearing to be a core operating principle of prefrontal cortex, including in ostensible value regions (Kimmel et al., 2020; Blanchard et al., 2018; Hayden and Platt, 2010; Yoo and Hayden, 2018).

A third issue is that although the activity of some neurons in the OFC is correlated with value (but not necessarily linearly related to value), activity of other neurons in this same area is anticorrelated with value, with the majority of neurons showing no relationship at all with value. This raises the possibility that the neurons are encoding something other than value, for instance, a distributed representation of the identity of each of the three options, A, B and C above, or the identity of the stimulus representing the offer. With the small number of options evaluated in most experiments, such a distributed code of outcome or stimulus identity (but not value) can easily result in some neurons randomly firing most strongly for the higher-valued of the three options and least for the worse option, while in other neurons the relationship would be in the opposite direction. This would also explain why many neurons show a non-monotonic relationship between value and their firing.

Recently, using ensemble recordings in the OFC that allow analysis on the level of a single trial, Wallis and colleagues have attempted a more direct test of the hypothesis that OFC encodes value (Rich & Wallis, 2016). In their task, monkeys were offered two options (denoted by images corresponding to the options) and asked to choose between them. On some trials, only one option was offered. Classifiers were trained to classify options corresponding to each of 4 offer-value levels (0.05, 0.10, 0.18, and 0.30 ml juice, or, in separate blocks, 4 levels of second-order reinforcer), aggregating over the two images corresponding to each value level. Using single-unit activity and local field-potential recordings in the OFC, the authors could classify the offered value above chance, suggesting that OFC neurons encoded information about value. However, here too, offer values may have been encoded as different (outcome stimulus) identities, not different (ordinal) values in a common currency. One stringent test of the value-coding hypothesis would be to show that a combination of two offers of value level 0.05 ml resulted in similar activation pattern to a single offer of value level 0.10 ml. Ensemble recordings that give ample data in a single trial allow testing this hypotheses with novel combinations that have not been trained to predict the same identity of reward through multiple presentations, therefore testing the assumptions of the additivity of common currency directly. This critical test has not been conducted, to our knowledge.

As mentioned, many factors that are conceptually distinct from value can influence choice – and are therefore closely correlated with value (Maunsell, 2004). Reward value drives attention, promotes both short- and long-term learning, primes behavioral adjustments, updates internal models, activates circuitry that detect both positive and negative surprises, and elicits mental computations of cost-benefit tradeoffs as well as comparisons with what could have been chosen. All of these are different from scalar, common-currency value, but are known to drive neural activity in the regions usually associated with economic value. As such, they confound that interpretation. This problem is a long-standing and notorious one in neuroeconomics (O’Doherty, 2014; Maunsell, 2004; Roesch and Olson, 2003).

To overcome some of these potential confounds, a strong tradition in neuroeconomics research is to control for extraneous factors such as salience and response cost (although we note that many factors – e.g., covert attention – cannot be controlled for). As a practical issue, it is difficult to control for alternative interpretations without making the task so convoluted that it becomes unnatural for the animal, in the ethological sense (O’Doherty, 2014). As a result, it is not clear if findings of value computation (if we believe them to be so) in these tasks would naturally translate to how the brain computes choice in naturalistic situations. But the deeper issue remains that value as inferred in most experiments is definitionally a summary of aggregated choice behavior, and therefore, any variable that influences choice behavior will necessarily be correlated with value. It is for this reason that we suggest that a stronger demonstration of scalar value coding in the brain should show mathematical properties such as the additivity of value, or separation from preferences as these are changed without changing value, as suggested above. If the response of neurons to the novel sum of two stimuli each promising 0.1 ml of juice is similar to the response of those same neurons to a different stimulus associated with 0.2 ml of juice, and especially if this response did not change when preference for an option was induced by means such as the mere exposure effect (Schonberg & Katz, 2020), it would be harder to argue that this is due to a shared motor plan, or attentional capture. We note that research to date has shown that the activity of brain areas associated with value, in particular the orbitofrontal cortex, does change when preferences are modified through methods that should, in principle, not change economic value (Botvinik-Nezer et al., 2020).

An alternative view: direct learning of policies

Although reinforcement-learning models have contributed to the assumption that the brain computes values, many reinforcement-learning algorithms do not learn or estimate values for different actions. Instead, they directly learn action policies. In fact, in the reinforcement-learning literature, the goal of an agent is to obtain as much reward as possible by executing optimal actions – calculating values is only one means to achieve that end.

In explaining reinforcement-learning methods, Dayan and Abbott’s (2001) textbook begins with the “direct actor” – an actor that learns actions without computing their values. The Actor-Critic model, a prominent algorithm for reinforcement learning that has been linked to the brain (Barto, 1995; Joel et al., 2002; O’Doherty et al., 2004; Takahashi et al., 2008; Maia, 2010), does exactly that. In this algorithm, a Critic module learns values of states in terms of expected future rewards (here, the state includes all available actions, averaging over choices and explicitly not computing the value of each possible choice), and uses these to compute reward prediction errors. These prediction errors are used to learn an action policy in the Actor module: the probability of actions that are followed by positive prediction errors is increased, and the probability of actions that are followed by negative prediction errors is decreased. Under some reasonable conditions, this model learns correct reward-maximizing policies (Sutton & Barto, 2018). However, the quantities learned by the Actor – tendencies to perform one action over the other – cannot be read out as action values. In particular, due to the way the algorithm learns, ties are broken between equally good actions such that eventually agents learn deterministic policies. To be clear, if four actions were to lead to 1, 2, 3 and 3 drops of juice, respectively, the model may learn to always choose the third option, or to always choose the fourth (both optimal policies). At the end of learning, action weights in the Actor may be 0, 0, 1, 0 respectively, losing all information about value, or relative value. Even before convergence to a deterministic policy, weights may be 0, 0.1, 0.95, 0.2 respectively (or any other combination – weights here do not have to sum up to 1, and these are just illustrative numbers). Importantly, if the basal ganglia indeed implement an Actor-Critic learning algorithm in the brain, there is no sense in which we can glean action values from the Actor or the Critic.

The Actor-Critic model is only one of a class of reinforcement-learning algorithms that learn policies directly, that is, without calculating option values (e.g., Williams, 1992; Sutton et al., 2000). In their general form, these algorithms maintain an action policy, use experience to evaluate a gradient direction for this policy (that is, what change in policy would increase the overall obtained reward), and change the policy in that direction. Another notable model, developed to explain behavioral patterns in choices, is the Experience-Weighted Attraction model of Camerer & Ho (1999), which interpolates between value calculation and direct policy learning.

Recent findings suggest that both behavior and learning signals measured in humans during decision making are more in line with a policy-learning algorithm than with value estimation. For instance, Li & Daw (2011) had participants choose between two options that gave reward with different probabilities. After each choice, both the outcome of the chosen option and the counterfactual outcome of the unchosen option were displayed. Behavior, as well as neural signals corresponding to prediction errors in the basal ganglia, suggested that subjects were updating both options in opposite directions, learning relative choice propensities (a policy) rather than tracking the expected value of each option. In another task in which humans chose between pairs of options and were able to view both the outcome of their choice and the counterfactual outcome of the forgone option, Palminteri et al. (2015) had subjects learn which of two probabilistically rewarding options is better, and which of two probabilistically punishing options was better. At test, subjects were asked to choose between pairs from both the rewarding and the punishing contexts. Surprisingly, when choosing between the less rewarding option and the less punishing option, subjects tended to choose the less punishing option. This is consistent with policy learning, as that option had been the favored option in the punishment context, whereas the less rewarding option had been the disfavored one in the reward context. However, the value of a sometimes-rewarding option is clearly higher than that of a sometimes-punishing option, hence value learning cannot explain this fundamentally suboptimal choice pattern.

Direct learning of policies is consistent with basic tenets of decision making, such as the fact that choices are stochastic even when no exploration is necessary or warranted (e.g., choosing between two gambles that are fully described; Khaw et al., 2017a). Because policies are relative quantities (preference for one option implies unfavorability of another, as the probabilities of all choices have to sum to one), they also explain common violations of the independence axiom such as the effects of third-option “decoys” on choice (Soltani et al., 2012), and temporal and spatial contextual influences on choice (Khaw et al., 2017b), although these phenomena can also be explained by valuation models that involve range normalization.

Conclusion

The field of neuroeconomics started with the putative identification of pure value signals (Platt and Glimcher, 1999). The meaning of these signals was disputed early on (Roesch et al., 2003; Maunsell, 2004), but deep questions were set aside as researchers continued to identify brain areas with parts of the computational processes of choice, and in particular, identified the orbitofrontal cortex with the seat of economic value. However, while those debates have subsided, the problems they raised have not been resolved. Progress on these issues will require additional data, but we stress that not every experiment that involves choices also implies valuation, and one must be careful in interpreting data not only because of potential confounds, but also because we must be wary of treating a hypothesis – that the brain computes value – as an axiom. We also suggest the need for additional philosophical work to define value in a way that is – at least in principle – dissociable from other factors that promote choices (Juechems & Summerfield, 2019). In other words, we argue for a return to the productive debates of the early days of the field, twenty years ago (for neuroeconomics), and earlier (for its psychological underpinnings). Bolstered by an additional twenty years of new data, such debates would surely benefit the field of neuroeconomics moving forward. They would also potentially help reconcile conflicting views on what the orbitofrontal cortex might or might not be doing in decision making (Stalnaker et al., 2015).

Acknowledgements

We thank the Gordon Research Conference on Neurobiology of Cognition 2018, organized by Jennifer Groh and David Leopold, for providing the context in which this collaboration was seeded. We thank Sarah Heilbronner, Rei Akaishi, Becket Ebitz and many commentators on Twitter for helpful discussions.

Funding statement

This research was supported by National Institute on Drug Abuse Grants R01 DA038615 (BYH) and R01 DA042065 (YN), and by grants from the John Templeton Foundation (to both YN and BYH). The opinions expressed in this publication are those of the authors and do not necessarily reflect the views of the John Templeton Foundation or the National Institute on Drug Abuse.

Footnotes

1

We consider subjective worth as revealed by preference, following neuroeconomic theory, but note that the limiting focus on only the ‘wanting’ not the ‘liking’ side of value (Berridge, 1996).

Competing interests

The authors have no competing interests to declare.

References

  1. Allais M (1953). Le comportement de l’homme rationnel devant le risque: critique des postulats et axiomes de l’école Américaine. Econometrica: Journal of the Econometric Society, 503546. [Google Scholar]
  2. Ariely D, Loewenstein G, & Prelec D (2003). “Coherent arbitrariness”: Stable demand curves without stable preferences. The Quarterly journal of economics, 118(1), 73–106. [Google Scholar]
  3. Armel KC, Beaumel A, & Rangel A (2008). Biasing simple choices by manipulating relative visual attention. Judgment and Decision making, 3(5), 396–403. [Google Scholar]
  4. Azab H, & Hayden BY (2017). Correlates of decisional dynamics in the dorsal anterior cingulate cortex. PLoS biology, 15(11), e2003091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Azab H, & Hayden BY (2018). Correlates of economic decisions in the dorsal and subgenual anterior cingulate cortices. European Journal of Neuroscience, 47(8), 979–993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Azab H, & Hayden BY (2020). Partial integration of the components of value in anterior cingulate cortex. Behavioral Neuroscience, 134(4), 296. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Barto AG (1995). Adaptive critics and the basal ganglia. In Houk J, Davis J, & Beiser D (Eds.), Models of information processing in the basal ganglia (pp. 215–232). Cambridge, MA: MIT Press [Google Scholar]
  8. Bartra O, McGuire JT, & Kable JW (2013). The valuation system: a coordinate-based meta-analysis of BOLD fMRI experiments examining neural correlates of subjective value. Neuroimage, 76, 412–427. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Berridge KC (1996). Food reward: brain substrates of wanting and liking. Neuroscience & Biobehavioral Reviews, 20(1), 1–25. [DOI] [PubMed] [Google Scholar]
  10. Blanchard TC, Strait CE, & Hayden BY (2015). Ramping ensemble activity in dorsal anterior cingulate neurons during persistent commitment to a decision. Journal of neurophysiology, 114(4), 2439–2449. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Blanchard TC, Piantadosi ST, & Hayden BY (2018). Robust mixture modeling reveals category-free selectivity in reward region neuronal ensembles. Journal of neurophysiology, 119(4), 1305–1318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Botvinik-Nezer R, Salomon T, & Schonberg T (2020). Enhanced bottom-up and reduced top-down fMRI activity is related to long-lasting nonreinforced behavioral change. Cerebral Cortex, 30(3), 858–874. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Brandstätter E, Gigerenzer G, & Hertwig R (2006). The priority heuristic: making choices without trade-offs. Psychological review, 113(2), 409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Busemeyer JR, & Townsend JT (1993). Decision field theory: a dynamic-cognitive approach to decision making in an uncertain environment. Psychological review, 100(3), 432. [DOI] [PubMed] [Google Scholar]
  15. Camerer C, Loewenstein G, & Prelec D (2005). Neuroeconomics: How neuroscience can inform economics. Journal of economic Literature, 43(1), 9–64. [Google Scholar]
  16. Camerer C, & Hua Ho T (1999). Experience‐weighted attraction learning in normal form games. Econometrica, 67(4), 827–874. [Google Scholar]
  17. Chater N (2018). The mind is flat: The remarkable shallowness of the improvising brain. Yale University Press. [Google Scholar]
  18. Churchland PM (1981). Eliminative materialism and propositional attitudes. the Journal of Philosophy, 78(2), 67–90. [Google Scholar]
  19. Churchland P (1984). Eliminative materialism. Matter and Consciousness, 43–49. [Google Scholar]
  20. Churchland PM (2013). Matter and consciousness. MIT press. [Google Scholar]
  21. Conen KE, & Padoa-Schioppa C (2019). Partial adaptation to the value range in the macaque orbitofrontal cortex. Journal of Neuroscience, 39(18), 3498–3513. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Conen KE, & Padoa-Schioppa C (2015). Neuronal variability in orbitofrontal cortex during economic decisions. Journal of neurophysiology, 114(3), 1367–1381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Dayan P, & Abbott LF (2001). Theoretical neuroscience: computational and mathematical modeling of neural systems. Computational Neuroscience Series. [Google Scholar]
  24. Eisenreich BR, Akaishi R, & Hayden BY (2017). Control without controllers: toward a distributed neuroscience of executive control. Journal of cognitive neuroscience, 29(10), 1684–1698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Franks NR, Dornhaus A, Fitzsimmons JP, & Stevens M (2003). Speed versus accuracy in collective decision making. Proceedings of the Royal Society of London. Series B: Biological Sciences, 270(1532), 2457–2463. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Glimcher PW, & Fehr E (Eds.). (2013). Neuroeconomics: Decision making and the brain. Academic Press. [Google Scholar]
  27. Gigerenzer G, & Gaissmaier W (2011). Heuristic decision making. Annual review of psychology, 62, 451–482. [DOI] [PubMed] [Google Scholar]
  28. Gul F, & Pesendorfer W (2008). The case for mindless economics. The foundations of positive and normative economics: A handbook, 1, 3–42. [Google Scholar]
  29. Farashahi S, Azab H, Hayden B, & Soltani A (2018). On the flexibility of basic risk attitudes in monkeys. Journal of Neuroscience, 38(18), 4383–4398. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Fellows LK (2006). Deciding how to decide: ventromedial frontal lobe damage affects information acquisition in multi-attribute decision making. Brain, 129(4), 944–952. [DOI] [PubMed] [Google Scholar]
  31. Fishburn PC, & Kochenberger GA (1979). Two‐piece von Neumann‐Morgenstern utility functions. Decision Sciences, 10(4), 503–518. [Google Scholar]
  32. Friedman M (1953). The methodology of positive economics. Essays in positive economics, 3(3), 145–178. [Google Scholar]
  33. Fusi S, Miller EK, & Rigotti M (2016). Why neurons mix: high dimensionality for higher cognition. Current opinion in neurobiology, 37, 66–74. [DOI] [PubMed] [Google Scholar]
  34. Haber SN, & Knutson B (2010). The reward circuit: linking primate anatomy and human imaging. Neuropsychopharmacology, 35(1), 4–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Harrison GW (2008). Neuroeconomics: A critical reconsideration. Economics & Philosophy, 24(3), 303–344. [Google Scholar]
  36. Hayden BY, & Platt ML (2009). The mean, the median, and the St. Petersburg paradox. Judgment and Decision Making, 4(4), 256. [PMC free article] [PubMed] [Google Scholar]
  37. Hayden BY, & Platt ML (2010). Neurons in anterior cingulate cortex multiplex information about reward and action. Journal of Neuroscience, 30(9), 3339–3346. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Hayden BY, & Moreno-Bote R (2018). A neuronal theory of sequential economic choice. Brain and Neuroscience Advances, 2, 2398212818766675. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Heilbronner SR, & Hayden BY (2016). The description-experience gap in risky choice in nonhuman primates. Psychonomic bulletin & review, 23(2), 593–600. [DOI] [PubMed] [Google Scholar]
  40. Hunt LT, Dolan RJ, & Behrens TE (2014). Hierarchical competitions subserving multiattribute choice. Nature neuroscience, 17(11), 1613–1622. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Hunt LT, Malalasekera WN, de Berker AO, Miranda B, Farmer SF, Behrens TE, & Kennerley SW (2018). Triple dissociation of attention and decision computations across prefrontal cortex. Nature neuroscience, 21(10), 1471–1481. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Izuma K, Matsumoto M, Murayama K, Samejima K, Sadato N, & Matsumoto K (2010). Neural correlates of cognitive dissonance and choice-induced preference change. Proceedings of the National Academy of Sciences, 107(51), 22014–22019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Joel D, Niv Y, & Ruppin E (2002). Actor–critic models of the basal ganglia: New anatomical and computational perspectives. Neural networks, 15(4–6), 535–547. [DOI] [PubMed] [Google Scholar]
  44. Juechems K, & Summerfield C (2019). Where does value come from? Trends in cognitive sciences, 23(10), 836–850. [DOI] [PubMed] [Google Scholar]
  45. Kable JW, & Glimcher PW (2009). The neurobiology of decision: consensus and controversy. Neuron, 63(6), 733–745. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Kahneman D & Tversky A (1979) Prospect theory: an analysis of decisions under risk. Econometrica 47, 263–291. [Google Scholar]
  47. Kahneman D, Slovic SP, Slovic P, & Tversky A (Eds.). (1982). Judgment under uncertainty: Heuristics and biases. Cambridge university press. [DOI] [PubMed] [Google Scholar]
  48. Kennerley SW, Dahmubed AF, Lara AH, & Wallis JD (2009). Neurons in the frontal lobe encode the value of multiple decision variables. Journal of cognitive neuroscience, 21(6), 1162–1178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Kennerley SW, Behrens TE, & Wallis JD (2011). Double dissociation of value computations in orbitofrontal and anterior cingulate neurons. Nature neuroscience, 14(12), 1581. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Khaw MW, Li Z, & Woodford M (2017). Risk aversion as a perceptual bias (No. w23294). National Bureau of Economic Research. [Google Scholar]
  51. Khaw MW, Glimcher PW, & Louie K (2017). Normalized value coding explains dynamic adaptation in the human valuation process. Proceedings of the National Academy of Sciences, 114(48), 12696–12701. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Kimmel DL, Elsayed GF, Cunningham JP, & Newsome WT (2020). Value and choice as separable, stable representations in orbitofrontal cortex. Nature Communications 11:3466. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Klein-Flügge MC, Barron HC, Brodersen KH, Dolan RJ, & Behrens TEJ (2013). Segregated encoding of reward–identity and stimulus–reward associations in human orbitofrontal cortex. Journal of Neuroscience, 33(7), 3202–3211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Knutson B, Adams CM, Fong GW, & Hommer D (2001). Anticipation of increasing monetary reward selectively recruits nucleus accumbens. Journal of Neuroscience, 21(16), RC159-RC159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Kobayashi S, de Carvalho OP, & Schultz W (2010). Adaptation of reward sensitivity in orbitofrontal neurons. Journal of Neuroscience, 30(2), 534–544. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Levy DJ, & Glimcher PW (2012). The root of all value: a neural common currency for choice. Current opinion in neurobiology, 22(6), 1027–1038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Li J, & Daw ND (2011). Signals in human striatum are appropriate for policy update rather than value prediction. Journal of Neuroscience, 31(14), 5504–5511. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Lieder F, & Griffiths TL (2020). Resource-rational analysis: understanding human cognition as the optimal use of limited computational resources. Behavioral and Brain Sciences, 43. [DOI] [PubMed] [Google Scholar]
  59. Lichtenstein S, & Slovic P (Eds.). (2006). The construction of preference. Cambridge University Press. [Google Scholar]
  60. Loewenstein G, Rick S, & Cohen JD (2008). Neuroeconomics. Annu. Rev. Psychol, 59, 647–672. [DOI] [PubMed] [Google Scholar]
  61. Maia TV (2010). Two-factor theory, the actor-critic model, and conditioned avoidance. Learning & behavior, 38(1), 50–67. [DOI] [PubMed] [Google Scholar]
  62. Marsh B (2002). Do animals use heuristics?. Journal of Bioeconomics, 4(1), 49–56. [Google Scholar]
  63. Martin R (2011). The St. Petersburg paradox. Stanford Encyclopedia of Philosophy. [Google Scholar]
  64. Maunsell JH (2004). Neuronal representations of cognitive state: reward or attention?. Trends in cognitive sciences, 8(6), 261–265. [DOI] [PubMed] [Google Scholar]
  65. Miller KJ, Shenhav A, & Ludvig EA (2019). Habits without values. Psychological review, 126(2), 292. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Mitchell M (2009). Complexity: A guided tour. Oxford University Press. [Google Scholar]
  67. Montague PR, & Berns GS (2002). Neural economics and the biological substrates of valuation. Neuron, 36(2), 265–284. [DOI] [PubMed] [Google Scholar]
  68. Niv Y (2009). Reinforcement learning in the brain. Journal of Mathematical Psychology, 53(3), 139–154. [Google Scholar]
  69. Niv Y, Daw ND, Joel D, & Dayan P (2007). Tonic dopamine: opportunity costs and the control of response vigor. Psychopharmacology, 191(3), 507–520. [DOI] [PubMed] [Google Scholar]
  70. O’Doherty JP (2014). The problem with value. Neuroscience & Biobehavioral Reviews, 43, 259–268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. O’Doherty JP (2007). Lights, camembert, action! The role of human orbitofrontal cortex in encoding stimuli, rewards, and choices. Annals of the New York Academy of Sciences, 1121(1), 254–272. [DOI] [PubMed] [Google Scholar]
  72. O’Doherty J, Dayan P, Schultz J, Deichmann R, Friston K, & Dolan RJ (2004). Dissociable roles of ventral and dorsal striatum in instrumental conditioning. science, 304(5669), 452–454. [DOI] [PubMed] [Google Scholar]
  73. O’Neill M, & Schultz W (2010). Coding of reward risk by orbitofrontal neurons is mostly distinct from coding of reward value. Neuron, 68(4), 789–800. [DOI] [PubMed] [Google Scholar]
  74. O’Reilly RC (2020). Unraveling the Mysteries of Motivation. Trends in Cognitive Sciences. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Padoa-Schioppa C (2011). Neurobiology of economic choice: a good-based model. Annual review of neuroscience, 34, 333–359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Padoa-Schioppa C, & Assad JA (2008). The representation of economic value in the orbitofrontal cortex is invariant for changes of menu. Nature neuroscience, 11(1), 95–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Padoa-Schioppa C (2009). Range-adapting representation of economic value in the orbitofrontal cortex. Journal of Neuroscience, 29(44), 14004–14014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Pais D, Hogan PM, Schlegel T, Franks NR, Leonard NE, & Marshall JA (2013). A mechanism for value-sensitive decision-making. PLoS One, 8(9), e73216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Palminteri S, Khamassi M, Joffily M, & Coricelli G (2015). Contextual modulation of value signals in reward and punishment learning. Nature communications, 6(1), 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Passino KM, Seeley TD, & Visscher PK (2008). Swarm cognition in honey bees. Behavioral Ecology and Sociobiology, 62(3), 401–414. [Google Scholar]
  81. Payne JW, Bettman JR, & Johnson EJ (1992). Behavioral decision research: A constructive processing perspective. Annual review of psychology, 43(1), 87–131. [Google Scholar]
  82. Piantadosi ST, & Hayden BY (2015). Utility-free heuristic models of two-option choice can mimic predictions of utility-stage models under many conditions. Frontiers in neuroscience, 9, 105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Pirrone A, Azab H, Hayden BY, Stafford T, & Marshall JA (2018). Evidence for the speed–value trade-off: Human and monkey decision making is magnitude sensitive. Decision, 5(2), 129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Plassmann H, O’Doherty J, & Rangel A (2007). Orbitofrontal cortex encodes willingness to pay in everyday economic transactions. Journal of neuroscience, 27(37), 9984–9988. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Rangel A, Camerer C, & Montague PR (2008). A framework for studying the neurobiology of value-based decision making. Nature reviews neuroscience, 9(7), 545–556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Raposo D, Kaufman MT, & Churchland AK (2014). A category-free neural population supports evolving demands during decision-making. Nature neuroscience, 17(12), 1784. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Reyna VF (2008). A theory of medical decision making and health: fuzzy trace theory. Medical decision making, 28(6), 850–865. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Rich EL, & Wallis JD (2016). Decoding subjective decisions from orbitofrontal cortex. Nature neuroscience, 19(7), 973–980. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Rigotti M, Barak O, Warden MR, Wang XJ, Daw ND, Miller EK, & Fusi S (2013). The importance of mixed selectivity in complex cognitive tasks. Nature, 497(7451), 585590. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Roesch MR, & Olson CR (2004). Neuronal activity related to reward value and motivation in primate frontal cortex. Science, 304(5668), 307–310. [DOI] [PubMed] [Google Scholar]
  91. Roesch MR, Taylor AR, & Schoenbaum G (2006). Encoding of time-discounted rewards in orbitofrontal cortex is independent of value representation. Neuron, 51(4), 509–520. [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. Roesch MR, & Olson CR (2003). Impact of expected reward on neuronal activity in prefrontal cortex, frontal and supplementary eye fields and premotor cortex. Journal of neurophysiology, 90(3), 1766–1789. [DOI] [PubMed] [Google Scholar]
  93. Rorty R (1970). In defense of eliminative materialism. The review of metaphysics, 112–121. [Google Scholar]
  94. Rudebeck PH, Saunders RC, Lundgren DA, & Murray EA (2017). Specialized representations of value in the orbital and ventrolateral prefrontal cortex: desirability versus availability of outcomes. Neuron, 95(5), 1208–1220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  95. Rushworth MF, Noonan MP, Boorman ED, Walton ME, & Behrens TE (2011). Frontal cortex and reward-guided learning and decision-making. Neuron, 70(6), 10541069. [DOI] [PubMed] [Google Scholar]
  96. Schonberg T, Bakkour A, Hover AM, Mumford JA, Nagar L, Perez J, & Poldrack RA (2014). Changing value through cued approach: an automatic mechanism of behavior change. Nature neuroscience, 17(4), 625–630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. Schonberg T, & Katz LN (2020). A Neural Pathway for Nonreinforced Preference Change. Trends in Cognitive Sciences. [DOI] [PubMed] [Google Scholar]
  98. Salomon T, Botvinik-Nezer R, Gutentag T, Gera R, Iwanir R, Tamir M, & Schonberg T (2018). The cue-approach task as a general mechanism for long-term non-reinforced behavioral change. Scientific reports, 8(1), 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Santos LR, & Rosati AG (2015). The evolutionary roots of human decision making. Annual review of psychology, 66, 321. [DOI] [PMC free article] [PubMed] [Google Scholar]
  100. Seeley TD (2010). Honeybee democracy. Princeton University Press. [Google Scholar]
  101. Seeley TD, & Buhrman SC (1999). Group decision making in swarms of honey bees. Behavioral Ecology and Sociobiology, 45(1), 19–31. [Google Scholar]
  102. Seeley TD, Visscher PK, & Passino KM (2006). Group Decision Making in Honey Bee Swarms: When 10,000 bees go house hunting, how do they cooperatively choose their new nesting site?. American scientist, 94(3), 220–229. [Google Scholar]
  103. Shafir S, Waite TA, & Smith BH (2002). Context-dependent violations of rational choice in honeybees (Apis mellifera) and gray jays (Perisoreus canadensis). Behavioral Ecology and Sociobiology, 51(2), 180–187. [Google Scholar]
  104. Sharot T, Fleming SM, Yu X, Koster R, & Dolan RJ (2012). Is choice-induced preference change long lasting? Psychological science, 23(10), 1123–1129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  105. Shimojo S, Simion C, Shimojo E, & Scheier C (2003). Gaze bias both reflects and influences preference. Nature neuroscience, 6(12), 1317–1322. [DOI] [PubMed] [Google Scholar]
  106. Soltani A, De Martino B, & Camerer C (2012). A range-normalization model of context-dependent choice: a new model and evidence. PLoS Comput Biol, 8(7), e1002607. [DOI] [PMC free article] [PubMed] [Google Scholar]
  107. Stalnaker TA, Cooch NK, & Schoenbaum G (2015). What the orbitofrontal cortex does not do. Nature neuroscience, 18(5), 620. [DOI] [PMC free article] [PubMed] [Google Scholar]
  108. Stewart N, Chater N, & Brown GD (2006). Decision by sampling. Cognitive psychology, 53(1), 1–26. [DOI] [PubMed] [Google Scholar]
  109. Stewart N, & Simpson K (2008). A decision-by-sampling account of decision under risk. The probabilistic mind: Prospects for Bayesian cognitive science, 261–276. [Google Scholar]
  110. Stevens JR (2016). Intertemporal similarity: Discounting as a last resort. Journal of Behavioral Decision Making, 29(1), 12–24. [Google Scholar]
  111. Sugrue LP, Corrado GS, & Newsome WT (2005). Choosing the greater of two goods: neural currencies for valuation and decision making. Nature Reviews Neuroscience, 6(5), 363–375. [DOI] [PubMed] [Google Scholar]
  112. Sugrue LP, Corrado GS, & Newsome WT (2004). Matching behavior and the representation of value in the parietal cortex. science, 304(5678), 1782–1787. [DOI] [PubMed] [Google Scholar]
  113. Sutton RS, McAllester DA, Singh SP, & Mansour Y (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems (pp. 1057–1063). [Google Scholar]
  114. Sutton RS, & Barto AG (2018). Reinforcement learning: An introduction. MIT press. [Google Scholar]
  115. Takahashi Y, Schoenbaum G, & Niv Y (2008). Silencing the critics: understanding the effects of cocaine sensitization on dorsolateral and ventral striatum in the context of an actor/critic model. Frontiers in neuroscience, 2, 14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  116. Teodorescu AR, Moran R, & Usher M (2016). Absolutely relative or relatively absolute: violations of value invariance in human decision making. Psychonomic bulletin & review, 23(1), 22–38. [DOI] [PubMed] [Google Scholar]
  117. Tremblay L, & Schultz W (1999). Relative reward preference in primate orbitofrontal cortex. Nature, 398(6729), 704–708. [DOI] [PubMed] [Google Scholar]
  118. Tversky A, & Fox CR (1995). Weighing risk and uncertainty. Psychological review, 102(2), 269. [Google Scholar]
  119. Tversky A (1972). Elimination by aspects: A theory of choice. Psychological review, 79(4), 281. [Google Scholar]
  120. Tversky A, & Shafir E (1992). Choice under conflict: The dynamics of deferred decision. Psychological science, 3(6), 358–361. [Google Scholar]
  121. Tversky A (1969). Intransitivity of preferences. Psychological review, 76(1), 31. [Google Scholar]
  122. Vlaev I, Chater N, Stewart N, & Brown GD (2011). Does the brain calculate value? Trends in cognitive sciences, 15(11), 546–554. [DOI] [PubMed] [Google Scholar]
  123. Voigt K, Murawski C, & Bode S (2017). Endogenous formation of preferences: Choices systematically change willingness-to-pay for goods. Journal of Experimental Psychology: Learning, Memory, and Cognition, 43(12), 1872. [DOI] [PubMed] [Google Scholar]
  124. Von Neumann J, & Morgenstern O (2007). Theory of games and economic behavior (commemorative edition). Princeton university press. [Google Scholar]
  125. Wallis JD, & Rich EL (2011). Challenges of interpreting frontal neurons during value-based decision-making. Frontiers in neuroscience, 5, 124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  126. Wallis JD (2007). Orbitofrontal cortex and its contribution to decision-making. Annu. Rev. Neurosci, 30, 31–56. [DOI] [PubMed] [Google Scholar]
  127. Weber EU, Johnson EJ, Milch KF, Chang H, Brodscholl JC, & Goldstein DG (2007). Asymmetric discounting in intertemporal choice: A query-theory account. Psychological science, 18(6), 516–523. [DOI] [PubMed] [Google Scholar]
  128. Williams RJ (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3–4), 229–256. [Google Scholar]
  129. Wilson RC, Takahashi YK, Schoenbaum G, & Niv Y (2014). Orbitofrontal cortex as a cognitive map of task space. Neuron, 81(2), 267–279. [DOI] [PMC free article] [PubMed] [Google Scholar]
  130. Yoo SBM, Sleezer BJ, & Hayden BY (2018). Robust encoding of spatial information in orbitofrontal cortex and striatum. Journal of cognitive neuroscience, 30(6), 898–913. [DOI] [PubMed] [Google Scholar]
  131. Yoo SBM, & Hayden BY (2018). Economic choice as an untangling of options into actions. Neuron, 99(3), 434–447. [DOI] [PMC free article] [PubMed] [Google Scholar]
  132. Zimmermann J, Glimcher PW, & Louie K (2018). Multiple timescales of normalized value coding underlie adaptive choice behavior. Nature communications, 9(1), 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES