Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Jan 1.
Published in final edited form as: Curr Top Behav Neurosci. 2016;27:313–333. doi: 10.1007/7854_2015_375

The Computational Complexity of Valuation and Motivational Forces in Decision-Making Processes

A David Redish 1,, Nathan W Schultheiss 2, Evan C Carter 3
PMCID: PMC4937458  NIHMSID: NIHMS743604  PMID: 25981912

Abstract

The concept of value is fundamental to most theories of motivation and decision making. However, value has to be measured experimentally. Different methods of measuring value produce incompatible valuation hierarchies. Taking the agent’s perspective (rather than the experimenter’s), we interpret the different valuation measurement methods as accessing different decision-making systems and show how these different systems depend on different information processing algorithms. This identifies the translation from these multiple decision-making systems into a single action taken by a given agent as one of the most important open questions in decision making today. We conclude by looking at how these different valuation measures accessing different decision-making systems can be used to understand and treat decision dysfunction such as in addiction.

Keywords: Neuroeconomonics, Valuation, Multiple Decision Theory, Decision-Making


Value-based decision-making processes are integral to adaptive behavior, and accordingly, the concept of value is ubiquitous across decision-making studies from diverse perspectives, including studies of neuroeconomic choice (Glimcher et al. 2008; Kable and Glimcher 2009), optimal foraging theory (Stephens and Krebs 1987), reinforcement learning (Sutton and Barto 1998), and deliberative decision making (Rangel et al. 2008; Rangel and Hare 2010). Operational definitions of value within each of these fields often point to “value” as a common currency that can be applied to objects, actions, and experiences (Kable and Glimcher 2009; Levy and Glimcher 2012). Theoretically, using value as a common currency may ease an agent’s difficulty making choices between very different options. However, value and motivation are hypothetical constructs (MacCorquodale and Meehl 1954), and, as such, cannot be directly measured. Instead, they must be inferred from inter- pretations of observations. This review will start from the observation that measuring value in different ways can lead to incompatible orderings of valuation— how you measure value changes what things are valued more than others. Valuation is not trans-situational; instead, it is context-dependent.

1 What is Value?

The concept of value can be defined as a quantification of a resource in terms of its costs and benefits, as well as the subjective desire or preference for some quantity of one resource over another. Typically, value is considered to be a derivative property of the relationship between an agent and a given object of desire (Glimcher et al. 2008). Because both human and non-human animals make decisions using similar systems dependent on similar neural structures (Redish 2013), we draw evidence from both human and non-human decision-making studies to reach our conclusions. Among all animals, the needs of the individual may modulate the value of candidate actions in the world, as hunger predisposes an agent for consumable rewards, but even though it may be context-dependent, the valuation step is typically taken as a metric of external features of the world in the present condition of the agent (Glimcher et al. 2008, but see Niv et al. 2006b).

Fundamentally, the value of an object to an agent is a multidimensional entity. We rarely choose where to eat dinner based solely on the nutritional content of a particular dish, but rather we integrate the convenience and expense of different restaurants or our own kitchen, the taste or style of foods at local restaurants or the ingredients in our cupboards, the speed of service at different restaurants or the preparation time of foods, our mood, restaurant atmospheres, and other social factors, as well as what we may have had for lunch today or dinner yesterday. Potential costs, as in these examples, are determined by temporal, energetic, and resource constraints; while benefits range from social and hedonic experiences to energetic and health-related issues. Despite the numerous factors that may contribute to the valuations of different restaurants and our kitchen at home, we ultimately choose where to have dinner based on some integration of these factors. Current theories suggest that this process entails value as a means of reaching a common currency to make the choice (Glimcher 2003; Levy and Glimcher 2012; Wunderlich et al. 2012). Taking that idea to its logical conclusion suggests that the choice itself should be understood as the measure of the valuation process (Samuelson 1937; Redish 2013).

1.1 Measuring Value

This observation suggests that we need to take value as revealed through the actions of an agent. Different computational processes may well underlie different decision processes and different effective motivational measures of valuation (Daw et al. 2005; Niv et al. 2006a, b; Rangel et al. 2008; Redish et al. 2008; Montague et al. 2012; van der Meer et al. 2012). There are at least three simple ways of measuring value— revealed preference, willingness to pay, and by approach/avoidance. Interestingly, these three measures can produce different orderings of the same choices.

Revealed preferences

The simplest means of measuring whether something is more valued than another is to provide the two (or more) options as a choice and see which one is selected. Preferentially selecting option A over option B implies that value(A) > value(B). Logically, this simple conceptualization implies that value should be transitive [value(A) > value(B) and value(B) > value(C) implies that value(A) > value(C)], at least within the bounds of noise. However, as we will see below, the algorithm used when calculating value in a revealed preferences situation is more complex and the transitive property does not necessarily apply.

Willingness to pay

One can directly measure the value of a single thing by asking how much effort or sacrifice one is willing to make to get that thing. Typically, this is measured by requiring a physical effort to obtain an object (such as lever presses to obtain food or drug) or by demanding a sacrifice (such as asking how much money one would be willing to pay for a given item, such as a car). Measuring how much one is willing to pay for each of several options should lead to an ordering of preferences—logically, if one is willing to pay more for option A than one is willing to pay for option B, then one would expect A to be preferred to B, but this is not always the case (Lichtenstein and Slovic 2006; Ahmed 2010; Perry et al. 2013). Interestingly, as we will see below, the algorithm used when selecting actions that lead to paying effort to achieve a goal does not apparently apply to selling that goal back (Kahneman et al. 1991).

Humans, with their explicitly accessible linguistic abilities, can report their willingness to pay directly. (You can ask them.) Linguistically reported willingness-to-pay can diverge from a behaviorally observed willingness to pay (Kahneman and Tversky 2000; Lichtenstein and Slovic 2006). For example, many drug users will deny their willingness to pay high costs for drugs, but will, when faced with the drug, pay that high cost (Goldstein 2000).

Approach/avoidance

Even simpler than measuring the willingness to pay or the revelation of preferences, one can measure whether an agent will approach or avoid the option in question. This can be taken as a binary value applied to an option— approach it (positive) or avoid it (negative), but one can also measure the speed and vigor with which one approaches the option, which can provide a quantitative measure of that positive or negative component. Interestingly, the behaviors that agents do when in simple approach or avoidance tasks tend to reflect species-specific behaviors (such as a rat gnawing on a handle that predicts food reward) (Breland and Breland 1961; Dayan et al. 2006; Rangel et al. 2008; Redish 2013).

Interactions and modulations

It is, of course, possible to construct experiments in which these measures interact with each other—for example, if one measures simple approach to a reward and then places a shock before it, one is measuring the willingness to pay that shocks for that reward. Higher shock levels will (of course) lead to less likelihood of taking an option. One can also measure motivation in the modulations of these measures of value, such as the fact that a reminder of a potential reward in one context changes the revealed preferences toward that outcome (a phenomenon known as “Pavlovian-to-instrumental transfer” or PIT, Kruse et al. 1983; Corbit and Janak 2007; Talmi et al. 2008). As another example, it is possible to change the dimensions on which options are compared in a revealed preference task by guiding attention (Plous 1993; Gilovich et al. 2002; Hill 2008), by changing one’s emotional state (Dutton and Aron 1974; Andrade and Ariely 2009), or by making one of the options more concrete (Peters and Büchel 2010; Benoit et al. 2011).

2 Taking the Subject’s Point of View

The problem with the logic laid out in the start of this chapter is that it is derived from the experimenter’s point of view—it assumes that each of the three experimental paradigms measures something explicitly different. (This gets particularly complicated when we start to look at interactions and modulations.) Rather than working backwards from the behavioral experiments to hypothesized constructs, let us take the subject’s point of view to first consider how each of these means of measuring value reflects aspects of the agent’s decision-making process and then work forward from current taxonomies of the decision-making systems.

2.1 Information Processing in Decision-Making Systems

Decisions arise from information processing applied to a combination of (1) information about the current world (processed inferences from perception), (2) past experience (memory, history), and (3) goals and motivations (valuation). Although it is hypothetically possible that all decisions arise from a single fundamental algorithm applied to these three inputs (such as the maximization of subjective value, Samuelson 1937; Glimcher et al. 2008; Kable and Glimcher 2009), this is not necessarily true. Computational analyses of information processing that include calculation time, memory storage and access requirements, and the willingness to generalize suggest that different algorithms will be optimal in different situations (O’Keefe and Nadel 1978; Nadel 1994; Daw et al. 2005; Niv et al. 2006b; Rangel et al. 2008; Redish et al. 2008; van der Meer et al. 2012; Montague et al. 2012; Redish 2013).

Current analyses of the information processing that occurs within decision making suggest a taxonomy of three different processes1deliberation between imagined options, procedural action chains, and Pavlovian action-selection systems (Rangel et al. 2008; Redish et al. 2008; Montague et al. 2012; van der Meer et al. 2012; Redish 2013). As a first-order description, the three ways of measuring value tap into each of these three systems. However, the information processing that goes into each of these three systems implies a complexity of valuation that will need to be addressed. Additionally, the three systems can interact to lead to interesting situational dependencies of valuation.

2.1.1 Algorithms of Revealed Preference (Deliberation)

Computationally, deliberation entails the imagination and evaluation (comparison) of future outcomes, a process related to episodic future thinking. Episodic future thinking is the ability to imagine oneself into a specific potential future (Atance and O’Neill 2001; Buckner and Carroll 2007). Humans with hippocampal lesions do not create fully integrated episodic futures (Hassabis et al. 2007), and creating an integrated imagined episodic future activates prefrontal cortex and hippocampus as revealed by fMRI signals (Hassabis and Maguire 2011; Schacter et al. 2008; Schacter and Addis 2011). In rats, the primary evidence for episodic future thinking lies in decoding sequences of firing in hippocampal place cells, which have been revealed to fire in a sequence representing a serial path to the next goal (Pfeiffer and Foster 2013; Wikenheiser and Redish 2015). At choice points, rats will sometimes pause (Tolman 1932) and the hippocampus will represent the sequences to potential goals serially (i.e., episodically) (Johnson and Redish 2007).

Once the representation of that future is created, it must be evaluated. In rats, this evaluation process depends on the ventral striatum or nucleus accumbens (Smith et al. 2009; van der Meer and Redish 2011; Jones et al. 2012). In humans, this evaluation process is thought to include the orbitofrontal and ventromedial prefrontal cortex (O’Doherty et al. 2001; O’Doherty 2004; Coricelli et al. 2005; Hare et al. 2011; Winecoff et al. 2013); however, these interpretations have depended on fMRI signals, which do not have the resolution to determine whether these signals occur early enough to be actually involved in the decision process itself. In rats, fine time-scale analyses have found that orbitofrontal representations do not represent future goals until after a decision has been reached (Steiner and Redish 2012; Stott and Redish 2014). It remains unknown whether the human cortical components are also only active post-decision, or whether this difference is a species-specific difference. Species-specific differences can occur through how the computation is performed (humans tend to be more cortically dependent than rats, Streidter 2005), or they can occur because the tasks being used are different (spatial vs. non-spatial), or they can occur due to anatomical differences in the source of the signal (e.g., medial vs. lateral orbitofrontal cortex).

Nevertheless, the basic computation underlying deliberation is clear: To choose between options A and B, specific potential futures incorporating each option are imagined, evaluated, and compared. This computation requires sufficient understanding of the world to search forward to those imagined outcomes. (Thus, this decision process is often termed “model-based,” as it uses the model of the world to create a forward/imagined outcome.) Deliberation allows for fast learning because it is inherently flexible (knowing that option A will lead to consequence ɑ does not require one to take option A), but it is also slow and computationally expensive because one needs to infer consequence ɑ from one’s knowledge of the world.

At this time, the selection process that underlies deliberation remains unknown —Is it a test-and-evaluate system in which hypotheses are generated serially and the first one that is good enough is chosen or is there direct comparison between options?2 Importantly, the inconsistency between revealed-preference and will- ingness-to-pay measures of value suggests that deliberation depends on a more direct comparison between options. A serial test-and-evaluate system should produce similar measures whether one or two options are offered, which would lead to similar valuations between the revealed-preference and willingness-to-pay measures, but, as noted at the top of this chapter, these valuation measures often reveal incompatible valuation orderings.

Humans, other primates, and rats have all been observed to change their choices dependent on the set of available choices—the set of choices available changes the options chosen, sometimes incompatibly (such as in the classic case of extremeness aversion, where humans tend to select the middle option, Simonson and Tversky 1992, see Gallistel 1990; Plous 1993; Tremblay and Schultz 1999; Padoa-Schioppa 2009; Rangel and Clithero 2012). Incompatible choices is a case of intransitivity, in which choice A is chosen over C, C over B, and B over A. Neurophysiological recordings have found that neural representations of value in monkeys are transitive within a block, but not between blocks, a process known as “renormalization” (Tremblay and Schultz 1999; Padoa-Schioppa 2009). One process that can create this effect is for the value to be normalized within block (i.e., value is divided by a function of the average or maximum value available within that block). Neural representations are known to be dependent on excitatory–inhibitory networks that show content-addressable properties where inhibitory networks enforce limitations in the total activation of the excitatory cells (Hertz et al. 1991). Renormalization would be an obvious consequence of these excitatory–inhibitory networks.

Algorithmically, the fact that attention to specific features changes the selected option in a multivalued comparison (Plous 1993; Hill 2008; Hare et al. 2011) further suggests that deliberation entails a direct comparison between options. This would, of course, make the deliberative process non-transitive. In deliberation, a specific future is imagined and options within that future are compared (Johnson et al. 2007; Schacter et al. 2008). Imagination, like great art, consists of painting a few specific strokes—thus, the imagined comparison of A and B might focus on one dimension that the two options share, while the comparison of B and C may focus on another dimension, and the comparison of A and C on yet a third.

Further support for the idea that the selection process in deliberation is an actual comparison comes from the fact that concrete futures that are easier to imagine (Trope and Liberman 2003; Schacter et al. 2008) are preferred (Peters and Büchel 2010; Benoit et al. 2011). Consistent with this comparison hypothesis, working memory is related to deliberative abilities—agents with better working memory abilities are more likely to deliberate when given the chance (Burks et al. 2009; Bickel et al. 2011), consider more options (Franco-Watkins et al. 2006), and look further into the future (Bickel et al. 2011). These effects are a direct prediction of the search–evaluate–compare model of deliberative decision making (Kurth-Nelson and Redish 2012).

2.1.2 Algorithms of Willingness to Pay (Procedural)

Deliberation is a slow and laborious process; if you have to act quickly, or if the situation is not changing, then it would be more efficacious and more efficient to cache the best action, so it can be directly recalled. Algorithmically, this process entails a combination of recognition (categorization) processes and associated action chains (Klein 1999; Redish et al. 2007; Dezfouli and Balleine 2012; Redish 2013).

The recognition process is a form of categorization and parameterization of the world. Anatomically, this categorization can be seen in cortical signals that integrate information to identify the most likely situation (Yang and Shadlen 2007; Redish et al. 2007). Once the parameterized situation has been identified, an action chain can be released through learned mechanisms (Jog et al. 1999; Dezfouli and Balleine 2012; Smith and Graybiel 2013).

In much of the literature, the procedural system is identified as “model-free” because it does not require a model of the transitions that can occur within the world (Daw et al. 2005; Gläscher et al. 2010; Lee et al. 2014), but this is a misnomer. The recognition component of the procedural system requires development of a schema that defines the parameters of the task (Charness 1991; Klein 1999; Redish et al. 2007; Redish 2013; Schmidhuber 2014). An incorrect parameterization will prevent learning and severely reduce the efficacy of the actions selected. As noted by Klein (1999), these learned schemas are a form of expertise, which requires an implicit model of the structure of the world.

Economically, the question asked by the procedural system is fundamentally different from that asked by the deliberative system—the deliberative system asks Which choice is betterfi, while the procedural system asks How sure am I that this is the right action at this time? How expensive is it?.

As with the deliberative system, attention can modulate procedural actions, by identifying the specific cues and parameters that define a situation (Klein 1999; Redish et al. 2007). And motivational components can guide attention to specific aspects of a given situation.

2.1.3 Algorithms of Approach and Avoidance (Pavlovian)

Computationally, the third decision-making system entails stored situation-recognition and associative processes that release species-specific behaviors (see Rangel et al. 2008 and Redish 2013, for review).3 For example, Pavlov’s dogs learned to salivate on hearing the bell using this system, but could not have learned to apply an arbitrary action without using one of the other systems. Anatomically, this system depends on associations made within the amygdala (LeDoux 2000; Janak and Tye 2015) driving species-specific behaviors in the periaqueductal gray (Bandler and Shipley 1994; McNally et al. 2011), as well as simple approach/avoidance behaviors involving the shell of the accumbens (Flagel et al. 2009; Laurent et al. 2012; Robinson et al. 2014). The specific survival circuits that underlie individual species-specific behaviors are likely to access different neural substrates (LeDoux 2012), but the general Pavlovian learning system can be seen as a valuation system in its own right, depending on different learning and valuation algorithms than deliberative or procedural processes (Rangel et al. 2008; Redish 2013).

As with the deliberative and procedural processes, Pavlovian action-selection systems reveal an underlying valuation process, because response strength is modulated by the magnitude of the available reward and by the needs of the animal. Classic species-specific behaviors include approach and avoidance, freezing or grooming, fighting or fleeing, eating or not, etc. When these behaviors are put in conflict, they can create a value hierarchy, approaching the palatable food, avoiding the unpalatable, fleeing from a larger opponent, fighting a smaller one. Such experimentally-imposed behavioral conflicts are representative of ubiquitous approach-avoidance conflicts in nature as an animal navigates in pursuit of good things and avoiding threats. Since an animal cannot simultaneously approach two good things or avoid two bad things in opposite directions, valuation (relative valuations) may be a necessary part of natural approach-avoidance decisions, and the needs or learning of the animal may modulate these valuations.

Importantly, this value hierarchy is only revealed in situations that activate the Pavlovian action-selection system, which is based on immediately available sensory cues (such as the sound of Pavlov’s bell or the smell of baking bread). This increased valuation of immediate cues can modulate other behaviors (providing, for example, a preference for immediately available, concrete options4).

2.1.4 The Role of Motivation

An important (and open) question is whether there is a separate motivation system that modulates all three of these decision-making systems (deliberative, procedural, and Pavlovian) or whether each of the systems has their own specific modulation system.

Most economic theories suggest that there is a separate valuation system that is called upon by action-selection systems. This separate system is usually identified with the orbitofrontal and ventromedial prefrontal cortices and the nucleus accumbens, modulated by dopamine and other neuromodulator signals (Doya 2000; O’Doherty 2004; Dayan and Niv 2008; van der Meer et al. 2012; Winecoff et al. 2013). Obviously, some (presumably hypothalamic) systems need to identify the intrinsic needs of the animal (such as indicated by hunger or thirst), and there is evidence that this motivational system is sensitive to cues (pictures of pizza make us hungry) and learning (such as in conditioned taste-aversion, which is why hospitals provide strangely flavored foods to chemotherapy patients, Bernstein 1978, 1999). The complexity of how and when motivational factors are learned or can occur in response to visceral sensations of homeostatic changes is beyond the scope of this chapter, but can be found in other chapters in this volume, including Waltz and Gold, and Woods and Begg.

2.1.5 The Macro-Agent

Although these decision-making systems entail different computational processes, in the end, there is a single agent that needs to take the action. An open question in the field of decision making is how conflicts between these different decision-making systems are resolved. It is not clear yet whether there is a separate executive that selects between systems or whether there is a mechanism by which the components directly compete for behavioral expression, for example, by intrinsic components within each system that make it more or less likely to be “listenable to” by downstream motor areas.

Most theories have suggested that the decision of which subsystem is allowed to drive behavior depends on an external valuation system that takes the most-valued option from each of the components (Levy and Glimcher 2012; Wunderlich et al. 2012, but see van der Meer et al. 2012), but the inconsistency of value under the different experimental paradigms laid out at the start of this chapter belies this hypothesis. More likely, some other parameter is being used to decide between systems. Suggestions have included expected calculation time (Simon 1955; Gigerenzer and Goldstein 1996; Keramati et al. 2011), or an internal representation of reliability or uncertainty in the valuation calculation (Daw et al. 2005). Interestingly, reliability in valuation could be represented by the intrinsic self-consistency of the value representation, which can vary due to the distributed nature of representation in neural systems. Because neural representations are distributed, the activity of individual neurons within a representation can agree with each other about what is specifically represented or they can disagree with each other. It is possible to quantitatively measure the self-consistency of those representations, and it is possible for downstream structures to use that self-consistency to control the influence of a representation on the activity of that downstream structure (Jackson and Redish 2003; Johnson et al. 2008).

However the process is resolved, it is clear that these different algorithmic processes are called upon (win-out) under different experimental conditions. We can use this to explain the inconsistencies in the valuation function—a simple willingness-to-pay experiment can be solved by any of the three systems, but providing a choice forces the agent to deliberate, and making a choice concrete tends to access Pavlovian action-selection systems. Similarly, the act of perception and conscious reporting in a linguistic version of the willingness-to-pay task may drive the decision process into more deliberative cognitive processes that can change the valuation.

3 Testing the Theory

This multiple decision-making system explanation of valuation and the observed effects of motivation can explain a number of different phenomena that have been identified over the years.

What is motivated behavior?

The fact that these decision-making systems can create action plans that are in conflict and that the action plans take time to execute implies that there will be internal states of motivation in which an action is desired but not necessarily released yet. We can refer to this state as the “urge” to act.

Particularly intriguing in this light is dysfunction of motivated behaviors such as might occur in Tourette’s syndrome (Kurlan 1993; Leckman and Riddle 2000) or Obsessive-Compulsive Disorder (Goodman et al. 2000). Contrary to popular belief, Tourette’s and OCD do not manifest as dystonic actions that are released before the subject is aware of them, but rather they manifest as “urges” which can be suppressed (Kathmann et al. 2005; Maia and McClelland 2012). Eventually, the effort expended (presumably by one of the other systems) to suppress the urge becomes too much, the urge becomes overwhelming, and the macro-agent releases the motivated action.

Intriguingly, most of the dysfunctional behaviors seen in Tourette’s syndrome are Pavlovian in the sense used here (cursing5, facial expressions, etc.), while the control and suppression of them is effortful, conscious, cognitive, and depends on limited cognitive resources, which suggests a more deliberative process. However, some have found treatment in their Tourette’s tics by channeling them into rhythmic actions and action chains (a presumably procedural process) (Sacks 1985). Whether the motor tics of Tourette’s and other motoric dysfunctions with identifiable urges is a dysfunction in Pavlovian, procedural, or some motor component remains unknown.

Sign-tracking and goal-tracking

Another classic example of conflict between decision-making systems is that in a task in which a cue signals reward delivery, but the animal does not need to do anything at the cue to receive the reward,6 some animals approach the cue (sign trackers), while others go directly to the reward (goal trackers) (Flagel et al. 2009). Sign-trackers are presumably using Pavlovian action-selection systems to approach things that the system has identified as valuable, while goal trackers are using the cue to identify the appropriate action (using either deliberative or procedural systems). A direct prediction of this would be that there should be specific neurophysiological differences between sign trackers and goal trackers, particularly in neural structures associated with valuation in the Pavlovian action-selection systems. Indeed, such differences exist—sign- trackers show dopamine shifts to the cue (reward-prediction-errors), but goal-trackers do not—in sign trackers but not goal trackers, transient dopamine bursts that initially occur at the time of reward transition to occurring at the time of the cue (Flagel et al. 2011; Lesaint et al. 2014). Does the sign have value? How does the sign motivate actions? It depends on which decision system is being used to take those actions in response to the sign.

Pavlovian and Deliberative Morality

Even social behavior is fundamentally about decision making, and even the most human behaviors (such as questions of morality) are driven by interactions between these multiple decision-making sys- tems. One of the most interesting recent discoveries of the past several decades has been that human social interactions are fundamentally Pavlovian in the sense used in this chapter—they are species-specific behaviors that we learn the appropriate situations for (Singer et al. 2006, 2009; Hein et al. 2011; Greene 2013). Moral decisions (such as whether to allow one person to die to save five others (Greene et al. 2001), or whether to provide a shock to another person, Milgram 1974) depend greatly on the immediacy of the social interaction (Milgram 1974; Greene et al. 2004; Zak 2008; Haslam and Reicher 2012; Rand et al. 2012). Subjects are more likely to refuse to kill one person to save five others and more likely to refuse to shock another person if there is a social bond between them (Milgram 1974; Greene et al. 2001; Haslam and Reicher 2012). Manipulations that push subjects into more deliberative modes (such as forcing the subject to make the decision in a foreign language) drive subjects into being more willing to apply utilitarian (and nonsocial) calculations (Hoffman et al. 1994; Sanfey 2007; Smith 2009; Costa et al. 2014).

Concrete preferences

A classic motivation task is to put immediate and future rewards in conflict with each other, for example, in the marshmallow task, in which a subject (usually a young child) is offered the choice of one marshmallow immediately or two marshmallows if the first marshmallow remains uneaten for 15 min (Mischel et al. 1989; Mischel 2014). It is likely that the marshmallow task is an example of conflict between Pavlovian and deliberative decision-making systems. While it is tempting to suggest that the ability to wait for the future is fundamentally deliberative, computational models suggest that both procedural and deliberative systems need to have their own discounting functions within them— preferring temporally proximate to temporally distant options (Sutton and Barto 1998; Kurth-Nelson and Redish 2012).

Looking at the algorithm underlying deliberation suggests that the fundamental reason for the preference for temporally proximate options in deliberation may reflect the ability to imagine and positively evaluate that future outcome (Kurth-Nelson and Redish 2012). One direct prediction of that hypothesis is that making the future option more concrete will make subjects more likely to choose it, effectively reducing the discounting function (making subjects discount time more slowly). Experiments have conclusively shown this to be true (Peters and B/l=u"/chel 2010, 2011; Benoit et al. 2011).

Attention to cues

Perception and valuation are fundamentally intertwined throughout the decision-making processes. Valuation judgments accomplished by deliberative, procedural, and Pavlovian processes of decision-making systems allow flexible decision making under multiple different conditions. Each of these pro- cesses, however, contains some form of situation-recognition step either prior to or as an early stage within the decision-making process. During this situation-recog- nition step, information is derived from the environmental, interoceptive, and proprioceptive cues that define the decision-making context, the identification of available goals to be sought, the potential threats to be avoided, and the state or needs of the agent, as well as numerous other features of the environment registered through varying degrees of attention, including specific details about objects or events or more abstract impressions such as social atmosphere.

The aspect of situation recognition where available goals and threats are identified parallels the approach-avoidance decision-making process and is likely to be modulated by Pavlovian processes (Phelps et al. 2014). This can be seen by processes in which Pavlovian associations drive deliberative actions, such as in Pavlovian-to-instrumental transfer (Corbit and Balleine 2005; Talmi et al. 2008), or in the influence of concrete options on decisions (Trope and Liberman 2003; Peters and Büchel 2010; Kang et al. 2011).

An agent’s perception of a given situation, in addition to the concept of value applied to items within the situation, is fundamentally multidimensional and depends both on the state of the agent as well as the features of the environment. An agent’s emotional or physiological state colors their percepts of both ongoing and remembered experiences. Sexual arousal, for example, changes choice behavior when choosing potential mates and influences what risky behaviors are considered acceptable (Wilson and Daly 2004; Ariely and Loewenstein 2006). Likewise, the emotional state of an individual influences an agent’s responses to a given situation (Dutton and Aron 1974; Andrade and Ariely 2009). Both physiological and emotional factors can be understood to reflect the needs of the agent and influence the decision-making process by impacting the perceived set of available or acceptable courses of action, the expected outcomes of those actions, and the costs that an agent is willing to endure for those outcomes.

In addition to information about one’s own state, sensory cues about external factors (the environment or situation) also fundamentally influence choice behavior by impinging on valuation. The features of an environment that color an agent’s experience of the situation (multidimensional situation recognition) must be integrated from a diversity of perceived situational constraints and available courses of action to be taken including pursuit or avoidance of goals and threats. The fact that environmental cues that are not relevant to the decision can nonetheless influence choice behavior suggests that situation recognition and the concomitant attention to cues necessary for categorizing situations are integral parts of the decision-making process itself.

The endowment effect

The price at which people are willing to sell an attained item is higher than the price at which people are willing to buy that same item (Kahneman et al. 1991). This effect may be another example of an interaction between decision systems affecting valuation. Specifically, it may be that the Pavlovian system provides a greater contribution to the valuation of an item when that item has already been obtained, a situation in which, most likely, the cues associated with it are more immediately apparent and concrete than they would be if it were not yet owned.

This logic might also explain the well-documented finding that patch-foraging (human and non-human) animals will deviate from the rate-maximizing behavior that is predicted by optimal foraging theory by “over-staying” in a patch of resources rather than leaving to search for a new patch (Nonacs 2001). An intriguing possibility is that the Pavlovian system induces longer patch residence by contributing to the valuation of staying (but not leaving), much as it might when considering selling (but not buying). Interestingly, human participants have been found to display this overstaying bias during a computerized patch-foraging task for which the leave option did not require additional costs (e.g., energy and time spent traveling), suggesting that overstaying may occur due to an overvaluation of the impending reward rather than an undervaluation of leaving (Carter et al. 2015). A similar process may explain the observation that rats’ aversion to rejecting an offer is positively correlated with the overall quality of the offers in the environment, which is incompatible with optimal foraging theory and standard delay discounting models (Wikenheiser et al. 2013). One possibility is that there is an increased Pavlovian preference to staying at a patch of food (because impending food is cued). This preference may be increased in rich environments, in which the rewards are more easily available.

4 Summary and Implications

In this chapter, we have argued that because valuation is a hypothetical construct, it cannot be directly measured, and it must be inferred from observed behaviors. Following from current theories of motivated behavior based on multiple interacting decision-making systems, we have argued that valuation is multidimen- sional. This multidimensionality complicates decision making, but these interactions can also be useful when targeting treatment and other interventions.

Craving

For example, craving is often used as a particular example of motivation, but it is useful to ask what craving is within this computational conceptualization. Craving is the computational recognition of a potential outcome with very high value, which means that it must either come from the Pavlovian system (which recognizes a situation and outcome to release a motivated action related to that outcome—such as salivating to the expectation of food) or from the deliberative system (which imagines a future situation).

Fundamentally, craving is transitive, one must crave something. Craving is always goal-directed. Some researchers have suggested that craving is Pavlovian (in the sense used in this chapter) (Skinner and Aubin 2010) and others have suggested that it is deliberative (Tiffany 1999; Tiffany and Wray 2009; Redish and Johnson 2007; Redish et al. 2008). Craving may well be an interaction between the two systems—a deliberative (model-based) computation recognizes a path to a high- value goal, which leads to the motivation of Pavlovian approach.

Because relapse in addiction can occur from any of the three systems, but craving only from the Pavlovian or deliberative, craving should be dissociable from relapse (Sayette et al. 2000; Tiffany and Wray 2009; Redish et al. 2008; Redish 2009). It should be possible to relapse without craving and to crave without relapse. This follows directly from the observations that only drug-seeking arising from Pavlovian (and possibly deliberative) processes will co-occur with craving; pro- cedural drug-seeking will not. This means that a true “habit,” one that does not devalue (Balleine and Dickinson 1998), and one that is often done “non-cognitively” (Tiffany 1990; Redish 2013), will likely be resistant to treatments aimed at reducing craving.

Contingency Management

In contrast, it might be possible to use the different decision-making systems to provide treatment. Another implication of the multi-decision-making system theory is that if we can shift the decision-making question from one valuation measure to another, we might be able to change the decision. One place in which this may be occurring is in the success of contingency management, a treatment used for drug addiction and other behavioral modification processes. In contingency management, a drug user is rewarded for coming in clean to the clinic7 (Petry 2011, and see Walter and Petry in this volume). Historically, the efficacy of contingency management has been explained as an alternate reward which increases the opportunity cost of drug use (Higgins et al. 2002; Stitzer and Petry 2006); however, this depends on how quickly drug-taking falls off as drugs increase in price, an economic concept known as elasticity. Drug-taking is generally far too inelastic to explain the success of contingency management (Bruner and Johnson 2013). We have recently suggested that contingency management provides an opportunity for the user to engage more deliberative decision processes in their decision making (Regier and Redish 2012).

This hypothesis suggests that alternate rewards that are easier to remember and to episodically imagine would provide stronger effects in contingency management. Making the reward more concrete should thus improve contingency management, as should making it more temporally proximal, or larger. Similarly, training working memory (and other methods that provide cognitive resources) that improve episodic future thinking should improve contingency management. A simple first step would be to provide explicit reminders.

To go back to the measures of value that we began the chapter with, we are suggesting that contingency management shifts the valuation of drugs from a willing-to-pay (or an approach/avoid) valuation process to a revealed-preference valuation process. Animal drug self-administration experiments have found that shifting from willing-to-pay or approach/avoid tasks to decision-between-options (revealed-preference) tasks reduces drug-taking and drug-seeking, even at very low costs (LeSage et al. 2004; Lenoir et al. 2007; Cantin et al. 2010; Ahmed 2010; Perry et al. 2013).

5 Is Value Still a Valuable Hypothetical Construct?

Because value is not directly measurable, it must be inferred from behavior. As noted above, valuation is inconsistent. In order to understand the underlying microeconomics of decision making, we need to take into account the information processing that underlies decision making in humans and other animals (Padoa-Schioppa 2008; Rangel et al. 2008; Redish 2013). We can explain the inconsistency of valuation through these separate information processing algorithms, each of which provide a different path to motivation and valuation. These theories suggest that the process of valuation may continue to be a useful construct but that the construct of value as a means of identifying a common currency may no longer be useful.

Footnotes

1

Actually, we have argued elsewhere (Redish 2013) that decision making needs to be defined by action selection and that one should thus include reflexes as well in our taxonomy of decision-making systems. However, it is unlikely that the valuation and motivation mechanisms discussed here influence the information processing that goes on in reflexes, and we will leave the Reflex system out of our analyses here.

2

Work under stressful situations (such as fire ground commanders) suggest a serial test-and-evaluate system (Klein 1999), but it is unclear whether these experienced commanders are using deliberative or procedural mechanisms.

3

We have chosen to call this system the “Pavlovian” system because it is what Pavlov’s dogs were doing (an association between the bell and the food led to the prewired species-specific salivation behavior on hearing the bell, Pavlov 1927), but it should not be confused with classical definitions of Pavlovian learning based on the experimenter-defined task parameters (that an animal does not need to act in order to receive reward or punishment, Bouton 2007), nor should it be confused with the recent definitions of state [situation] versus state-action reinforcement-learning algorithms (Dayan et al. 2006; Cavanagh et al. 2013).

4

Tempting as it is to try to use this concrete modulation as an explanation for discounting phenomena (in which more temporally proximal options are preferred to equivalently valuable temporally distant options), the data suggest that all three systems have discounting effects within them that interact to produce the delay discounting phenomenon.

5

While a specific curse word is presumably not pre-wired within the human motor plan, all human languages have curse words that humans release at times of stress and pain and that are not supposed to be used in social company. This abstract behavior may well be a part of the species-specific human social construct.

6

This makes such a task “Pavlovian” in the classic sense of the word—the reward is delivered whether the animal acts or not. The fact that the task can be solved by any of the three systems described above and that animals act differently under each decision-making system shows the importance of looking at behavior from the animal’s perspective rather than the experimenters.

7

Usually this needs to be verified by a drug-negative urine sample.

Contributor Information

A. David Redish, Department of Neuroscience, University of Minnesota, Minneapolis, USA redish@umn.edu.

Nathan W. Schultheiss, Department of Neuroscience, University of Minnesota, Minneapolis, USA nschulth@umn.edu

Evan C. Carter, Department of Ecology, Evolution, and Behavior, University of Minnesota, St. Paul, USA evan.c.carter@gmail.com

References

  1. Ahmed SH. Validation crisis in animal models of drug addiction: beyond non-disordered drug use toward drug addiction. Neurosci Biobehav Rev. 2010;35(2):172–184. doi: 10.1016/j.neubiorev.2010.04.005. [DOI] [PubMed] [Google Scholar]
  2. Andrade EB, Ariely D. The enduring impact of transient emotions on decision making. Organ Behav Hum Decis Process. 2009;109(1):1–8. [Google Scholar]
  3. Ariely D, Loewenstein G. The heat of the moment: the effect of sexual arousal on sexual decision making. J Behav Decis Mak. 2006;19:87–98. [Google Scholar]
  4. Atance CM, O’Neill DK. Episodic future thinking. Trends Cogn Sci. 2001;5(12):533–539. doi: 10.1016/s1364-6613(00)01804-0. [DOI] [PubMed] [Google Scholar]
  5. Balleine BW, Dickinson A. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology. 1998;37(4–5):407–419. doi: 10.1016/s0028-3908(98)00033-1. [DOI] [PubMed] [Google Scholar]
  6. Bandler R, Shipley MT. Columnar organization in the midbrain periaqueductal gray: modules for emotional expression? Trends Neurosci. 1994;17(9):379–389. doi: 10.1016/0166-2236(94)90047-7. [DOI] [PubMed] [Google Scholar]
  7. Benoit RG, Gilbert SJ, Burgess PW. A neural mechanism mediating the impact of episodic prospection on farsighted decisions. J Neurosci. 2011;31(18):6771–6779. doi: 10.1523/JNEUROSCI.6559-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bernstein IL. Learned taste aversions in children receiving chemotherapy. Science. 1978;200(4347):1302–1303. doi: 10.1126/science.663613. [DOI] [PubMed] [Google Scholar]
  9. Bernstein IL. Taste aversion learning: a contemporary perspective. Nutrition. 1999;15(3):229–234. doi: 10.1016/s0899-9007(98)00192-0. [DOI] [PubMed] [Google Scholar]
  10. Bickel WK, Yi R, Landes RD, Hill PF, Baxter C. Remember the future: working memory training decreases delay discounting among stimulant addicts. Biol Psychiatry. 2011;69(3):260–265. doi: 10.1016/j.biopsych.2010.08.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Bouton ME. Learning and behavior: a contemporary synthesis. Sinauer Associates, Massachusetts. 2007 [Google Scholar]
  12. Breland K, Breland M. The misbehavior of organisms. Am Psychol. 1961;16(11):682–684. [Google Scholar]
  13. Bruner N, Johnson M. Demand curves for hypothetical cocaine in cocaine-dependent individuals. Psychopharmacology. 2013:1–9. doi: 10.1007/s00213-013-3312-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Buckner RL, Carroll DC. Self-projection and the brain. Trends Cogn Sci. 2007;11(2):49–57. doi: 10.1016/j.tics.2006.11.004. [DOI] [PubMed] [Google Scholar]
  15. Burks SV, Carpenter JP, Goette L, Rustichini A. Cognitive skills affect economic preferences, strategic behavior, and job attachment. Proc Nat Acad Sci. 2009;106(19):7745–7750. doi: 10.1073/pnas.0812360106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Cantin L, Lenoir M, Augier E, Vanhille N, Dubreucq S, Serre F, Vouillac C, Ahmed SH. Cocaine is low on the value ladder of rats: possible evidence for resilience to addiction. PLoS ONE. 2010;5(7):e11592. doi: 10.1371/journal.pone.0011592. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Carter EC, Pedersen EJ, McCullough ME. Reassessing intertemporal choice: human decision-making is more optimal in a foraging task than in a self-control task. Frontiers Psychol Decis Neurosci. 2015;6:95. doi: 10.3389/fpsyg.2015.00095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Cavanagh JF, Eisenberg I, Guitart-Masip M, Huys Q, Frank MJ. Frontal theta overrides Pavlovian learning biases. J Neurosci. 2013;33(19):8541–8548. doi: 10.1523/JNEUROSCI.5754-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Charness N. Expertise in chess: the balance between knowledge and search. In: Ericsson KA, Smith J, editors. Toward a general theory of expertise: prospects and limits (Chap. 2) Cambridge University Press; Cambridge: 1991. pp. 39–63. [Google Scholar]
  20. Corbit LH, Balleine BW. Double dissociation of basolateral and central amygdala lesions on the general and outcome-specific forms of Pavlovian-instrumental transfer. J Neurosci. 2005;25(4):962–970. doi: 10.1523/JNEUROSCI.4507-04.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Corbit LH, Janak PH. Inactivation of the lateral but not medial dorsal striatum eliminates the excitatory impact of Pavlovian stimuli on instrumental responding. J Neurosci. 2007;27(51):13977–13981. doi: 10.1523/JNEUROSCI.4097-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Coricelli G, Critchley HD, Joffily M, O’Doherty JP, Sirigu A, Dolan RJ. Regret and its avoidance: a neuroimaging study of choice behavior. Nat Neurosci. 2005;8:1255–1262. doi: 10.1038/nn1514. [DOI] [PubMed] [Google Scholar]
  23. Costa A, Foucart A, Hayakawa S, Aparici M, Apesteguia J, Heafner J, Keysar B. Your morals depend on language. PLoS ONE. 2014;9(4):e94842. doi: 10.1371/journal.pone.0094842. doi:10.1371/journal.pone.0094842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Daw ND, Niv Y, Dayan P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci. 2005;8:1704–1711. doi: 10.1038/nn1560. [DOI] [PubMed] [Google Scholar]
  25. Dayan P, Niv Y. Reinforcement learning: the good, the bad and the ugly. Curr Opin Neurobiol. 2008;18(2):185–196. doi: 10.1016/j.conb.2008.08.003. [DOI] [PubMed] [Google Scholar]
  26. Dayan P, Niv Y, Seymour B, Daw ND. The misbehavior of value and the discipline of the will. Neural Networks. 2006;19:1153–1160. doi: 10.1016/j.neunet.2006.03.002. [DOI] [PubMed] [Google Scholar]
  27. Dezfouli A, Balleine B. Habits, action sequences and reinforcement learning. Eur J Neurosci. 2012;35(7):1036–1051. doi: 10.1111/j.1460-9568.2012.08050.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Doya K. In: Metalearning, neuromodulation, and emotion. Hatano G, Okada N, Tanabe H, editors. Affective Minds; Elsevier, Amsterdam: 2000. [Google Scholar]
  29. Dutton DG, Aron AP. Some evidence for heightened sexual attraction under conditions of high anxiety. J Pers Soc Psychol. 1974;30(4):510–517. doi: 10.1037/h0037031. [DOI] [PubMed] [Google Scholar]
  30. Flagel SB, Akil H, Robinson TE. Individual differences in the attribution of incentive salience to reward-related cues: implications for addiction. Neuropharmacology. 2009;56(Suppl. 1):139–148. doi: 10.1016/j.neuropharm.2008.06.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Flagel SB, Clark JJ, Robinson TE, Mayo L, Czuj A, Willuhn I, Akers CA, Clinton SM, Phillips PEM, Akil H. A selective role for dopamine in stimulus-reward learning. Nature. 2011;469(7328):53–57. doi: 10.1038/nature09588. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Franco-Watkins AM, Pashler H, Rickard TC. Does working memory load lead to greater impulsivity? Commentary on Hinson, Jameson and Whitney (2003) J Exp Psychol Learn Mem Cogn. 2006;32(2):443–447. doi: 10.1037/0278-7393.32.2.443. [DOI] [PubMed] [Google Scholar]
  33. Gallistel CR. The organization of learning. MIT Press; Cambridge: 1990. [Google Scholar]
  34. Gigerenzer G, Goldstein DG. Reasoning the fast and frugal way: models of bounded rationality. Psychol Rev. 1996;103:650–669. doi: 10.1037/0033-295x.103.4.650. [DOI] [PubMed] [Google Scholar]
  35. Gilovich T, Griffin D, Kahneman D. Heuristics and biases: the psychology of intuitive judgement. Cambridge University Press; Cambridge: 2002. [Google Scholar]
  36. Gläscher J, Daw N, Dayan P, O’Doherty JP. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron. 2010;66(4):585–595. doi: 10.1016/j.neuron.2010.04.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Glimcher PW. Decisions, uncertainty, and the brain: the science of neuroeconomics. MIT Press; Cambridge: 2003. [Google Scholar]
  38. Glimcher PW, Camerer C, Poldrack RA. Neuroeconomics: decision making and the brain. Academic Press; Massachusetts: 2008. [Google Scholar]
  39. Goldstein A. Addiction: from biology to drug policy. Oxford; New York: 2000. [Google Scholar]
  40. Goodman WK, Rudorfer MV, Maser JD. Obsessive-compulsive disorder: contemporary issues in treatment. Lawrence Earlbaum; Hillsdale: 2000. [Google Scholar]
  41. Greene J. Moral tribes: emotion, reason, and the gap between us and them. Penguin. 2013 [Google Scholar]
  42. Greene JD, Sommerville RB, Nystrom LE, Darley JM, Cohen JD. An fMRI investigation of emotional engagement and moral judgement. Science. 2001;293(5537):2105–2108. doi: 10.1126/science.1062872. [DOI] [PubMed] [Google Scholar]
  43. Greene JD, Nystrom LE, Engell AD, Darley JM, Cohen JD. The neural basis of cognitive conflict and control in moral judgement. Neuron. 2004;44(2):389–400. doi: 10.1016/j.neuron.2004.09.027. [DOI] [PubMed] [Google Scholar]
  44. Hare TA, Malmaud J, Rangel A. Focusing attention on the health aspects of foods changes value signals in vmPFC and improves dietary choice. J Neurosci. 2011;31(30):11077–11087. doi: 10.1523/JNEUROSCI.6383-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Haslam SA, Reicher SD. Contesting the “nature” of conformity: what Milgram and Zimbardo’s studies really show. PLoS Biol. 2012;10(11):e1001426. doi: 10.1371/journal.pbio.1001426. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Hassabis D, Maguire EA. The construction system in the brain. In: Bar M, editor. Predictions in the brain: using our past to generate a future. Oxford University Press; Oxford: 2011. pp. 70–82. [Google Scholar]
  47. Hassabis D, Kumaran D, Vann SD, Maguire EA. Patients with hippocampal amnesia cannot imagine new experiences. Proc Natl Acad Sci. 2007;104:1726–1731. doi: 10.1073/pnas.0610561104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Hein G, Lamm C, Brodbeck C, Singer T. Skin conductance response to the pain of others predicts later costly helping. PLoS ONE. 2011;6(8):e22759. doi: 10.1371/journal.pone.0022759. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Hertz J, Krogh A, Palmer RG. Introduction to the theory of neural computation. Addison-Wesley, Reading. 1991 [Google Scholar]
  50. Higgins ST, Alessi SM, Dantona RL. Voucher-based incentives: a substance abuse treatment innovation. Addict Behav. 2002;27:887–910. doi: 10.1016/s0306-4603(02)00297-6. [DOI] [PubMed] [Google Scholar]
  51. Hill C. The rationality of preference construction (and the irrationality of rational choice) Minn J Law Sci Technol. 2008;9(2):689–742. [Google Scholar]
  52. Hoffman E, McCabe K, Shachat K, Smith V. Preferences, property rights, and anonymity in bargaining games. Game Econ Behav. 1994;7:346–380. [Google Scholar]
  53. Jackson JC, Redish AD. Detecting dynamical changes within a simulated neural ensemble using a measure of representational quality. Network Comput Neural Syst. 2003;14:629–645. [PubMed] [Google Scholar]
  54. Janak P, Tye K. From circuits to behaviour in the amygdala. Nature. 2015;517:284–292. doi: 10.1038/nature14188. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Jog MS, Kubota Y, Connolly CI, Hillegaart V, Graybiel AM. Building neural representations of habits. Science. 1999;286:1746–1749. doi: 10.1126/science.286.5445.1745. [DOI] [PubMed] [Google Scholar]
  56. Johnson A, Redish AD. Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point. J Neurosci. 2007;27(45):12176–12189. doi: 10.1523/JNEUROSCI.3761-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Johnson A, van der Meer MAA, Redish AD. Integrating hippocampus and striatum in decision-making. Curr Opin Neurobiol. 2007;17(6):692–697. doi: 10.1016/j.conb.2008.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Johnson A, Jackson J, Redish AD. Measuring distributed properties of neural representations beyond the decoding of local variables—implications for cognition. In: Hölscher C, Munk MHJ, editors. Mechanisms of information processing in the brain: encoding of information in neural populations and networks. Cambridge University Press; Cambridge: 2008. pp. 95–119. [Google Scholar]
  59. Jones JL, Esber GR, McDannald MA, Gruber AJ, Hernandez A, Mirenzi A, Schoenbaum G. Orbitofrontal cortex supports behavior and learning using inferred but not cached values. Science. 2012;338(6109):953–956. doi: 10.1126/science.1227489. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Kable JW, Glimcher PW. The neurobiology of decision: consensus and controversy. Neuron. 2009;63(6):733–745. doi: 10.1016/j.neuron.2009.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Kahneman D, Tversky A. Choices, values, and frames. Cambridge University Press; Cambridge: 2000. [Google Scholar]
  62. Kahneman D, Knetsch JL, Thaler RH. The endowment effect, loss aversion, and status quo bias. J Econ Perspect. 1991;5(1):193–206. [Google Scholar]
  63. Kang MJ, Rangel A, Camus M, Camerer CF. Hypothetical and real choice differentially activate common valuation areas. J Neurosci. 2011;31(2):461–468. doi: 10.1523/JNEUROSCI.1583-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Kathmann N, Rupertseder C, Hauke W, Zaudig M. Implicit sequence learning in obsessive-compulsive disorder: further support for the fronto-striatal dysfunction model. Biol Psychiatry. 2005;58(3):239–244. doi: 10.1016/j.biopsych.2005.03.045. [DOI] [PubMed] [Google Scholar]
  65. Keramati M, Dezfouli A, Piray P. Speed/accuracy trade-off between the habitual and the goal-directed processes. PLoS Comput Biol. 2011;7(5):e1002055. doi: 10.1371/journal.pcbi.1002055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Klein G. Sources of power: how people make decisions. MIT Press; Cambridge: 1999. [Google Scholar]
  67. Kruse JM, Overmier JB, Konz WA, Rokke E. Pavlovian conditioned stimulus effects upon instrumental choice behavior are reinforcer specific. Learn Motiv. 1983;14(2):165–181. [Google Scholar]
  68. Kurlan R, editor. Handbook of tourette’s syndrome and related tic and behavioral disorders. Marcel Dekker; New York: 1993. [Google Scholar]
  69. Kurth-Nelson Z, Redish AD. A theoretical account of cognitive effects in delay discounting. Eur J Neurosci. 2012;35:1052–1064. doi: 10.1111/j.1460-9568.2012.08058.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Laurent V, Leung B, Maidment N, Balleine BW. μ- and δ-opioid-related processes in the accumbens core and shell differentially mediate the influence of rewardguided and stimulusguided decisions on choice. J Neurosci. 2012;32(5):1875–1883. doi: 10.1523/JNEUROSCI.4688-11.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Leckman JF, Riddle MA. Tourette’s syndrome: when habit-forming systems form habits of their own? Neuron. 2000;28:349–354. doi: 10.1016/s0896-6273(00)00114-8. [DOI] [PubMed] [Google Scholar]
  72. LeDoux JE. Emotion circuits in the brain. Annu Rev Neurosci. 2000;23:155–184. doi: 10.1146/annurev.neuro.23.1.155. [DOI] [PubMed] [Google Scholar]
  73. LeDoux JE. Rethinking the emotional brain. Neuron. 2012;73:653–676. doi: 10.1016/j.neuron.2012.02.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Lee SW, Shimoko S, O’Doherty JP. Neural computations underlying arbitration between model-based and model-free learning. Neuron. 2014;81(3):687–699. doi: 10.1016/j.neuron.2013.11.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Lenoir M, Serre F, Cantin L, Ahmed SH. Intense sweetness surpasses cocaine reward. PLoS ONE. 2007;2(8):e698. doi: 10.1371/journal.pone.0000698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. LeSage MG, Burroughs D, Dufek M, Keyler DE, Pentel PR. Reinstatement of nicotine self-administration in rats by presentation of nicotine-paired stimuli, but not nicotine priming. Pharmacolol Biochem Behav. 2004;79(3):507–513. doi: 10.1016/j.pbb.2004.09.002. [DOI] [PubMed] [Google Scholar]
  77. Lesaint F, Sigaud O, Flagel SB, Robinson TE, Khamassi M. Modelling individual differences in the form of Pavlovian conditioned approach responses: a dual learning systems approach with factored representations. PLoS Comput Biol. 2014;10(2):e1003466. doi: 10.1371/journal.pcbi.1003466. doi:10.1371/journal.pcbi.1003466. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Levy DJ, Glimcher PW. The root of all value: a neural common currency for choice. Curr Opin Neurobiol. 2012;22:1–12. doi: 10.1016/j.conb.2012.06.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Lichtenstein S, Slovic P. The construction of preference. Cambridge University Press; Cambridge: 2006. [Google Scholar]
  80. MacCorquodale K, Meehl PE. Edward C. Tolman. In: Estes W, editor. Modern learning theory. Appleton-Century-Crofts; New York: 1954. pp. 177–266. [Google Scholar]
  81. Maia TV, McClelland JL. A neurocomputational approach to obsessive-compulsive disorder. Trends Cogn Sci. 2012;16(1):14–15. doi: 10.1016/j.tics.2011.11.011. [DOI] [PubMed] [Google Scholar]
  82. McNally GP, Johansen JP, Blair HT. Placing prediction into the fear circuit. Trends Neurosci. 2011;34(6):283–292. doi: 10.1016/j.tins.2011.03.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Milgram S. Obedience to authority: an experimental view. Harper Collins; New York: 1974/2009. [Google Scholar]
  84. Mischel W. The marshmallow test: mastering self-control. Little, Brown, and Co; New York: 2014. [Google Scholar]
  85. Mischel W, Shoda Y, Rodriguez ML. Delay of gratification in children. Science. 1989;244(4907):933–938. doi: 10.1126/science.2658056. [DOI] [PubMed] [Google Scholar]
  86. Montague PR, Dolan RJ, Friston KJ, Dayan P. Comput Psychiatry. Trends Cogn Sci. 2012;16(1):72–80. doi: 10.1016/j.tics.2011.11.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Nadel L. Multiple memory systems: what and why, an update. In: Schacter DL, Tulving E, editors. Memory systems 1994. MIT Press; Cambridge: 1994. pp. 39–64. [Google Scholar]
  88. Niv Y, Daw ND, Dayan P. Choice values. Nat Neurosci. 2006a;9:987–988. doi: 10.1038/nn0806-987. [DOI] [PubMed] [Google Scholar]
  89. Niv Y, Joel D, Dayan P. A normative perspective on motivation. Trends Cogn Sci. 2006b;10(8):375–381. doi: 10.1016/j.tics.2006.06.010. [DOI] [PubMed] [Google Scholar]
  90. Nonacs P. State dependent patch use and the marginal value theorem. Behav Ecol. 2001;12:71–83. [Google Scholar]
  91. O’Doherty JP. Reward representations and reward-related learning in the human brain: insights from neuroimaging. Curr Opin Neurobiol. 2004;14:769–776. doi: 10.1016/j.conb.2004.10.016. [DOI] [PubMed] [Google Scholar]
  92. O’Doherty JP, Kringelbach ML, Rolls ET, Hornak J, Andrews C. Abstract reward and punishment representations in the human orbitofrontal cortex. Nat Neurosci. 2001;4:95–102. doi: 10.1038/82959. [DOI] [PubMed] [Google Scholar]
  93. O’Keefe J, Nadel L. The hippocampus as a cognitive map. Clarendon Press; Oxford: 1978. [Google Scholar]
  94. Padoa-Schioppa C. The syllogism of neuro-economics. Econ Philos. 2008;24:449–457. doi: 10.1017/S0266267108002071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  95. Padoa-Schioppa C. Range-adapting representation of economic value in the orbitofrontal cortex. J Neurosci. 2009;29(44):14004–14014. doi: 10.1523/JNEUROSCI.3751-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Pavlov I. Conditioned reflexes. Oxford University Press; Oxford: 1927. [Google Scholar]
  97. Perry AN, Westenbroek C, Becker JB. The development of a preference for cocaine over food identifies individual rats with addiction-like behaviors. PLoS ONE. 2013;8(11):e79465. doi: 10.1371/journal.pone.0079465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  98. Peters J, Büchel C. Episodic future thinking reduces reward delay discounting through an enhancement of prefrontal-mediotemporal interactions. Neuron. 2010;66(1):138–148. doi: 10.1016/j.neuron.2010.03.026. [DOI] [PubMed] [Google Scholar]
  99. Peters J, Büchel C. The neural mechanisms of inter-temporal decision-making: understanding variability. Trends Cogn Sci. 2011;15(5):227–239. doi: 10.1016/j.tics.2011.03.002. [DOI] [PubMed] [Google Scholar]
  100. Petry NM. Contingency management for substance abuse treatment: a guide to implementing this evidence-based practice. Routledge; London: 2011. [Google Scholar]
  101. Pfeiffer BE, Foster DJ. Hippocampal place-cell sequences depict future paths to remembered goals. Nature. 2013;497:74–79. doi: 10.1038/nature12112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  102. Phelps E, Lempert KM, Sokol-Hessner P. Emotion and decision making: multiple modulatory circuits. Annu Rev Neurosci. 2014;37:263–287. doi: 10.1146/annurev-neuro-071013-014119. [DOI] [PubMed] [Google Scholar]
  103. Plous S. The psychology of judgement and decision-making. McGraw-Hill; New York: 1993. [Google Scholar]
  104. Rand DG, Greene JD, Nowak MA. Spontaneous giving and calculated greed. Nature. 2012;489:427–430. doi: 10.1038/nature11467. [DOI] [PubMed] [Google Scholar]
  105. Rangel A, Clithero JA. Value normalization in decision making: theory and evidence. Curr Opin Neurobiol. 2012;22(6):970–981. doi: 10.1016/j.conb.2012.07.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  106. Rangel A, Hare T. Neural computations associated with goal-directed choice. Curr Opin Neurobiol. 2010;20(2):262–270. doi: 10.1016/j.conb.2010.03.001. [DOI] [PubMed] [Google Scholar]
  107. Rangel A, Camerer C, Montague PR. A framework for studying the neurobiology of value-based decision making. Nat Rev Neurosci. 2008;9:545–556. doi: 10.1038/nrn2357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  108. Redish AD. Implications of the multiple-vulnerabilities theory of addiction for craving and relapse. Addiction. 2009;104(11):1940–1941. doi: 10.1111/j.1360-0443.2009.02746.x. [DOI] [PubMed] [Google Scholar]
  109. Redish AD. The mind within the brain: how we make decisions and how those decisions go wrong. Oxford University Press; Oxford: 2013. [Google Scholar]
  110. Redish AD, Johnson A. A computational model of craving and obsession. Ann New York Acad Sci. 2007;1104(1):324–339. doi: 10.1196/annals.1390.014. [DOI] [PubMed] [Google Scholar]
  111. Redish AD, Jensen S, Johnson A, Kurth-Nelson Z. Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling. Psychol Rev. 2007;114(3):784–805. doi: 10.1037/0033-295X.114.3.784. [DOI] [PubMed] [Google Scholar]
  112. Redish AD, Jensen S, Johnson A. A unified framework for addiction: vulnerabilities in the decision process. Behav Brain Sci. 2008;31:415–487. doi: 10.1017/S0140525X0800472X. [DOI] [PMC free article] [PubMed] [Google Scholar]
  113. Regier PS, Redish AD. What is the role of decision-making systems in contingency management? Society for Neuroscience Abstracts. 2012 [Google Scholar]
  114. Robinson MJ, Warlow SM, Berridge KC. Optogenetic excitation of central amygdala amplifies and narrows incentive motivation to pursue one reward above another. J Neurosci. 2014;34(50):16567–16580. doi: 10.1523/JNEUROSCI.2013-14.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  115. Sacks O. The Man Who Mistook His Wife for a Hat. Simon and Schuster; 1985. Witty ticcy ray. [Google Scholar]
  116. Samuelson PA. A note on measurement of utility. Rev Econ Stud. 1937;4(2):155–161. [Google Scholar]
  117. Sanfey AG. Social decision-making: insights from game theory and neuroscience. Science. 2007;318(5850):598–602. doi: 10.1126/science.1142996. [DOI] [PubMed] [Google Scholar]
  118. Sayette MA, Shiffman S, Tiffany ST, Niaura RS, Martin CS, Shadel WG. The measurement of drug craving. Addiction. 2000;95(Suppl. 2):S189–S210. doi: 10.1080/09652140050111762. [DOI] [PMC free article] [PubMed] [Google Scholar]
  119. Schacter DL, Addis DR. On the nature of medial temporal lobe contributions to the constructive simulation of future events. In: Bar M, editor. Predictions in the brain: using our past to generate a future. Oxford University Press; Oxford: 2011. pp. 58–69. [Google Scholar]
  120. Schacter DL, Addis DR, Buckner RL. Episodic simulation of future events: concepts, data, and applications. Ann New York Acad Sci. 2008;1124:39–60. doi: 10.1196/annals.1440.001. [DOI] [PubMed] [Google Scholar]
  121. Schmidhuber J. Deep learning in neural networks. Neural Networks. 2014;61:85–117. doi: 10.1016/j.neunet.2014.09.003. [DOI] [PubMed] [Google Scholar]
  122. Simon H. A behavioral model of rational choice. Q J Econ. 1955;69:99–118. [Google Scholar]
  123. Simonson I, Tversky A. Choice in context: tradeoff contrast and extremeness aversion. J Mark Res. 1992;29(3):281–295. [Google Scholar]
  124. Singer T, Seymour B, O’Doherty JP, Stephan KE, Dolan RJ, Frith CD. Empathic neural responses are modulated by the perceived fairness of others. Nature. 2006;439(7075):466–469. doi: 10.1038/nature04271. [DOI] [PMC free article] [PubMed] [Google Scholar]
  125. Singer T, Critchley HD, Preuschoff K. A common role of insula in feelings, empathy and uncertainty. Trends Cogn Sci. 2009;13(8):334–340. doi: 10.1016/j.tics.2009.05.001. [DOI] [PubMed] [Google Scholar]
  126. Skinner MD, Aubin H-J. Craving’s place in addiction theory: contributions of the major models. Neurosci Biobehav Rev. 2010;34(4):606–623. doi: 10.1016/j.neubiorev.2009.11.024. [DOI] [PubMed] [Google Scholar]
  127. Smith V. Rationality in economics: constructivist and ecological forms. Cambridge University Press; Cambridge: 2009. [Google Scholar]
  128. Smith KS, Graybiel AM. A dual operator view of habitual behavior reflecting cortical and striatal dynamics. Neuron. 2013;79(2):361–374. doi: 10.1016/j.neuron.2013.05.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  129. Smith KS, Tindell AJ, Aldridge JW, Berridge KC. Ventral pallidum roles in reward and motivation. Behav Brain Res. 2009;196(2):155–167. doi: 10.1016/j.bbr.2008.09.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  130. Steiner A, Redish AD. Orbitofrontal cortical ensembles during deliberation and learning on a spatial decision-making task. Front Decis Neurosci. 2012;6:131. doi: 10.3389/fnins.2012.00131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  131. Stephens DW, Krebs JR. Foraging theory. Princeton University Press; Princeton: 1987. [Google Scholar]
  132. Stitzer M, Petry N. Contingency management for treatment of substance abuse. Annu Rev Clin Psychol. 2006;2:411–434. doi: 10.1146/annurev.clinpsy.2.022305.095219. [DOI] [PubMed] [Google Scholar]
  133. Stott JJ, Redish AD. A functional difference in information processing between orbitofrontal cortex and ventral striatum during decision-making behavior. Philos Trans R Soc B. 2014;369(1655) doi: 10.1098/rstb.2013.0472. 10.1098/rstb.2013.0472. [DOI] [PMC free article] [PubMed] [Google Scholar]
  134. Streidter G. Principles of brain evolution. Sinauer Associates; Sunderland: 2005. [Google Scholar]
  135. Sutton RS, Barto AG. Reinforcement learning: an introduction. MIT Press; Cambridge: 1998. [Google Scholar]
  136. Talmi D, Seymour B, Dayan P, Dolan RJ. Human Pavlovian-instrumental transfer. J Neurosci. 2008;28(2):360–368. doi: 10.1523/JNEUROSCI.4028-07.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  137. Tiffany ST. A cognitive model of drug urges and drug-use behavior: role of automatic and nonautomatic processes. Psychol Rev. 1990;97(2):147–168. doi: 10.1037/0033-295x.97.2.147. [DOI] [PubMed] [Google Scholar]
  138. Tiffany ST. Cognitive concepts of craving. Alcohol Res Health. 1999;23(3):215–224. [PMC free article] [PubMed] [Google Scholar]
  139. Tiffany ST, Wray J. The continuing conundrum of craving. Addiction. 2009;104:1618–1619. doi: 10.1111/j.1360-0443.2009.02588.x. [DOI] [PubMed] [Google Scholar]
  140. Tolman EC. Purposive behavior in animals and men. Appleton-Century-Crofts; New York: 1932. [Google Scholar]
  141. Tremblay L, Schultz W. Relative reward preference in primate orbitofrontal cortex. Nature. 1999;398(6729):704–708. doi: 10.1038/19525. [DOI] [PubMed] [Google Scholar]
  142. Trope Y, Liberman N. Temporal construal. Psychol Rev. 2003;110(3):403–421. doi: 10.1037/0033-295x.110.3.403. [DOI] [PubMed] [Google Scholar]
  143. van der Meer MAA, Redish AD. Ventral striatum: a critical look at models of learning and evaluation. Curr Opin Neurobiol. 2011;21(3):387–392. doi: 10.1016/j.conb.2011.02.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  144. van der Meer MAA, Kurth-Nelson Z, Redish AD. Information processing in decision-making systems. Neuroscientist. 2012;18(4):342–359. doi: 10.1177/1073858411435128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  145. Wikenheiser AM, Redish AD. Hippocampal theta sequences reflect current goals. Nat Neurosci. 2015;18:289–294. doi: 10.1038/nn.3909. [DOI] [PMC free article] [PubMed] [Google Scholar]
  146. Wikenheiser AM, Stephens DW, Redish AD. Subjective costs drive overly-patient foraging strategies in rats on an intertemporal foraging task. Proc Natl Acad Sci USA. 2013;110(20):8308–8313. doi: 10.1073/pnas.1220738110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  147. Wilson M, Daly M. Do pretty women inspire men to discount the future? Proc R Soc Lond B. 2004;271:S177–S179. doi: 10.1098/rsbl.2003.0134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  148. Winecoff A, Clithero JA, Carter RM, Bergman SR, Wang L, Huettel SA. Ventromedial prefrontal cortex encodes economic value. J Neurosci. 2013;33(27):11032–11039. doi: 10.1523/JNEUROSCI.4317-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  149. Wunderlich K, Dayan P, Dolan RJ. Mapping value based planning and extensively trained choice in the human brain. Nat Neurosci. 2012;15(5):786–791. doi: 10.1038/nn.3068. [DOI] [PMC free article] [PubMed] [Google Scholar]
  150. Yang T, Shadlen MN. Probabilistic reasoning by neurons. Nature. 2007;447:1075–1080. doi: 10.1038/nature05852. [DOI] [PubMed] [Google Scholar]
  151. Zak PJ, editor. Moral markets: the critical role of values in the economy. Princeton University Press; Princeton: 2008. [Google Scholar]

RESOURCES