Reinforcement learning models of risky choice and the promotion of risk-taking by losses-disguised-as-wins in rats

Andrew T Marshall; Kimberly Kirkpatrick

doi:10.1037/xan0000141

. Author manuscript; available in PMC: 2018 Jul 1.

Published in final edited form as: J Exp Psychol Anim Learn Cogn. 2017 Jul;43(3):262–279. doi: 10.1037/xan0000141

Reinforcement learning models of risky choice and the promotion of risk-taking by losses-disguised-as-wins in rats

Andrew T Marshall ^1,², Kimberly Kirkpatrick ¹

PMCID: PMC5682951 NIHMSID: NIHMS885038 PMID: 29120214

Abstract

Risky decisions are inherently characterized by the potential to receive gains and losses, and these outcomes have distinct effects on subsequent decision making. One important factor is that individuals engage in loss-chasing, in which the reception of a loss is followed by relatively increased risk-taking. Unfortunately, the mechanisms of loss-chasing are poorly understood, despite the potential importance for understanding pathological choice behavior. The goal of the present experiment was to illuminate the mechanisms governing individual differences in loss-chasing and risky-choice behaviors. Rats chose between a low-uncertainty outcome that always delivered a variable amount of reward and a high-uncertainty outcome that probabilistically delivered reward. Loss processing and loss-chasing were assessed in the context of losses-disguised-as-wins (LDWs), which are loss outcomes that are presented along with gain-related stimuli. LDWs have been suggested to interfere with adaptive decision making in humans and thus may potentially increase loss chasing. Here, the rats presented with LDWs were riskier, in that they made more choices for the high-uncertainty outcome. A series of non-linear models were fit to individual rats’ data to elucidate the possible psychological mechanisms that best account for individual differences in high-uncertainty choices and loss-chasing behaviors. The models suggested that the rats presented with LDWs were more prone to show a stay-bias following high-uncertainty outcomes compared to rats not presented with LDWs. These results collectively suggest that LDWs acquire conditioned reinforcement properties that encourage continued risk-taking and increase loss-chasing following previous high-risk decisions.

Keywords: Risky choice, losses-disguised-as-wins, reinforcement learning, rats

Individuals often choose between a guaranteed safer outcome (e.g., saving money) and a larger yet riskier payoff (e.g., gambling). These risky decision-making scenarios inherently involve the potential to receive gains and losses. While risky losses tend to discourage subsequent risky choice (i.e., lose-shift), risky gains encourage subsequent risky choices in humans and non-human animals (i.e., win-stay; Marshall & Kirkpatrick, 2013, 2015; Stopper & Floresco, 2011; Thaler & Johnson, 1990). Indeed, win-stay/lose-shift behavior has become encompassed as a key factor within theories of learning and valuation that incorporate prediction error mechanisms (e.g., Bush & Mosteller, 1951; Rescorla & Wagner, 1972; Sutton & Barto, 1998). For example, simple reinforcement learning algorithms account for trial-by-trial decisions (Sutton & Barto, 1998), such that losses reduce the value of and likelihood of repeating an action (lose-shift) and gains increase said value and likelihood (win-stay).

While win-stay/lose-shift behavior may be a theoretically optimal strategy in ecologically valid environments (see Wilke & Barrett, 2009), some individuals chase losses with increasingly risky behaviors (i.e., lose-stay; see Linnet, Røjskjær, Nygaard, & Maher, 2006). Here, “loss-chasing” is operationally defined as an increased tendency to make risky choices after losses. For example, after losses, individuals may wager significantly more than they had originally planned, while actual versus planned wagering may not considerably differ after gains (Andrade & Iyer, 2009). “Loss-chasing” behavior is a key characteristic of pathological gambling (e.g., Breen & Zuckerman, 1999; Lesieur, 1979), and loss-chasing decisions are associated with elevated activity in the ventromedial prefrontal cortex (Campbell-Meiklejohn, Woolrich, Passingham, & Rogers, 2008) and deficits in the mesolimbic dopamine pathway (Campbell-Meiklejohn et al., 2008). Relatedly, dopamine antagonism reduces loss-chasing in rats (Rogers, Wong, McKinnon, & Winstanley, 2013), and pathological gamblers show greater increases in ventral striatal dopaminergic activity following losses than control participants (Linnet, Peterson, Doudet, Gjedde, & Møller, 2010). Unfortunately, the mechanisms of loss processing are not fully understood (Seymour, Maruyama, & De Martino, 2015). While there are a few recent reports of loss-chasing in animals (Marshall & Kirkpatrick, 2015; Rogers et al., 2013; Tremblay et al., 2014), there has been much difficulty in evaluating loss processing in animal models of risky choice (see Clark et al., 2013; Cocker & Winstanley, 2015). A key challenge is that the operationalization of losses is limited in animal models. Food reward is the primary outcome in such paradigms, but the consumption of reward cannot be readily undone in the same fashion that monetary rewards (or points) can be gained and then lost in human experiments. Thus, it has become critical for animal research to establish a manipulation of “loss” that will have potential implications for choice behavior in humans.

In many risky choice paradigms, reward omission is the primary loss type, and, in accordance with win-stay/lose-shift behavior, reward omission should, but may not, effectively inhibit the continuation of the corresponding behavior (see Horsley, Osborne, Norman, & Wells, 2012; also see Domjan, 2010). One possible explanation for this ineffectiveness relates to the nature of the corresponding feedback of the behavior that resulted in reward omission. Theoretically, risky gains should encourage continuation of the risky behavior, while losses, such as reward omission, should not (Thorndike, 1911). However, following reward omission, individuals may continue to make risky choices (e.g., Larkin, Jenni, & Floresco, 2016; Marshall & Kirkpatrick, 2013, 2015; St. Onge, Stopper, Zahm, & Floresco, 2012; Stopper & Floresco, 2011; Stopper, Khayambashi, & Floresco, 2013), suggesting a “stay bias” for previous choices (see, e.g., K.-U. Kim, Huh, Jang, Lee, & Jung, 2015). An enhancement of the stay bias could represent a potential mechanism for understanding loss-chasing.

Furthermore, despite reward omission, individuals may maintain high rates of responding for reward during periods when such high rates have minimal (if any) effect on reward delivery (see, e.g., Amsel, 1958; Staddon & Innis, 1966, 1969). That is, while response rates on free operant fixed-interval (FI) schedules of reinforcement typically decrease following reward delivery (Ferster & Skinner, 1957), they remain relatively elevated following reward omission. However, in an FI paradigm, Kello (1972) showed that reward omission coupled with other stimuli typically presented with reward (e.g., houselight offset) systematically reduced the post-omission elevations in response rate with each added stimulus (also see Mellon, Leak, Fairhurst, & Gibbon, 1995). Therefore, the effects of reward omission on subsequent choice behavior may also partially depend on the reception of explicit feedback from the previous decision.

Several experiments in the human gambling literature have investigated the impact of such explicit feedback in the context of losses-disguised-as-wins (LDWs). LDWs occur when individuals receive outcomes that are less than the amount wagered (i.e., a loss) but accompanied by the same/similar positive-feedback stimuli as a win. For example, an individual might wager $5 and only win back $3 (a net loss). But the $3 “win” may also be presented with the same stimuli as would be if s/he had won $10. When exposed to wins, losses, and LDWs, individuals’ heart-rate changes are more similar after losses and LDWs while individuals’ skin-conductance responses are more similar after wins and LDWs (Dixon, Harrigan, Sandhu, Collins, & Fugelsang, 2010), reflecting a complex physiological response to LDWs. Moreover, LDWs have been suggested to create the false belief that individuals are winning more than they objectively are (see Dixon, Collins, Harrigan, Graydon, & Fugelsang, 2015; Jensen et al., 2013; Templeton, Dixon, Harrigan, & Fugelsang, 2015). As “winning” may further promote the corresponding behavior (i.e., win-stay), greater understanding of the psychological mechanisms of risky choice in the face of LDWs has critical implications for individuals that lose in “winning” environments (e.g., pathological gamblers in casinos, drug addicts amongst addicted companions).

Recently, Barrus and Winstanley (2016) reported that the accompaniment of stimulus cues with risky gains (cued task) resulted in more suboptimal choice relative to behavior within a condition in which there were no stimulus cues (uncued task); in the cued task, but not in the uncued task, choice behavior was modulated by agonism and antagonism of D₃ receptors, which have been linked to both risk-taking behavior and addiction (Kreek, Nielsen, Butelman, & LaForge, 2005). Accordingly, the pairing of gain-related stimuli with not only gains (Barrus & Winstanley, 2016), but also with losses (LDWs) may provide a crucial insight into mechanisms of pathological gambling such as loss-chasing (see Barrus, Cherkasova, & Winstanley, 2016). However, to our knowledge, there are no reports on the effects of LDWs in rats and the related psychological mechanisms. Thus, the goal of the present study was to utilize regression modeling along with reinforcement learning (RL) modeling to elucidate the psychological mechanisms of LDW-related effects on loss-chasing within a risky choice paradigm.

Method

Animals

Twenty-four experimentally-naive male Sprague-Dawley rats (Charles River; Kingston, NY) were used in the experiment. They arrived at the facility (Kansas State University; Manhattan, KS) at 21 days of age and began experimentation at approximately 75 days of age. The rats were pair-housed in a red-illuminated colony room set to a reverse 12:12 hr light:dark schedule (lights off at approximately 7:30 am). The rats were tested during the dark phase. There was ad libitum access to water in the home cages and in the experimental chambers. The rats were maintained at approximately 85% of their projected ad libitum weight during the experiment, based on growth-curve charts obtained from the supplier. Their daily food ration was delivered primarily in the experimental chambers, with supplementary feeding in the home cages as necessary. When supplementary feeding was required following an experimental session, the rats were fed approximately 1 hr after being returned to the colony room (see Bacotti, 1976; Smethells, Fox, Andrews, & Reilly, 2012).

Apparatus

The experiment was conducted in 24 operant chambers (Med-Associates; St. Albans, VT) each housed within sound-attenuating, ventilated boxes (74 × 38 × 60 cm). Each chamber (25 × 30 × 30 cm) was equipped with a stainless steel grid floor, two stainless steel walls (front and back), and a transparent polycarbonate side wall, ceiling, and door (see Figure 1A). Two pellet dispensers (ENV-203), mounted on the outside of the operant chamber, were equipped to deliver 45-mg food pellets (Bio-Serv; Flemington, NJ) to a food cup (ENV-200R7) that was centered on the lower section of the front wall. For the rats in Group Extra-Feedback, the tubing connecting one of the pellet dispensers to the food cup was disconnected and the corresponding pellet deliveries were rerouted to a receptacle outside of the operant chamber (Figure 1A; see Freestone, MacInnis, & Church, 2013). Head entries into the food magazine were transduced by an infrared photobeam (ENV-254). Two retractable levers (ENV-112CM) were located on opposite sides of the food cup. The chamber was also equipped with a house light (ENV-215) that was centered at the top of the front wall, as well as two nose poke keys (ENV-119M-1) that were located above the levers. Water was always available from a sipper tube that protruded through the back wall of the chamber. Experimental events were controlled and recorded with 2-ms resolution by the software program MED-PC IV (Tatham & Zurn, 1989).

Decision-making task. A: Set-up and design of the operant chamber for the task. B: Flow diagram of the task for the Extra-Feedback group. C: Flow diagram of the task for the Normal-Feedback group. Each session began with 8 forced-choice trials followed by a maximum of 100 free-choice trials (Marshall & Kirkpatrick, 2013, 2015). On forced-choice trials, one lever was inserted into the chamber. Each lever corresponded to one of two choices (i.e., low-uncertainty, L-U; high-uncertainty, H-U); lever assignments were counterbalanced across rats. When the lever was pressed on forced-choice trials, a fixed interval (FI) 20-s schedule began; the first lever press after 20 s resulted in lever retraction and food delivery. If the lever corresponded to the L-U outcome, then the L-2 or L-4 outcome was delivered (ps = .50). In the P[0] condition, H-U forced-choice trials involved H-1 and H-11 outcomes (ps = .50); in the P[1] condition, H-U forced-choice trials involved H-0 and H-11 outcomes (ps = .50). Each of these magnitudes for the L-U and H-U choices was presented twice in the forced-choice trials in a random order. Free-choice trials were identical to forced-choice trials with the following exceptions: (1) Both levers were inserted into the chamber; (2) a choice on one of the levers caused the other lever to retract; and, (3) all H-U outcomes (H-0, H-1, and H-11) were probabilistically delivered following H-U choices. The red (gray) and green (white) shades of the choice outcomes reflect losses and gains, such that the dual-shaded outcomes for Group Extra-Feedback reflect the losses-disguised-as-wins (LDW) outcomes. The lights above each lever were nosepoke key lights that could be illuminated. ITI = intertrial interval. Adapted with permission from Stopper and Floresco (2011).

Procedure

Magazine and lever-press training

The rats experienced a random-time 60-s schedule of food deliveries for magazine training, earning approximately 120 pellets in one 2-hr session. The rats then experienced lever-press training with a fixed ratio (FR) 1 schedule of reinforcement followed by a random ratio (RR) 3 schedule and then an RR 5. During FR 1 training, only one lever was inserted at a given time. For RR 3 and RR 5 training, both levers were presented to the rat. Each of these schedules lasted until the rats earned 20 pellets on each lever, such that the corresponding lever was retracted once the criterion was met. For 22 of the 24 rats, lever-press training lasted for two sessions. For the other two rats, a third session of lever-press training was administered, which involved only the RR 3 and RR 5 schedules of reinforcement.

Risky choice task

Each pair of rats was randomly assigned to one of two groups: Group Extra-Feedback and Group Normal-Feedback. Both groups experienced the same choice task (Figures 1B and 1C). Low-uncertainty (L-U) choices always resulted in food, but the amounts varied probabilistically between 2 or 4 pellets (ps = .50; low-two, L-2, and low-four, L-4), and high-uncertainty (H-U) choices resulted in either no food being delivered (high-zero, H-0), 1 food pellet (H-1), or 11 food pellets (H-11) with unequal probabilities (Table 1). Thus, the H-U choice could result in two potential losses: H-0, in which no reward was delivered, and H-1, in which the rats received less than what they could have received for an L-U choice.

Table 1.

Probabilities of the high-uncertainty-zero (H-0), H-1, and H-11 outcomes in the P[0] and P[1] conditions for Groups Extra-Feedback and Normal-Feedback. Half of each group experienced one of two probability/condition orders. Across all phases and conditions, the low-uncertainty-two (L-2) and L-4 outcomes were delivered following low-uncertainty choices with probabilities of .50.

Group	Order	Condition 1	Condition 2
Extra-Feedback (n=12)	1 (n=6)	P[0]	P[1]
		H-0 = .10, .90, .50, .10	H-0 = .05, .25, .45
		H-1 = .45, .05, .25, .45	H-1 = .90, .50, .10
		H-11 = .45, .05, .25, .45	H-11 = .05, .25, .45
	2 (n=6)	P[1]	P[0]
		H-0 = .45, .05, .25, .45	H-0 = .90, .50, .10
		H-1 = .10, .90, .50, .10	H-1 = .05, .25, .45
		H-11 = .45, .05, .25, .45	H-11 = .05, .25, .45
Normal-Feedback (n=12)	1 (n=6)	P[0]	P[1]
		H-0 = .10, .90, .50, .10	H-0 = .05, .25, .45
		H-1 = .45, .05, .25, .45	H-1 = .90, .50, .10
		H-11 = .45, .05, .25, .45	H-11 = .05, .25, .45
	2 (n=6)	P[1]	P[0]
		H-0 = .45, .05, .25, .45	H-0 = .90, .50, .10
		H-1 = .10, .90, .50, .10	H-1 = .05, .25, .45
		H-11 = .45, .05, .25, .45	H-11 = .05, .25, .45

Open in a new tab

Food delivery to Group Normal-Feedback was naturally accompanied by exposure to external stimuli (i.e., sound and vibration of the pellet dispenser) that matched the amount of food delivered (Figure 1B). In the case of reward omission (H-0), there was no feeder operation. However, with the other outcomes, the feeder operation cues matched the number of pellets delivered (L-2, L-4, H-1, and H-11). Additionally, for Group Normal-Feedback, each operation of the pellet dispenser was accompanied by a 0.1-s illumination of the nose poke key light above the chosen lever, in that these rats experienced synchronized auditory, tactile, and visual stimuli veridically related to the number of pellets received. A 10-s intertrial interval (ITI) began following the last food pellet delivered.

For Group Extra-Feedback, every H-U outcome was also accompanied by 11 operations of the alternative pellet dispenser, but the rats did not receive any pellets from this dispenser; these food pellets were delivered into an external receptacle (Figure 1A). There also were 11 flashing illuminations of the nose poke key light above the H-U lever that were synchronized with the 11 external pellet deliveries (Figure 1A). Thus, all H-U outcomes included explicit multimodal feedback (i.e., light, sound and vibration of the pellet dispenser) associated with 11 pellets, regardless of the size of the actual H-U outcome delivery (Figure 1C). For example, for an H-1 outcome, synchronized operation of the active pellet dispenser, alternative pellet dispenser, and nose poke key light occurred once, followed by 10 more synchronized operations of the alternative pellet dispenser and nose poke key light. For Group Extra-Feedback, the 10-s ITI began following the 11^th operation of the alternative pellet dispenser and 11^th flash of the nose poke key light. Thus, for Group Extra-Feedback, the H-0 and H-1 outcomes were designed to resemble LDWs. Reward delivery and outcome feedback of the L-U choices made by Group Extra-Feedback were presented as in Group Normal-Feedback.

The task presented to both groups was divided into two conditions (P[0] and P[1]), in which either p(H-0) or p(H-1) was selectively manipulated, respectively. Despite rats’ experiencing the same outcome magnitudes in these different conditions, the selective manipulation of the probabilities of H-0 and H-1 and how these probabilities relate to that of the H-11 outcome have been shown to have considerable effects on loss-chasing in risky environments (Marshall & Kirkpatrick, 2015). In the P[0] condition, p(H-0) was manipulated across sessions, and the remaining probability of H-U outcomes [1 – p(H-0)] was divided equally across H-1 and H-11 outcomes [i.e., p(H-1) = p(H-11)]. In the P[1] condition, p(H-1) was manipulated, such that p(H-0) was equal to p(H-11) throughout this condition (Table 1). Half of the rats in each group experienced the P[0] condition first, while the other half experienced the P[1] condition first.

In Phase 1 of the P[0] condition, p(H-0) was .10 (i.e., p(H-1) = p(H-11) = .45). In Phases 2–4, p(H-0) was equal to .90, .50, and .10, respectively, so that p(H-1) and p(H-11) were equal to .05, .25, and .45, respectively. For the P[1] condition, p(H-1) was equal to .10, .90, .50, and .10 in Phases 1–4, respectively, so that p(H-0) and p(H-11) were equal to .45, .05, .25, and .45 in Phases 1–4, respectively (Table 1). Following exposure to either the P[0] or P[1] conditions, the rats then experienced the P[1] and P[0] conditions, respectively. Here, the P[0] and P[1] conditions mirrored Phases 2–4 of the prior condition, so that p(H-0) and p(H-1), respectively, were equal to .90, .50, and .10 in consecutive phases (Table 1). Each of these seven phases lasted for 10 sessions (i.e., 70 total sessions). Each session began with 8 forced-choice trials followed by a maximum of 100 free-choice trials (Marshall & Kirkpatrick, 2013, 2015), and lasted until all free-choice trials were completed or for approximately 2 hr. Additional details are provided in the caption to Figure 1.

Data analysis: mixed-effects regression models

All rats’ data from all sessions were used in analyses (unless otherwise specified below). Due to equipment issues (i.e., jams/clogs in either of the feeders; Figure 1A), three sessions for one rat, two sessions for a second rat, one session for a third rat, and four sessions for a fourth rat were removed from all analyses (i.e., < 1% of total sessions analyzed). Cursory analyses of mean choice behavior across rats before and after each of these sessions suggested that these issues did not considerably impact subsequent choice behavior. All responses and outcomes were obtained from the raw data using MATLAB (The MathWorks; Natick, MA).

Choice behavior

Given the considerable interactions between both molar manipulations (e.g., probability) and molecular influences (e.g., previous outcome) on risky choice (e.g., Marshall & Kirkpatrick, 2013, 2015), a combined molar-molecular analysis was conducted on choice behavior. Here, analyses involved fitting generalized linear mixed-effects models with a binomial response distribution and a logit link function to the data (e.g., Pinheiro & Bates, 2000; also see Bolker et al., 2008; Hoffman & Rovine, 2007; Schielzeth & Nakagawa, 2013). These analyses were conducted using the Statistics and Machine Learning Toolbox 11.0 in MATLAB 9.1. Choice was the criterion in the regression analyses (L-U = 0, H-U = 1). Because these models included the outcome of the previous choice, the first free-choice trial of each session was not included in analysis. These models are comparable to repeated-measures logistic regression analyses, allowing for parameter estimation as a function of condition (fixed effects) and the individual (random effects). Model selection involved determining the model that minimized the Akaike information criterion (AIC; Akaike, 1973; Burnham & Anderson, 1998), in which the doubled negative log likelihood of the regression model is penalized by twice the number of estimated parameters. In-text reporting of the results focused on theoretically important results, but full model outputs are provided in the Supplemental Materials. Unstandardized regression coefficients (i.e., b-values), which are the regression weights for each variable in the form of log odds ratios, are reported along with the corresponding 95% confidence intervals of these estimates. Unstandardized coefficients are the recommended measures of effect size for logistic regression models (Baguley, 2009). Post-hoc analyses were conducted using the coefTest function in MATLAB.

Outcome history

This analysis was similar to the logistic regression modelling approach employed by Lau and Glimcher (2005). However, because choice identity (left or right lever; L-U or H-U choice) and outcome magnitude were perfectly correlated here, the analysis specifically focused on the effects of outcome history on subsequent choice (e.g., Kennerley, Walton, Behrens, Buckley, & Rushworth, 2006):

γ_{o} + \sum_{i = 1}^{10} α_{i} (O_{HU, N - i} - O_{LU, N - i}),

(1)

in which γ_O was the model’s intercept, N was the current trial, i was trial lag, α_i was the coefficient estimating the extent to which the ith previous outcome influenced choice on trial N, O_HU,N-i was the reward received on trial N-i for making an H-U choice, and O_LU,N-i was the reward received on trial N-i for making an L-U choice. Analysis then determined the functional form of the change in regression weights as a function of the past 10 outcomes; accordingly, the first 10 trials of each session were omitted from analysis. Each rat’s individual regression weights were computed from the results of the previous analysis, and subjected to a nonlinear mixed-effects model in R version 3.3.1 (Pinheiro, Bates, DebRoy, & Sarkar, 2016) to determine whether the effects of previous outcomes were better accounted for by a model that assumed exponential or hyperbolic decay. Decaying weights of previous outcomes have been suggested to take either an exponential or hyperbolic form (e.g., Devenport, Hill, Wilson, & Ogden, 1997; Sutton & Barto, 1998). The following models were fit to the outcome history analysis within the nonlinear mixed effects analyses:

Exponential : A^{*} {(1 - k)}^{T - 1}

(2)

Biased - Exponential : A^{*} {(1 - k)}^{T - 1} + γ

(3)

Hyperbolic : \frac{A}{1 + k^{*} (T - 1)}

(4)

Biased - Hyperbolic : \frac{A}{1 + k^{*} (T - 1)} + γ

(5)

Here, A is the function’s intercept, k is the decay rate of the influence of previous outcomes, T is the lag of the outcome, and γ is the function’s limit as T → ∞. The γ parameter reflects a stay bias that vertically adjusts the function’s limit, ultimately accounting for potential stay biases in choice paradigms (K.-U. Kim et al., 2015). As described below, the effects of the LDWs on choice behavior did not emerge until after the initial phase of the first condition of the experiment (Table 1). Accordingly, to best characterize the effects of LDWs on behavior, the data from this initial phase were not included in analysis.

Goal-tracking behavior

Goal-tracking behavior was measured to determine whether the cues in Group Extra-Feedback elicited greater goal-tracking behavior compared to Group Normal-Feedback following H-U losses, given that H-0 and H-1 losses were cued as if H-11 outcomes were delivered for Group Extra-Feedback but not for Group Normal-Feedback. Goal-tracking behavior was operationally defined as the latency to the first head entry into the food magazine relative to the lever press that terminated the trial. Shorter latencies were indicative of expectations of more reward delivered, as rats have been shown to collect larger rewards more rapidly than smaller rewards (Zeeb, Robbins, & Winstanley, 2009). Latency would thus be driven by the cues in the environment predicting outcome magnitude. Mean head-entry latency was compared between groups as a function of H-U outcome magnitude (H-0, H-1, H-11). If the LDWs experienced by Group Extra-Feedback did in fact skew expectations of actual food delivery, then this group’s latencies given all H-U outcomes should be relatively comparable. This analysis was similar to the mixed-effects analysis of choice behavior except that this analysis used a gamma response distribution with a log link function. Forced- and free-choice trials were included in this analysis. Trials in which a head entry did not occur before the onset of the next trial were omitted from analyses (9.9% of trials). One rat from Group Normal-Feedback was removed from the analysis due to a malfunction of the head-entry photobeam that resulted in inconsistent recordings. As in the outcome history analyses, the initial phase of the first condition of the experiment was not included in this analysis.

Data analysis: RL models

RL model structure

Choice behavior was subjected to RL modeling with the goal of identifying potential mechanisms behind the effects of LDW-type feedback on H-U choice. The basic structure of the RL models was as follows:

δ_{N, T} = R_{N, T} - Q_{N, T - 1},

(6)

Q_{N, T} = Q_{N, T - 1} + α_{N} δ_{N, T},

(7)

P (N_{T} = HU | Q_{LU, T}, Q_{HU, T}) = \frac{1}{1 + exp (- β (Q_{HU, T} - Q_{LU, T}) + ρ)},

(8)

ρ = (C_{HU, T - 1} - C_{LU, T - 1}) \times γ_{N} .

(9)

Equations 6 and 7 correspond to the model’s valuation rule, and Equations 8 and 9, the decision rule. Equation 6 computes the difference between the received outcome and expected outcome (i.e., prediction error), and Equation 7 integrates the prediction error with the previous estimate of value in order to update value estimates. The updated value (Q_N,T) on trial T for choice N is a function of both the previous estimate of value for that choice (Q_N,T-1) as well as the prediction error (δ_N,T) between the previous estimate of value (Q_N,T-1) and the outcome that was received on trial T for making choice N (R_N,T). The prediction error, δ_N,T, is scaled by a learning-rate parameter, α_N ∈ [0,1]. Larger α values reflect faster learning and more rapid adjustments in value. The larger the value of α, the more closely Q_N,T approximates R_N,T.

Equation 8 corresponds to a softmax decision rule (Daw, O'Doherty, Dayan, Seymour, & Dolan, 2006), in which the probability of making an H-U (HU) choice on trial T is a function of the difference between H-U (Q_HU,T) and L-U value estimates (Q_LU,T), an inverse-temperature parameter that captures the stochasticity of choice (β; larger values of β reflect greater exploitation of the higher-valued choice), and a stay-bias parameter, ρ, which accounts for the likelihood of repeating the previous choice. Equation 9 states that the value of ρ is a function of the previous choice (C_HU,T-1 − C_LU,T-1) and the stay bias on choice N (γ_N). Here, C_HU,T-1 equals 1 when an H-U choice was made in trial T-1 (i.e., C_LU,T-1 = 0), and C_LU,T-1 equals 1 when an L-U choice was made in trial T-1 (i.e., C_HU,T-1 = 0).

Six RL models were fit to the data (Table 2). As the primary manipulation in the current experiment involved differential feedback between L-U and H-U choices across groups, the RL models were structured to determine the extent to which Group Extra-Feedback exhibited differential value-updating (learning) of the H-U choice or perseverated more strongly on the H-U choice compared to Group Normal-Feedback. Thus, these models differed with respect to α_N, in which value estimates (Q) were updated at the same or different rates depending on the choice (N), and/or the stay-bias parameter, γ_N, which was either fixed at 0 or allowed to vary (i.e., stay-bias rates following H-U and L-U choices were either identical or allowed to differ).

Table 2.

Reinforcement learning (RL) models. “LU” and “HU” correspond to the learning and stay-bias parameters for the low-uncertainty (L-U) and high-uncertainty (H-U) choices, respectively. While Equations 6–9 were used to fit the data, Equations 7 and 9 correspond to the equations in which the learning (α) and stay-bias (γ) rates were included. ⊥ = “independent of”.

Model	Learning (Eq. 7)	Stay Bias (Eq. 9)
Simple	Single (α_LU = α_HU)	None (γ_LU = γ_HU = 0)
Simple/Biased	Single (α_LU = α_HU)	Single (γ_LU = γ_HU)
Simple/Dual-Biased	Single (α_LU = α_HU)	Dual (γ_LU ⊥ γ_HU)
Asymmetric	Dual (α_LU ⊥ α_HU)	None (γ_LU = γ_HU = 0)
Asymmetric/Biased	Dual (α_LU ⊥ α_HU)	Single (γ_LU = γ_HU)
Asymmetric/Dual-Biased	Dual (α_LU ⊥ α_HU)	Dual (γ_LU ⊥ γ_HU)

Open in a new tab

RL model fitting and selection

The fminsearch optimization algorithm in MATLAB 9.1 was used to fit each RL model to the individual rats’ trial-by-trial choice data via maximum likelihood estimation (Daw, 2011). Model fitting involved the random generation of 1000 sets of uniformly-distributed initial parameters (i.e., 1000 runs) across the following bounds: α ∈ (0,1), β ∈ (0,10), γ ∈ (−4,4). The model’s choices [P(N_T=HU | Q_LU,T, Q_HU,T)] were compared to the actual choice on each trial (L-U = 0, H-U = 1), as the same reinforcement history experienced by each rat was provided to the model; accordingly, all sessions’ data were included in analysis to account for entire choice and reinforcement history. The choice probabilities were transformed given the rat’s choice: For an H-U choice, P = P(N_T=HU | Q_LU,T, Q_HU,T); for an L-U choice, P = 1 − P(N_T=HU | Q_LU,T, Q_HU,T). Model fit was evaluated by a log likelihood measure (i.e., the sum of the log-transformed probabilities). The best-fitting model had the minimum AIC and α and β values that fell within the plausible closed intervals of [0,1] and [0,10], respectively (see Daw, 2011); there was no strong a priori hypothesis for bounding the fitted values of γ, but the values of γ from the best-fitting RL models fell within an acceptable range of [−2.18, 5.00]. Secondary evaluation of goodness-of-fit using a pseudo-R² measure (i.e., omega-squared, ω²) was computed to aid interpretation of the adequacy of model fit on a standardized scale. Here, individual rats’ data were smoothed over a moving 13-trial window, and compared to simulated data (smoothed over a 13-trial window) that were generated using the selected model’s fitted parameters.

A parameter recovery technique was performed to confirm that the RL models were detecting the data generative parameters. For each condition order (Table 1), 1000 simulations for each of the models were performed. The simulations incorporated the same task structure that the rats experienced (see Table 1). For each simulation, model parameters were randomly sampled from uniform distributions [i.e., α ∈ (0,1), β ∈ (0,10), γ ∈ (−4,4)] and these sampled parameters were used in the valuation and decision rules that determined the simulated choices. The simulated data were then fit with the RL models described above to determine whether the fitted parameter matched the corresponding simulated parameter. For parameter recovery, each simulation was fit over 50 iterations. Model fits to the simulated data with parameters that exceeded the closed intervals of the aforementioned uniform distributions of the RL parameters were excluded from parameter recovery analyses.

Results and Discussion

Choice behavior

Rats were initially exposed to a condition in which the probability of the manipulated H-U loss was .1 (i.e., H-0 in P[0]; H-1 in P[1]), in that the probability of the H-11 outcome was .45 (Table 1). Given that the initial .10 probability training phase was only delivered for half of the rats in each group for each condition (see Table 1), and that there were considerable differences in H-U preference when p(H-0) or p(H-1) in the initial phase versus the final phase of each condition (Figures 2A and 2C), the data from the initial phase and the remaining phases (i.e., Phases 2–4 of Condition 1 and Phases 1–3 of Condition 2; see Table 1) were analyzed separately.

A, B: Proportion of choices for the high-uncertainty (H-U) outcome for Groups Extra-Feedback and Normal-Feedback as a function of (A) condition (P[0]/P[1]) and (B) the outcome of the previous choice in the initial phase of the first .10 probability condition of the choice task (see Table 1). C, D: Proportion of choice for the H-U outcome as a function of (C) the probability of the primary loss (i.e., H-0 in P[0]; H-1 in P[1]) and (D) the outcome of the previous choice in all phases of both conditions except for the first phase of the first condition (see Table 1). Error bars (+/− 1 SEM) were computed with respect to the estimated marginal means of the fitted generalized linear mixed-effects models. L-4 = low-uncertainty (L-U) choice, 4-pellet outcome; L-2 = L-U choice, 2-pellet outcome; H-0 = H-U choice, 0-pellet outcome; H-1 = H-U choice, 1-pellet outcome; H-11 = H-U choice, 11-pellet outcome.

Figure 2A displays choice behavior as a function of feedback group and condition, while Figure 2B shows H-U choice as a function of feedback group and previous outcome during the initial .10 probability phase. For the regression model, the fixed-effects structure included the overall intercept and previous outcome, and the random-effects structure included intercept and previous outcome. In this initial phase, the rats made significantly more H-U than L-U choices overall, t(20991) = 5.02, p < .001, b = 0.74 [0.45, 1.03]. The absence of feedback group and condition factors from the best-fitting model indicated that these factors did not contribute significantly to model fit.

Analysis also revealed a strong effect of previous outcome on choice in the initial choice phase (Figure 2B). Specifically, the rats were significantly more likely to make L-U choices after L-4 outcomes, t(20991) = −11.73, p < .001, b = −1.22 [−1.42, −1.01], and significantly more likely to make H-U choices after H-0, t(20991) = 8.41, p < .001, b = 0.94 [0.72, 1.16], H-1, t(20991) = 10.76, p < .001, b = 0.85 [0.69, 1.00], and H-11 outcomes, t(20991) = 6.04, p < .001, b = 0.70 [0.47, 0.93]. Post-hoc analyses indicated that the rats were also more likely to make L-U choices after L-2 outcomes, p < .001. Choice behavior did not significantly differ after L-2 and L-4 outcomes, p = .496, or after H-0, H-1, and H-11 outcomes, ps ≥ .142. Overall, both groups behaved comparably in the initial phase of the experiment, which was likely driven by relatively minimal experience with H-U losses (as described below).

Figure 2C shows H-U preference as a function of the probability of the primary loss (H-0 in the P[0] condition; H-1 in the P[1] condition) for both groups in the P[0] and P[1] conditions over the remaining six phases of the experiment.¹ Figure 2D shows the proportion of choices for the H-U outcome as a function of the outcome of the previous choice. The best-fitting generalized linear mixed-effects model included the overall intercept, Feedback Group × Condition × Probability, Feedback Group × Condition × Previous Outcome, Condition × Probability × Previous Outcome, and the associated lower-order interactions and main effects as fixed effects, and intercept and previous outcome as random effects. Full model output is shown in Table S1. Here, rats were more likely to make L-U than H-U choices overall (Figure 2C), t(135326) = −10.22, p < .001, b = −1.05 [−1.25, −0.85]. It was predicted above that exposure to LDWs in Group Extra-Feedback would increase preference for the H-U outcome. Indeed, there was a main effect of feedback group on H-U choice, t(135326) = 2.70, p = .007, b = 0.28 [0.08, 0.48], suggesting that LDW exposure did in fact promote greater risk-taking behavior (see Dixon et al., 2015; Jensen et al., 2013). Moreover, there was a significant Feedback Group × Condition × Probability interaction (Figure 2C), t(135326) = 3.52, p < .001, b = 0.14 [0.06, 0.22], as reflected by the differential changes in H-U preference as a function of probability across groups and conditions. The significant group differences here are noteworthy given the lack of significant group differences in the initial phase of Condition 1 (Figure 2A). LDWs may have had less of an effect when they were presented less regularly (see Jensen et al., 2013), but then began to control behavior more after having been presented more frequently (Figure 2C). This suggests that the LDW stimuli had acquired value that further encouraged risk-taking behavior.

Group Extra-Feedback also tended to make more H-U choices after H-U outcomes compared to Group Normal-Feedback, but both groups behaved relatively comparably following L-U outcomes (Figure 2D). Analysis revealed a significant Feedback Group × Condition × Previous Outcome interaction: L-4, t(135326) = 2.21, p = .027, b = 0.05 [0.01, 0.09]; H-0, t(135326) = 2.20, p = .028, b = 0.07 [0.01, 0.13], H-1, t(135326) = −0.10, p = .918, b = −0.003 [−0.06, 0.05], H-11, t(135326) = −3.63, p < .001, b = −0.14 [−0.21, −0.06]. Post-hoc tests indicated that the groups did not differ in H-U choice following L-U outcomes in either condition, ps ≥ .262; in contrast, Group Extra-Feedback made significantly more H-U choices than Group Normal-Feedback following (1) H-1 outcomes in the P[0] condition, p = .049, (2) H-1 outcomes in the P[1] condition, p = .002, and (3) H-11 outcomes in the P[1] condition, p = .006. There was a tendency for Group Extra-Feedback to make more H-U choices than Group Normal-Feedback following H-0 outcomes in the P[0] and P[1] conditions, but these differences were not significant (ps ≈ .100). Therefore, the LDWs had a focused rather than general effect, affecting choice behavior following H-U outcomes. This serves as a critical validation of the current paradigm, as it would be expected that LDWs would selectively promote subsequent risk-taking following choices associated with LDWs (H-U) as opposed to those not associated with LDWs (L-U). Thus, LDWs may causally facilitate loss-chasing propensities.

The significant increased H-U preference following H-1 outcomes in Group Extra-Feedback versus Group Normal-Feedback is noteworthy, as H-1 outcomes for Group Extra-Feedback may most closely mirror LDWs in casinos. There, an individual gambler may be “tricked” into thinking that s/he won if s/he should win back 10 cents on a 25-cent gamble. In actuality, this is a 15-cent loss, but the loss is disguised as a win, as the reception of an outcome accompanies exposure to auditory and visual cues celebrating the victory. Thus, the current results (Figures 2C–D) suggest that LDWs can produce a significant increase in loss-chasing behaviors in rats. Indeed, LDWs drive humans to overestimate the frequency of “winning” (Dixon et al., 2015; Jensen et al., 2013; Templeton et al., 2015), which would theoretically lead to elevated risk-taking following these perceived but nonexistent “wins,” thereby taking the form of exaggerated staying behavior following an LDW.

In terms of overall risk-taking, there was tendency toward suboptimal decision making in the latter .10 probability phases (Figure 2C), which countered the rats’ behavior in the initial .10 probability phase (Figure 2A), suggesting that exposure to the .9 and .5 probability phases elicited greater risk aversion. In the latter .10 probability phases, the rats’ L-U preference led to less overall food reward compared to if they had exclusively made H-U choices. Suboptimal behavior in decision making tasks in rats has been reported previously (Chow, Smith, Wilson, Zentall, & Beckmann, 2017; Marshall & Kirkpatrick, 2015; also see Magalhães, White, Stewart, Beeby, & van der Vliet, 2012), and ultimately suggests that the expected utility (value) of the outcomes is weighted less heavily in these tasks than other task dimensions (e.g., outcome predictability, frequency of gains versus losses; see Smith, Bailey, Chow, Beckmann, & Zentall, 2016). Assuming that the exposure to two loss magnitudes (H-0, H-1) contributed to greater risk aversion in Group Normal-Feedback (Marshall & Kirkpatrick, 2015), then the extra-feedback cues seem to have attenuated risk aversion and the deterring impact of the corresponding losses, ultimately producing greater risk-taking behavior.

As seen in Figure 2D, there was a distinct pattern of results following H-0 and H-1 outcomes that depended on condition. Across both conditions, the tendency to make more H-U choices following losses is not without precedent (e.g., Montes, Stopper, & Floresco, 2015; Stopper & Floresco, 2011), and may reflect perseverative tendencies to repeat previous choices (i.e., stay-biases). Here, in the P[0] condition, rats made significantly more H-U choices after H-1 than after H-0 outcomes, p = .018; the opposite pattern was exhibited in the P[1] condition, but this difference was not significant, p = .517 (also see Marshall & Kirkpatrick, 2015). These differences may indicate that the rats were tracking the probability of the H-11 outcome relative to one of the H-U losses. Specifically, in the P[0] condition, the probabilities of the H-0 and H-11 outcome were anti-correlated: Increases in p(H-0) were associated with decreases in p(H-11). In contrast, the probabilities of H-0 and H-11 were correlated in the P[1] condition. Greater H-U choice following H-0 outcomes than H-1 outcomes in the P[1] condition may have potentially reflected the rats’ tracking p(H-11): The more that they received H-0 outcomes, the more likely they were to receive subsequent H-11 outcomes. Thus, the differential effects following H-0 and H-1 outcomes may indicate that the rats were using a learned model of their environment to make sequential decisions (Huh, Jo, Kim, Sul, & Jung, 2009; also see Daw, Gershman, Seymour, Dayan, & Dolan, 2011; Wunderlich, Symmonds, Bossaerts, & Dolan, 2011). Therefore, assumptions of pure win-stay/lose-shift behavior in terms of the relationship between previous outcomes and subsequent choices may mask richer interpretations of actual behavior. For instance, a “win” (or gain) may be jointly characterized by some weighted interaction between the magnitude of the previous outcome and the probabilistic signaling of the most likely outcome given the same or different choice.

The unique relationship between the previous outcome and subsequent choice across groups suggests that elevated loss-chasing following H-1 outcomes induced by LDWs may be driven by a heightened stay bias following such ambiguous outcomes. As seen in Figure 2D, the groups did not differ following L-U outcomes, but Group Extra-Feedback exhibited greater H-U preference following H-U outcomes. This indicates that the differential feedback did not necessarily increase a general tendency to make H-U choices regardless of the previous outcome, but increased H-U loss-chasing (following H-1 outcomes) as well as the overall increased tendency to repeat an H-U choice. Given that pathological gamblers have been shown to exhibit stay biases (de Ruiter et al., 2009), LDWs may contribute to more biased decision-making.

Ultimately, these data suggest that the LDWs raised the value of the H-U choice and outcome because the multimodal LDW stimulus served as a conditioned reinforcer that increased the likelihood of subsequent H-U choices (Zentall & Stagner, 2011; also see Seo & Lee, 2009; but see Barrus & Winstanley, 2016). Specifically, the probability of making an H-U choice following H-U outcomes may have been increased in Group Extra-Feedback relative to Group Normal-Feedback because the LDW stimuli augmented H-U value. That is, the LDWs may have “disguised” the magnitude of the true outcome, a hypothesis further explored in the analysis of goal-tracking behavior (see below). According to this account, the positive conditioned reinforcing value of the LDW may have masked or competed with processing the delivery of the H-0 and H-1 losses, consequently elevating the likelihood of repeating H-U choices. Accordingly, any influence of conditioned reinforcer value may have been manifested in an elevated stay bias to continue making H-U choices by Group Extra-Feedback. Such an explanation could then account for the overestimations of “winning” given exposure to LDWs (see Dixon et al., 2015; Jensen et al., 2013; Templeton et al., 2015).

Outcome history

To further investigate the relationship between LDWs, previous outcomes, and subsequent choices, analyses were conducted to determine whether the LDW-type feedback affected the decaying influence of past outcomes on subsequent choice, as well as to determine the function that best characterized the form of decay. It is well-established that the influence of a given previous outcome on subsequent choice decays with each subsequent outcome (e.g., Devenport et al., 1997; Glimcher, 2011). This is a central tenet of RL models, in which the decay is exponential (e.g., Glimcher, 2011). However, other reports have suggested a hyperbolic decay of past events (Devenport et al., 1997). To our knowledge, while there have been several previous analyses of outcome decay (e.g., Kennerley et al., 2006; H. Kim, Lee, & Jung, 2013; H. Kim, Sul, Huh, Lee, & Jung, 2009; Sul, Kim, Huh, Lee, & Jung, 2010), none of them have reported the functional form of the decay, or have only fit an exponential function to the data (Beeler, Daw, Frazier, & Zhuang, 2010; Rutledge et al., 2009; Zalocusky et al., 2016).

To address this oversight in the literature, analysis determined the regression coefficients of previous outcomes as a function of outcome lag; intuitively, this reflects the relationship between choices and previous outcomes that are progressively more temporally distant rather than just the single most recent previous outcome (e.g., Figure 2D for outcome lag = 1). This analysis is a win-stay/lose-shift analysis, in which more positive slopes reflect a greater likelihood of making H-U choices after greater H-U outcome magnitudes and of making L-U choices after greater L-U outcome magnitudes, reflecting increased staying behavior.

Analyses involved 122,452 observations, and included the overall intercept, a random intercept, and fixed and random effects of the previous 10 outcomes. Table S2 includes the mixed-effects output for the analysis. There was the expected decay in outcome influence (regression coefficients) as a function of previous outcome (Figure 3), consistent with previous studies (e.g., Devenport et al., 1997; Sutton & Barto, 1998). More recent outcomes had greater influence over subsequent choices than more temporally distant outcomes. With the exception of the outcome of lag 8 (p = .055), all regression coefficients were significantly greater than 0, ps ≤ .015 (see Table S2). Thus, the past series of outcomes had at least partial influence over subsequent choice behavior.

Regression coefficients from the mixed-effects analysis evaluating the effect of the previous 10 outcomes on subsequent risky choice. The curvilinear function within each panel reflects the model fits of the best-fitting biased-exponential model. N = Group Normal-Feedback; E = Group Extra-Feedback.

The random-effects coefficients were analyzed via nonlinear mixed-effects models to determine the functional form of the decay rates of the coefficients (Equations 2–5). The AICs of the best- to worst-fitting models were as follows: biased-exponential (−1157.13), biased-hyperbolic (−1120.43), hyperbolic (−1025.92), and exponential (−913.48).² Figure 3 shows regression coefficients (data points) and the fit of the biased-exponential (line) model for each rat. The estimates for A, k, and γ were all significantly greater than zero, warranting their inclusion in the model: A, t(211) = 8.87, p < .001, b = 0.38 [0.29, 0.46]; k, t(211) = 54.41, p < .001, b = 0.64 [0.62, 0.66]; and, γ, t(211) = 3.83, p < .001, b = 0.02 [0.01, 0.04]. Group Extra-Feedback exhibited a significantly greater decay rate (k = 0.64) than Group Normal-Feedback (k = 0.58), t(211) = −3.50, p = .001, b = −0.06 [−0.10, −0.03]. There were no significant group differences in the intercept, A, t(211) = −1.52, p = .130, b = −0.09 [−0.21, 0.03], or the stay-bias parameter, γ, t(211) = −0.37, p = .706, b = −0.003 [−0.02, 0.01]. Thus, an exponential decay with a stay-bias parameter, which produced a non-zero asymptote, was the best account of the outcome history data. Examination of Figure 3 indicates that the regression coefficients did not drop to 0, even at a 10-outcome lag, indicating the need for a stay-bias parameter. Interestingly, Zalocusky et al. (2016) recently showed that a similar model also fit their risky choice data well, but these authors did not report competing models. Given the large number of observations used to estimate the regression coefficients and the decay function in the present analysis, these results provide compelling support for an exponential decay of outcome history on subsequent choice behavior, in accordance with reinforcement learning models (e.g., Sutton & Barto, 1998) and neurobiological data (e.g., Bayer & Glimcher, 2005).

The significant differences in decay rates, k, between groups is interesting, as steeper decay rates have been suggested to be analogous to greater learning rates (see Glimcher, 2011), and steeper learning rates have been suggested to be related to greater risk aversion in the presence of gains (March, 1996). However, Group Extra-Feedback exhibited greater H-U preference (less risk aversion) than Group Normal-Feedback (Figures 2C–D), suggesting that the rate of decaying coefficients of outcome history may not purely reflect differences in learning/value-updating rates. As the asymptotes of the group’s outcome-history functions were similar, the greater decay rates (k) in Group Extra-Feedback indicate that they reached asymptote more quickly, reflecting a disproportionately stronger effect of more recent outcomes in Group Extra-Feedback. In this analysis, there were no significant group differences in the intercept, A, and the stay-bias parameter, γ, so the values of the regression coefficients in Group Extra-Feedback would have needed to decrease by greater amounts to reach an asymptote comparable to that of Group Normal-Feedback. Thus, for Group Extra-Feedback, more recent outcomes would have had relatively greater influences on tendencies to repeat the previous choice, specifically the H-U choice (i.e., an H-U stay bias). Therefore, the steeper outcome-history decay rates in Group Extra-Feedback suggest that the most recent H-U outcomes had disproportionately large effects on subsequent choice. In conjunction with the choice behavior analyses above, these results thereby suggest that the LDWs presented to Group Extra-Feedback promoted greater stay biases on the H-U choice.

Goal-tracking behavior

The analyses thus far have focused on the effects of LDWs on choice. It is possible that LDWs gain their value through conditioned reinforcement due to their repeated pairing with food on H-11 trials. This is consistent with the finding that the LDW effects were not apparent until after the rats had experienced a higher probability of H-11 outcomes. Conditioned reinforcers are stimuli that gain value due to repeated pairing with primary reinforcers through associative learning (e.g., food; Kelleher & Gollub, 1962). If LDWs act as conditioned reinforcers, then their exposure should produce a heightened expectation of food when feedback was provided, potentially leading to LDW-induced overestimations of winning (see, e.g., Dixon et al., 2015). One commonly used measure of expectation of food receipt is goal-tracking behavior, or checking the food cup at the end of a trial. Goal-tracking was operationally defined as the latency to the first head entry in the food magazine after the performance of the lever press that resulted in food delivery at the end of the trial. If LDWs result in heightened expectation of food delivery, then the rats should show a more rapid onset of their goal-tracking responses. On the other hand, the groups should not differ if they are purely attending to the actual delivery of food pellets to the food magazine (i.e., 0 vs. 1 vs. 11 food pellets). Overall, as Group Extra-Feedback received the same feedback following all H-U outcomes, the rats in this group may be faster to inspect the food magazine compared to Group Normal-Feedback given (1) the greater ambiguity of H-U feedback stimuli, and (2) greater reward expectation given the onset of conditioned reinforcement upon trial termination (i.e., when rewards were delivered). Relatedly, as described above, if the LDWs skewed expectations of the magnitude of actual food delivery in Group Extra-Feedback, then their latencies given the H-U outcomes should be relatively similar.

Figure 4 shows the mean latency (s) to the first head entry into the food magazine following completion of the H-U choice trial as a function of feedback group and H-U outcome. Analysis included 20,932 observations, with a fixed-effects structure of overall intercept, feedback group, H-U outcome, and Feedback Group × H-U Outcome; the random-effects structure included intercept and H-U outcome. Across outcomes, the rats in Group Extra-Feedback had shorter latencies to initiate head entry responses compared to Group Normal-Feedback, t(20926) = −7.88, p < .001, b = −0.32 [−0.40, −0.24]. Analysis also revealed a significant Feedback Group × H-U Outcome interaction: H-0, t(20926) = −9.24, p < .001, b = −0.35 [−0.43, −0.28]; H-1, t(20926) = 5.07, p < .001, b = 0.14 [0.09, 0.20]. Post hoc tests indicated that Group Extra-Feedback exhibited faster collection latencies than Group Normal-Feedback across all H-U outcomes, ps ≤ .001, but absolute group differences decreased with increases in H-U outcome magnitude.

Mean latency (s) to the first head-entry into the food magazine following termination of the choice’s FI as a function of high-uncertainty (H-U) outcome magnitude and group. Error bars (+/− 1 SEM) were computed with respect to the estimated marginal means of the fitted generalized linear mixed-effects models. H-0 = H-U choice, 0-pellet outcome; H-1 = H-U choice, 1-pellet outcome; H-11 = H-U choice, 11-pellet outcome.

The first food pellet and LDW cue of H-1 and H-11 outcomes was delivered before the mean head-entry latency in Figure 4, suggesting that these latencies reflect more reactionary than anticipatory mechanisms. Group Normal-Feedback exhibited an inverse relationship between head-entry latency and H-U outcome magnitude, indicating that head-entry latency reflects an expectation of reward magnitude in the food magazine. Specifically, this group waited nearly 2 s to check the food cup on H-0 trials, but checked the food cup within approximately 0.5 s on H-1 and H-11 trials, indicating that they were reacting to reward omission (which was associated with an absence of feeder and cue light stimuli). The faster collection latencies by Group Extra-Feedback suggests that these rats exhibited a heightened expectation of reward given an H-U choice overall, and this was particularly striking at the H-0 outcome where they showed no indication of an expectation of reward omission. Further post-hoc tests were conducted to determine the extent of within-groups differences in collection latency across H-U outcomes. Here, there were no statistically significant differences across H-U outcomes for Group Extra-Feedback, ps ≥ .263, but there were significant decreases in collection latencies with increases in H-U outcome magnitude (H-0 vs. H-1, H-0 vs. H-11, and H-1 vs. H-11) in Group Normal-Feedback, ps ≤ .001. Therefore, Group Extra-Feedback’s head-entry latencies across H-U outcomes were comparable, suggesting that this group had relatively similar reward expectations across the delivery of all H-U outcomes despite the considerable differences in outcome magnitude. This indicates that the LDW cues undermined this group’s ability to differentiate between reward omission and reward receipt in the fashion that was shown by Group Normal-Feedback. That is, the rats that were not exposed to the conditioned reinforcement of LDWs (i.e., Group Normal-Feedback) showed differentiated goal-tracking behavior with respect to H-U outcome magnitude. Thus, while Group Normal-Feedback’s behavior appeared to be governed by actual pellet delivery, Group Extra-Feedback’s behavior was more likely driven by the LDW feedback cues. Accordingly, if the conditioned reinforcement of LDWs overrode actual outcome magnitude from the choice, then these results could potentially account for LDW-induced overestimations of win frequency (see, e.g., Dixon et al., 2015).

The goal-tracking results are particularly intriguing in consideration of the rats’ choice behavior as a function of the previous outcome (Figure 2). Given the non-differential goal-tracking latencies in Group Extra-Feedback, it could be predicted that an expectation of an H-11 outcome and the delivery of an H-0 or H-1 outcome would have produced a large prediction error between expected and received reward (i.e., extinction), which could have ultimately discouraged subsequent H-U choice. In contrast, Group Extra-Feedback exhibited relatively high rates of H-U choice behavior (i.e., loss-chasing) following H-0 and H-1 outcomes compared to Group Normal-Feedback (Figure 2D). Potentially, the presentation of LDWs may have increased H-U choices due to reward magnitude hyposensitivity during gambling (see Lole, Gonsalvez, & Barry, 2015; but also see Lole, Gonsalvez, Barry, & Blaszczynski, 2014). Overall, these results suggest that the acquisition of conditioned reinforcement value of the LDWs with repeated exposure came to diminish differential effects of H-U outcomes on choice (e.g., muted responses to H-0 reward omission). Accordingly, LDWs seemed to have overridden any value-reducing effects of H-0 and H-1 outcomes on H-U value, thereby producing loss-chasing behavior.

Overall, the goal-tracking results provide good face validity to the current animal model of LDWs. Similar behavior has been demonstrated in the rodent slot-machine task (rSMT) (e.g., Winstanley, Cocker, & Rogers, 2011; also see Peters, Hunt, & Harper, 2010). Here, each of three key lights are probabilistically illuminated (i.e., analogues to slot machine reels). When all three key lights are illuminated, the animal has “won” and can press a lever to “collect reward”; a second lever provides the opportunity to gamble again. However, if the collect-reward lever is pressed when any other permutation of key lights is presented, the animal suffers a timeout before the next gamble (i.e., the next opportunity to earn food). While animals consistently respond to collect reward when all lights are illuminated, they still do so more often than not when two of the lights are illuminated (Cocker, Hosking, Murch, Clark, & Winstanley, 2016; Cocker, Le Foll, Rogers, & Winstanley, 2014; Cocker, Lin, Barrus, Le Foll, & Winstanley, 2016; Cocker, Tremblay, Kaur, & Winstanley, 2017; Rafa, Kregiel, Popik, & Rygula, 2016; Winstanley et al., 2011). These results have been interpreted as evidence of near-misses in animal gambling models (e.g., Winstanley et al., 2011), in which losses accompanied by nearly all win-related stimuli (i.e., conditioned reinforcers) are treated as if the individual still won when they actually lost (see Reid, 1986). Moreover, in the rSMT, Cocker et al. (2014) reported that the latency to press the collect-reward lever following wins and near-misses did not significant differ. The similarities in collection latency in Group Extra-Feedback across H-U outcomes (Figure 4) cement the current paradigm as a novel animal model of gambling behavior, complementing other animal models that have also reported relatively comparable responses to the delivery of wins and ambiguous losses.

RL models

Parameter recovery

The first step of RL modelling was parameter recovery, in which data were simulated with uniformly-distributed parameters and then fit with the RL models to gauge how well the models identified the data generative parameters. The parameter recovery data are presented in Figure S1. In brief, all models adequately recovered the simulated parameters, indicating that they were all viable models for fitting the data. The RL models were fit to the trial-by-trial observed data from each rat.

Model selection

Even through 1000 iterations, the Simple RL model did not converge on a viable solution for 13 rats; Simple/Biased RL, 1 rat; Asymmetric, 11 rats; Asymmetric/Biased, 6 rats. The Simple/Dual-Biased and Asymmetric/Dual-Biased models converged on a solution for all rats, suggesting that the rats’ exhibited separate rates of stay biases for L-U and H-U choices. The mean AICs of the Simple/Dual-Biased and Asymmetric/Dual-Biased models were 3413.3 and 3374.4, respectively. Thus, the best model was the Asymmetric/Dual-Biased RL model, suggesting that independent L-U and H-U learning and stay-bias rates provided the best account of the data.

Model fit (Asymmetric/Dual-Biased RL)

Figure 5 shows the simulations of the fitted Asymmetric/Dual-Biased RL model parameters to trial-by-trial data of five different rats. The model did well to fit the rats’ data with the minimum and maximum ω² values equal to .68 and .98, respectively. Table S3 shows the summary statistics of the best-fitting parameter estimates and fit indices of the Asymmetric/Dual-Biased RL model. Due to asymmetric skewness in the distributions of the fitted parameters, non-parametric tests were used in corresponding statistical tests (i.e., Wilcoxon signed-ranks tests for paired samples; Wilcoxon rank-sum tests for independent samples).

Actual choices (Data) and predicted choices from the Asymmetric/Dual-Biased RL model (Model) for individual rats, ordered from top to bottom by how well the model fit the data in terms of omega-squared (ω²). “Min” and “Max” refer to the rats with the lowest and highest ω², respectively. “25%ile”, “Mdn”, and “75%ile” refer to the 25^th, 50^th, and 75^th percentile of ω², respectively. The alternating gray and white boxes within each panel represent different conditions and phases (see Table 1). The differential widths of these phase markers reflect individual differences in trials completed within each phase.

Figure 6 shows the medians of the five parameters for both groups [i.e., two parameters within the Asymmetric/Dual-Biased RL valuation rule (Figure 6A) and three parameters within the Asymmetric/Dual-Biased RL decision rule (Figure 6B)]. L-U learning rates (α_LU) were significantly smaller than H-U learning rates (α_HU), z = 3.86, p < .001 (Wilcoxon signed-ranks test). Given the constancy of the L-U choice across all conditions, L-U learning rates may have been smaller than H-U learning rates as individuals may more readily incorporate recent events into estimations of subjective value for actions corresponding to greater outcome uncertainty (i.e., H-U choices and outcomes; see Behrens, Woolrich, Walton, & Rushworth, 2007). Additionally, L-U stay biases (γ_LU) were significantly greater than H-U stay biases (γ_HU), z = 4.26, p < .001 (Wilcoxon signed-ranks test), with H-U rates even being more negative, suggesting a relatively reduced rate of repeating H-U choices than what would be predicted by action values alone. These results reflect general risk aversion (see Kagel, MacDonald, Battalio, White, & Green, 1986), which was likely driven by the frequencies of gains and losses rather than simply differences in value estimates of the two choices (see Horstmann, Villringer, & Neumann, 2012; Marshall & Kirkpatrick, 2015). This is consistent with the mixed-effects regression model which disclosed greater overall L-U choices. Overall, the significant differences in learning and stay-bias rates following L-U and H-U choices support the inclusion of distinct L-U and H-U parameters in the model.

Median fitted parameter estimates from the valuation rule (A) and decision rule (B) of the Asymmetric/Dual-Biased reinforcement learning (RL) model for Groups Normal-Feedback and Extra-Feedback. Medians are shown because the corresponding statistical analyses involved non-parametric signed-rank and rank-sum tests. Error bars represent the interquartile range. LU = low-uncertainty choice; HU = high-uncertainty choice. * = statistically significant difference (p < .05).

Additional analyses were conducted to compare parameters across feedback groups. A series of Wilcoxon rank-sum tests revealed that Groups Extra-Feedback and Normal-Feedback did not differ with respect to α_LU, z = 0.55, p = .583, α_HU, z = 1.36, p = .175, or β, z = 0.72, p = .470. However, Group Extra-Feedback exhibited significantly greater H-U stay biases (γ_HU) compared to Group Normal-Feedback, z = 2.40, p = .017. There were no group differences in L-U stay biases (γ_LU), z = 1.59, p = .112. These results indicate that Group Extra-Feedback displayed a greater stay bias following the H-U choices compared to Group Normal-Feedback that may have explained their greater overall H-U choices (see Figures 2C and 2D). These results indicate that LDWs served to elevate stay biases on riskier choices by effectively increasing H-U value via possible conditioned reinforcement properties of LDW-type feedback.

Implications and Applications

This experiment was designed to investigate the effect of LDW-type feedback on choice behavior, the influence of past outcomes on subsequent choice, and goal-tracking behavior. The current results suggest that LDWs (e.g., cues presented by gambling machines in casinos) may encourage loss chasing by elevating reward expectation and enhancing stay biases for riskier choices. These empirical results were paralleled by fitting multiple RL algorithms to the rats’ data to evaluate the psychological mechanisms of choice behavior across groups. The theory-driven RL models served to complement the theory-neutral mixed-effects models, providing a complete account of the data. Importantly, there are strengths to both approaches: RL can account for individuals’ real-time experiences (i.e., outcome history) without discretizing experimental manipulations (e.g., probability of food), while mixed-effects models permit analysis of group- and individual-level effects of various manipulations on choice. The use of only one of these approaches may fail to capture key pieces of information. Although formal model comparison (e.g., AIC) between mixed-effects and RL models would be unjustifiable given differences in total observations within the models, the integration of both approaches may serve as the most direct way to unveil critical mechanism-manipulation interactions that govern individual differences in decision making (see Katahira, 2015). While RL modeling and mixed-effects analyses may be unable to provide perfect accounts of behavior (i.e., there are an infinite number of potential models), the present report has presented the results of simple forms of these models, which may inspire future research to more conclusively elucidate the corresponding mechanisms.

Of the RL models, the Asymmetric/Dual-Biased algorithm emerged as the superior model. Indeed, the incorporation of a stay-bias parameter has been shown to provide good model fits to decision-making data (see Ahn et al., 2014; Biele, Erev, & Ert, 2009; Dutt & Gonzalez, 2012; Gonzalez & Dutt, 2011; Nevo & Erev, 2012; Rutledge et al., 2009; Worthy et al., 2016; Worthy & Maddox, 2014; Worthy, Pang, & Byrne, 2013). Previous models using multiple learning rates have done so in modeling differential learning from gains and losses (e.g., Donahue & Lee, 2015; Frank, Moustafa, Haughey, Curran, & Hutchison, 2007; Niv, Edlund, Dayan, & O'Doherty, 2012). However, in the present context, this would have collapsed across H-U and L-U losses, which were qualitatively distinct in Group Extra-Feedback. Accordingly, the present set of RL models considered differential learning from L-U and H-U choices. While L-U and H-U learning rates (α_LU, α_HU) did not differ across groups, Group Extra-Feedback exhibited significantly greater stay biases following H-U choices relative to Group Normal-Feedback. If H-U outcomes with more frequent losses than gains inherently deter commission of H-U choices, then win-related LDWs appeared to have reduced such deterrence, ultimately adding subjective value to the associated action. Overall, LDWs may serve as conditioned reinforcers to encourage subsequent risk-taking, even in suboptimal conditions, such as casinos, where losing is highly likely. As the LDW feedback accompanied all H-U outcomes for Group Extra-Feedback, the present data set and RL models cannot effectively dissociate any separate influence of H-U outcome magnitude versus LDWs on risky choice. Future research may consider probabilistically delivering LDW feedback with the occurrence of H-U losses to gauge how individual differences in the influence of conditioned reinforcers moderate loss-chasing and risky-choice behaviors.

The use of complex multimodal stimuli by casinos may promote risky decision making by effectively and strategically supplementing monetary reward with virtual conditioned reinforcers (LDWs). In the initial phase of the current experiment, the H-11 outcome occurred relatively frequently, such that LDW-type feedback (i.e., flashing lights and feeder sounds; Figure 1) occurred more often with the H-11 outcomes, thereby strengthening its conditioned reinforcement value. This allowed the presentation of LDWs in subsequent conditions in which food delivery was relatively rarer to promote H-U choice. Research has shown that individuals tend to overestimate how often they “won” when LDWs are present in the environment (e.g., Dixon et al., 2015), and that problem gamblers overestimate “win” frequency relative to non-problem gamblers (Dixon et al., 2014). Thus, individuals who become pathological gamblers may attribute greater conditioned reinforcement value to LDWs. The goal-tracking analyses suggested that the LDWs elicited greater expectation of reward in the food magazine, suggesting that the LDW stimuli did in fact disguise the H-0 and H-1 losses as wins. Accordingly, the coupling of over-expectation of reward and conditioned reinforcement by LDWs would ultimately promote stay biases that lead to increased loss-chasing behaviors.

Further insight into the mechanisms of trial-by-trial decision making were provided by the outcome history analyses. Previous reports have employed comparable analyses to investigate how certain experimental manipulations (e.g., brain lesions) impact the number of past choices and/or outcomes that influence subsequent choice behavior (e.g., Kennerley et al., 2006). However, to our knowledge, there have been no reports that have statistically elucidated the functional relationship of these regression coefficients against competing models. For instance, Lau and Glimcher (2005) analyzed the decaying regression coefficients from two primates and noted that the decays were of hyperbolic or exponential form; however, the use of only two subjects prevented their ability to determine which function provided better fits to the data. With added power in the current analyses, our results suggest that the weights of previous outcomes decay exponentially to a non-zero asymptote as they recede into the past. The non-zero asymptote reflects a stay bias, or a tendency to continue to make H-U choices following previous H-U outcomes. Previous research has provided considerable quantitative and neurobiological support for exponential models in decision making environments (e.g., Bayer & Glimcher, 2005). Interestingly, the assumptions of the Asymmetric/Dual-Biased RL model are congruent with those of the biased-exponential decay model, such that choice behavior was governed by exponentially decaying outcome weights and non-zero stay biases. Note that while the stay-bias parameter in the biased-exponential model accounted for both L-U and H-U outcomes over the past 10 outcomes, the Asymmetric/Dual-Biased RL model assumed independent stay biases for the two choices that locally moderated the propensity to repeat the previous choice. Intuitively, the stay-bias parameter in the biased-exponential model may be viewed as a global perseveration metric of decaying local stay-bias rates from the Asymmetric/Dual-Biased RL model. That there were no significant group differences in the stay-bias parameter in the outcome-history analysis does not weaken the meaningfulness of the significant group differences in H-U stay bias in the RL analyses, but provides a finer account of LDW effects on risky choice. Indeed, the outcome history analysis showed that Group Extra-Feedback was quicker to decay to asymptote than Group Normal-Feedback, which is consistent with the LDWs enhancing the effect of the relatively more recent outcomes in promoting a stay bias. As the group differences in previous outcome effects on risky choice were localized to H-U outcomes (Figure 2D), these differences were thus driven by those in H-U stay biases, as supported by the Asymmetric/Dual-Biased RL model. This essentially reflects a disproportionate LDW-driven H-U stay bias in Group Extra-Feedback. Thus, in conjunction with the empirical and modeling results above, these results complete a novel and comprehensive mechanistic account of LDW-type feedback effects on risky decision making.

In conclusion, this report has described a novel animal model of LDWs (i.e., Group Extra-Feedback), which has good validity in terms of the effects of LDWs on humans. Ultimately, these results suggest that LDWs act as conditioned reinforcers to maintain loss-chasing behaviors (see Harrigan, Dixon, MacLaren, Collins, & Fugelsang, 2011) by promoting stay biases following H-U losses, corroborating previous research that has described the critical influence of conditioned reinforcers on behavior in risky environments (e.g., McDevitt, Spetch, & Dunn, 1997; Zentall, 2011, 2014). Interestingly, near-misses have also been described as potential conditioned reinforcers (e.g., Kassinove & Schare, 2001; Reid, 1986), such that gambling devices’ delivery of frequent presentations of conditioned reinforcement cues (i.e., LDWs, near-misses) may impair individuals’ abilities to track patterns of “winning” and “losing.” This could in turn promote a stay bias following conditioned reinforcement cue delivery which would have the appearance of a “win-stay” behavior in the face of an actual loss. Also, as LDWs have been suggested to increase how much individuals enjoy gambling (Sharman, Aitken, & Clark, 2015), perhaps by increasing the rewarding value of the outcomes through conditioned reinforcement, future research may consider using the current paradigm while recording ultrasonic vocalizations (USVs) to determine if LDWs result in more appetitive 50-kHz than aversive 22-kHz USVs (see Knutson, Burgdorf, & Panksepp, 2002). Ultimately while previous research has investigated the potential neural correlates of gain- and loss-processing (e.g., Levin et al., 2012), as well as those of more ambiguous losses (e.g., Habib & Dixon, 2010), the implementation of LDWs in neurobiological analyses of risky choice will considerably advance our understanding of the complexities of loss processing, conditioned reinforcement, value computation, and neurocomputational models of behavior.

Supplementary Material

NIHMS885038-supplement-1.docx^{(131.8KB, docx)}

Acknowledgments

The authors would like to thank Christian Davis for assistance with data collection. The results were part of a doctoral dissertation by Andrew T. Marshall in 2016 at Kansas State University. This research was supported by National Institute of Mental Health Grant 085739 award to Kimberly Kirkpatrick and Kansas State University. ATM and KK conceived and designed the experiments. ATM performed experimentation. ATM and KK analyzed the data. ATM and KK wrote the paper.

Footnotes

Note that all sessions’ data were included in figures and analysis, such that the small decreases in H-U preference from probabilities .9 to .5 primarily reflect high rates of H-U preference initially carried over from the end of the initial phase (Figure 2A) rather than an inability to learn the corresponding reinforcement contingencies.

In contrast to traditional measures of best fit (e.g., R²), the model with the lowest AIC, in this case the most negative AIC, is the model that provides the best fit to the data (e.g., Burnham, Anderson, & Huyvaert, 2011). A model with an AIC that is at least 10 units more than that of another model has essentially no empirical support (e.g., Burnham & Anderson, 1998).

The results were presented at the 2016 meeting of the Eastern Psychological Association.

References

Ahn W-Y, Vasilev G, Lee S-H, Busemeyer JR, Kruschke JK, Bechara A, Vassileva J. Decision-making in stimulant and opiate addicts in protracted abstinence: evidence from computational modeling with pure users. Frontiers in Psychology. 2014;5:849. doi: 10.3389/fpsyg.2014.00849. [DOI] [PMC free article] [PubMed] [Google Scholar]
Akaike H. Information theory as an extension of the maximum likelihood principle. In: Petrov BN, Csaki F, editors. Second International Symposium on Information Theory. Budapest: Akademiai Kiado; 1973. pp. 267–281. [Google Scholar]
Amsel A. The role of frustrative nonreward in noncontinuous reward situations. Psychological Bulletin. 1958;55:102–119. doi: 10.1037/h0043125. [DOI] [PubMed] [Google Scholar]
Andrade EB, Iyer G. Planned versus actual betting in sequential gambles. Journal of Marketing Research. 2009;46:372–383. doi: 10.1509/jmkr.46.3.372. [DOI] [Google Scholar]
Bacotti AV. Home cage feeding time controls responding under multiple schedules. Animal Learning & Behavior. 1976;4:41–44. doi: 10.3758/BF03211983. [DOI] [Google Scholar]
Baguley T. Standardized or simple effect size: what should be reported? British Journal of Psychology. 2009;100:603–617. doi: 10.1348/000712608X377117. [DOI] [PubMed] [Google Scholar]
Barrus MM, Cherkasova M, Winstanley CA. Skewed by cues? The motivational role of audiovisual stimuli in modelling substance use and gambling disorders. Current Topics in Behavioral Neurosciences. 2016;27:507–529. doi: 10.1007/7854_2015_393. [DOI] [PubMed] [Google Scholar]
Barrus MM, Winstanley CA. Dopamine D3 receptors modulate the ability of win-paired cues to increase risky choice in a rat gambling task. The Journal of Neuroscience. 2016;36:785–794. doi: 10.1523/JNEUROSCI.2225-15.2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bayer HM, Glimcher PW. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron. 2005;47:129–141. doi: 10.1016/j.neuron.2005.05.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
Beeler JA, Daw N, Frazier CRM, Zhuang X. Tonic dopamine modulates exploitation of reward learning. Frontiers in Behavioral Neuroscience. 2010;4 doi: 10.3389/fnbeh.2010.00170. [DOI] [PMC free article] [PubMed] [Google Scholar]
Behrens TEJ, Woolrich MW, Walton ME, Rushworth MFS. Learning the value of information in an uncertain world. Nature Neuroscience. 2007;10:1214–1221. doi: 10.1038/nn1954. [DOI] [PubMed] [Google Scholar]
Biele G, Erev I, Ert E. Learning, risk attitude and hot stoves in restless bandit problems. Journal of Mathematical Psychology. 2009;53:155–167. doi: 10.1016/j.jmp.2008.05.006. [DOI] [Google Scholar]
Bolker BM, Brooks ME, Clark CJ, Geange SW, Poulsen JR, Stevens MHH, White J-SS. Generalized linear mixed models: a practical guide for ecology and evolution. Trends in Ecology & Evolution. 2008;24:127–135. doi: 10.1016/j.tree.2008.10.008. [DOI] [PubMed] [Google Scholar]
Breen RB, Zuckerman M. 'Chasing' in gambling behavior: personality and cognitive determinants. Personality and Individual Differences. 1999;27:1097–1111. doi: 10.1016/S0191-8869(99)00052-5. [DOI] [Google Scholar]
Burnham KP, Anderson DR. Model Selection and Inference: A Practical Information-Theoretic Approach. New York, NY: Springer; 1998. [Google Scholar]
Burnham KP, Anderson DR, Huyvaert KP. AIC model selection and multimodel inference in behavioral ecology: some background, observations, and comparisons. Behavioral Ecology and Sociobiology. 2011;65:23–35. doi: 10.1007/s00265-010-1029-6. [DOI] [Google Scholar]
Bush RR, Mosteller F. A mathematical model for simple learning. Psychological Review. 1951;58:313–323. doi: 10.1037/h0054388. [DOI] [PubMed] [Google Scholar]
Campbell-Meiklejohn D, Woolrich MW, Passingham RE, Rogers RD. Knowing when to stop: the brain mechanisms of chasing losses. Biological Psychiatry. 2008;63:293–300. doi: 10.1016/j.biopsych.2007.05.014. [DOI] [PubMed] [Google Scholar]
Chow JJ, Smith AP, Wilson AG, Zentall TR, Beckmann JS. Suboptimal choice in rats: incentive salience attribution promotes maladaptive decision-making. Behavioural Brain Research. 2017;320:244–254. doi: 10.1016/j.bbr.2016.12.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
Clark L, Averbeck B, Payer D, Sescousse G, Winstanley CA, Xue G. Pathological choice: the neuroscience of gambling and gambling addiction. The Journal of Neuroscience. 2013;33:17617–17623. doi: 10.1523/JNEUROSCI.3231-13.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cocker PJ, Hosking JG, Murch WS, Clark L, Winstanley CA. Activation of dopamine D4 receptors within the anterior cingulate cortex enhances the erroneous expectation of reward on a rat slot machine task. Neuropharmacology. 2016;105:186–195. doi: 10.1016/j.neuropharm.2016.01.019. [DOI] [PubMed] [Google Scholar]
Cocker PJ, Le Foll B, Rogers RD, Winstanley CA. A selective role for dopamine D4 receptors in modulating reward expectancy in a rodent slot machine task. Biological Psychiatry. 2014;75:817–824. doi: 10.1016/j.biopsych.2013.08.026. [DOI] [PubMed] [Google Scholar]
Cocker PJ, Lin MY, Barrus MM, Le Foll B, Winstanley CA. The agranular and granular insula differentially contribute to gambling-like behavior on a rat slot machine task: effects of inactivation and local infusion of a dopamine D4 agonist on reward expectancy. Psychopharmacology. 2016;233:3135–3147. doi: 10.1007/s00213-016-4355-1. [DOI] [PubMed] [Google Scholar]
Cocker PJ, Tremblay M, Kaur S, Winstanley CA. Chronic administration of the dopamine D2/3 agonist ropinirole invigorates performance of a rodent slot machine task, potentially indicative of less distractible or compulsive-like gambling behaviour. Psychopharmacology. 2017;234:137–153. doi: 10.1007/s00213-016-4447-y. [DOI] [PubMed] [Google Scholar]
Cocker PJ, Winstanley CA. Irrational beliefs, biases and gambling: exploring the role of animal models in elucidating vulnerabilities for the development of pathological gambling. Behavioural Brain Research. 2015;279:259–273. doi: 10.1016/j.bbr.2014.10.043. [DOI] [PubMed] [Google Scholar]
Daw ND. Trial-by-trial data analysis using computational models. In: Delgado MR, Phelps EA, Robbins TW, editors. Decision Making, Affect, and Learning: Attention and Performance. Vol. 23. 2011. pp. 3–38. [Google Scholar]
Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ. Model-based influences on humans' choices and striatal prediction errors. Neuron. 2011;69:1204–1215. doi: 10.1016/j.neuron.2011.02.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
Daw ND, O'Doherty JP, Dayan P, Seymour B, Dolan RJ. Cortical substrates for exploratory decisions in humans. Nature. 2006;441:876–879. doi: 10.1038/nature04766. [DOI] [PMC free article] [PubMed] [Google Scholar]
de Ruiter MB, Veltman DJ, Goudriaan AE, Oosterlaan J, Sjoerds Z, van den Brink W. Response perseveration and ventral prefrontal sensitivity to reward and punishment in male problem gamblers and smokers. Neuropsychopharmacology. 2009;34:1027–1038. doi: 10.1038/npp.2008.175. [DOI] [PubMed] [Google Scholar]
Devenport L, Hill T, Wilson M, Ogden E. Tracking and averaging in variable environments: a transition rule. Journal of Experimental Psychology: Animal Behavior Processes. 1997;23:450–460. doi: 10.1037/0097-7403.23.4.450. [DOI] [Google Scholar]
Dixon MJ, Collins K, Harrigan KA, Graydon C, Fugelsang JA. Using sound to unmask losses disguised as wins in multiline slot machines. Journal of Gambling Studies. 2015;31:183–196. doi: 10.1007/s10899-013-9411-8. [DOI] [PubMed] [Google Scholar]
Dixon MJ, Harrigan KA, Sandhu R, Collins K, Fugelsang JA. Losses disguised as wins in modern multi-line video slot machines. Addiction. 2010;105:1819–1824. doi: 10.1111/j.1360-0443.2010.03050.x. [DOI] [PubMed] [Google Scholar]
Dixon MJ, Harrigan KA, Santesso DL, Graydon C, Fugelsang JA, Collins K. The impact of sound in modern multiline video slot machine play. Journal of Gambling Studies. 2014;30:913–929. doi: 10.1007/s10899-013-9391-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Domjan M. The Principles of Learning and Behavior. 6. Belmont, CA: Wadsworth; 2010. [Google Scholar]
Donahue CH, Lee D. Dynamic routing of task-relevant signals for decision making in dorsolaterla prefrontal cortex. Nature Neuroscience. 2015;18:295–301. doi: 10.1038/nn.3918. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dutt V, Gonzalez C. The role of inertia in modeling decisions from experience with instance-based learning. Frontiers in Psychology. 2012;3:177. doi: 10.3389/fpsyg.2012.00177. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ferster CB, Skinner BF. Schedules of Reinforcement. New York, NY: Appleton-Century-Crofts, Inc.; 1957. [Google Scholar]
Frank MJ, Moustafa AA, Haughey HM, Curran T, Hutchison KE. Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning. Proceedings of the National Academy of Sciences. 2007;104:16311–16316. doi: 10.1073/pnas.0706111104. [DOI] [PMC free article] [PubMed] [Google Scholar]
Freestone DM, MacInnis MLM, Church RM. Response rates are governed more by time cues than contingency. Timing & Time Perception. 2013;1:3–20. doi: 10.1163/22134468-00002006. [DOI] [Google Scholar]
Glimcher PW. Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis. Proceedings of the National Academy of Sciences. 2011;108:15647–15654. doi: 10.1073/pnas.1014269108. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gonzalez C, Dutt V. Instance-based learning: integrating sampling and repeated decisions from experience. Psychological Review. 2011;118:523–551. doi: 10.1037/a0024558. [DOI] [PubMed] [Google Scholar]
Habib R, Dixon MR. Neurobehavioral evidence for the "near-miss" effect in pathological gamblers. Journal of the Experimental Analysis of Behavior. 2010;93:313–328. doi: 10.1901/jeab.2010.93-313. [DOI] [PMC free article] [PubMed] [Google Scholar]
Harrigan KA, Dixon M, MacLaren V, Collins K, Fugelsang JA. The maximum rewards at the minimum price: reinforcement rates and payback percentages in multi-line slot machines. Journal of Gambling Issues. 2011;26:11–29. doi: 10.4309/jgi.2011.26.3. [DOI] [Google Scholar]
Hoffman L, Rovine MJ. Multilevel models for the experimental psychology: foundations and illustrative examples. Behavior Research Methods. 2007;39:101–117. doi: 10.3758/BF03192848. [DOI] [PubMed] [Google Scholar]
Horsley RR, Osborne M, Norman C, Wells T. High-frequency gamblers show increased resistance to extinction following partial reinforcement. Behavioural Brain Research. 2012;229:438–442. doi: 10.1016/j.bbr.2012.01.024. [DOI] [PubMed] [Google Scholar]
Horstmann A, Villringer A, Neumann J. Iowa Gambling Task: there is more to consider than long term outcome. Using a linear equation model to disentangle the impact of outcome and frequency of gains and losses. Frontiers in Neuroscience. 2012;6:61. doi: 10.3389/fnins.2012.00061. [DOI] [PMC free article] [PubMed] [Google Scholar]
Huh N, Jo S, Kim H, Sul JH, Jung MW. Model-based reinforcement learning under concurrent schedules of reinforcement in rodents. Learning & Memory. 2009;16:315–323. doi: 10.1101/lm.1295509. [DOI] [PubMed] [Google Scholar]
Jensen C, Dixon MJ, Harrigan KA, Sheepy E, Fugelsang JA, Jarick M. Misinterpreting 'winning' in multiline slot machine games. International Gambling Studies. 2013;13:112–126. doi: 10.1080/14459795.2012.717635. [DOI] [Google Scholar]
Kagel JH, MacDonald DN, Battalio RC, White S, Green L. Risk aversion in rats (Rattus norvegicus) under varying levels of resource availability. Journal of Comparative Psychology. 1986;100:95–100. [Google Scholar]
Kassinove JI, Schare ML. Effects of the "near miss" and the "big win" on persistence at slot machine gambling. Psychology of Addictive Behaviors. 2001;15:155–158. doi: 10.1037/0893-164X.15.2.155. [DOI] [PubMed] [Google Scholar]
Katahira K. The relation between reinforcement learning parameters and the influence of reinforcement history on choice behavior. Journal of Mathematical Psychology. 2015;66:59–69. doi: 10.1016/j.jmp.2015.03.006. [DOI] [Google Scholar]
Kelleher RT, Gollub LR. A review of positive conditioned reinforcement. Journal of the Experimental Analysis of Behavior. 1962;5:543–597. doi: 10.1901/jeab.1962.5-s543. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kello JE. The reinforcement-omission effect on fixed-interval schedules: frustration or inhibition? Learning and Motivation. 1972;3:138–147. doi: 10.1016/0023-9690(72)90034-3. [DOI] [Google Scholar]
Kennerley SW, Walton ME, Behrens TEJ, Buckley MJ, Rushworth MFS. Optimal decision making and the anterior cingulate cortex. Nature Neuroscience. 2006;9:940–947. doi: 10.1038/nn1724. [DOI] [PubMed] [Google Scholar]
Kim H, Lee D, Jung MW. Signals for previous goal choice persist in the dorsomedial, but not dorsolateral striatum of rats. The Journal of Neuroscience. 2013;33:52–63. doi: 10.1523/JNEUROSCI.2422-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kim H, Sul JH, Huh N, Lee D, Jung MW. Role of striatum in updating values of chosen actions. The Journal of Neuroscience. 2009;29:14701–14712. doi: 10.1523/JNEUROSCI.2728-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kim K-U, Huh N, Jang Y, Lee D, Jung MW. Effects of fictive reward on rat's choice behavior. Scientific Reports. 2015;5:8040. doi: 10.1038/srep08040. [DOI] [PMC free article] [PubMed] [Google Scholar]
Knutson B, Burgdorf J, Panksepp J. Ultrasonic vocalizations as indices of affective states in rats. Psychological Bulletin. 2002;128:961–977. doi: 10.1037//0033-2909.128.6.961. [DOI] [PubMed] [Google Scholar]
Kreek MJ, Nielsen DA, Butelman ER, LaForge KS. Genetic influences on impulsivity, risk taking, stress responsivity and vulnerability to drug abuse and addiction. Nature Neuroscience. 2005;8:1450–1457. doi: 10.1038/nn1583. [DOI] [PubMed] [Google Scholar]
Larkin JD, Jenni NL, Floresco SB. Modulation of risk/reward decision making by dopaminergic transmission within the basolateral amygdala. Psychopharmacology. 2016;233:121–136. doi: 10.1007/s00213-015-4094-8. [DOI] [PubMed] [Google Scholar]
Lau B, Glimcher PW. Dynamic response-by-response models of matching behavior in rhesus monkeys. Journal of the Experimental Analysis of Behavior. 2005;84:555–579. doi: 10.1901/jeab.2005.110-04. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lesieur HR. The compulsive gambler's spiral of options and involvement. Psychiatry: Interpersonal and Biological Processes. 1979;42:79–87. doi: 10.1521/00332747.1979.11024008. [DOI] [PubMed] [Google Scholar]
Levin IP, Xue G, Weller JA, Reimann M, Lauriola M, Bechara A. A neuropsychological approach to understanding risk-taking for potential gains and losses. Frontiers in Neuroscience. 2012;6:1–11. doi: 10.3389/fnins.2012.00015. [DOI] [PMC free article] [PubMed] [Google Scholar]
Linnet J, Peterson E, Doudet DJ, Gjedde A, Møller A. Dopamine release in ventral striatum of pathological gamblers losing money. Acta Psychiatrica Scandinavica. 2010;122:326–333. doi: 10.1111/j.1600-0447.2010.01591.x. [DOI] [PubMed] [Google Scholar]
Linnet J, Røjskjær S, Nygaard J, Maher BA. Episodic chasing in pathological gamblers using the Iowa gambling task. Scandinavian Journal of Psychology. 2006;47:43–49. doi: 10.1111/j.1467-9450.2006.00491.x. [DOI] [PubMed] [Google Scholar]
Lole L, Gonsalvez CJ, Barry RJ. Reward and punishment hyposensitivity in problem gamblers: a study of event-related potentials using a principal components analysis. Clinical Neurophysiology. 2015;126:1295–1309. doi: 10.1016/j.clinph.2014.10.011. [DOI] [PubMed] [Google Scholar]
Lole L, Gonsalvez CJ, Barry RJ, Blaszczynski A. Problem gamblers are hyposensitive to wins: an analysis of skin conductance responses during actual gambling on electronic gaming machines. Psychophysiology. 2014;51:556–564. doi: 10.1111/psyp.12198. [DOI] [PubMed] [Google Scholar]
Magalhães P, White KG, Stewart T, Beeby E, van der Vliet W. Suboptimal choice in nonhuman animals: rats commit the sunk cost error. Learning & Behavior. 2012;40:195–206. doi: 10.3758/s13420-011-0055-1. [DOI] [PubMed] [Google Scholar]
March JG. Learning to be risk averse. Psychological Review. 1996;103:309–319. [Google Scholar]
Marshall AT, Kirkpatrick K. The effects of the previous outcome on probabilistic choice in rats. Journal of Experimental Psychology: Animal Behavior Processes. 2013;39:24–38. doi: 10.1037/a0030765. [DOI] [PMC free article] [PubMed] [Google Scholar]
Marshall AT, Kirkpatrick K. Relative gains, losses, and reference points in probabilistic choice in rats. PLoS ONE. 2015;10:e0117697. doi: 10.1371/journal.pone.0117697. [DOI] [PMC free article] [PubMed] [Google Scholar]
McDevitt MA, Spetch ML, Dunn R. Contiguity and conditioned reinforcement in probabilistic choice. Journal of the Experimental Analysis of Behavior. 1997;68:317–327. doi: 10.1901/jeab.1997.68-317. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mellon RC, Leak TM, Fairhurst S, Gibbon J. Timing processes in the reinforcement-omission effect. Animal Learning & Behavior. 1995;23:286–296. doi: 10.3758/BF03198925. [DOI] [Google Scholar]
Montes DR, Stopper CM, Floresco SB. Noradrenergic modulation of risk/reward decision making. Psychopharmacology. 2015;232:2681–2696. doi: 10.1007/s00213-015-3904-3. [DOI] [PubMed] [Google Scholar]
Nevo I, Erev I. On surprise, change, and the effect of recent outcomes. Frontiers in Psychology. 2012;3:24. doi: 10.3389/fpsyg.2012.00024. [DOI] [PMC free article] [PubMed] [Google Scholar]
Niv Y, Edlund JA, Dayan P, O'Doherty JP. Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain. The Journal of Neuroscience. 2012;32:551–562. doi: 10.1523/JNEUROSCI.5498-10.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
Peters H, Hunt M, Harper D. An animal model of slot machine gambling: the effect of structural characteristics on response latency and persistence. Journal of Gambling Studies. 2010;26:521–531. doi: 10.1007/s10899-010-9183-3. [DOI] [PubMed] [Google Scholar]
Pinheiro J, Bates D. Mixed-effects models in S and S-Plus. New York: Springer; 2000. [Google Scholar]
Pinheiro J, Bates D, DebRoy S, Sarkar D. nlme: Linear and Nonlinear Mixed Effects Models. 2016 Retrieved from http://CRAN.R-project.org/package=nlme.
Rafa D, Kregiel J, Popik P, Rygula R. Effects of optimism on gambling in the rat slot machine task. Behavioural Brain Research. 2016;300:97–105. doi: 10.1016/j.bbr.2015.12.013. [DOI] [PubMed] [Google Scholar]
Reid RL. The psychology of the near miss. Journal of Gambling Studies. 1986;2:32–39. doi: 10.1007/BF01019932. [DOI] [Google Scholar]
Rescorla RA, Wagner AR. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In: Black AH, Prokasy WF, editors. Classical Conditioning II: Current Research and Theory. New York: Appleton-Century-Crofts; 1972. pp. 64–99. [Google Scholar]
Rogers RD, Wong A, McKinnon C, Winstanley CA. Systemic administration of 8-OH-DPAT and eticlopride, but not SCH23390, alters loss-chasing behavior in the rat. Neuropsychopharmacology. 2013;38:1094–1104. doi: 10.1038/npp.2013.8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rutledge RB, Lazzaro SC, Lau B, Myers CE, Gluck MA, Glimcher PW. Dopaminergic drugs modulate learning rates and perseveration in Parkinson's patients in a dynamic foraging task. The Journal of Neuroscience. 2009;29:15104–15114. doi: 10.1523/JNEUROSCI.3524-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schielzeth H, Nakagawa S. Nested by design: model fitting and interpretation in a mixed model era. Methods in Ecology and Evolution. 2013;4:14–24. doi: 10.1111/j.2041-210x.2012.00251.x. [DOI] [Google Scholar]
Seo H, Lee D. Behavioral and neural changes after gains and losses of conditioned reinforcers. The Journal of Neuroscience. 2009;29:3627–3641. doi: 10.1523/JNEUROSCI.4726-08.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
Seymour B, Maruyama M, De Martino B. When is a loss a loss? Excitatory and inhibitory processes in loss-related decision-making. Current Opinion in Behavioral Sciences. 2015;5:122–127. doi: 10.1016/j.cobeha.2015.09.003. [DOI] [Google Scholar]
Sharman S, Aitken MRF, Clark L. Dual effects of 'losses disguised as wins' and near-misses in a slot machine game. International Gambling Studies. 2015;15:212–223. doi: 10.1080/14459795.2015.1020959. [DOI] [Google Scholar]
Smethells JR, Fox AT, Andrews JJ, Reilly MP. Immediate postsession feeding reduces operant responding in rats. Journal of the Experimental Analysis of Behavior. 2012;97:203–214. doi: 10.1901/jeab.2012.97-203. [DOI] [PMC free article] [PubMed] [Google Scholar]
Smith AP, Bailey AR, Chow JJ, Beckmann JS, Zentall TR. Suboptimal choice in pigeons: stimulus value predicts choice over frequencies. PLoS ONE. 2016;11:e0159336. doi: 10.1371/journal.pone.0159336. [DOI] [PMC free article] [PubMed] [Google Scholar]
St Onge JR, Stopper CM, Zahm DS, Floresco SB. Separate prefrontal-subcortical circuits mediate different components of risk-based decision making. The Journal of Neuroscience. 2012;32:2886–2899. doi: 10.1523/JNEUROSCI.5625-11.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
Staddon JER, Innis NK. An effect analogous to "frustration" on interval reinforcement schedules. Psychonomic Science. 1966;4:287–288. [Google Scholar]
Staddon JER, Innis NK. Reinforcement omission on fixed-interval schedules. Journal of the Experimental Analysis of Behavior. 1969;12:689–700. doi: 10.1901/jeab.1969.12-689. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stopper CM, Floresco SB. Contributions of the nucleus accumbens and its subregions to different aspects of risk-based decision making. Cognitive, Affective, and Behavioral Neuroscience. 2011;11:97–112. doi: 10.3758/s13415-010-0015-9. [DOI] [PubMed] [Google Scholar]
Stopper CM, Khayambashi S, Floresco SB. Receptor-specific modulation of risk-based decision making by nucleus accumbens dopamine. Neuropsychopharmacology. 2013;38:715–728. doi: 10.1038/npp.2012.240. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sul JH, Kim H, Huh N, Lee D, Jung MW. Distinct roles of rodent orbitofrontal and medial prefrontal cortex in decision making. Neuron. 2010;66:449–460. doi: 10.1016/j.neuron.2010.03.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sutton RS, Barto AG. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press; 1998. [Google Scholar]
Tatham TA, Zurn KR. The MED-PC experimental apparatus programming system. Behavior Research Methods Instruments & Computers. 1989;21:294–302. doi: 10.3758/BF03205598. [DOI] [Google Scholar]
Templeton JA, Dixon MJ, Harrigan KA, Fugelsang JA. Upping the reinforcement rate by playing the maximum lines in multi-line slot machine play. Journal of Gambling Studies. 2015;31:949–964. doi: 10.1007/s10899-014-9446-5. [DOI] [PubMed] [Google Scholar]
Thaler RH, Johnson EJ. Gambling with the house money and trying to break even: the effects of prior outcomes on risky choice. Management Science. 1990;36:643–660. doi: 10.1287/mnsc.36.6.643. [DOI] [Google Scholar]
Thorndike EL. Animal Intelligence. New York: MacMillan; 1911. [Google Scholar]
Tremblay M, Cocker PJ, Hosking JG, Zeeb FD, Rogers RD, Winstanley CA. Dissociable effects of basolateral amygdala lesions on decision making biases in rats when loss or gain is emphasized. Cognitive, Affective & Behavioral Neuroscience. 2014;14:1184–1195. doi: 10.3758/s13415-014-0271-1. [DOI] [PubMed] [Google Scholar]
Wilke A, Barrett HC. The hot hand phenomenon as a cognitive adaptation to clumped resources. Evolution and Human Behavior. 2009;30:161–169. doi: 10.1016/j.evolhumbehav.2008.11.004. [DOI] [Google Scholar]
Winstanley CA, Cocker PJ, Rogers RD. Dopamine modulates reward expectancy during performance of a slot machine task in rats: evidence for a 'near-miss' effect. Neuropsychopharmacology. 2011;36:913–925. doi: 10.1038/npp.2010.230. [DOI] [PMC free article] [PubMed] [Google Scholar]
Worthy DA, Davis T, Gorlick MA, Cooper JA, Bakkour A, Mumford JA, Maddox WT. Neural correlates of state-based decision-making in younger and older adults. NeuroImage. 2016;130:13–23. doi: 10.1016/j.neuroimage.2015.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
Worthy DA, Maddox WT. A comparison model of reinforcement-learning and win-stay-lose-shift decision-making processes: a tribute to W. K. Estes. Journal of Mathematical Psychology. 2014;59:41–49. doi: 10.1016/j.jmp.2013.10.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
Worthy DA, Pang B, Byrne KA. Decomposing the roles of perseveration and expected value representation in models of the Iowa Gambling Task. Frontiers in Psychology. 2013;4 doi: 10.3389/fpsyg.2013.00640. doi: 10:3389/fpsyg.2013.00640. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wunderlich K, Symmonds M, Bossaerts P, Dolan RJ. Hedging your bets by learning reward correlations in the human brain. Neuron. 2011;71:1141–1152. doi: 10.1016/j.neuron.2011.07.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zalocusky KA, Ramakrishnan C, Lerner TN, Davidson TJ, Knutson B, Deisseroth K. Nucleus accumbens D2R cells signal prior outcomes and control risky decision-making. Nature. 2016;531:642–646. doi: 10.1038/nature17400. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zeeb FD, Robbins TW, Winstanley CA. Serotonergic and dopaminergic modulation of gambling behavior as assessed using a novel rat gambling task. Neuropsychopharmacology. 2009;34:2329–2343. doi: 10.1038/npp.2009.62. [DOI] [PubMed] [Google Scholar]
Zentall TR. Maladaptive "gambling" by pigeons. Behavioural Processes. 2011;87:50–56. doi: 10.1016/j.beproc.2010.12.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zentall TR. Suboptimal choice by pigeons: an analog of human gambling behavior. Behavioural Processes. 2014;103:156–164. doi: 10.1016/j.beproc.2013.11.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zentall TR, Stagner J. Maladaptive choice behaviour by pigeons: an animal analogue and possible mechanism for gambling (sub-optimal human decision-making behaviour) Proceedings of the Royal Society B. 2011;278:1203–1208. doi: 10.1098/rspb.2010.1607. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS885038-supplement-1.docx^{(131.8KB, docx)}

[R1] Ahn W-Y, Vasilev G, Lee S-H, Busemeyer JR, Kruschke JK, Bechara A, Vassileva J. Decision-making in stimulant and opiate addicts in protracted abstinence: evidence from computational modeling with pure users. Frontiers in Psychology. 2014;5:849. doi: 10.3389/fpsyg.2014.00849. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Akaike H. Information theory as an extension of the maximum likelihood principle. In: Petrov BN, Csaki F, editors. Second International Symposium on Information Theory. Budapest: Akademiai Kiado; 1973. pp. 267–281. [Google Scholar]

[R3] Amsel A. The role of frustrative nonreward in noncontinuous reward situations. Psychological Bulletin. 1958;55:102–119. doi: 10.1037/h0043125. [DOI] [PubMed] [Google Scholar]

[R4] Andrade EB, Iyer G. Planned versus actual betting in sequential gambles. Journal of Marketing Research. 2009;46:372–383. doi: 10.1509/jmkr.46.3.372. [DOI] [Google Scholar]

[R5] Bacotti AV. Home cage feeding time controls responding under multiple schedules. Animal Learning & Behavior. 1976;4:41–44. doi: 10.3758/BF03211983. [DOI] [Google Scholar]

[R6] Baguley T. Standardized or simple effect size: what should be reported? British Journal of Psychology. 2009;100:603–617. doi: 10.1348/000712608X377117. [DOI] [PubMed] [Google Scholar]

[R7] Barrus MM, Cherkasova M, Winstanley CA. Skewed by cues? The motivational role of audiovisual stimuli in modelling substance use and gambling disorders. Current Topics in Behavioral Neurosciences. 2016;27:507–529. doi: 10.1007/7854_2015_393. [DOI] [PubMed] [Google Scholar]

[R8] Barrus MM, Winstanley CA. Dopamine D3 receptors modulate the ability of win-paired cues to increase risky choice in a rat gambling task. The Journal of Neuroscience. 2016;36:785–794. doi: 10.1523/JNEUROSCI.2225-15.2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Bayer HM, Glimcher PW. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron. 2005;47:129–141. doi: 10.1016/j.neuron.2005.05.020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Beeler JA, Daw N, Frazier CRM, Zhuang X. Tonic dopamine modulates exploitation of reward learning. Frontiers in Behavioral Neuroscience. 2010;4 doi: 10.3389/fnbeh.2010.00170. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Behrens TEJ, Woolrich MW, Walton ME, Rushworth MFS. Learning the value of information in an uncertain world. Nature Neuroscience. 2007;10:1214–1221. doi: 10.1038/nn1954. [DOI] [PubMed] [Google Scholar]

[R12] Biele G, Erev I, Ert E. Learning, risk attitude and hot stoves in restless bandit problems. Journal of Mathematical Psychology. 2009;53:155–167. doi: 10.1016/j.jmp.2008.05.006. [DOI] [Google Scholar]

[R13] Bolker BM, Brooks ME, Clark CJ, Geange SW, Poulsen JR, Stevens MHH, White J-SS. Generalized linear mixed models: a practical guide for ecology and evolution. Trends in Ecology & Evolution. 2008;24:127–135. doi: 10.1016/j.tree.2008.10.008. [DOI] [PubMed] [Google Scholar]

[R14] Breen RB, Zuckerman M. 'Chasing' in gambling behavior: personality and cognitive determinants. Personality and Individual Differences. 1999;27:1097–1111. doi: 10.1016/S0191-8869(99)00052-5. [DOI] [Google Scholar]

[R15] Burnham KP, Anderson DR. Model Selection and Inference: A Practical Information-Theoretic Approach. New York, NY: Springer; 1998. [Google Scholar]

[R16] Burnham KP, Anderson DR, Huyvaert KP. AIC model selection and multimodel inference in behavioral ecology: some background, observations, and comparisons. Behavioral Ecology and Sociobiology. 2011;65:23–35. doi: 10.1007/s00265-010-1029-6. [DOI] [Google Scholar]

[R17] Bush RR, Mosteller F. A mathematical model for simple learning. Psychological Review. 1951;58:313–323. doi: 10.1037/h0054388. [DOI] [PubMed] [Google Scholar]

[R18] Campbell-Meiklejohn D, Woolrich MW, Passingham RE, Rogers RD. Knowing when to stop: the brain mechanisms of chasing losses. Biological Psychiatry. 2008;63:293–300. doi: 10.1016/j.biopsych.2007.05.014. [DOI] [PubMed] [Google Scholar]

[R19] Chow JJ, Smith AP, Wilson AG, Zentall TR, Beckmann JS. Suboptimal choice in rats: incentive salience attribution promotes maladaptive decision-making. Behavioural Brain Research. 2017;320:244–254. doi: 10.1016/j.bbr.2016.12.013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Clark L, Averbeck B, Payer D, Sescousse G, Winstanley CA, Xue G. Pathological choice: the neuroscience of gambling and gambling addiction. The Journal of Neuroscience. 2013;33:17617–17623. doi: 10.1523/JNEUROSCI.3231-13.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Cocker PJ, Hosking JG, Murch WS, Clark L, Winstanley CA. Activation of dopamine D4 receptors within the anterior cingulate cortex enhances the erroneous expectation of reward on a rat slot machine task. Neuropharmacology. 2016;105:186–195. doi: 10.1016/j.neuropharm.2016.01.019. [DOI] [PubMed] [Google Scholar]

[R22] Cocker PJ, Le Foll B, Rogers RD, Winstanley CA. A selective role for dopamine D4 receptors in modulating reward expectancy in a rodent slot machine task. Biological Psychiatry. 2014;75:817–824. doi: 10.1016/j.biopsych.2013.08.026. [DOI] [PubMed] [Google Scholar]

[R23] Cocker PJ, Lin MY, Barrus MM, Le Foll B, Winstanley CA. The agranular and granular insula differentially contribute to gambling-like behavior on a rat slot machine task: effects of inactivation and local infusion of a dopamine D4 agonist on reward expectancy. Psychopharmacology. 2016;233:3135–3147. doi: 10.1007/s00213-016-4355-1. [DOI] [PubMed] [Google Scholar]

[R24] Cocker PJ, Tremblay M, Kaur S, Winstanley CA. Chronic administration of the dopamine D2/3 agonist ropinirole invigorates performance of a rodent slot machine task, potentially indicative of less distractible or compulsive-like gambling behaviour. Psychopharmacology. 2017;234:137–153. doi: 10.1007/s00213-016-4447-y. [DOI] [PubMed] [Google Scholar]

[R25] Cocker PJ, Winstanley CA. Irrational beliefs, biases and gambling: exploring the role of animal models in elucidating vulnerabilities for the development of pathological gambling. Behavioural Brain Research. 2015;279:259–273. doi: 10.1016/j.bbr.2014.10.043. [DOI] [PubMed] [Google Scholar]

[R26] Daw ND. Trial-by-trial data analysis using computational models. In: Delgado MR, Phelps EA, Robbins TW, editors. Decision Making, Affect, and Learning: Attention and Performance. Vol. 23. 2011. pp. 3–38. [Google Scholar]

[R27] Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ. Model-based influences on humans' choices and striatal prediction errors. Neuron. 2011;69:1204–1215. doi: 10.1016/j.neuron.2011.02.027. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] Daw ND, O'Doherty JP, Dayan P, Seymour B, Dolan RJ. Cortical substrates for exploratory decisions in humans. Nature. 2006;441:876–879. doi: 10.1038/nature04766. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] de Ruiter MB, Veltman DJ, Goudriaan AE, Oosterlaan J, Sjoerds Z, van den Brink W. Response perseveration and ventral prefrontal sensitivity to reward and punishment in male problem gamblers and smokers. Neuropsychopharmacology. 2009;34:1027–1038. doi: 10.1038/npp.2008.175. [DOI] [PubMed] [Google Scholar]

[R30] Devenport L, Hill T, Wilson M, Ogden E. Tracking and averaging in variable environments: a transition rule. Journal of Experimental Psychology: Animal Behavior Processes. 1997;23:450–460. doi: 10.1037/0097-7403.23.4.450. [DOI] [Google Scholar]

[R31] Dixon MJ, Collins K, Harrigan KA, Graydon C, Fugelsang JA. Using sound to unmask losses disguised as wins in multiline slot machines. Journal of Gambling Studies. 2015;31:183–196. doi: 10.1007/s10899-013-9411-8. [DOI] [PubMed] [Google Scholar]

[R32] Dixon MJ, Harrigan KA, Sandhu R, Collins K, Fugelsang JA. Losses disguised as wins in modern multi-line video slot machines. Addiction. 2010;105:1819–1824. doi: 10.1111/j.1360-0443.2010.03050.x. [DOI] [PubMed] [Google Scholar]

[R33] Dixon MJ, Harrigan KA, Santesso DL, Graydon C, Fugelsang JA, Collins K. The impact of sound in modern multiline video slot machine play. Journal of Gambling Studies. 2014;30:913–929. doi: 10.1007/s10899-013-9391-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] Domjan M. The Principles of Learning and Behavior. 6. Belmont, CA: Wadsworth; 2010. [Google Scholar]

[R35] Donahue CH, Lee D. Dynamic routing of task-relevant signals for decision making in dorsolaterla prefrontal cortex. Nature Neuroscience. 2015;18:295–301. doi: 10.1038/nn.3918. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] Dutt V, Gonzalez C. The role of inertia in modeling decisions from experience with instance-based learning. Frontiers in Psychology. 2012;3:177. doi: 10.3389/fpsyg.2012.00177. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] Ferster CB, Skinner BF. Schedules of Reinforcement. New York, NY: Appleton-Century-Crofts, Inc.; 1957. [Google Scholar]

[R38] Frank MJ, Moustafa AA, Haughey HM, Curran T, Hutchison KE. Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning. Proceedings of the National Academy of Sciences. 2007;104:16311–16316. doi: 10.1073/pnas.0706111104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] Freestone DM, MacInnis MLM, Church RM. Response rates are governed more by time cues than contingency. Timing & Time Perception. 2013;1:3–20. doi: 10.1163/22134468-00002006. [DOI] [Google Scholar]

[R40] Glimcher PW. Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis. Proceedings of the National Academy of Sciences. 2011;108:15647–15654. doi: 10.1073/pnas.1014269108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] Gonzalez C, Dutt V. Instance-based learning: integrating sampling and repeated decisions from experience. Psychological Review. 2011;118:523–551. doi: 10.1037/a0024558. [DOI] [PubMed] [Google Scholar]

[R42] Habib R, Dixon MR. Neurobehavioral evidence for the "near-miss" effect in pathological gamblers. Journal of the Experimental Analysis of Behavior. 2010;93:313–328. doi: 10.1901/jeab.2010.93-313. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] Harrigan KA, Dixon M, MacLaren V, Collins K, Fugelsang JA. The maximum rewards at the minimum price: reinforcement rates and payback percentages in multi-line slot machines. Journal of Gambling Issues. 2011;26:11–29. doi: 10.4309/jgi.2011.26.3. [DOI] [Google Scholar]

[R44] Hoffman L, Rovine MJ. Multilevel models for the experimental psychology: foundations and illustrative examples. Behavior Research Methods. 2007;39:101–117. doi: 10.3758/BF03192848. [DOI] [PubMed] [Google Scholar]

[R45] Horsley RR, Osborne M, Norman C, Wells T. High-frequency gamblers show increased resistance to extinction following partial reinforcement. Behavioural Brain Research. 2012;229:438–442. doi: 10.1016/j.bbr.2012.01.024. [DOI] [PubMed] [Google Scholar]

[R46] Horstmann A, Villringer A, Neumann J. Iowa Gambling Task: there is more to consider than long term outcome. Using a linear equation model to disentangle the impact of outcome and frequency of gains and losses. Frontiers in Neuroscience. 2012;6:61. doi: 10.3389/fnins.2012.00061. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] Huh N, Jo S, Kim H, Sul JH, Jung MW. Model-based reinforcement learning under concurrent schedules of reinforcement in rodents. Learning & Memory. 2009;16:315–323. doi: 10.1101/lm.1295509. [DOI] [PubMed] [Google Scholar]

[R48] Jensen C, Dixon MJ, Harrigan KA, Sheepy E, Fugelsang JA, Jarick M. Misinterpreting 'winning' in multiline slot machine games. International Gambling Studies. 2013;13:112–126. doi: 10.1080/14459795.2012.717635. [DOI] [Google Scholar]

[R49] Kagel JH, MacDonald DN, Battalio RC, White S, Green L. Risk aversion in rats (Rattus norvegicus) under varying levels of resource availability. Journal of Comparative Psychology. 1986;100:95–100. [Google Scholar]

[R50] Kassinove JI, Schare ML. Effects of the "near miss" and the "big win" on persistence at slot machine gambling. Psychology of Addictive Behaviors. 2001;15:155–158. doi: 10.1037/0893-164X.15.2.155. [DOI] [PubMed] [Google Scholar]

[R51] Katahira K. The relation between reinforcement learning parameters and the influence of reinforcement history on choice behavior. Journal of Mathematical Psychology. 2015;66:59–69. doi: 10.1016/j.jmp.2015.03.006. [DOI] [Google Scholar]

[R52] Kelleher RT, Gollub LR. A review of positive conditioned reinforcement. Journal of the Experimental Analysis of Behavior. 1962;5:543–597. doi: 10.1901/jeab.1962.5-s543. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] Kello JE. The reinforcement-omission effect on fixed-interval schedules: frustration or inhibition? Learning and Motivation. 1972;3:138–147. doi: 10.1016/0023-9690(72)90034-3. [DOI] [Google Scholar]

[R54] Kennerley SW, Walton ME, Behrens TEJ, Buckley MJ, Rushworth MFS. Optimal decision making and the anterior cingulate cortex. Nature Neuroscience. 2006;9:940–947. doi: 10.1038/nn1724. [DOI] [PubMed] [Google Scholar]

[R55] Kim H, Lee D, Jung MW. Signals for previous goal choice persist in the dorsomedial, but not dorsolateral striatum of rats. The Journal of Neuroscience. 2013;33:52–63. doi: 10.1523/JNEUROSCI.2422-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R56] Kim H, Sul JH, Huh N, Lee D, Jung MW. Role of striatum in updating values of chosen actions. The Journal of Neuroscience. 2009;29:14701–14712. doi: 10.1523/JNEUROSCI.2728-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R57] Kim K-U, Huh N, Jang Y, Lee D, Jung MW. Effects of fictive reward on rat's choice behavior. Scientific Reports. 2015;5:8040. doi: 10.1038/srep08040. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R58] Knutson B, Burgdorf J, Panksepp J. Ultrasonic vocalizations as indices of affective states in rats. Psychological Bulletin. 2002;128:961–977. doi: 10.1037//0033-2909.128.6.961. [DOI] [PubMed] [Google Scholar]

[R59] Kreek MJ, Nielsen DA, Butelman ER, LaForge KS. Genetic influences on impulsivity, risk taking, stress responsivity and vulnerability to drug abuse and addiction. Nature Neuroscience. 2005;8:1450–1457. doi: 10.1038/nn1583. [DOI] [PubMed] [Google Scholar]

[R60] Larkin JD, Jenni NL, Floresco SB. Modulation of risk/reward decision making by dopaminergic transmission within the basolateral amygdala. Psychopharmacology. 2016;233:121–136. doi: 10.1007/s00213-015-4094-8. [DOI] [PubMed] [Google Scholar]

[R61] Lau B, Glimcher PW. Dynamic response-by-response models of matching behavior in rhesus monkeys. Journal of the Experimental Analysis of Behavior. 2005;84:555–579. doi: 10.1901/jeab.2005.110-04. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R62] Lesieur HR. The compulsive gambler's spiral of options and involvement. Psychiatry: Interpersonal and Biological Processes. 1979;42:79–87. doi: 10.1521/00332747.1979.11024008. [DOI] [PubMed] [Google Scholar]

[R63] Levin IP, Xue G, Weller JA, Reimann M, Lauriola M, Bechara A. A neuropsychological approach to understanding risk-taking for potential gains and losses. Frontiers in Neuroscience. 2012;6:1–11. doi: 10.3389/fnins.2012.00015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R64] Linnet J, Peterson E, Doudet DJ, Gjedde A, Møller A. Dopamine release in ventral striatum of pathological gamblers losing money. Acta Psychiatrica Scandinavica. 2010;122:326–333. doi: 10.1111/j.1600-0447.2010.01591.x. [DOI] [PubMed] [Google Scholar]

[R65] Linnet J, Røjskjær S, Nygaard J, Maher BA. Episodic chasing in pathological gamblers using the Iowa gambling task. Scandinavian Journal of Psychology. 2006;47:43–49. doi: 10.1111/j.1467-9450.2006.00491.x. [DOI] [PubMed] [Google Scholar]

[R66] Lole L, Gonsalvez CJ, Barry RJ. Reward and punishment hyposensitivity in problem gamblers: a study of event-related potentials using a principal components analysis. Clinical Neurophysiology. 2015;126:1295–1309. doi: 10.1016/j.clinph.2014.10.011. [DOI] [PubMed] [Google Scholar]

[R67] Lole L, Gonsalvez CJ, Barry RJ, Blaszczynski A. Problem gamblers are hyposensitive to wins: an analysis of skin conductance responses during actual gambling on electronic gaming machines. Psychophysiology. 2014;51:556–564. doi: 10.1111/psyp.12198. [DOI] [PubMed] [Google Scholar]

[R68] Magalhães P, White KG, Stewart T, Beeby E, van der Vliet W. Suboptimal choice in nonhuman animals: rats commit the sunk cost error. Learning & Behavior. 2012;40:195–206. doi: 10.3758/s13420-011-0055-1. [DOI] [PubMed] [Google Scholar]

[R69] March JG. Learning to be risk averse. Psychological Review. 1996;103:309–319. [Google Scholar]

[R70] Marshall AT, Kirkpatrick K. The effects of the previous outcome on probabilistic choice in rats. Journal of Experimental Psychology: Animal Behavior Processes. 2013;39:24–38. doi: 10.1037/a0030765. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R71] Marshall AT, Kirkpatrick K. Relative gains, losses, and reference points in probabilistic choice in rats. PLoS ONE. 2015;10:e0117697. doi: 10.1371/journal.pone.0117697. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R72] McDevitt MA, Spetch ML, Dunn R. Contiguity and conditioned reinforcement in probabilistic choice. Journal of the Experimental Analysis of Behavior. 1997;68:317–327. doi: 10.1901/jeab.1997.68-317. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R73] Mellon RC, Leak TM, Fairhurst S, Gibbon J. Timing processes in the reinforcement-omission effect. Animal Learning & Behavior. 1995;23:286–296. doi: 10.3758/BF03198925. [DOI] [Google Scholar]

[R74] Montes DR, Stopper CM, Floresco SB. Noradrenergic modulation of risk/reward decision making. Psychopharmacology. 2015;232:2681–2696. doi: 10.1007/s00213-015-3904-3. [DOI] [PubMed] [Google Scholar]

[R75] Nevo I, Erev I. On surprise, change, and the effect of recent outcomes. Frontiers in Psychology. 2012;3:24. doi: 10.3389/fpsyg.2012.00024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R76] Niv Y, Edlund JA, Dayan P, O'Doherty JP. Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain. The Journal of Neuroscience. 2012;32:551–562. doi: 10.1523/JNEUROSCI.5498-10.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R77] Peters H, Hunt M, Harper D. An animal model of slot machine gambling: the effect of structural characteristics on response latency and persistence. Journal of Gambling Studies. 2010;26:521–531. doi: 10.1007/s10899-010-9183-3. [DOI] [PubMed] [Google Scholar]

[R78] Pinheiro J, Bates D. Mixed-effects models in S and S-Plus. New York: Springer; 2000. [Google Scholar]

[R79] Pinheiro J, Bates D, DebRoy S, Sarkar D. nlme: Linear and Nonlinear Mixed Effects Models. 2016 Retrieved from http://CRAN.R-project.org/package=nlme.

[R80] Rafa D, Kregiel J, Popik P, Rygula R. Effects of optimism on gambling in the rat slot machine task. Behavioural Brain Research. 2016;300:97–105. doi: 10.1016/j.bbr.2015.12.013. [DOI] [PubMed] [Google Scholar]

[R81] Reid RL. The psychology of the near miss. Journal of Gambling Studies. 1986;2:32–39. doi: 10.1007/BF01019932. [DOI] [Google Scholar]

[R82] Rescorla RA, Wagner AR. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In: Black AH, Prokasy WF, editors. Classical Conditioning II: Current Research and Theory. New York: Appleton-Century-Crofts; 1972. pp. 64–99. [Google Scholar]

[R83] Rogers RD, Wong A, McKinnon C, Winstanley CA. Systemic administration of 8-OH-DPAT and eticlopride, but not SCH23390, alters loss-chasing behavior in the rat. Neuropsychopharmacology. 2013;38:1094–1104. doi: 10.1038/npp.2013.8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R84] Rutledge RB, Lazzaro SC, Lau B, Myers CE, Gluck MA, Glimcher PW. Dopaminergic drugs modulate learning rates and perseveration in Parkinson's patients in a dynamic foraging task. The Journal of Neuroscience. 2009;29:15104–15114. doi: 10.1523/JNEUROSCI.3524-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R85] Schielzeth H, Nakagawa S. Nested by design: model fitting and interpretation in a mixed model era. Methods in Ecology and Evolution. 2013;4:14–24. doi: 10.1111/j.2041-210x.2012.00251.x. [DOI] [Google Scholar]

[R86] Seo H, Lee D. Behavioral and neural changes after gains and losses of conditioned reinforcers. The Journal of Neuroscience. 2009;29:3627–3641. doi: 10.1523/JNEUROSCI.4726-08.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R87] Seymour B, Maruyama M, De Martino B. When is a loss a loss? Excitatory and inhibitory processes in loss-related decision-making. Current Opinion in Behavioral Sciences. 2015;5:122–127. doi: 10.1016/j.cobeha.2015.09.003. [DOI] [Google Scholar]

[R88] Sharman S, Aitken MRF, Clark L. Dual effects of 'losses disguised as wins' and near-misses in a slot machine game. International Gambling Studies. 2015;15:212–223. doi: 10.1080/14459795.2015.1020959. [DOI] [Google Scholar]

[R89] Smethells JR, Fox AT, Andrews JJ, Reilly MP. Immediate postsession feeding reduces operant responding in rats. Journal of the Experimental Analysis of Behavior. 2012;97:203–214. doi: 10.1901/jeab.2012.97-203. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R90] Smith AP, Bailey AR, Chow JJ, Beckmann JS, Zentall TR. Suboptimal choice in pigeons: stimulus value predicts choice over frequencies. PLoS ONE. 2016;11:e0159336. doi: 10.1371/journal.pone.0159336. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R91] St Onge JR, Stopper CM, Zahm DS, Floresco SB. Separate prefrontal-subcortical circuits mediate different components of risk-based decision making. The Journal of Neuroscience. 2012;32:2886–2899. doi: 10.1523/JNEUROSCI.5625-11.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R92] Staddon JER, Innis NK. An effect analogous to "frustration" on interval reinforcement schedules. Psychonomic Science. 1966;4:287–288. [Google Scholar]

[R93] Staddon JER, Innis NK. Reinforcement omission on fixed-interval schedules. Journal of the Experimental Analysis of Behavior. 1969;12:689–700. doi: 10.1901/jeab.1969.12-689. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R94] Stopper CM, Floresco SB. Contributions of the nucleus accumbens and its subregions to different aspects of risk-based decision making. Cognitive, Affective, and Behavioral Neuroscience. 2011;11:97–112. doi: 10.3758/s13415-010-0015-9. [DOI] [PubMed] [Google Scholar]

[R95] Stopper CM, Khayambashi S, Floresco SB. Receptor-specific modulation of risk-based decision making by nucleus accumbens dopamine. Neuropsychopharmacology. 2013;38:715–728. doi: 10.1038/npp.2012.240. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R96] Sul JH, Kim H, Huh N, Lee D, Jung MW. Distinct roles of rodent orbitofrontal and medial prefrontal cortex in decision making. Neuron. 2010;66:449–460. doi: 10.1016/j.neuron.2010.03.033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R97] Sutton RS, Barto AG. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press; 1998. [Google Scholar]

[R98] Tatham TA, Zurn KR. The MED-PC experimental apparatus programming system. Behavior Research Methods Instruments & Computers. 1989;21:294–302. doi: 10.3758/BF03205598. [DOI] [Google Scholar]

[R99] Templeton JA, Dixon MJ, Harrigan KA, Fugelsang JA. Upping the reinforcement rate by playing the maximum lines in multi-line slot machine play. Journal of Gambling Studies. 2015;31:949–964. doi: 10.1007/s10899-014-9446-5. [DOI] [PubMed] [Google Scholar]

[R100] Thaler RH, Johnson EJ. Gambling with the house money and trying to break even: the effects of prior outcomes on risky choice. Management Science. 1990;36:643–660. doi: 10.1287/mnsc.36.6.643. [DOI] [Google Scholar]

[R101] Thorndike EL. Animal Intelligence. New York: MacMillan; 1911. [Google Scholar]

[R102] Tremblay M, Cocker PJ, Hosking JG, Zeeb FD, Rogers RD, Winstanley CA. Dissociable effects of basolateral amygdala lesions on decision making biases in rats when loss or gain is emphasized. Cognitive, Affective & Behavioral Neuroscience. 2014;14:1184–1195. doi: 10.3758/s13415-014-0271-1. [DOI] [PubMed] [Google Scholar]

[R103] Wilke A, Barrett HC. The hot hand phenomenon as a cognitive adaptation to clumped resources. Evolution and Human Behavior. 2009;30:161–169. doi: 10.1016/j.evolhumbehav.2008.11.004. [DOI] [Google Scholar]

[R104] Winstanley CA, Cocker PJ, Rogers RD. Dopamine modulates reward expectancy during performance of a slot machine task in rats: evidence for a 'near-miss' effect. Neuropsychopharmacology. 2011;36:913–925. doi: 10.1038/npp.2010.230. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R105] Worthy DA, Davis T, Gorlick MA, Cooper JA, Bakkour A, Mumford JA, Maddox WT. Neural correlates of state-based decision-making in younger and older adults. NeuroImage. 2016;130:13–23. doi: 10.1016/j.neuroimage.2015.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R106] Worthy DA, Maddox WT. A comparison model of reinforcement-learning and win-stay-lose-shift decision-making processes: a tribute to W. K. Estes. Journal of Mathematical Psychology. 2014;59:41–49. doi: 10.1016/j.jmp.2013.10.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R107] Worthy DA, Pang B, Byrne KA. Decomposing the roles of perseveration and expected value representation in models of the Iowa Gambling Task. Frontiers in Psychology. 2013;4 doi: 10.3389/fpsyg.2013.00640. doi: 10:3389/fpsyg.2013.00640. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R108] Wunderlich K, Symmonds M, Bossaerts P, Dolan RJ. Hedging your bets by learning reward correlations in the human brain. Neuron. 2011;71:1141–1152. doi: 10.1016/j.neuron.2011.07.025. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R109] Zalocusky KA, Ramakrishnan C, Lerner TN, Davidson TJ, Knutson B, Deisseroth K. Nucleus accumbens D2R cells signal prior outcomes and control risky decision-making. Nature. 2016;531:642–646. doi: 10.1038/nature17400. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R110] Zeeb FD, Robbins TW, Winstanley CA. Serotonergic and dopaminergic modulation of gambling behavior as assessed using a novel rat gambling task. Neuropsychopharmacology. 2009;34:2329–2343. doi: 10.1038/npp.2009.62. [DOI] [PubMed] [Google Scholar]

[R111] Zentall TR. Maladaptive "gambling" by pigeons. Behavioural Processes. 2011;87:50–56. doi: 10.1016/j.beproc.2010.12.017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R112] Zentall TR. Suboptimal choice by pigeons: an analog of human gambling behavior. Behavioural Processes. 2014;103:156–164. doi: 10.1016/j.beproc.2013.11.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R113] Zentall TR, Stagner J. Maladaptive choice behaviour by pigeons: an animal analogue and possible mechanism for gambling (sub-optimal human decision-making behaviour) Proceedings of the Royal Society B. 2011;278:1203–1208. doi: 10.1098/rspb.2010.1607. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Reinforcement learning models of risky choice and the promotion of risk-taking by losses-disguised-as-wins in rats

Andrew T Marshall

Kimberly Kirkpatrick

Abstract

Method

Animals

Apparatus

Figure 1.

Procedure

Magazine and lever-press training

Risky choice task

Table 1.

Data analysis: mixed-effects regression models

Choice behavior

Outcome history

Goal-tracking behavior

Data analysis: RL models

RL model structure

Table 2.

RL model fitting and selection

Results and Discussion

Choice behavior

Figure 2.

Outcome history

Figure 3.

Goal-tracking behavior

Figure 4.

RL models

Parameter recovery

Model selection

Model fit (Asymmetric/Dual-Biased RL)

Figure 5.

Figure 6.

Implications and Applications

Supplementary Material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases