The drift diffusion model as the choice rule in inter-temporal and risky choice: A case study in medial orbitofrontal cortex lesion patients and controls

Jan Peters; Mark D’Esposito

doi:10.1371/journal.pcbi.1007615

. 2020 Apr 20;16(4):e1007615. doi: 10.1371/journal.pcbi.1007615

The drift diffusion model as the choice rule in inter-temporal and risky choice: A case study in medial orbitofrontal cortex lesion patients and controls

Jan Peters ^1,^*, Mark D’Esposito ²

Editor: Ulrik R Beierholm³

PMCID: PMC7192518 PMID: 32310962

Abstract

Sequential sampling models such as the drift diffusion model (DDM) have a long tradition in research on perceptual decision-making, but mounting evidence suggests that these models can account for response time (RT) distributions that arise during reinforcement learning and value-based decision-making. Building on this previous work, we implemented the DDM as the choice rule in inter-temporal choice (temporal discounting) and risky choice (probability discounting) using hierarchical Bayesian parameter estimation. We validated our approach in data from nine patients with focal lesions to the ventromedial prefrontal cortex / medial orbitofrontal cortex (vmPFC/mOFC) and nineteen age- and education-matched controls. Model comparison revealed that, for both tasks, the data were best accounted for by a variant of the drift diffusion model including a non-linear mapping from value-differences to trial-wise drift rates. Posterior predictive checks confirmed that this model provided a superior account of the relationship between value and RT. We then applied this modeling framework and 1) reproduced our previous results regarding temporal discounting in vmPFC/mOFC patients and 2) showed in a previously unpublished data set on risky choice that vmPFC/mOFC patients exhibit increased risk-taking relative to controls. Analyses of DDM parameters revealed that patients showed substantially increased non-decision times and reduced response caution during risky choice. In contrast, vmPFC/mOFC damage abolished neither scaling nor asymptote of the drift rate. Relatively intact value processing was also confirmed using DDM mixture models, which revealed that in both groups >98% of trials were better accounted for by a DDM with value modulation than by a null model without value modulation. Our results highlight that novel insights can be gained from applying sequential sampling models in studies of inter-temporal and risky decision-making in cognitive neuroscience.

Author summary

Maladaptive changes in decision-making are associated with many psychiatric and neurological disorders, e.g. when people are making impulsive or risky decisions. For understanding the processes of how such decisions arise, it can be informative to examine not only the choices that people make, but also the response times associated with these decisions. Here we show that response times during impulsive and risky decision-making are well accounted for by a model that has been developed to describe perceptual decision-making, the drift diffusion model. Furthermore, we use this model to examine impulsive and risky choice following damage to a core regions of the brains decision-making circuitry, the ventromedial / orbitofrontal cortex. Although this region has repeatedly been shown to contribute to value processing, modeling revealed that lesions to this area do not render reponse times less dependent on value. Our results highlight that novel insights can be gained from applying such models in studies of impulsive and risky choice in cognitive neuroscience.

Introduction

Understanding the neuro-cognitive mechanisms underlying decision-making and reinforcement learning[1–3] has potential implications for many neurological and psychiatric disorders associated with maladaptive choice behavior[4–6]. Modeling work in value-based decision-making and reinforcement learning often relies on simple logistic (softmax) functions[7,8] to link model-based decision values to observed choices. In contrast, in perceptual decision-making, sequential sampling models such as the drift diffusion model (DDM) that not only account for the observed choices but also for the full response time (RT) distributions have a long tradition[9–11]. Recent work in reinforcement learning[12–15], inter-temporal choice[16,17] and value-based choice[18–21] has shown that sequential sampling models can be successfully applied in these domains.

In the DDM, decisions arise from a noisy evidence accumulation process that terminates as the accumulated evidence reaches one of two response boundaries[9]. In its simplest form, the DDM has four free parameters: the boundary separation parameter α governs how much evidence is required before committing to a decision. The upper boundary corresponds to the case when the accumulated evidence exceeds α, whereas the lower boundary corresponds to the case when the accumulated evidence exceeds zero. The drift rate parameter v determines the mean rate of evidence accumulation. A greater drift rate reflects a greater rate of evidence accumulation and thus faster and more accurate responding. In contrast, a drift rate of zero would indicate chance level performance, as the evidence accumulation process would have an equal likelihood of terminating at the upper or lower boundaries (for a neutral bias). The starting point or bias parameter z determines the starting point of the evidence accumulation process in units of the boundary separation, and the non-decision time τ reflects components of the RT related to stimulus encoding and/or response preparation that are unrelated to the evidence accumulation process. The DDM can account for a wide range of experimental effects on RT distributions during two-alternative forced choice tasks[9].

The application of sequential sampling models such as the DDM has several potential advantages over traditional softmax[7] choice rules. First, including RT data during model estimation may improve both the reliability of the estimated parameters[12] and parameter recovery[13], thereby leading to more robust estimates. Second, taking into account the full RT distributions can reveal additional information regarding the dynamics of decision processes[14,15]. This is of potential interest, in particular in the context of maladaptive behaviors in clinical populations[14,22–25] but also when the goal is to more fully account for how decisions arise on a neural level[10].

In the present case study, we focus on a brain region that has long been implicated in decision-making, reward-based learning and impulse regulation[26,27], the ventromedial prefrontal / medial orbitofrontal cortex (vmPFC/mOFC). Performance impairments on the Iowa Gambling Task are well replicated in vmPFC/mOFC patients[26,28,29]. Damage to vmPFC/mOFC also increases temporal discounting[30,31] (but see[32]) and risk-taking[33–35], impairs reward-based learning[36–38] and has been linked to inconsistent choice behavior[39–41]. Meta-analyses of functional neuroimaging studies strongly implicate this region in reward valuation[42,43]. Based on these observations, we reasoned that vmPFC/mOFC damage might also render RTs during decision-making less dependent on value. In the context of the DDM, this could be reflected in changes in the value-dependency of the drift rate v. In contrast, more general impairments in the processing of decision options, response execution and/or preparation would be reflected in changes in the non-decision time. Interestingly, however, one previous model-free analysis in vmPFC/mOFC patients revealed a similar modulation of RTs by value in patients and controls[40].

The present study therefore had the following aims. The first aim was a validation of the applicability of the DDM as a choice rule in the context of inter-temporal and risky choice. To this end, we first performed a model comparison of variants of the DDM in a data set of nine vmPFC/mOFC lesion patients and nineteen controls. Since recent work on reinforcement learning suggested that the mapping from value differences to trial-wise drift rates might be non-linear[15] rather than linear[14], we compared these different variants of the DDM in our data and ran posterior predictive checks on the winning DDM models to explore the degree to which the different models could account for RT distributions and the relationship between RTs and subjective value. Second, we re-analyzed previously published temporal discounting data in controls and vmPFC/mOFC lesion patients to examine the degree to which our previously reported model-free analyses[30] could be reproduced using a hierarchical Bayesian model-based analysis with the DDM as the choice rule. Third, we used the same modeling framework to analyze previously unpublished data from a risky decision-making task in the same lesion patients and controls to examine whether risk taking in the absence of a learning requirement is increased following vmPFC/mOFC damage. Finally, we explored changes in choice dynamics as revealed by DDM parameters as a result of vmPFC/mOFC lesions, and investigated whether lesions to vmPFC/mPFC impacted the degree to which RTs were sensitive to subjective value differences, both by examining DDM parameters and via DDM mixture models.

Results

Model comparison

We first compared the fit of two previously proposed DDM models with linear (DDM_lin, see Eq 5)[14] and non-linear (DDM_S, see Eq 6 and Eq 7)[15] value-dependent drift-rate scaling in terms of the WAIC and the estimated log predictive density (elpd)[44]. For comparison we also included a null model (DDM₀) with constant drift rate, that is, a model without value modulation. For both temporal discounting data (Table 1) and risky choice / probability discounting data (Table 2), the non-linear drift rate scaling models outperformed linear scaling, and both models fit the data better than the DDM₀. Furthermore, 95% confidence intervals of the differences in elpd between each model and the DDM_S did not overlap, and did not include 0 (Tables 1 and 2, last column), suggesting that the differences in elpd were robust.

Table 1. Model comparison of drift diffusion models of temporal discounting.

The hyperbolic+shift value function (see Eq 1) corresponds to hyperbolic discounting in the now condition, and a shift parameter that models the decrease in discounting between the now and not now conditions. WAIC–Widely Applicable Information Criterion; elpd–estimated log predictive density; elpd_diff is the difference in elpd between each model and the DDM_S.

Model	Drift rate scaling	Value function	WAIC	-elpd	-eldp_diff [95% CI]
DDM₀	-	-	20939	10472.5	1987.9 [1899.1–2076.7]
DDM_lin	Linear	Hyperbolic+Shift	19602	9805.2	1320.6 [1231.2–1409.9]
DDM_S	Sigmoid	Hyperbolic+Shift	16966	8484.6	-

Open in a new tab

Table 2. Model comparison of drift diffusion models of risky choice.

The hyperbolic value function (see Eq 2) corresponds to hyperbolic discounting over the odds-against-winning the gamble. WAIC–Widely Applicable Information Criterion; elpd–estimated log predictive density; elpd_diff is the difference in elpd between each model and the DDM_S.

Model	Drift rate scaling	Value function	WAIC	-elpd	-eldp_diff [95% CI]
DDM₀	-	-	11515	5760.3	1162.8 [1094.4–1231.2]
DDM_lin	Linear	Hyperbolic	10422	5222.4	625.0 [546.0–703.9]
DDM_S	Sigmoid	Hyperbolic	9190	4597.4	-

Open in a new tab

Model validation

We then carried out a number of simple sanity checks (see S1 Text) which confirmed that log(k) parameters estimated via standard softmax and via the DDM_s showed good correspondence (S3 Fig). Likewise, minimum and median RT showed the expected associations with model-based non-decision times (S4 Fig) and boundary separation parameters (S5 Fig).

Prediction of binary choice data

We then checked the degree to which the different implementations of the DDM predicted participants’ binary choices. Using each participant’s mean posterior parameters from the hierarchical models we calculated model predicted choices, and compared these to the observed binary choices. Raw accuracy scores per model and group are listed in Table 3 (temporal discounting) and Table 4 (risky choice) with the softmax models shown for comparison. Numerically, accuracy scores for the DDM_S were higher than for DDM_lin. Indeed variance-stabilized accuracy values (arcsine-square-root transformed, see Fig 1) were greater for DDM_S compared to DDM_lin for temporal discounting (t₂₇ = -7.43, 95% CI: [-.19, -.11]), with a similar trend for risky choice (t₂₇ = -1.97, 95% CI: [-.09, .002]).

Table 3. Median (range) of the proportion of correctly predicted binary choices for the different temporal discounting models, separately for mOFC patients and controls.

	Softmax	DDM_lin	DDM_S
mOFC patients	.92 (.87-.99)	.90 (.80-.96)	.92 (.89-.99)
Controls	.91 (.78-.96)	.75 (.60-.99)	.91 (.84-.99)

Open in a new tab

Table 4. Median (range) of the proportion of correctly predicted binary choices for the different risky choice models, separately for mOFC patients and controls.

	Softmax	DDM_lin	DDM_S
mOFC patients	.92 (.82-.97)	.90 (.82-.95)	.92 (.84-.98)
Controls	.92 (.82-.99)	.91 (.79-.99)	.91 (.82-.99)

Open in a new tab

Fig 1 — Variance-stabilized proportion of trials (arcsine-square root transformed) where each model correctly predicted binary decisions for temporal discounting (a) and risky choice (b).

Posterior predictive checks and prediction of RTs

Next, we carried out posterior predictive checks (see methods section) to 1) examine whether models also differed with respect to their ability to account the observed RTs (as opposed to only binary choices) and 2) to verify that the best-fitting model captured the overall pattern in the data. Posterior predictive checks for the DDM_S for each individual participant in relation to the full RT distributions are shown in the SI for temporal discounting (S1 Fig) and risky choice (S2 Fig). These initial checks revealed that the DDM_S indeed provided a good account of individual RT distributions.

In a second step, we directly compared the ability of the DDM_S and DDM_lin to account for how value modulates RTs. To this end, we binned trials for each subject into five bins according to the subjective value of the LL or risky reward according to Eqs 1 and 2. We then simulated 10k full data sets from the posterior distributions of each model (DDM₀, DDM_lin, DDM_S) and averaged model predicted response times per bin. Results are shown for each participant in Fig 2 for temporal discounting and Fig 3 for risky choice. The DDM₀ does not incorporate values, thus it predicts the same RTs across value bins (horizontal blue lines in Figs 2 and 3). While the DDM_lin could account for some aspects of the association between value and RT in some participants, the DDM_S provided a much better account of this relationship overall.

Fig 2 — Trials were binned into five bins of equal sizes according to the subjective value of the larger-later (LL) option for each participant (calculated according to Eq 1). The x-axis in each panel shows the subject-specific mean LL value for each bin. The y-axis denotes observed response times per bin (dotted black lines) and model predicted response times per bin for the different DDM models (blue: DDM₀, red: DDM_lin, orange: DDM_S). Model predicted response times were obtained by averaging over 10k data sets simulated from the posterior distribution of each hierarchical model.

Fig 3 — Trials were binned into five bins of equal sizes according to the subjective value of the risky option for each participant (calculated according to Eq 2). The x-axis in each panel shows the subject-specific mean LL value for each bin. The y-axis denotes observed response times per bin (dotted black lines) and model predicted response times per bin for the different DDM models (blue: DDM₀, red: DDM_lin, orange: DDM_S). Model predicted response times were obtained by averaging over 10k data sets simulated from the posterior distribution of each hierarchical model.

This was in many cases due to the DDM_lin overestimating RTs (underestimating the drift rate) for intermediate value trials and underestimating RTs (overestimating the drift rate) for trials with high value LL or risky options. This effect is most clearly seen in the temporal discounting data (Fig 2) where a greater proportion of value bins fall into the intermediate range. In the supplemental information, we visually compare predicted drift rates between DDM_lin and DDM_S to illustrate this effect (S6 Fig). Taken together, these analyses show that 1) the DDM_S provided an overall superior fit to both temporal discounting and risky choice data and 2) that this was reflected in a better account of both binary choices and the relationship between RTs and value.

Simulations of effects of drift rate components on RT distributions

We next set out to more systematically explore how the two components of the drift rate in the DDM_S (v_max and v_coeff) affect RTs. To this end, we simulated 50 RTs from the DDM_S for each of 400 value differences ranging from zero to ± 20. We ran 30 simulations in total, systematically varying v_max and v_coeff while keeping the other DDM parameters (boundary separation, bias, non-decision time) fixed at mean posterior values of the control group (see Table 5).

Table 5. DDM parameter values used for simulation analyses depicted in Fig 5.

All parameters are the posterior group means of the control group.

	Parameter value
Boundary separation (α)	3.37
Non decision time (τ)	.945
Starting point / bias (z)	.531
Drift rate v (max)	[.5, 1, 1.5, 2.5, 3.5]
Drift rate v (coeff)	[.05, .1, .2, .4, 1, 2]

Open in a new tab

Simulated RT distributions are shown in Fig 4A, whereas mean simulated RTs and binary choices per value bin are shown in Fig 4B and 4C, respectively. Results from corresponding simulations computed across the actual delay/amount and probability/amount combinations from the tasks are shown in S8 Fig (temporal discounting) and S9 Fig (risky choice). As can be seen in Fig 4A, the effects of v_max on the leading edge of the RT distribution were generally more pronounced for higher values of v_coeff. At the same time, smaller values of v_coeff generally lead to more heavy tailed RT distributions. The model of course predicts longest RTs for trials were values are most similar (the predicted RTs are highest for value differences close to zero, see the dotted lines in the right panels of Fig 4B). But the simulations illustrate an additional effect: Both relatively high and relatively low values of v_coeff can make RTs appear insensitive to value differences. For example, for the case of v_coeff = .05, RTs tend to be uniformly slow, and accelerate only slightly for the largest value differences (blue lines in Fig 4B). In contrast, for the highest values of v_coeff, relatively small value differences already give rise to maximal drift rates and thus uniformly fast RTs for all but the smallest value differences (highest conflict).

Parameter recovery simulations

A further crucial property of a model is that if generating parameters are known, they should be recoverable. As done in previous work[14,15] we therefore carried out parameter recovery analyses for the most complex model (DDM_S). Ten simulated data sets were randomly selected (see methods section) and re-fit using the DDM_S. We then compared the generating (true) parameter values to the estimated values. Subject-level parameters generally recovered well (Figs 5A and 6A). Group level means and standard deviations (calculated based on the precision) generally also recovered well (Fig 5B–5E, Fig 6B–6E), such that in most cases, the 95% highest density intervals of the estimated posterior distributions included the true generating parameter values. For parameters that showed a high variance (e.g. v_coeff and log(k)_now in the patient group) the group-level standard deviations tended to be overestimated.

Fig 5 — a: Recovery of subject-level model parameters pooled across all ten simulations. b/c: true generating group level means (squares) for mOFC patients (b, red) and controls (c, blue) and estimated 95% highest density intervals (lines) per simulation. d/e: generating group level standard deviations (squares) for mOFC patients (d, red) and controls (e, blue) and estimated 95% highest density intervals (lines) per simulation.

Fig 6 — a: Recovery of subject-level model parameters pooled across all ten simulations. b/c: generating group level means (squares) for mOFC patients (b, red) and controls (c, blue) and estimated 95% highest density intervals (lines) per simulation. d/e: generating group level standard deviations (squares) for mOFC patients (d, red) and controls (e, blue) and estimated 95% highest density intervals (lines) per simulation.

Comparison to previous model-free analyses in mOFC patients

We have previously reported that temporal discounting in mOFC lesion patients is more affected by the immediacy of smaller-sooner (SS) rewards than in controls[30]. Our previous analysis revealed this both via an analysis of the area-under-the-curve of the empirical discounting function[45] and by a direct comparison of preference reversals between groups. To further validate the applicability of the DDM in the context of temporal discounting, we next examined whether these effects could be reproduced via the hierarchical DDM_S. Fig 7 shows the group-level posterior distributions of parameter means for all seven parameters, where we for the purposes of comparison to our previous results first focus on log(k)_now (the discount rate in the baseline now condition, see Fig 7F) and shift_log(k) (the parameter modeling the decrease in discounting in not now trials as compared to now trials, see Fig 7G). The analysis of directional between-subject effects revealed a numerical increase in log(k)_now in the mOFC patient group (Fig 7F, Table 6) and strong evidence for a substantially greater difference in discounting between now and not now trials in the patients (Fig 7G, Table 6). This shows that our results based on model-free summary measures of discounting behavior following mOFC lesions[30] could be reproduced via a hierarchical Bayesian estimation scheme with the DDM_S as the choice rule.

Fig 7 — Top row: posterior distributions of the parameter group means (a: boundary separation, b: non-decision time, c: starting point (bias), d: drift rate (maximum), e: drift rate (coefficient), f) log(k): discount rate in the *now* condition, g) change in log(k) in *not now* condition) for controls (blue) and mOFC patients (red). Bottom row: Posterior group differences (mOFC patients–controls) for each parameter. Solid horizontal lines indicate highest density intervals (HDI, thick lines: 85% HDI, thin lines: 95% HDI).

Table 6. Summary of group differences in model parameters.

For each parameter and task, we report the mean difference in the group-level posterios (M_diff: patients–controls) and Bayes Factors testing for directional effects[14,46]. Bayes Factors < .33 indicate evidence for a reduction in the patient group, whereas Bayes Factors >3 indicate evidence for an increase in the patient group (see Methods section). Standardized effect sizes (Cohen’s d) were calculated based on the posterior group-level estimates of mean and precision (see methods section).

Model parameter	Temporal discounting			Risky Choice
	M_diff	d	BF	M_diff	d	BF
Boundary separation (α)	-.012	-.013	1.03	-.368	-.42	.203
Non decision time (τ)	.184	.44	4.39	.166	.35	3.52
Starting point / bias (z)	-.025	-.49	.196	.017	.28	2.55
Drift rate v (max)	-.184	-.26	.647	-.027	-.075	.739
Drift rate v (coeff)	3.14	1.11	7.43	.033	.63	2.49
Log(k)_now	.734	.33	2.85	-	-	-
Shift_log(k)	.529	2.22	69.9	-	-	-
Log(h)	-	-	-	-.447	-.28	.278

Open in a new tab

Risk-taking in vmPFC/mOFC patients

Risk-taking on the probability discounting task was quantified via the probability discounting parameter log(h), where higher values indicate a greater discounting of value over probabilities. There was some evidence for a smaller log(h) in vmPFC/mOFC patients (Fig 8F, Table 6), reflecting a relative increase in risk-taking (reduced value discounting over probabilities) as compared to controls.

Fig 8 — Top row: posterior distributions of the parameter group means (a: boundary separation, b: non-decision time, c: starting point (bias), d: drift rate (maximum), e: drift rate (coefficient), f: log(h), probability discount rate) for controls (blue) and mOFC patients (red). Bottom row: Posterior group differences (mOFC patients–controls) for each parameter. Solid horizontal lines indicate highest density intervals (HDI, thick lines: 85% HDI, thin lines: 95% HDI).

Effects of mOFC lesions on diffusion model parameters

Finally, we examined the diffusion model parameters of the DDM_S models in greater detail. First, there was evidence for longer non-decision times in the patient group for both tasks (see Table 6 and Figs 7B and 8B). These effects amounted to on average 184ms for temporal discounting and 166ms for risky choice. Second, the group differences observed for the starting point (bias) parameter largely mirrored group differences observed for discounting behavior. For temporal discounting, controls exhibited a more pronounced bias towards the LL boundary than vmPFC/mOFC patients, who exhibited a largely neutral bias here. For risky choice, controls showed a bias that was numerically shifted towards the safe option compared to vmPFC/mOFC patients. Third, posterior distributions for the boundary separation parameter (alpha) in temporal discounting showed high overlap and the difference distribution was centered at zero (Fig 7A). In contrast, for risky choice, there was evidence for a reduced boundary separation in the vmPFC/mOFC patients (Fig 8A, Table 6).

In the DDM_S, two components of the drift rate can be dissociated: the asymptote of the drift rate scaling function (v_max), that is, the maximum drift rate that is approached as value differences increase, and the scaling factor of the value difference (v_coeff). In both tasks, there was no evidence for a group difference in v_max (see Table 6 and Figs 7D and 8D) and both difference distributions were centered at zero. Across tasks and groups, the value scaling parameter for the drift rate (v_coeff) was generally > 0, reflecting a robust positive effect of value differences on the rate of evidence accumulation (see Figs 7D and 8D). Interestingly, the drift rate scaling parameter (v_coeff) was numerically increased in the vmPFC/mOFC patients for both tasks, an effect that was substantial for temporal discounting. Here, the posterior distribution also had a higher variance compared to the control group, which was driven by 4/9 vmPFC/mOFC patients who had v_coeff estimates that fell substantially beyond the values observed in controls and in the remaining patients (mean v_coeff estimates: P1: 17.89, P3: 8.32, P4: 3.38, P5: 4.70). These extreme cases included the patient with the lowest discount rate (P1 log(k)_now : -10.53) and the patient with the second highest discount rate (P4 log(k)_now : -2.28).

DDM mixture models

Both the model comparison and the posterior predictive checks suggest that choices in vmPFC/mOFC patients were still modulated by value. But the simulations showed that both very high and very low values of v_coeff can produce RTs that are more uniform across value differences–RTs tend to be more uniformly fast for high values of v_coeff, and more uniformly slow for low values. Therefore, we additionally ran a more direct test of value sensitivity following vmPFC/mOFC damage by setting up DDM mixture models (see methods section). In short, these models allowed a proportion of trials to be produced by the DDM₀ and the remaining trials to be produced by the DDM_S, with an additional free parameter λ controlling the mixing proportion. Notably, this analysis is agnostic with respect to the directionality of potential changes in v_max and v_coeff, and instead solely focuses on whether groups differ in the proportion of trials produced by a value-DDM vs. the DDM₀. Posterior distributions for λ are shown in Fig 9. For this analysis, λ was estimated in standard normal space and transformed to the interval [0, 1] via an inverse probit transformation on the subject level. In z-units, the posterior group mean of lambda was 3.67 and 4.29 in mOFC patients and controls for the temporal discounting data (Fig 9A), and 5.09 and 4.04 for the risky choice data (Fig 9B). Thus, on average, in both groups >99% of trials were better accounted for by the DDM_S compared to the DDM₀. Because group differences in lambda are minuscule in raw proportion units, they were not further examined.

Fig 9 — Top row: posterior distributions of the mixture parameter λ (a: temporal discounting (TD), b: risky choice / probability discounting (PD)) in z-units. Positive values of λ indicate that a greater proportion of trials was better accounted for by DDM_S vs. DDM₀, whereas negative values indicate the reverse. λ was fitted in standard normal space with a group-level uniform prior of [–7, 7] and back-transformed on the subject-level via an inverse probit transformation. Bottom row: Posterior group differences (mOFC patients–controls) for each parameter. Solid horizontal lines indicate highest density intervals (HDI, thick lines: 85% HDI, thin lines: 95% HDI).

Discussion

Here we examined different choice rules for modeling inter-temporal and risky choice / probability discounting in healthy controls and patients with vmPFC/mOFC lesions. For each task, we examined a standard softmax action selection function and three variants of the drift diffusion model (DDM). Across tasks, the data were better accounted for by a DDM with a non-linear mapping of value differences onto trial-wise drift rates (DDM_S) than by a DDM with linear mapping (DDM_lin) or a null model without any value modulation (DDM₀). Following a series of initial sanity checks (see SI), we performed detailed posterior predictive analyses, ran simulations to characterize the behavior of the DDM_S in more detail and performed parameter recovery analyses. We then applied this model to reproduce our previous results on temporal discounting in patients with vmPFC/mOFC lesions[30], to characterize risk-taking behavior in these patients, and to explore group differences in DDM parameters across tasks. Finally, we examined DDM mixture models to test whether vmPFC/mOFC damage affected the proportion of trials that were best described by a value DDM as compared to the DDM₀.

Previous studies have successfully incorporated RTs in the modeling of value-based decision-making, e.g. via the linear ballistic accumulator model[16] or linear regression[13]. Here we build on recent work in reinforcement learning[12,14,15] and examined the degree to which the DDM could serve as the choice rule in temporal discounting and risky choice. In line with a recent model comparison in reinforcement learning[15], our model comparison of linear vs. non-linear value scaling revealed a superior fit of the DDM with non-linear (sigmoid) value scaling both for temporal discounting and risky choice data. Parameter recovery analyses showed that both subject- and group-level parameters generally recovered well. One exception were group-level variance parameters for parameters with large variability, which tended to be overestimated in some cases (though they still fell within the 95% HDIs). Posterior predictive checks of the best-fitting model revealed a good fit to the overall RT distributions of most individual participants (see S1 and S2 Figs). Given that the DDMs differed in terms of how values impact RTs, we then focused on posterior predictive checks that explicitly examined how value-dependent RTs could be reproduced by the models. While the DDM_lin could account for some aspect of this association in some participants, in most participants the DDM_S provided a superior account of the relationship between values and RTs. Specifically, the DDM_lin in many cases overestimated RTs for smaller value differences, and underestimated RTs for very high value differences (see S6 Fig for an illustration).

One advantage of hierarchical Bayesian parameter estimation is that robust model fits can be obtained with fewer data points than are typically required for maximum likelihood estimation[47,48], and this is also the case for the drift diffusion model[47]. The reason is that in contrast to obtaining single-subject point estimates of parameters (as in maximum likelihood estimation), in hierarchical Bayesian estimation, the group-level distribution of parameters constrains and informs the parameters estimated for each individual participant. One consequence of this is shrinkage[48] or partial pooling, such that in a hierarchical model individual parameter estimates tend to be drawn towards the group mean. While this can improve the predictive accuracy of parameters, there is the possibility that meaningful between-subjects variability is removed[49]. Nonetheless, we believe that for situations with limited data per subject[47], which is a particular issue in studies involving lesion patients, the hierarchical Bayesian parameter estimation is most appropriate.

We examined variants of the DDM in tasks where they have not been applied previously (although other sequential sampling models have[16]). We therefore ran a number of initial sanity checks to validate our modeling results (see S3–S5 Figs). Additionally, analyses of the DDM_S for temporal discounting reproduced our previous model-free results in vmPFC/mOFC patients[30]: discounting behavior following vmPFC/mOFC damage was substantially more affected by SS reward immediacy than in controls, which in the present modeling scheme was reflected in a substantially increased shift_log(k) parameter in the patient group. This reproduction of our previous results strengthens our confidence in the validity of using the DDM as the choice rule in inter-temporal and risky choice.

The temporal discounting task, but not the risky choice task, was comprised of two experimental conditions (immediate vs. delayed smaller-sooner rewards). However, we have refrained from examining condition differences in the DDM parameters in greater detail, and instead only modeled a shift parameter for log(k), rather than for the full set of DDM parameters. This was done for simplicity and in order to keep analyses comparable between tasks. However, how contextual factors and framing effects[50,51] impact choice dynamics during inter-temporal and risky choice will be an interesting future avenue for research.

The stimulus coding scheme (coding the boundaries in terms LL/risky options vs. SS/safe options) that we adopted here differs from accuracy coding as implemented in recent applications of the DDM to reinforcement learning[14,15] (coding the boundaries in terms of correct vs. incorrect choices), with implications for the interpretation of the DDM parameters. The drift rate v in the present coding scheme (as reflected in v_max and v_coeff) can be interpreted as in classical perceptual decision-making tasks: it reflects the rate of evidence accumulation. In stimulus coding, however, higher drift rates do not directly correspond to better performance (as is the case in accuracy coding), because there is no objectively correct response. Instead the drift rate parameters reflect a participant’s overall sensitivity to value differences, similar to inverse temperature parameters in softmax models. More importantly, adopting stimulus coding allowed us to estimate a starting point (bias) parameter. In all cases, the estimated bias parameters were relatively close to 0.5 (a neutral bias), but group differences for each task mirrored the results for the choice model parameters. That is, the group that displayed a preference for one option as reflected in the discount rate parameter (e.g. LL rewards in the case of controls) also exhibited a response bias towards that decision boundary. It should be noted that these numerical differences in bias could be attributable to differences in the RT distributions, differences in the binary choices, or both.

We also performed simulations to explore the impact of DDM_S drift rate components on the relationship between subjective value and RTs. These simulations revealed that for very high values of v_coeff the DDM_S produces longer RTs only for then highest conflict choices (green lines in Fig 4, this effect can also be seen in P1 in Fig 2, the participant with the highest v_coeff for temporal discounting of all participants). In contrast very low values of v_coeff yield RTs that tend to be uniformly longer for all but the easiest (highest value-difference) choices. The implication is that increases and decreases in v_coeff cannot unambiguously be interpreted as increases and decreases in value-sensitivity in RTs. Rather, as the simulations show, value-sensitivity (if interpreted as the degree of RT deceleration with increasing conflict) is maximal for intermediate values of v_coeff. At the same time, the magnitude of this effect depends on v_max.

Our results provide novel insights into the role of the vmPFC/mOFC in value-based decision-making. Our DDM analyses show a comparable maximum drift rate v_max in the two groups for both tasks, while v_coeff was increased in the patients for temporal discounting. However, examination of posterior predictive checks for each individual lesion patient (Figs 2 and 3) shows that RTs were modulated by value in most patients, and that this modulation was better accounted for the DDM_S than DDM_lin. This suggests that value sensitivity of RTs was intact in the patients. This interpretation is corroborated by the DDM mixture model analyses: in both groups, the vast majority of trials was better accounted for by the DDM_S than the DDM₀, with no evidence for a group difference in these mixture proportions. This is in line with an earlier report showing reduced preference consistency but no changes in overall RTs or the value-modulation of RTs in vmPFC/mOFC patients[40]. If one considers the overwhelming evidence of neuroimaging studies showing a prominent role of the vmPFC/mOFC in reward valuation[42,43], it is nonetheless striking that lesions to this region do not negatively impact the value-sensitivity of the evidence accumulation process during value-based decision-making. Our data are therefore more compatible with the idea that vmPFC/mOFC, likely in interaction with other regions[52,53], plays a role in self-control, such that lesions shift preferences towards options with a greater short-term appeal.

Previous work has suggested that damage to vmPFC/mOFC might decrease the temporal stability of value representations, leading to inconsistent preferences[39–41]. There was no evidence in the present data that the lesion patients’ decisions were more “noisy” or “erratic”. Similar to a previous study on temporal discounting[31], choice consistency was high such that the best-fitting DDM_S accounted for about 90% of binary choices in both groups and tasks, suggesting that value representations on a given trial[40] and throughout the course of the testing sessions were relatively stable in both groups. In contrast, results from both tasks revealed an increase in non-decision times in the patient group. Whether this effect is specific to value-based decisions or extends to other choice settings is an open question. However, accounts of perceptual decision-making have typically focused on lateral prefrontal cortex regions[54,55]. Together, these observations suggest that vmPFC/mOFC lesions lead to a slowing of more basic perceptual and/or response-related processes during value-based decision-making, while leaving the effects of value-differences on the evidence accumulation process strikingly intact.

Previous studies have shown increases in risky decision-making following vmPFC/mOFC damage[33,35]. Our finding of attenuated discounting over probabilities in the patients is consistent with these previous results. However, our model-based analysis revealed an additional effect: lesion patients also exhibited reduced response caution during risky choice, reflected in a reduced boundary separation parameter. In contrast, this was not observed for temporal discounting. This suggests that risk taking in vmPFC/mOFC patients might not only be driven by altered preferences, but also by more premature responding.

Taken together, our results demonstrate the feasibility of using the DDM as the choice rule in the context of inter-temporal and risky decision-making. Model comparison revealed that a variant of the DDM that included a non-linear drift rate modulation provided the best fit to the data. We further show that the application of a sequential sampling model revealed additional insights: while the value-dependency of the evidence accumulation process was strikingly unaffected by vmPFC/mOFC damage, we observed a slowing of non-decision times both in temporal discounting and risky choice, with implications for models of decision-making. This modeling framework might provide further insights, e.g. when studying mechanisms underlying context-dependent changes in decision-making[50,56–58] or impairments in decision-making in psychiatric[59][59] and neurological disorders[6].

Materials & methods

Ethics statement

All participants gave informed written consent, and the study procedure was approved by the local institutional review board of the University of California, Berkeley, USA.

Procedure

We report data from two value-based decision-making tasks: one previously unpublished data set from a risky-choice task and one previously published data set from a temporal discounting task (see below for task details). Data were acquired in nine patients with focal lesions that included medial orbitofrontal cortex and nineteen healthy age- and education-matched controls. The temporal discounting task was always performed first, followed by the risky choice task.

For a detailed account of etiology, socio-demographic information for all participants and lesion location data for the patients, the reader is referred to our previous paper[30].

Temporal discounting task

Here participants performed 224 trials of an inter-temporal choice task involving a series of choices between smaller-but-sooner (SS) and larger-but-later (LL) rewards. On half the trials, the SS reward was available immediately (now condition), whereas on the other half of the trials, the SS reward was available only after a 30d delay (not now condition). In the now condition, the SS reward consisted of $10 available immediately and LL rewards consisted of all combinations of fourteen reward amounts (10.1, 10.2, 10.5, 11, 12, 15, 18, 20, 30, 40, 70, 100, 130, 150 dollars) and seven delays (1, 3, 5, 8, 14, 30, 60 days). Trials for the not now condition where identical, with the exception that an additional delay of 30 days was added to both options, such that in not now trials, the SS reward was always associated with a 30 day delay, and LL reward delays ranged from 31 to 91 days. Trials were presented in randomized order and with a randomized assignment of options to the left/right side of the screen. Options remained on the screen until a response was logged.

Risky choice task

Here participants made a total of 112 choices between a certain (100% probability) $10 reward and larger-but-riskier options. The risky options consisted of all combinations of sixteen reward amounts (10.1, 10.2, 10.5, 11, 12, 15, 18, 20, 25, 30, 40, 50, 70, 100, 130, 150 dollars) and seven probabilities (10%, 17%, 28%, 54%, 84%, 96%, 99%). Trials were presented in randomized order and with a randomized assignment of options to the left/right side of the screen. As in the temporal discounting task, options remained on the screen until a response was logged.

Participants were instructed that all choices from the two tasks were potentially behaviorally relevant. A single trial was pseudo-randomly selected following completion of both tasks, and participants received their choice from that trial as a cash bonus.

Temporal discounting model

Based on previous work on the effect of smaller-sooner (SS) reward immediacy on discounting behavior [60,61], we hypothesized discounting to be hyperbolic relative to the soonest available reward. Previous studies[30,61] fitted separate discount rate parameters to trials with immediate vs. delayed SS rewards. Here we extended this approach by instead fitting a single k-parameter (reflecting discounting in the now condition), and a subject-specific shift parameter s modeling the reduction in log(k) in the not now condition as compared to the now condition:

{S V (L L)}_{t} = \frac{A_{t}}{(1 + (\exp (k - I_{t} * s)) * {I R I}_{t})}

(1)

Here, SV is the subjective discounted value of the delayed rewards, A is the amount of the LL reward on trial t, k is the subject specific discount rate for now trials in log-space, I is an indicator variable coding the condition (0 for now trials, 1 for not now trials), s is a subject-specific shift in log(k) between now and not-now conditions and IRI is the inter-reward-interval on trial t. Note that this model corresponds to the elimination-by-aspects model of Green et al. [60].

Risky choice model

Here we applied a simple one-parameter probability discounting model[62,63], where discounting is hyperbolic over the odds-against-winning the gamble:

{S V (r i s k y}_{t}) = \frac{A_{t}}{1 + \exp (h) * θ_{t}}, w i t h θ_{t} = \frac{1 - p_{t}}{p_{t}}

(2)

Here SV is the subjective discounted value of the risky reward, A is the reward amount on trial t and θ is the odds-against winning the gamble. The probability discount rate h (again fitted in log-space) models the degree of value discounting over probabilities. We also fit the data with a two-parameter model that includes separate parameters for the curvature and elevation of the probability weighting function[64–66]. However, when fitting a two-parameter model at the single subject level, in a number of individual cases the posterior distributions of the curvature and/or elevation parameters were not clearly peaked, suggesting that we likely did not have adequate coverage of the probability and amount dimensions to reliably dissociate these different components of risk preferences. For this reason, we opted for the simpler single-parameter hyperbolic model instead.

Softmax choice rule

Standard softmax action selection models the probability of choosing the LL reward (or the risky option) on trial t as:

{P (L L)}_{t} = \frac{e^{β * S V ({L L}_{t})}}{e^{β * S V ({L L}_{t})} + e^{β * S V ({S S}_{t})}}

(3)

Here, SV is the subjective value of the LL reward according to Eq 1 (or the risky reward according to Eq 2) and β is an inverse temperature parameter, modeling choice stochasticity (for β = 0, choices are random and as β increases, choices become more dependent on the option values).

Drift diffusion choice rule

For the DDMs, we build on earlier work in reinforcement learning[14,15] and inter-temporal choice[13,16]. Specifically, we replaced the softmax action selection rule (see previous section) with the DDM as the choice rule, using the Wiener module[67] for the JAGS software package[68]. In contrast to previous reinforcement learning approaches[14,15] that used accuracy coding for the boundary definitions, we here used stimulus coding, such that the lower boundary was defined as a selection of the SS reward (or the 100% option in the case of risky choice), and the upper boundary as selection of the LL reward (or the risky option in the case of risky choice). This is because we were explicitly interested in modeling a bias towards SS vs. LL options. RTs for choices towards the lower boundary were multiplied by -1 prior to estimation.

We initially used absolute RT cut-offs for trial exclusion[14] such that 0.4s < RT < 10s. However, when using such an absolute cut-off, single fast outlier trials can still force the non-decision-time to adjust to accommodate these observations, which can lead to a massive negative impact on model fit at the individual-subject level. This is also what we observed in two participants when plotting posterior predictive checks from hierarchical models with absolute cut-offs. For this reason, we instead excluded for each participant the slowest and fastest 2.5% of trials from analysis, which eliminated the problem. The RT on trial t is then distributed according to the Wiener first passage time (wfpt):

{R T}_{t} \sim w f p t (α, τ, z, v)

(4)

Here, α is the boundary separation (modeling response caution / the speed-accuracy trade-off), z is the starting point of the diffusion process (modeling a bias towards one of the decision boundaries), τ is the non-decision time (reflecting perceptual and/or response preparation processes unrelated to the evidence accumulation process) and v is the drift rate (reflecting the rate of evidence accumulation). Note that in the JAGS implementation of the Wiener model[67], the starting point z is coded in relative terms and takes on values between 0 and 1. That is, z = .5 reflects no bias, z >.5 reflects a bias towards the upper boundary, and z < .5 a bias towards the lower boundary.

In a first step, we fit a null model (DDM₀) that included no value modulation. That is, the null model for both the temporal discounting and risky choice data had four free parameters (α,τ, v, and z) that for each participant were constant across trials.

Next, to link the diffusion process to the valuation models (Eq 1, Eq 2), we compared two previously proposed functions linking trial-by-trial variability in value differences to the drift rate. First, we used a linear mapping as proposed by Pedersen et al. (2017)[14]:

v_{t} = v_{c o e f f} * (S V ({L L}_{t}) - S V ({S S}_{t}))

(5)

Here, v_coeff is a free parameter that maps value differences onto the drift rate v and simultaneously transforms value differences to the appropriate scale of the DDM[14]. This implementation naturally gives rise to the effect that highest conflict (when values are highly similar) would be expected to be associated with a drift rate close to zero. For positive values of v_coeff, as SV(SS) increases over SV(LL), the drift rate becomes more negative, reflecting evidence accumulation towards the lower (SS) boundary. The reverse is the case as SV(LL) increases over SV(SS). For the risky choice models, SV(LL) was replaced with SV(risky), and SV(SS) with SV(safe). Second, we also applied an additional non-linear transformation of the scaled value differences via the S-shaped function suggested by Fontanesi et al. (2019) [15]:

v_{t} = S (v_{c o e f f} * (S V ({L L}_{t}) - S V ({S S}_{t})))

(6)

S (m) = \frac{2 * v_{m a x}}{1 + e^{- m}} - v_{m a x}

(7)

S is a sigmoid function centered at 0 with m being the scaled value difference from Eq 6, and asymptote ± v_max. Again, effects of choice difficulty on the drift rate naturally arise: for highest decision conflict when SV(SS) = SV(LL), the drift rate would again be zero, whereas for larger value differences, v increases up to a maximum of ± v_max. Table 7 provides an overview of the parameters of the DDM_S model.

Table 7. Overview of the parameters of the DDM_S models and priors for group means.

Parameter	DDM_S: Temporal discounting	DDM_S: Risk taking	Group-level prior (μ)
α	Boundary separation		Uniform (.01, 5)
τ	Non-decision-time		Uniform (.1, 6)
z	Bias (>.5: LL, < .5: SS)	Bias (>.5: risky, < .5: safe)	Uniform (.1, .9)
ν_coeff	Drift rate: value-difference scaling		Uniform (-100, 100)
ν_max	Drift rate: maximum		Uniform (0,100)
log(k)	Discount rate now-trials	-	Uniform (-20,3)
shift_log(k)	log(k) reduction not-now	-	Gaussian (0, 2)
log(h)	-	Probability discount rate	Uniform (-10, 10)
λ	Mixture parameter (proportion of DDM_S trials)		Uniform (-7,7)

Open in a new tab

DDM mixture models

As a further test of whether groups differed with respect to the degree to which RT distributions showed value sensitivity, we also examined mixture models to explore whether the proportion of trials best accounted for by the best-fitting value DDM (DDM_S) vs. the null model (DDM₀) differed between groups. Mixture models contained the full hierarchical parameter sets of both the DDM_S and DDM₀, as well as a mixture parameter λ, such that a proportion of λ trials were allowed to be accounted for by the DDM_S and 1-λ trials by the DDM₀. For each group, the prior mean for λ was set to a uniform distribution [–7, 7] and subject level parameters were drawn from a normal distribution and transformed via an inverse probit transformation to the interval [0, 1].

Hierarchical Bayesian models

We used the following model-building procedure. In a first step, models were fit at the single-subject level. After validating that reasonably good fits could be obtained for single-subject data (by ensuring that $\hat{R}$ statistic was in an acceptable range of $1 \leq \hat{R} \leq 1.01$ and the posterior distributions were centered at reasonable parameter values) we re-fit all models using a hierarchical framework with separate group-level distributions for controls and patients. We again assessed chain convergence such that values of $1 \leq \hat{R} \leq 1.01$ were considered acceptable for all group- and individual-level parameters. As priors for the group-level hyperparameters we used uniform distributions for means defined over numerically plausible ranges (see Table 7) and gamma distributions with shape and rate parameters .001 for precision. Individual-subject parameters were then drawn from normal distributions with group-level means and precision.

Model estimation and comparison

All models were fit using Markov Chain Monte Carlo (MCMC) as implemented in JAGS[68] with the matjags interface (https://github.com/msteyvers/matjags) for Matlab (The Mathworks) and the JAGS Wiener package[67]. For each model, we ran two chains with a burn-in period of 50k samples and thinning of 2. 10k further samples were then retained for analysis. Chain convergence was assessed via the $\hat{R}$ statistic, where we considered $1 \leq \hat{R} \leq 1.01$ as acceptable values. Relative model comparison was performed using the loo R package[44], and we report both WAIC and the estimated log pointwise predictive density (elpd) which estimates the leave-one-out cross-validation predictive accuracy of the model[44].

Posterior predictive checks

Because a superior relative model fit does not necessarily mean that the best-fitting model captures key aspects of the data, we additionally performed posterior predictive checks. To this end, during model estimation, we simulated 10k full datasets from the hierarchical models, based on the posterior distribution of the parameters. We then compared these simulated data to the observed data in two ways. First, to visualize how models accounted for the overall observed RT distributions, a random sample of 1k of the simulated data sets were smoothed via non-parametric density estimation in Matlab (ksdensity.m) and overlaid on the observed RT distributions for each individual participant. Second, we examined how the different DDM models accounted for the observed association between RT and value. To this end, we binned trials into five bins based on the subjective value of the larger-later or risky reward (as per Eqs 1 and 2) for each individual participant, and for these bins again compared observed mean RTs to model-predicted RTs from the simulations.

Parameter recovery analyses

For models of decision-making, identifiability of the true data generating parameters is a crucial issue [48]. We therefore conducted parameter recovery simulations for the most complex model, the DDM_S. We selected ten random datasets simulated from the posterior distributions, and re-fit these datasets with the generating model using the same methods as outlined above. The recovery of subject-level parameters was examined by plotting generating parameters against estimated parameters. The recovery of group-level parameters was examined overlaying the true generating group-level means over the 95% highest-density intervals of the posterior distributions.

Simulating effects of drift rate components on RTs

To gain additional insights into how drift rate components v_max and v_coeff of the DDM_S affect RT distributions and the value-dependency of RTs more specifically, we ran additional simulations. Specifically, we simulated 50 RTs from the DDM_S for each of 400 value differences ranging from zero to ± 20. We ran 30 simulations in total, systematically varying v_max and v_coeff while keeping the other DDM parameters (boundary separation, bias, non-decision time) fixed at mean posterior values of the control group (see Table 6). For each simulated data set, we examined the shape of the overall RT distribution, the degree to which RTs depended on value differences, and the proportion of binary choices (lower vs. upper boundary) as a function of value differences.

Analysis of group differences

To characterize group differences, we show posterior distributions for all parameters, as well as 85% and 95% highest density intervals for the difference distributions of the group posteriors. We furthermore report Bayes Factors for directional effects[14,46] based on these difference distributions as BF = i/(1−i) were i is the integral of the posterior distribution from 0 to +∞, which we estimated via non-parametric kernel density estimation in Matlab (ksdensity.m). Following common criteria[69], Bayes Factors > 3 are considered positive evidence, and Bayes Factors > 12 are considered strong evidence. Bayes Factors < 0.33 are likewise interpreted as evidence in favor of the alternative model. Finally, we report standardized measures of effect size (Cohen’s d) calculated based on the mean posterior distributions of the group means and the pooled standard deviations across groups.

Code availability

JAGS model code for all models is available on the Open Science Framework (https://osf.io/5rwcu/).

Supporting information

S1 Fig. Posterior predictive plots of the drift diffusion temporal discounting model with non-linear value scaling of the drift rate (DDM_S) for all participants (red–mOFC patients, blue–controls).

Histograms depict the observed RT distributions for each participant. The solid lines are smoothed histograms of the model predicted RT distributions from 1000 individual subject data sets simulated from the posterior distribution of the best-fitting hierarchical model. RTs for smaller-sooner choices are plotted as negative, whereas RTs for larger-later choices are plotted as positive. The x-axes are adjusted to cover the range of observed RTs for each participant.

(TIF)

Click here for additional data file.^{(1.2MB, tif)}

S2 Fig. Posterior predictive plots of the drift diffusion probability discounting / risky choice model with non-linear value scaling of the drift rate (DDM_S) for all participants (red–mOFC patients, blue–controls).

Histograms depict the observed RT distributions for each participant. The solid lines are smoothed histograms of the model predicted RT distributions from 1000 individual subject data sets simulated from the posterior distribution of the best-fitting hierarchical model. RTs for choices of the safe option are plotted as negative, whereas RTs for risky choices are plotted as positive. The x-axes are adjusted to cover the range of observed RTs for each participant.

(TIF)

Click here for additional data file.^{(1.3MB, tif)}

S3 Fig. Consistency of model parameters for temporal discounting (TD: a/b) and probability discounting (PD, c) between softmax and DDM_S choice rules.

Scatter plots (controls: blue, mOFC patients: red) show model parameters estimated via a standard softmax choice rule (x-axis) vs. parameters estimated via a drift diffusion model choice rule with non-linear drift rate scaling (DDM_S, y-axis). a) Temporal discounting log(discount rate) for now trials. b) Shift in log(k) between now and not now trials). c) Probability discounting log(discount rate).

(TIF)

Click here for additional data file.^{(251.6KB, tif)}

S4 Fig. Associations between model-based non-decision time and model-free response times.

Scatter plots (red mOFC patients, blue: controls) depict associations between model-based non-decision time from the best fitting DDM_S models (x-axis) and minimum RT (a/b) and median RT (c/d) for temporal discounting (a/c) and risky choice / proability discounting (b/d).

(TIF)

Click here for additional data file.^{(727.3KB, tif)}

S5 Fig. Associations between model-based boundary separations and model-free response times.

Scatter plots (red: mOFC patients, blue: controls) depict associations between model-based boundary separation from the best fitting DDM_S models (x-axis) and minimum RT (a/b) and median RT (c/d) for temporal discounting (a/c) and risky choice / proability discounting (b/d).

(TIF)

Click here for additional data file.^{(742.2KB, tif)}

S6 Fig. Illustration of the differential effects of linear vs. sigmoid drift rate scaling.

Linear scaling predicts longer RTs (lower drift rates) than sigmoid scaling for all but the greatest value differences, where the effect reverses. The reversal point depends on the drift rate components (DDM_S1: v_max = 1.1786, v_coeff = .997, DDM_S2: v_max = .6, v_coeff = .2). The dashed line marks a value difference of -10, which was the lower bound of value differences in the present experimental design (i.e., the case when the risky or larger-later option was discounted to almost 0).

(TIF)

Click here for additional data file.^{(434.9KB, tif)}

S7 Fig

Associations between drift rate components and discount rates for temporal discounting (a) and risky choice / probability discounting (b). Top panels show v_max and lower panels show v_coeff.

(TIF)

Click here for additional data file.^{(677.2KB, tif)}

S8 Fig

Simulated temporal discounting response time distributions (left) and mean predicted response times per value bin (right) for a virtual participant for different values of v_max and v_coeff. See S1 Table (left column) for parameter values.

(TIF)

Click here for additional data file.^{(975.2KB, tif)}

S9 Fig

Simulated risky choice response time distributions (left) and mean predicted response times per value bin (right) for a virtual participant for different values of v_max and v_coeff. See S1 Table (right column) for parameter values.

(TIF)

Click here for additional data file.^{(1MB, tif)}

S1 Table. Parameter values used for simulation analyses depicted in S8 and S9 Figs.

All parameters are the posterior group means of the control group, with the exception of log(k)_now and the two drift rate modulator variables, which were selected for illustrative purposes.

(DOCX)

Click here for additional data file.^{(38.4KB, docx)}

S1 Text. Model validation analyses: associations of DDM parameters with model-free measures.

(DOCX)

Click here for additional data file.^{(159KB, docx)}

S2 Text. Associations between drift rate components and discount rates.

(DOCX)

Click here for additional data file.^{(41.4KB, docx)}

Acknowledgments

We thank Donatella Scabini for help with patient recruitment, Natasha Young for help with testing control subjects and all members of the Peters Lab at University of Cologne for helpful discussions.

Data Availability

Data cannot be shared publicly because participants did not provide consent for having the data posted in a public repository. Data are available from https://zenodo.org/record/3742412 for researchers who meet the criteria for access to confidential data.

Funding Statement

This work was funded by Deutsche Forschungsgemeinschaft (grants PE 1627/4-1 and PE1627/5-1 to J.P.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1.O’Doherty JP, Cockburn J, Pauli WM. Learning, Reward, and Decision Making. Annu Rev Psychol. 2017;68: 73–100. 10.1146/annurev-psych-010416-044216 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Rangel A, Camerer C, Montague PR. A framework for studying the neurobiology of value-based decision making. Nat Rev Neurosci. 2008;9: 545–56. 10.1038/nrn2357 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Dolan RJ, Dayan P. Goals and Habits in the Brain. Neuron. 2013;80: 312–325. 10.1016/j.neuron.2013.09.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Bickel WK, Jarmolowicz DP, Mueller ET, Koffarnus MN, Gatchalian KM. Excessive discounting of delayed reinforcers as a trans-disease process contributing to addiction and other disease-related vulnerabilities: Emerging evidence. Pharmacol Ther. 2012;134: 287–97. 10.1016/j.pharmthera.2012.02.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Gillan CM, Kosinski M, Whelan R, Phelps EA, Daw ND. Characterizing a psychiatric symptom dimension related to deficits in goal-directed control. eLife. 2016;5 10.7554/eLife.11305 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Chiong W, Wood KA, Beagle AJ, Hsu M, Kayser AS, Miller BL, et al. Neuroeconomic dissociation of semantic dementia and behavioural variant frontotemporal dementia. Brain J Neurol. 2016;139: 578–587. 10.1093/brain/awv344 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Sutton RS, Barto AG. Reinforcement Learning: An Introduction. Cambridge, Massachusetts: MIT Press; 1998. [Google Scholar]
8.Luce RD. The Choice Axiom after Twenty Years. J Math Psychol. 1977;15: 215–233. [Google Scholar]
9.Ratcliff R, McKoon G. The diffusion decision model: theory and data for two-choice decision tasks. Neural Comput. 2008;20: 873–922. 10.1162/neco.2008.12-06-420 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Forstmann BU, Ratcliff R, Wagenmakers E-J. Sequential Sampling Models in Cognitive Neuroscience: Advantages, Applications, and Extensions. Annu Rev Psychol. 2016;67: 641–666. 10.1146/annurev-psych-122414-033645 [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Usher M, McClelland JL. The time course of perceptual choice: the leaky, competing accumulator model. Psychol Rev. 2001;108: 550–592. 10.1037/0033-295x.108.3.550 [DOI] [PubMed] [Google Scholar]
12.Shahar N, Hauser TU, Moutoussis M, Moran R, Keramati M, NSPN consortium, et al. Improving the reliability of model-based decision-making estimates in the two-stage decision task with reaction-times and drift-diffusion modeling. PLoS Comput Biol. 2019;15: e1006803 10.1371/journal.pcbi.1006803 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Ballard IC, McClure SM. Joint modeling of reaction times and choice improves parameter identifiability in reinforcement learning models. J Neurosci Methods. 2019;317: 37–44. 10.1016/j.jneumeth.2019.01.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Pedersen ML, Frank MJ, Biele G. The drift diffusion model as the choice rule in reinforcement learning. Psychon Bull Rev. 2017;24: 1234–1251. 10.3758/s13423-016-1199-y [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Fontanesi L, Gluth S, Spektor MS, Rieskamp J. A reinforcement learning diffusion decision model for value-based decisions. Psychon Bull Rev. 2019. 10.3758/s13423-018-1554-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Rodriguez CA, Turner BM, McClure SM. Intertemporal choice as discounted value accumulation. PloS One. 2014;9: e90138 10.1371/journal.pone.0090138 [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Amasino DR, Sullivan NJ, Kranton RE, Huettel SA. Amount and time exert independent influences on intertemporal choice. Nat Hum Behav. 2019;3: 383–392. 10.1038/s41562-019-0537-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Milosavljevic M, Malmaud J, Huth A, Koch C, Rangel A. The drift diffusion model can account for the accuracy and reaction time of value-based choices under high and low time pressure. Judgement Decis Mak. 2010;5: 437–449. [Google Scholar]
19.Krajbich I, Armel C, Rangel A. Visual fixations and the computation and comparison of value in simple choice. Nat Neurosci. 2010;13: 1292–1298. 10.1038/nn.2635 [DOI] [PubMed] [Google Scholar]
20.Krajbich I, Rangel A. Multialternative drift-diffusion model predicts the relationship between visual fixations and choice in value-based decisions. Proc Natl Acad Sci U S A. 2011;108: 13852–13857. 10.1073/pnas.1101328108 [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Krajbich I, Lu D, Camerer C, Rangel A. The attentional drift-diffusion model extends to simple purchasing decisions. Front Psychol. 2012;3: 193 10.3389/fpsyg.2012.00193 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Pote I, Torkamani M, Kefalopoulou Z-M, Zrinzo L, Limousin-Dowsey P, Foltynie T, et al. Subthalamic nucleus deep brain stimulation induces impulsive action when patients with Parkinson’s disease act under speed pressure. Exp Brain Res. 2016;234: 1837–1848. 10.1007/s00221-016-4577-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Limongi R, Bohaterewicz B, Nowicka M, Plewka A, Friston KJ. Knowing when to stop: Aberrant precision and evidence accumulation in schizophrenia. Schizophr Res. 2018. 10.1016/j.schres.2017.12.018 [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Herz DM, Little S, Pedrosa DJ, Tinkhauser G, Cheeran B, Foltynie T, et al. Mechanisms Underlying Decision-Making as Revealed by Deep-Brain Stimulation in Patients with Parkinson’s Disease. Curr Biol CB. 2018;28: 1169–1178.e6. 10.1016/j.cub.2018.02.057 [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Cavanagh JF, Wiecki TV, Cohen MX, Figueroa CM, Samanta J, Sherman SJ, et al. Subthalamic nucleus stimulation reverses mediofrontal influence over decision threshold. Nat Neurosci. 2011;14: 1462–7. 10.1038/nn.2925 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Bechara A, Damasio AR, Damasio H, Anderson SW. Insensitivity to future consequences following damage to human prefrontal cortex. Cognition. 1994;50: 7–15. 10.1016/0010-0277(94)90018-3 [DOI] [PubMed] [Google Scholar]
27.Damasio H, Grabowski T, Frank R, Galaburda AM, Damasio AR. The return of Phineas Gage: clues about the brain from the skull of a famous patient. Science. 1994;264: 1102–1105. 10.1126/science.8178168 [DOI] [PubMed] [Google Scholar]
28.Gläscher J, Adolphs R, Damasio H, Bechara A, Rudrauf D, Calamia M, et al. Lesion mapping of cognitive control and value-based decision making in the prefrontal cortex. Proc Natl Acad Sci U S A. 2012;109: 14681–14686. 10.1073/pnas.1206608109 [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Bechara A, Damasio H, Tranel D, Anderson SW. Dissociation Of working memory from decision making within the human prefrontal cortex. J Neurosci. 1998;18: 428–37. 10.1523/JNEUROSCI.18-01-00428.1998 [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Peters J, D’Esposito M. Effects of Medial Orbitofrontal Cortex Lesions on Self-Control in Intertemporal Choice. Curr Biol CB. 2016;26: 2625–2628. 10.1016/j.cub.2016.07.035 [DOI] [PubMed] [Google Scholar]
31.Sellitto M, Ciaramelli E, di Pellegrino G. Myopic Discounting of Future Rewards after Medial Orbitofrontal Damage in Humans. J Neurosci. 2010;30: 16429–16436. 10.1523/JNEUROSCI.2516-10.2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Fellows LK, Farah MJ. Dissociable elements of human foresight: a role for the ventromedial frontal lobes in framing the future, but not in discounting future rewards. Neuropsychologia. 2005;43: 1214–1221. 10.1016/j.neuropsychologia.2004.07.018 [DOI] [PubMed] [Google Scholar]
33.Studer B, Manes F, Humphreys G, Robbins TW, Clark L. Risk-Sensitive Decision-Making in Patients with Posterior Parietal and Ventromedial Prefrontal Cortex Injury. Cereb Cortex. 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Manes F, Sahakian B, Clark L, Rogers R, Antoun N, Aitken M, et al. Decision-making processes following damage to the prefrontal cortex. Brain. 2002;125: 624–39. 10.1093/brain/awf049 [DOI] [PubMed] [Google Scholar]
35.Clark L, Bechara A, Damasio H, Aitken MR, Sahakian BJ, Robbins TW. Differential effects of insular and ventromedial prefrontal cortex lesions on risky decision-making. Brain. 2008;131: 1311–22. 10.1093/brain/awn066 [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Fellows LK, Farah MJ. Ventromedial frontal cortex mediates affective shifting in humans: evidence from a reversal learning paradigm. Brain J Neurol. 2003;126: 1830–1837. 10.1093/brain/awg180 [DOI] [PubMed] [Google Scholar]
37.Camille N, Tsuchida A, Fellows LK. Double dissociation of stimulus-value and action-value learning in humans with orbitofrontal or anterior cingulate cortex damage. J Neurosci Off J Soc Neurosci. 2011;31: 15048–15052. 10.1523/JNEUROSCI.3164-11.2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Tsuchida A, Doll BB, Fellows LK. Beyond reversal: a critical role for human orbitofrontal cortex in flexible learning from probabilistic feedback. J Neurosci. 2010;30: 16868–75. 10.1523/JNEUROSCI.1958-10.2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Camille N, Griffiths CA, Vo K, Fellows LK, Kable JW. Ventromedial frontal lobe damage disrupts value maximization in humans. J Neurosci. 2011;31: 7527–32. 10.1523/JNEUROSCI.6527-10.2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Henri-Bhargava A, Simioni A, Fellows LK. Ventromedial frontal lobe damage disrupts the accuracy, but not the speed, of value-based preference judgments. Neuropsychologia. 2012;50: 1536–1542. 10.1016/j.neuropsychologia.2012.03.006 [DOI] [PubMed] [Google Scholar]
41.Fellows LK, Farah MJ. The role of ventromedial prefrontal cortex in decision making: judgment under uncertainty or judgment per se? Cereb Cortex N Y N 1991. 2007;17: 2669–2674. 10.1093/cercor/bhl176 [DOI] [PubMed] [Google Scholar]
42.Clithero JA, Rangel A. Informatic parcellation of the network involved in the computation of subjective value. Soc Cogn Affect Neurosci. 2014;9: 1289–1302. 10.1093/scan/nst106 [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Bartra O, McGuire JT, Kable JW. The valuation system: a coordinate-based meta-analysis of BOLD fMRI experiments examining neural correlates of subjective value. NeuroImage. 2013;76: 412–427. 10.1016/j.neuroimage.2013.02.063 [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Vehtari A, Gelman A, Gabry J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput. 2017;27: 1413–1432. 10.1007/s11222-016-9696-4 [DOI] [Google Scholar]
45.Myerson J, Green L, Warusawitharana M. Area under the curve as a measure of discounting. J Exp Anal Behav. 2001;76: 235–43. 10.1901/jeab.2001.76-235 [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Marsman M, Wagenmakers E-J. Three Insights from a Bayesian Interpretation of the One-Sided P Value. Educ Psychol Meas. 2017;77: 529–539. 10.1177/0013164416669201 [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Wiecki TV, Sofer I, Frank MJ. HDDM: Hierarchical Bayesian estimation of the Drift-Diffusion Model in Python. Front Neuroinformatics. 2013;7 10.3389/fninf.2013.00014 [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Farrell S, Lewandowsky S. Computational modeling of cognition and behavior. Cambridge, UK: Cambridge University Press; 2018. [Google Scholar]
49.Scheibehenne B, Pachur T. Using Bayesian hierarchical parameter estimation to assess the generalizability of cognitive models of choice. Psychon Bull Rev. 2015;22: 391–407. 10.3758/s13423-014-0684-4 [DOI] [PubMed] [Google Scholar]
50.Lempert KM, Phelps EA. The Malleability of Intertemporal Choice. Trends Cogn Sci. 2016;20: 64–74. 10.1016/j.tics.2015.09.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Peters J, Büchel C. The neural mechanisms of inter-temporal decision-making: understanding variability. Trends Cogn Sci. 2011;15: 227–239. 10.1016/j.tics.2011.03.002 [DOI] [PubMed] [Google Scholar]
52.Hare TA, Camerer CF, Rangel A. Self-Control in Decision-Making Involves Modulation of the vmPFC Valuation System. Science. 2009;324: 646–648. 10.1126/science.1168450 [DOI] [PubMed] [Google Scholar]
53.Figner B, Knoch D, Johnson EJ, Krosch AR, Lisanby SH, Fehr E, et al. Lateral prefrontal cortex and self-control in intertemporal choice. Nat Neurosci. 2010;13: 538–539. 10.1038/nn.2516 [DOI] [PubMed] [Google Scholar]
54.Rahnev D, Nee DE, Riddle J, Larson AS, D’Esposito M. Causal evidence for frontal cortex organization for perceptual decision making. Proc Natl Acad Sci U S A. 2016;113: 6059–6064. 10.1073/pnas.1522551113 [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Heekeren HR, Marrett S, Ungerleider LG. The neural systems that mediate human perceptual decision making. Nat Rev Neurosci. 2008;9: 467–79. 10.1038/nrn2374 [DOI] [PubMed] [Google Scholar]
56.Peters J, Büchel C. Episodic Future Thinking Reduces Reward Delay Discounting through an Enhancement of Prefrontal-Mediotemporal Interactions. Neuron. 2010;66: 138–148. 10.1016/j.neuron.2010.03.026 [DOI] [PubMed] [Google Scholar]
57.Dixon MR, Jacobs EA, Sanders S. Contextual Control of Delay Discounting by Pathological Gamblers. Carr JE, editor. J Appl Behav Anal. 2006;39: 413–422. 10.1901/jaba.2006.173-05 [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Lempert KM, Johnson E, Phelps EA. Emotional arousal predicts intertemporal choice. Emot Wash DC. 2016;16: 647–656. 10.1037/emo0000168 [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Montague PR, Dolan RJ, Friston KJ, Dayan P. Computational psychiatry. Trends Cogn Sci. 2012;16: 72–80. 10.1016/j.tics.2011.11.018 [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Green L, Myerson J, Macaux EW. Temporal Discounting When the Choice Is Between Two Delayed Rewards. J Exp Psychol Learn Mem Cogn. 2005;31: 1121–1133. 10.1037/0278-7393.31.5.1121 [DOI] [PubMed] [Google Scholar]
61.Kable JW, Glimcher PW. An “as soon as possible” effect in human intertemporal decision making: behavioral evidence and neural mechanisms. J Neurophysiol. 2010;103: 2513–31. 10.1152/jn.00177.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
62.Green L, Myerson J. A discounting framework for choice with delayed and probabilistic rewards. Psychol Bull. 2004;130: 769–92. 10.1037/0033-2909.130.5.769 [DOI] [PMC free article] [PubMed] [Google Scholar]
63.Peters J, Buchel C. Overlapping and Distinct Neural Systems Code for Subjective Value during Intertemporal and Risky Decision Making. J Neurosci. 2009;29: 15727–15734. 10.1523/JNEUROSCI.3489-09.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
64.Hsu M, Krajbich I, Zhao C, Camerer CF. Neural Response to Reward Anticipation under Risk Is Nonlinear in Probabilities. J Neurosci. 2009;29: 2231–2237. 10.1523/JNEUROSCI.5296-08.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
65.Lattimore PK, Baker JR, Witte AD. The influence of probability on risky choice: a parametric examination. J Econ Behav Organ. 1992; 377–400. [Google Scholar]
66.Ligneul R, Sescousse G, Barbalat G, Domenech P, Dreher JC. Shifted risk preferences in pathological gambling. Psychol Med. 2012; 1–10. [DOI] [PubMed] [Google Scholar]
67.Wabersich D, Vandekerckhove J. Extending JAGS: a tutorial on adding custom distributions to JAGS (with a diffusion model example). Behav Res Methods. 2014;46: 15–28. 10.3758/s13428-013-0369-3 [DOI] [PubMed] [Google Scholar]
68.Plummer M. JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. Proceedings of the 3rd international workshop on distributed statistical computing. Technische Universit at Wien; 2003. p. 125. Available: http://www.ci.tuwien.ac.at/Conferences/DSC-2003/Drafts/Plummer.pdf
69.Kass RE, Raftery AE. Bayes Factors. J Am Stat Assoc. 1995;90: 773–795. 10.1080/01621459.1995.10476572 [DOI] [Google Scholar]

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007615.r001

Decision Letter 0

Ulrik R Beierholm, Samuel J Gershman

2 Aug 2019

Dear Dr Peters,

Thank you very much for submitting your manuscript 'The drift diffusion model as the choice rule in inter-temporal and risky choice: a case study in medial orbitofrontal cortex lesion patients and controls.' for review by PLOS Computational Biology. Your manuscript has been fully evaluated by the PLOS Computational Biology editorial team and in this case also by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the manuscript as it currently stands. While your manuscript cannot be accepted in its present form, we are willing to consider a revised version in which the issues raised by the reviewers have been adequately addressed. We cannot, of course, promise publication at that time.

Note that PLoS will usually require data to be made available freely, if at all possible. Is it possible to anonymise the data, so that it can be made freely available?

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

Your revisions should address the specific points made by each reviewer. Please return the revised version within the next 60 days. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by email at ploscompbiol@plos.org. Revised manuscripts received beyond 60 days may require evaluation and peer review similar to that applied to newly submitted manuscripts.

In addition, when you are ready to resubmit, please be prepared to provide the following:

(1) A detailed list of your responses to the review comments and the changes you have made in the manuscript. We require a file of this nature before your manuscript is passed back to the editors.

(2) A copy of your manuscript with the changes highlighted (encouraged). We encourage authors, if possible to show clearly where changes have been made to their manuscript e.g. by highlighting text.

(3) A striking still image to accompany your article (optional). If the image is judged to be suitable by the editors, it may be featured on our website and might be chosen as the issue image for that month. These square, high-quality images should be accompanied by a short caption. Please note as well that there should be no copyright restrictions on the use of the image, so that it can be published under the Open-Access license and be subject only to appropriate attribution.

Before you resubmit your manuscript, please consult our Submission Checklist to ensure your manuscript is formatted correctly for PLOS Computational Biology: http://www.ploscompbiol.org/static/checklist.action. Some key points to remember are:

- Figures uploaded separately as TIFF or EPS files (if you wish, your figures may remain in your main manuscript file in addition).

- Supporting Information uploaded as separate files, titled Dataset, Figure, Table, Text, Protocol, Audio, or Video.

- Funding information in the 'Financial Disclosure' box in the online system.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see here.

We are sorry that we cannot be more positive about your manuscript at this stage, but if you have any concerns or questions, please do not hesitate to contact us.

Sincerely,

Ulrik R. Beierholm

Associate Editor

PLOS Computational Biology

Samuel Gershman

Deputy Editor

PLOS Computational Biology

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: In the manuscript entitled “The drift diffusion model as the choice rule in inter-temporal and risky choice: a case study in medial orbitofrontal cortex lesion patients and controls”, Peters & D’Esposito present some novel insights by fitting cognitive models to partly unpublished data. These insights include: (1) vmPFC/mOFC damage patients have longer non-decision times and reduced decision caution compared to healthy and age- and education-matched controls; (2) they show increased risk-taking;

(3) the best way to account for value-based decisions in an inter-temporal choice task as well as in a risk-taking task is to modify the DDM so that the rate of evidence accumulation is proportional to the difference in subjective value between the two options; (4) the mapping between value-differences and the rate of evidence accumulation is non-linear rather than linear.

The present manuscript offers new insights in addition to the current literature in value-based decision making. However, there are a number of points (major and minor ones) that I think should be addressed by the authors.

Major points:

To test the hypothesis that the “vmPFC/mOFC damage might also render RTs during decision-making less dependent on value” it would be better to use a mixed-modelling approach rather than simply looking for differences in the drift-value-coefficient between groups of participants. This is because a low vs. high drift-value-coefficient means that decisions are still based on values, but are less sensitive to value differences. What would be interesting to see, is whether for a higher portion of trials compared to control subjects participants with vmPFC/mOFC damage can be simply described by the null-DDM instead of by the value-DDM. This could be tested with a mixed model, where a parameter (e.g., lambda) could control the proportion of trials that can be best described by a null-DDM vs. a value-DDM. This would allow to formulate and test the hypothesis that the lambda would be higher in participants with a vmPFC/mOFC damage.

It seems to me that the null model (DDM0) is too simple, making the comparison with the two value-based DDMs unfair. I strongly suggest to fit separate sets of the 4 parameters (alpha, tau, v, and z) across conditions (Now vs. Not now) for the inter temporal choice task. Since there are no clear conditions in the risky choice task, I am not sure how this could be done there.

Regarding the prior distributions, I have 2 suggestions: (1) that the priors are properly written and described in the text or appendix, or in the supplementary materials. They should be easily accessible by a reader and should not just be retrievable within the online code. Note that it is also necessary to specify the priors for the individual parameters, depending on the group parameters; (2) uniform priors are not uninformative, especially when restricted to sensible ranges. This suggests that the authors had an idea of which values were sensible and could therefore specify weakly-informative priors (see Gelman et al., 2014). I suggest to use “sensibly” centered Cauchy distributions – since they have the advantage of being heavy-tailed distributions and therefore weakly-informative – for the means and Half-Cauchy for the standard deviations.

The authors should add a parameter recovery section for the winning model (as in, e.g., Pedersen et al., 2017, Fontanesi et al., 2019a, and Fontanesi er al., 2019b). This is quite crucial when proposing new, complex models such as value based modifications of the DDM.

The posterior predictive checks are hard to assess. To better assess them, the authors should group mean RTs or RT quantiles for SS vs. LL options or for risky vs. safe options by condition and compare them with the same summary statistics on the data (as in, e.g., Fontanesi et al., 2019a, Fontanesi et al., 2019b)

For the Cohen’s d calculation, the authors should use the whole posterior traces for mean and pooled standard deviation, instead of the mean of the posterior distributions. This would make the calculation “more Bayesian” and more reliable.

I would exclude from the analyses correlations between DDM parameter estimates and model-free RT statistics. The DDM is supposed to decompose RT and choices distributions into interpretable parameters so: (1) some parameters always correlate to some extent to RTs (2) it is unclear to me what this would add to the interpretation of the model. If the aim was to check for qualitative model fit, the posterior predictive checks should be enough.

Minor points:

Please substitute every recurrence of “reaction time” with “response time”, which is more appropriate in this case (as this is not a mere stimulus reaction task but participants need to integrate information in order to make a decision). In general, I would only write response time (RT) at the first recurrence, and only refer to it as RT or RTs after that.

Regarding model comparison, I would suggest to switch to WAIC or LOO, which also allow you to have an estimate of the error in such measures and more reliably assess the difference in model fit between two models (as in, e.g., Pedersen et al., 2017 and Fontanesi et al., 2019a). This can be easily achieved via the R loo package, that only needs the mcmc traces as input (you could save your analyses output from Matlab and load them in R…).

I do not understand what it means that the distributions should have a clear peak or a clearly Gaussian shape: not all parameter distributions are supposed to have such a shape, so this shouldn’t be a way to asses model recoverability. On the contrary, looking at chain convergence and parameter recovery are.

The “m” in Equation 7 is not the slope of the sigmoid function, as described in the main text, but is the input value-difference that is transformed by the sigmoid function. The slope would then be Vcoeff (Vmod in Fontanesi et al., 2019a).

R-hats statistics should just be between 1 and some value close to 1. So please correct to 1 <= Rhat <= 1.01

In the results, clarify this sentence: “Since the correlation for shiftlog(k) appeared to be somewhat inflated by the extreme datapoints of the mOFC patients, we re-ran the correlation only in the control group. Here, the correlation was lower but still robust (r=.52).” What are these extreme datapoints? What was the correlation in the mOFC patients?

Very minor points:

In the abstract, refer to “Bayesian parameter estimation” instead of “Bayesian estimation scheme”.

In the introduction, remove the word “usually” in the first sentence of the second paragraph. DDM can only have 2 response boundaries, otherwise it would be a different model. In the last sentence of the same paragraph substitute “simple” with 2-alternatives forced choice tasks (this are what the DDM was made for).

I think it’s a bit misleading to call the DDM boundaries 0 and 1, so if possible I would delete that, or better explain that the lower boundary corresponds to when the accumulated evidence is equal to 0, the upper boundary is who the accumulated evidence is equal to alpha and the starting point z is half alpha.

Put a comma after every recurrence of e.g. (e.g., )

References:

Fontanesi, L., Gluth, S., Spektor, M.S. et al. Psychon Bull Rev (2019a). https://doi.org/10.3758/s13423-018-1554-2

Fontanesi, L., Palminteri, S. & Lebreton, M. Cogn Affect Behav Neurosci (2019b) 19: 490. https://doi.org/10.3758/s13415-019-00723-1

Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2014) Bayesian data analysis, (3rd edn.) London: Chapman & Hall/ CRC.

Pedersen, M. L., Frank, M. J., & Biele, G. (2017). The drift diffusion model as the choice rule in reinforcement learning. Psychonomic Bulletin & Review, 24(4), 1234—1251.

Reviewer #2: In the current study, the authors are using drift-diffusion modeling to predict a combination of choice and RT data, from two reinforcement learning tasks (temporal/probability discounting) performed by healthy controls and mOFC/vmPFC lesion patients. The authors suggest that DDM can be adequately used to describe observed data, and that it does not fall behind compared to a more conventional RL model, describing only choice behavior. The authors then suggest that (a) a ddm with a non-linear mapping between subjective-value and drift-rate provide the best fit to the data compared to more conventional ddm models. (b) Group differs mainly in increased non-decision time, and reduced decision threshold for patients vs. controls. The authors report no group differences in value processing.

I believe this is a valuable study, both in the sense that it aims to contribute by examining the applicability of evidence accumulation modeling to described choice&RT data, and presents interesting results with lesion patients. However, I think there are some issues that needs to be further elaborated and explored:

1. The authors conclude that non-linear drift rate modulation provided the best fit to the data. This is a great finding, but I think that it is very important to understand why that is, in the mechanistic level, specifically in terms of the relationship between subjective value and decision-time:

a. At the moment, it is hard to figure out why the non-linear aspect of the DDMs allows a better description for the data. Is this because the RT association with value differences between the two options is stronger for lower value differences? Or maybe this is due to the fact that some participants are less sensitive to the value manipulation (i.e., resulting in a very high Vcoef)?

b. Does the better fit for DDMs vs. DDMlin comes only from choice data, or does it provide better fit to RT data as well? I believe this is important to fully understand what part of the observed data is better explained by DDMs (e.g., it might not be about RTs at all, with DDMs better accounting for choice data alone compared to DDMlin). If it is mostly due to choice data, I am not sure why the use of ddm is justified here.

c. On the same point above, does vmax/vcoef estimates reflected differently in different aspects of the RT distribution (e.g., the tail/leading edge of the distribution)? Maybe the authors can also include a simulation where vmax/vcoef are mapped to aspects of the RT distribution (e.g., using ex-Gaussian fitting). This might then be used to show why DDMs actually fit better with RT data compared to DDMlin (hoping it does fit better to both, rather than to choice behavior only).

d. What is the relationship between vcoef/vmax and discounting parameters? Does DDMs provide a better fit because it help to better capture the RT-SV associations – or rather help account for individuals where such a relationship is actually absent (e.g., maybe by allowing a high vcoef?)

2. The authors suggest two analysis which led them to the conclusion that DDM can be adequately used to describe value based decision (depicted in Fig4 and Fig5/6). However, I feel that these analyses are more of a 'sanity checks' rather than novel results:

a. The authors report high correlations between the same parameters fitted with either softmax or DDMs. Yet, since both models describe choice behavior – why would we expect the same parameter to differ due to the modeling of choice and RT combination vs choice only? I think this needs better justification/explanation. What did you have in mind? Did you expect RTs to change the model ability to accurately predict choice for some reason? I think the challenge here is not to show that DDM account for choices similar to softmax models (which means that modeling a combination of choice and RT as opposed to choice only, doesn't reduce the fit for choice data). The challenge here in my mind, is to show the benefits of modeling RTs and choices at the same time. Since modeling both RT and choice is more difficult, I think it should be justified by laying down the possible advantages of using such an approach.

b. Fig 5/6. Why would we expect anything else then a positive correlation between nd /th and min/median/(mean?) RT? I appreciate the fact that the authors are including this – but I don't see how this is more than a sanity check. Did you have any reason to believe that a value based DDM model will tamper with the nd/th relationship with RT? Why?

c. Further re "we re-ran the correlation only in the control group. Here, the correlation was lower but still robust (r=.52)." I would suggest this is very low. Why is that? Maybe it has to do with the recoverability of this parameter specifically, or maybe very low effect/variance between conditions for controls? The point is that this might be unrelated to the difference between DDMs and softmax models.

3. Group differences look very interesting and valuable, but it's not clear whether you use these to validate the use of DDM, justifying the use of DDM for value based-decisions, or add to the mOFC literature per-se:

a. If group differences in nd is unrelated to value based processes – why is this a demonstration of why DDM can be beneficial here? Couldn't this be done with perceptual based decisions as well?

b. The authors report similar vmax, but higher vcoef for mOFC group. They conclude that "value-differences exert a similar (if not stronger) effect on trial-wise drift rates in vmPFC/mOFC patients compared to controls". I am not sure I understand why this led them to conclude that value choice processes are intact in mOFC patients. I think high vcoef might actually be a way of the model to account for participants that are insensitive to value differences.

c. Is it possible that the difference in starting point is only due to choice data, but not RT?

4. The issue of accuracy vs. stimulus coding is emphasized. However, it was hard for me to follow why this makes a difference. The assignment of the upper boundary to the higher value option is, to the best of my understating, strictly technical (both models are perfectly equivalent). I think it's good to note the differences, but I'm not sure I understand why it emphasized (i.e., had a paragraph both in the intro and discussion).

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: No: I did not get access to the data, but the authors stated in the manuscript that data can be provided upon request.

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Laura Fontanesi

Reviewer #2: No

PLoS Comput Biol. 2020 Apr 20;16(4):e1007615. doi: 10.1371/journal.pcbi.1007615.r002

Author response to Decision Letter 0

10 Oct 2019

Attachment

Submitted filename: R1_ResponseLetter_Peters.pdf

Click here for additional data file.^{(992.9KB, pdf)}

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007615.r003

Decision Letter 1

Ulrik R Beierholm, Samuel J Gershman

11 Nov 2019

Dear Dr Peters,

Thank you very much for submitting your manuscript, 'The drift diffusion model as the choice rule in inter-temporal and risky choice: a case study in medial orbitofrontal cortex lesion patients and controls.', to PLOS Computational Biology. As with all papers submitted to the journal, yours was fully evaluated by the PLOS Computational Biology editorial team, and in this case, by independent peer reviewers. The reviewers appreciated the attention to an important topic but identified some aspects of the manuscript that should be improved.

As you can see below, one of the reviewers had a small, but reasonable, request regarding the prior of one of the parameters, which should be easily addressed.

While the data underlying the study can not be made publically available, we would encourage you to also find a secondary place for storing data, e.g. an institutional data access portal, should your institution have this.

We would like to ask you to modify the manuscript according to the review recommendations before we can consider your manuscript for acceptance. Your revisions should address the specific points made by each reviewer and we encourage you to respond to particular issues Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.raised.

In addition, when you are ready to resubmit, please be prepared to provide the following:

(1) A detailed list of your responses to the review comments and the changes you have made in the manuscript. We require a file of this nature before your manuscript is passed back to the editors.

(2) A copy of your manuscript with the changes highlighted (encouraged). We encourage authors, if possible to show clearly where changes have been made to their manuscript e.g. by highlighting text.

- Figures uploaded separately as TIFF or EPS files (if you wish, your figures may remain in your main manuscript file in addition).

- Supporting Information uploaded as separate files, titled 'Dataset', 'Figure', 'Table', 'Text', 'Protocol', 'Audio', or 'Video'.

- Funding information in the 'Financial Disclosure' box in the online system.

We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by email at ploscompbiol@plos.org.

If you have any questions or concerns while you make these revisions, please let us know.

Sincerely,

Ulrik R. Beierholm

Associate Editor

PLOS Computational Biology

Samuel Gershman

Deputy Editor

PLOS Computational Biology

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Thank you for the additional mixed-models analyses. I only have one question about that, and it’s why was the lambda parameter constrained to be [-3, 3] in the standard normal space. Judging by the posteriors, it looks like the distributions are all pushing towards the bound at 3. If possible, I would consider relaxing the prior distribution to accommodate higher values of lambda.

I understand why it’s not feasible to fit separate DDM parameter by condition.

I also appreciate the change in wording regarding the prior distributions.

Thanks for the parameter recovery. It looks like the Bayesian estimation procedure is indeed able to recover parameters well.

Posterior predictive are also much improved. I really like how the also provide a better explanation of why the non-linear mapping fits the data better.

Regarding the Cohen’s d calculation, I simply suggested to perform a calculation based on all posterior samples and not just on their summaries (e.g., the mean). So that the result would then be a Cohen’s d distribution, instead of just a point summary. But I understand that this is not a crucial part of the results and this should be enough for future reference purposes.

I appreciate that the correlation analyses between raw data and DDM parameters are now out of the main text.

Reviewer #2: I would like to thank the authors for revising the manuscript and providing a detailed response. I believe this manuscript, and mainly the demonstration that a nonlin RLDDM can provide a better fit to choice&RT value-based data, serves as an important contribution to the field.

All the best,

Nitzan Shahar.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Reviewer #1: Yes

Reviewer #2: No: Authors mention data will be provided on-demand.

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

PLoS Comput Biol. 2020 Apr 20;16(4):e1007615. doi: 10.1371/journal.pcbi.1007615.r004

Author response to Decision Letter 1

29 Nov 2019

Attachment

Submitted filename: Peters_ResponseLetterR2.pdf

Click here for additional data file.^{(87.5KB, pdf)}

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007615.r005

Decision Letter 2

Ulrik R Beierholm, Samuel J Gershman

19 Dec 2019

Dear Dr Peters,

We are pleased to inform you that your manuscript 'The drift diffusion model as the choice rule in inter-temporal and risky choice: a case study in medial orbitofrontal cortex lesion patients and controls.' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Once you have received these formatting requests, please note that your manuscript will not be scheduled for publication until you have made the required changes.

In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pcompbiol/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process.

One of the goals of PLOS is to make science accessible to educators and the public. PLOS staff issue occasional press releases and make early versions of PLOS Computational Biology articles available to science writers and journalists. PLOS staff also collaborate with Communication and Public Information Offices and would be happy to work with the relevant people at your institution or funding agency. If your institution or funding agency is interested in promoting your findings, please ask them to coordinate their releases with PLOS (contact ploscompbiol@plos.org).

Thank you again for supporting Open Access publishing. We look forward to publishing your paper in PLOS Computational Biology.

Sincerely,

Ulrik R. Beierholm

Associate Editor

PLOS Computational Biology

Samuel Gershman

Deputy Editor

PLOS Computational Biology

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007615.r006

Acceptance letter

Ulrik R Beierholm, Samuel J Gershman

15 Apr 2020

PCOMPBIOL-D-19-01092R2

The drift diffusion model as the choice rule in inter-temporal and risky choice: a case study in medial orbitofrontal cortex lesion patients and controls

Dear Dr Peters,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Matt Lyles

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

(TIF)

Click here for additional data file.^{(1.2MB, tif)}

Histograms depict the observed RT distributions for each participant. The solid lines are smoothed histograms of the model predicted RT distributions from 1000 individual subject data sets simulated from the posterior distribution of the best-fitting hierarchical model. RTs for choices of the safe option are plotted as negative, whereas RTs for risky choices are plotted as positive. The x-axes are adjusted to cover the range of observed RTs for each participant.

(TIF)

Click here for additional data file.^{(1.3MB, tif)}

S3 Fig. Consistency of model parameters for temporal discounting (TD: a/b) and probability discounting (PD, c) between softmax and DDM_S choice rules.

(TIF)

Click here for additional data file.^{(251.6KB, tif)}

S4 Fig. Associations between model-based non-decision time and model-free response times.

(TIF)

Click here for additional data file.^{(727.3KB, tif)}

S5 Fig. Associations between model-based boundary separations and model-free response times.

(TIF)

Click here for additional data file.^{(742.2KB, tif)}

S6 Fig. Illustration of the differential effects of linear vs. sigmoid drift rate scaling.

(TIF)

Click here for additional data file.^{(434.9KB, tif)}

S7 Fig

Associations between drift rate components and discount rates for temporal discounting (a) and risky choice / probability discounting (b). Top panels show v_max and lower panels show v_coeff.

(TIF)

Click here for additional data file.^{(677.2KB, tif)}

S8 Fig

(TIF)

Click here for additional data file.^{(975.2KB, tif)}

S9 Fig

(TIF)

Click here for additional data file.^{(1MB, tif)}

S1 Table. Parameter values used for simulation analyses depicted in S8 and S9 Figs.

All parameters are the posterior group means of the control group, with the exception of log(k)_now and the two drift rate modulator variables, which were selected for illustrative purposes.

(DOCX)

Click here for additional data file.^{(38.4KB, docx)}

S1 Text. Model validation analyses: associations of DDM parameters with model-free measures.

(DOCX)

Click here for additional data file.^{(159KB, docx)}

S2 Text. Associations between drift rate components and discount rates.

(DOCX)

Click here for additional data file.^{(41.4KB, docx)}

Attachment

Submitted filename: R1_ResponseLetter_Peters.pdf

Click here for additional data file.^{(992.9KB, pdf)}

Attachment

Submitted filename: Peters_ResponseLetterR2.pdf

Click here for additional data file.^{(87.5KB, pdf)}

Data Availability Statement

[pcbi.1007615.ref001] 1.O’Doherty JP, Cockburn J, Pauli WM. Learning, Reward, and Decision Making. Annu Rev Psychol. 2017;68: 73–100. 10.1146/annurev-psych-010416-044216 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref002] 2.Rangel A, Camerer C, Montague PR. A framework for studying the neurobiology of value-based decision making. Nat Rev Neurosci. 2008;9: 545–56. 10.1038/nrn2357 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref003] 3.Dolan RJ, Dayan P. Goals and Habits in the Brain. Neuron. 2013;80: 312–325. 10.1016/j.neuron.2013.09.007 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref004] 4.Bickel WK, Jarmolowicz DP, Mueller ET, Koffarnus MN, Gatchalian KM. Excessive discounting of delayed reinforcers as a trans-disease process contributing to addiction and other disease-related vulnerabilities: Emerging evidence. Pharmacol Ther. 2012;134: 287–97. 10.1016/j.pharmthera.2012.02.004 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref005] 5.Gillan CM, Kosinski M, Whelan R, Phelps EA, Daw ND. Characterizing a psychiatric symptom dimension related to deficits in goal-directed control. eLife. 2016;5 10.7554/eLife.11305 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref006] 6.Chiong W, Wood KA, Beagle AJ, Hsu M, Kayser AS, Miller BL, et al. Neuroeconomic dissociation of semantic dementia and behavioural variant frontotemporal dementia. Brain J Neurol. 2016;139: 578–587. 10.1093/brain/awv344 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref007] 7.Sutton RS, Barto AG. Reinforcement Learning: An Introduction. Cambridge, Massachusetts: MIT Press; 1998. [Google Scholar]

[pcbi.1007615.ref008] 8.Luce RD. The Choice Axiom after Twenty Years. J Math Psychol. 1977;15: 215–233. [Google Scholar]

[pcbi.1007615.ref009] 9.Ratcliff R, McKoon G. The diffusion decision model: theory and data for two-choice decision tasks. Neural Comput. 2008;20: 873–922. 10.1162/neco.2008.12-06-420 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref010] 10.Forstmann BU, Ratcliff R, Wagenmakers E-J. Sequential Sampling Models in Cognitive Neuroscience: Advantages, Applications, and Extensions. Annu Rev Psychol. 2016;67: 641–666. 10.1146/annurev-psych-122414-033645 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref011] 11.Usher M, McClelland JL. The time course of perceptual choice: the leaky, competing accumulator model. Psychol Rev. 2001;108: 550–592. 10.1037/0033-295x.108.3.550 [DOI] [PubMed] [Google Scholar]

[pcbi.1007615.ref012] 12.Shahar N, Hauser TU, Moutoussis M, Moran R, Keramati M, NSPN consortium, et al. Improving the reliability of model-based decision-making estimates in the two-stage decision task with reaction-times and drift-diffusion modeling. PLoS Comput Biol. 2019;15: e1006803 10.1371/journal.pcbi.1006803 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref013] 13.Ballard IC, McClure SM. Joint modeling of reaction times and choice improves parameter identifiability in reinforcement learning models. J Neurosci Methods. 2019;317: 37–44. 10.1016/j.jneumeth.2019.01.006 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref014] 14.Pedersen ML, Frank MJ, Biele G. The drift diffusion model as the choice rule in reinforcement learning. Psychon Bull Rev. 2017;24: 1234–1251. 10.3758/s13423-016-1199-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref015] 15.Fontanesi L, Gluth S, Spektor MS, Rieskamp J. A reinforcement learning diffusion decision model for value-based decisions. Psychon Bull Rev. 2019. 10.3758/s13423-018-1554-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref016] 16.Rodriguez CA, Turner BM, McClure SM. Intertemporal choice as discounted value accumulation. PloS One. 2014;9: e90138 10.1371/journal.pone.0090138 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref017] 17.Amasino DR, Sullivan NJ, Kranton RE, Huettel SA. Amount and time exert independent influences on intertemporal choice. Nat Hum Behav. 2019;3: 383–392. 10.1038/s41562-019-0537-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref018] 18.Milosavljevic M, Malmaud J, Huth A, Koch C, Rangel A. The drift diffusion model can account for the accuracy and reaction time of value-based choices under high and low time pressure. Judgement Decis Mak. 2010;5: 437–449. [Google Scholar]

[pcbi.1007615.ref019] 19.Krajbich I, Armel C, Rangel A. Visual fixations and the computation and comparison of value in simple choice. Nat Neurosci. 2010;13: 1292–1298. 10.1038/nn.2635 [DOI] [PubMed] [Google Scholar]

[pcbi.1007615.ref020] 20.Krajbich I, Rangel A. Multialternative drift-diffusion model predicts the relationship between visual fixations and choice in value-based decisions. Proc Natl Acad Sci U S A. 2011;108: 13852–13857. 10.1073/pnas.1101328108 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref021] 21.Krajbich I, Lu D, Camerer C, Rangel A. The attentional drift-diffusion model extends to simple purchasing decisions. Front Psychol. 2012;3: 193 10.3389/fpsyg.2012.00193 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref022] 22.Pote I, Torkamani M, Kefalopoulou Z-M, Zrinzo L, Limousin-Dowsey P, Foltynie T, et al. Subthalamic nucleus deep brain stimulation induces impulsive action when patients with Parkinson’s disease act under speed pressure. Exp Brain Res. 2016;234: 1837–1848. 10.1007/s00221-016-4577-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref023] 23.Limongi R, Bohaterewicz B, Nowicka M, Plewka A, Friston KJ. Knowing when to stop: Aberrant precision and evidence accumulation in schizophrenia. Schizophr Res. 2018. 10.1016/j.schres.2017.12.018 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref024] 24.Herz DM, Little S, Pedrosa DJ, Tinkhauser G, Cheeran B, Foltynie T, et al. Mechanisms Underlying Decision-Making as Revealed by Deep-Brain Stimulation in Patients with Parkinson’s Disease. Curr Biol CB. 2018;28: 1169–1178.e6. 10.1016/j.cub.2018.02.057 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref025] 25.Cavanagh JF, Wiecki TV, Cohen MX, Figueroa CM, Samanta J, Sherman SJ, et al. Subthalamic nucleus stimulation reverses mediofrontal influence over decision threshold. Nat Neurosci. 2011;14: 1462–7. 10.1038/nn.2925 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref026] 26.Bechara A, Damasio AR, Damasio H, Anderson SW. Insensitivity to future consequences following damage to human prefrontal cortex. Cognition. 1994;50: 7–15. 10.1016/0010-0277(94)90018-3 [DOI] [PubMed] [Google Scholar]

[pcbi.1007615.ref027] 27.Damasio H, Grabowski T, Frank R, Galaburda AM, Damasio AR. The return of Phineas Gage: clues about the brain from the skull of a famous patient. Science. 1994;264: 1102–1105. 10.1126/science.8178168 [DOI] [PubMed] [Google Scholar]

[pcbi.1007615.ref028] 28.Gläscher J, Adolphs R, Damasio H, Bechara A, Rudrauf D, Calamia M, et al. Lesion mapping of cognitive control and value-based decision making in the prefrontal cortex. Proc Natl Acad Sci U S A. 2012;109: 14681–14686. 10.1073/pnas.1206608109 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref029] 29.Bechara A, Damasio H, Tranel D, Anderson SW. Dissociation Of working memory from decision making within the human prefrontal cortex. J Neurosci. 1998;18: 428–37. 10.1523/JNEUROSCI.18-01-00428.1998 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref030] 30.Peters J, D’Esposito M. Effects of Medial Orbitofrontal Cortex Lesions on Self-Control in Intertemporal Choice. Curr Biol CB. 2016;26: 2625–2628. 10.1016/j.cub.2016.07.035 [DOI] [PubMed] [Google Scholar]

[pcbi.1007615.ref031] 31.Sellitto M, Ciaramelli E, di Pellegrino G. Myopic Discounting of Future Rewards after Medial Orbitofrontal Damage in Humans. J Neurosci. 2010;30: 16429–16436. 10.1523/JNEUROSCI.2516-10.2010 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref032] 32.Fellows LK, Farah MJ. Dissociable elements of human foresight: a role for the ventromedial frontal lobes in framing the future, but not in discounting future rewards. Neuropsychologia. 2005;43: 1214–1221. 10.1016/j.neuropsychologia.2004.07.018 [DOI] [PubMed] [Google Scholar]

[pcbi.1007615.ref033] 33.Studer B, Manes F, Humphreys G, Robbins TW, Clark L. Risk-Sensitive Decision-Making in Patients with Posterior Parietal and Ventromedial Prefrontal Cortex Injury. Cereb Cortex. 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref034] 34.Manes F, Sahakian B, Clark L, Rogers R, Antoun N, Aitken M, et al. Decision-making processes following damage to the prefrontal cortex. Brain. 2002;125: 624–39. 10.1093/brain/awf049 [DOI] [PubMed] [Google Scholar]

[pcbi.1007615.ref035] 35.Clark L, Bechara A, Damasio H, Aitken MR, Sahakian BJ, Robbins TW. Differential effects of insular and ventromedial prefrontal cortex lesions on risky decision-making. Brain. 2008;131: 1311–22. 10.1093/brain/awn066 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref036] 36.Fellows LK, Farah MJ. Ventromedial frontal cortex mediates affective shifting in humans: evidence from a reversal learning paradigm. Brain J Neurol. 2003;126: 1830–1837. 10.1093/brain/awg180 [DOI] [PubMed] [Google Scholar]

[pcbi.1007615.ref037] 37.Camille N, Tsuchida A, Fellows LK. Double dissociation of stimulus-value and action-value learning in humans with orbitofrontal or anterior cingulate cortex damage. J Neurosci Off J Soc Neurosci. 2011;31: 15048–15052. 10.1523/JNEUROSCI.3164-11.2011 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref038] 38.Tsuchida A, Doll BB, Fellows LK. Beyond reversal: a critical role for human orbitofrontal cortex in flexible learning from probabilistic feedback. J Neurosci. 2010;30: 16868–75. 10.1523/JNEUROSCI.1958-10.2010 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref039] 39.Camille N, Griffiths CA, Vo K, Fellows LK, Kable JW. Ventromedial frontal lobe damage disrupts value maximization in humans. J Neurosci. 2011;31: 7527–32. 10.1523/JNEUROSCI.6527-10.2011 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref040] 40.Henri-Bhargava A, Simioni A, Fellows LK. Ventromedial frontal lobe damage disrupts the accuracy, but not the speed, of value-based preference judgments. Neuropsychologia. 2012;50: 1536–1542. 10.1016/j.neuropsychologia.2012.03.006 [DOI] [PubMed] [Google Scholar]

[pcbi.1007615.ref041] 41.Fellows LK, Farah MJ. The role of ventromedial prefrontal cortex in decision making: judgment under uncertainty or judgment per se? Cereb Cortex N Y N 1991. 2007;17: 2669–2674. 10.1093/cercor/bhl176 [DOI] [PubMed] [Google Scholar]

[pcbi.1007615.ref042] 42.Clithero JA, Rangel A. Informatic parcellation of the network involved in the computation of subjective value. Soc Cogn Affect Neurosci. 2014;9: 1289–1302. 10.1093/scan/nst106 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref043] 43.Bartra O, McGuire JT, Kable JW. The valuation system: a coordinate-based meta-analysis of BOLD fMRI experiments examining neural correlates of subjective value. NeuroImage. 2013;76: 412–427. 10.1016/j.neuroimage.2013.02.063 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref044] 44.Vehtari A, Gelman A, Gabry J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput. 2017;27: 1413–1432. 10.1007/s11222-016-9696-4 [DOI] [Google Scholar]

[pcbi.1007615.ref045] 45.Myerson J, Green L, Warusawitharana M. Area under the curve as a measure of discounting. J Exp Anal Behav. 2001;76: 235–43. 10.1901/jeab.2001.76-235 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref046] 46.Marsman M, Wagenmakers E-J. Three Insights from a Bayesian Interpretation of the One-Sided P Value. Educ Psychol Meas. 2017;77: 529–539. 10.1177/0013164416669201 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref047] 47.Wiecki TV, Sofer I, Frank MJ. HDDM: Hierarchical Bayesian estimation of the Drift-Diffusion Model in Python. Front Neuroinformatics. 2013;7 10.3389/fninf.2013.00014 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref048] 48.Farrell S, Lewandowsky S. Computational modeling of cognition and behavior. Cambridge, UK: Cambridge University Press; 2018. [Google Scholar]

[pcbi.1007615.ref049] 49.Scheibehenne B, Pachur T. Using Bayesian hierarchical parameter estimation to assess the generalizability of cognitive models of choice. Psychon Bull Rev. 2015;22: 391–407. 10.3758/s13423-014-0684-4 [DOI] [PubMed] [Google Scholar]

[pcbi.1007615.ref050] 50.Lempert KM, Phelps EA. The Malleability of Intertemporal Choice. Trends Cogn Sci. 2016;20: 64–74. 10.1016/j.tics.2015.09.005 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref051] 51.Peters J, Büchel C. The neural mechanisms of inter-temporal decision-making: understanding variability. Trends Cogn Sci. 2011;15: 227–239. 10.1016/j.tics.2011.03.002 [DOI] [PubMed] [Google Scholar]

[pcbi.1007615.ref052] 52.Hare TA, Camerer CF, Rangel A. Self-Control in Decision-Making Involves Modulation of the vmPFC Valuation System. Science. 2009;324: 646–648. 10.1126/science.1168450 [DOI] [PubMed] [Google Scholar]

[pcbi.1007615.ref053] 53.Figner B, Knoch D, Johnson EJ, Krosch AR, Lisanby SH, Fehr E, et al. Lateral prefrontal cortex and self-control in intertemporal choice. Nat Neurosci. 2010;13: 538–539. 10.1038/nn.2516 [DOI] [PubMed] [Google Scholar]

[pcbi.1007615.ref054] 54.Rahnev D, Nee DE, Riddle J, Larson AS, D’Esposito M. Causal evidence for frontal cortex organization for perceptual decision making. Proc Natl Acad Sci U S A. 2016;113: 6059–6064. 10.1073/pnas.1522551113 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref055] 55.Heekeren HR, Marrett S, Ungerleider LG. The neural systems that mediate human perceptual decision making. Nat Rev Neurosci. 2008;9: 467–79. 10.1038/nrn2374 [DOI] [PubMed] [Google Scholar]

[pcbi.1007615.ref056] 56.Peters J, Büchel C. Episodic Future Thinking Reduces Reward Delay Discounting through an Enhancement of Prefrontal-Mediotemporal Interactions. Neuron. 2010;66: 138–148. 10.1016/j.neuron.2010.03.026 [DOI] [PubMed] [Google Scholar]

[pcbi.1007615.ref057] 57.Dixon MR, Jacobs EA, Sanders S. Contextual Control of Delay Discounting by Pathological Gamblers. Carr JE, editor. J Appl Behav Anal. 2006;39: 413–422. 10.1901/jaba.2006.173-05 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref058] 58.Lempert KM, Johnson E, Phelps EA. Emotional arousal predicts intertemporal choice. Emot Wash DC. 2016;16: 647–656. 10.1037/emo0000168 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref059] 59.Montague PR, Dolan RJ, Friston KJ, Dayan P. Computational psychiatry. Trends Cogn Sci. 2012;16: 72–80. 10.1016/j.tics.2011.11.018 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref060] 60.Green L, Myerson J, Macaux EW. Temporal Discounting When the Choice Is Between Two Delayed Rewards. J Exp Psychol Learn Mem Cogn. 2005;31: 1121–1133. 10.1037/0278-7393.31.5.1121 [DOI] [PubMed] [Google Scholar]

[pcbi.1007615.ref061] 61.Kable JW, Glimcher PW. An “as soon as possible” effect in human intertemporal decision making: behavioral evidence and neural mechanisms. J Neurophysiol. 2010;103: 2513–31. 10.1152/jn.00177.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref062] 62.Green L, Myerson J. A discounting framework for choice with delayed and probabilistic rewards. Psychol Bull. 2004;130: 769–92. 10.1037/0033-2909.130.5.769 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref063] 63.Peters J, Buchel C. Overlapping and Distinct Neural Systems Code for Subjective Value during Intertemporal and Risky Decision Making. J Neurosci. 2009;29: 15727–15734. 10.1523/JNEUROSCI.3489-09.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref064] 64.Hsu M, Krajbich I, Zhao C, Camerer CF. Neural Response to Reward Anticipation under Risk Is Nonlinear in Probabilities. J Neurosci. 2009;29: 2231–2237. 10.1523/JNEUROSCI.5296-08.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007615.ref065] 65.Lattimore PK, Baker JR, Witte AD. The influence of probability on risky choice: a parametric examination. J Econ Behav Organ. 1992; 377–400. [Google Scholar]

[pcbi.1007615.ref066] 66.Ligneul R, Sescousse G, Barbalat G, Domenech P, Dreher JC. Shifted risk preferences in pathological gambling. Psychol Med. 2012; 1–10. [DOI] [PubMed] [Google Scholar]

[pcbi.1007615.ref067] 67.Wabersich D, Vandekerckhove J. Extending JAGS: a tutorial on adding custom distributions to JAGS (with a diffusion model example). Behav Res Methods. 2014;46: 15–28. 10.3758/s13428-013-0369-3 [DOI] [PubMed] [Google Scholar]

[pcbi.1007615.ref068] 68.Plummer M. JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. Proceedings of the 3rd international workshop on distributed statistical computing. Technische Universit at Wien; 2003. p. 125. Available: http://www.ci.tuwien.ac.at/Conferences/DSC-2003/Drafts/Plummer.pdf

[pcbi.1007615.ref069] 69.Kass RE, Raftery AE. Bayes Factors. J Am Stat Assoc. 1995;90: 773–795. 10.1080/01621459.1995.10476572 [DOI] [Google Scholar]

PERMALINK

The drift diffusion model as the choice rule in inter-temporal and risky choice: A case study in medial orbitofrontal cortex lesion patients and controls

Jan Peters

Mark D’Esposito

Roles

Abstract

Author summary

Introduction

Results

Model comparison

Table 1. Model comparison of drift diffusion models of temporal discounting.

Table 2. Model comparison of drift diffusion models of risky choice.

Model validation

Prediction of binary choice data

Table 3. Median (range) of the proportion of correctly predicted binary choices for the different temporal discounting models, separately for mOFC patients and controls.

Table 4. Median (range) of the proportion of correctly predicted binary choices for the different risky choice models, separately for mOFC patients and controls.

Fig 1.

Posterior predictive checks and prediction of RTs

Fig 2. Posterior predictive plots for the different temporal discounting DDM models for all individual participants (P–mOFC patients, C–controls).

Fig 3. Posterior predictive plots for the different risky choice DDM models for all individual participants (P–mOFC patients, C–controls).

Simulations of effects of drift rate components on RT distributions

Table 5. DDM parameter values used for simulation analyses depicted in Fig 5.

Fig 4. Simulation results for the DDMS.

Parameter recovery simulations

Fig 5. Parameter recovery results for the temporal discounting DDMS.

Fig 6. Parameter recovery results for the risky choice DDMS.

Comparison to previous model-free analyses in mOFC patients

Fig 7. Modeling results for the DDMS temporal discounting model.

Table 6. Summary of group differences in model parameters.

Risk-taking in vmPFC/mOFC patients

Fig 8. Modeling results for the DDMS risky choice model.

Effects of mOFC lesions on diffusion model parameters

DDM mixture models

Fig 9.

Discussion

Materials & methods

Ethics statement

Procedure

Temporal discounting task

Risky choice task

Temporal discounting model

Risky choice model

Softmax choice rule

Drift diffusion choice rule

Table 7. Overview of the parameters of the DDMS models and priors for group means.

DDM mixture models

Hierarchical Bayesian models

Model estimation and comparison

Posterior predictive checks

Parameter recovery analyses

Simulating effects of drift rate components on RTs

Analysis of group differences

Code availability

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Ulrik R Beierholm

Samuel J Gershman

Roles

Author response to Decision Letter 0

Decision Letter 1

Ulrik R Beierholm

Samuel J Gershman

Roles

Author response to Decision Letter 1

Decision Letter 2

Ulrik R Beierholm

Samuel J Gershman

Roles

Acceptance letter

Ulrik R Beierholm

Samuel J Gershman

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

Fig 4. Simulation results for the DDM_S.

Fig 5. Parameter recovery results for the temporal discounting DDM_S.

Fig 6. Parameter recovery results for the risky choice DDM_S.

Fig 7. Modeling results for the DDM_S temporal discounting model.

Fig 8. Modeling results for the DDM_S risky choice model.

Table 7. Overview of the parameters of the DDM_S models and priors for group means.