Skip to main content
eLife logoLink to eLife
. 2016 Jun 1;5:e14155. doi: 10.7554/eLife.14155

DYT1 dystonia increases risk taking in humans

David Arkadir 1,*, Angela Radulescu 2,3, Deborah Raymond 4, Naomi Lubarr 4, Susan B Bressman 4, Pietro Mazzoni 5, Yael Niv 2,3,*
Editor: Rui M Costa6
PMCID: PMC4951192  PMID: 27249418

Abstract

It has been difficult to link synaptic modification to overt behavioral changes. Rodent models of DYT1 dystonia, a motor disorder caused by a single gene mutation, demonstrate increased long-term potentiation and decreased long-term depression in corticostriatal synapses. Computationally, such asymmetric learning predicts risk taking in probabilistic tasks. Here we demonstrate abnormal risk taking in DYT1 dystonia patients, which is correlated with disease severity, thereby supporting striatal plasticity in shaping choice behavior in humans.

DOI: http://dx.doi.org/10.7554/eLife.14155.001

Research Organism: Human

eLife digest

We learn to choose better options and avoid worse ones through trial and error, but exactly how this happens is still unclear. One idea is that we learn 'values' for options: whenever we choose an option and get more reward than originally expected (for example, if an unappetizing-looking food turns out to be very tasty), the value of that option increases. Likewise, if we get less reward than expected, the chosen option’s value decreases.

This learning process is hypothesized to work via the strengthening and weakening of connections between neurons in two parts of the brain: the cortex and the striatum. In this model, the activity of the neurons in the cortex represents the options, and the value of these options is represented by the activity of neurons in the striatum. Strengthening the connections is thought to increase the value of the stimulus, but this theory has been difficult to test.

In humans, a single genetic mutation causes a movement disorder called DYT1 dystonia, in which muscles contract involuntarily. In rodents, the same mutation causes the connections between the neurons in the cortex and the striatum to become too strong. If the theory about value learning is true, this strengthening should affect the decisions of patients that have DYT1 dystonia.

Arkadir et al. got healthy people and people with DYT1 dystonia to play a game where they had to choose between a 'sure' option and a 'risky' option. Picking the sure option guaranteed the player would receive a small amount of money, whereas the risky option gave either double this amount or nothing. The theory predicts that the double rewards should cause the patients to learn abnormally high values, which would lure them into making risky choices. Indeed, Arkadir et al. found that players with DYT1 dystonia were more likely to choose the risky option, with the people who had more severe symptoms of dystonia having a greater tendency towards taking risks.

Arkadir et al. showed that these results correspond with a model that suggests that people with DYT1 dystonia learn excessively from unexpected wins but show weakened learning after losses, causing them to over-estimate the value of risky choices. This imbalance mirrors the previous results that showed an inappropriate strengthening of the connections between neurons in rodents, and so suggests that similar changes occur in the brains of humans. Thus it appears that the changes in the strength of the connections between neurons translate into changes in behavior.

This pattern of results might also mean that the movement problems seen in people with DYT1 dystonia may be because they over-learn movements that previously led to a desired outcome and cannot sufficiently suppress movements that are no longer useful. Testing this idea will require further experiments.

DOI: http://dx.doi.org/10.7554/eLife.14155.002

Introduction

DYT1 dystonia is a rare, dominantly inherited form of dystonia, caused almost exclusively by a specific deletion of three base pairs in the TOR1A gene (Ozelius et al., 1997). Clinically, DYT1 dystonia is characterized by variable severity of sustained or intermittent muscle contractions that produce abnormal movements. DYT1 dystonia patients have normal intelligence, and post-mortem examination of their brains does not reveal obvious abnormalities or evidence of neurodegeneration (Paudel et al., 2012). Nevertheless, research in two different rodent models of DYT1 dystonia points to the existence of a fundamental deficit in synaptic plasticity. Specifically, brain slices of transgenic rodents expressing the human mutant TOR1A gene show abnormally strong long-term potentiation (LTP; Martella et al., 2009) and weak, or even absent, long-term depression (LTD; Grundmann et al., 2012; Martella et al., 2009) in corticostriatal synapses, as compared to wild-type controls.

Reinforcement learning theory (Sutton and Barto, 1998) hypothesizes that dopamine-dependent synaptic plasticity in corticostriatal networks is the neuronal substrate for learning through trial and error (Barnes et al., 2005; Barto, 1995; Schultz et al., 1997). The core assumptions of this theory are that (1) dopamine release in the striatum signals errors in the prediction of reward, with dopamine levels increasing following successful actions (to signal a positive prediction error) and decreasing when actions fail to achieve the expected outcome (a negative prediction error), (2) fluctuations in dopamine modulate downstream plasticity in recently active corticostriatal synapses such that synapses responsible for positive prediction errors are strengthened through long-term potentiation (LTP), and those that led to disappointment are weakened through long-term depression (LTD) (Reynolds et al., 2001), and (3) the efficacy of corticostriatal transmission affects voluntary action selection.

Dopamine’s role as a reinforcing signal for trial-and-error learning is supported by numerous findings (Pessiglione et al., 2006; Schultz et al., 1997; Steinberg et al., 2013), including in humans, where Parkinson’s disease serves as a human model for altered dopaminergic transmission (Frank et al., 2004). However, the contribution of (dopamine modulated) corticostriatal plasticity to shaping action has remained unconfirmed in the behaving organism, as it is not clear that the behavioral effects of altered dopamine signaling in Parkinson’s disease (and other conditions in which dopamine transmission is compromised) indeed stem from the role of dopamine in modulating plasticity. Towards this end, here we test whether DYT1 dystonia, where corticostriatal plasticity is suggested to be altered despite preserved dopaminergic signaling, leads to the behavioral effects predicted by reinforcement learning with imbalanced plasticity. In particular, our predictions stem from considering the effects of intact prediction errors on an altered plasticity mechanism that amplifies the effect of positive prediction errors (i.e., responds to positive prediction errors with more LTP than would otherwise occur in controls) and mutes the effects of negative prediction errors (that is, responds with weakened LTD as compared to controls).

We compared the behavior of DYT1 dystonia patients and healthy controls on an operant-learning paradigm with probabilistic rewards (Niv et al., 2012). Participants learned from trial and error to associate four different visual cues with monetary rewards (Figure 1a), optimizing their gain by selecting one of two cues in choice trials, and choosing the single available cue in forced trials. Three visual cues were associated with a payoff of 0¢, 5¢ and 10¢, respectively, while the fourth cue was associated with an unpredictable payoff of either 0¢ or 10¢ with equal probabilities (henceforth the ‘risky 0/10¢’ cue). Based on the findings in rodents with the DYT1 mutation, we predicted that dystonia patients would learn preferentially from positive prediction errors (putatively due to abnormally strong LTP) and to a much lesser extent from negative prediction errors (due to weak LTD) (Figure 1b). As a result, they should show a stronger tendency to choose the risky cue as compared to healthy controls.

Figure 1. Behavioral task and hypothesis.

Figure 1.

(a) In ‘choice trials’ two visual cues were simultaneously presented on a computer screen. The participant was required to make a choice within 1.5 s. The chosen option and the outcome then appeared for 1 s, followed by a variable inter-trial interval. (b) Theoretical framework. Top: trials in which the risky cue is chosen and the obtained outcome is larger than expected (trials with a 10¢ outcome) should result in strengthening of corticostriatal connections (LTP), thereby increasing the expected value of the cue and the tendency to choose it in the future. Conversely, outcomes that are smaller than expected (0¢) should cause synaptic weakening (LTD) and a resulting decrease in choice probability. Middle: In DYT1 dystonia patients (red solid), increased LTP combined with decreased LTD are expected to result in an overall higher learned value for the risky cue, as compared to controls (blue dashed). In the model, this is reflected in higher probability of choosing the risky cue when presented together with sure 5¢ cue (Bottom). Simulations (1000 runs) used the actual order of trials and mean model parameters of each group as fit to participants’ behavior. Gray shadow in the middle plot denotes trials in the initial training phase.

DOI: http://dx.doi.org/10.7554/eLife.14155.003

Results

We tested 13 patients with DYT1 dystonia (8 women, 5 men, age 20–47, mean 28.6 years, henceforth DYT) and 13 healthy controls (CTL; 8 women, 5 men, age 19–46, mean 28.8 years), matched on an individual basis for sex and age (Mann-Whitney U test for age differences, z = −0.59, df = 24, P = 0.55), all with at least 13 years of education. Patients had no previous neurosurgical interventions for dystonia (including deep brain stimulation) and were tested before their scheduled dose of medication when possible (see Materials and methods). The number of aborted trials was similarly low in both groups (DYT 2.3 ± 2.5, CTL 1.1 ± 1.2, Mann-Whitney z = −1.61, df = 24, P = 0.11) and reactions times were well below the 1.5s response deadline (DYT 0.78s ± 0.11, CTL 0.71s ± 0.10, Mann-Whitney z = −1.49, df = 24, P = 0.14), confirming that motor symptoms of dystonia did not interfere with the minimal motor demands of the task.

Both groups quickly learned the task, and showed similarly high probabilities of choosing the best cue in trials in which a pair of sure cues (sure 0¢ vs. sure 5¢ or sure ¢5 vs. sure 10¢) appeared together (mean probability correct choice: DYT1 0.92 ± 0.08, CTL 0.93 ± 0.05, Mann-Whitney z = 0.08, df = 24, P = 0.94; Figure 2a), as well as in trials in which the risky cue appeared together with either the sure 0¢ or sure 10¢ cues (mean probability correct: DYT1 0.84 ± 0.09, CTL 0.89 ± 0.04, Mann-Whitney z = −1.39, df = 24, P = 0.17; Figure 2b).

Figure 2. Learning curves did not differ between the groups.

Figure 2.

Mean probabilities (± s.e.m) of choosing the cue associated with the higher outcome, on average, (a) among pairs of two sure cues (15 trials per ‘block’) or (b) when the risky 0/10¢ cue was paired with a sure cue of 0¢ or 10¢ value (20 trials per ‘block’) confirmed that both groups quickly learned to choose the best cue in trials in which one cue was explicitly better than the other. These results verify that both groups understood the task instructions and could perform the task similarly well (in terms of choosing and executing their responses fast enough, etc.). Participants evidenced learning of values for deterministically-rewarded cues even in the first choice trials despite the fact that they were never informed verbally or otherwise of the monetary outcomes associated with each of the cues, and thus could only learn these from experience. However, for cues leading to deterministic outcomes, a little experience can go a long way (Shteingart et al., 2013), and participants received 16 training trials prior to the test phase. Our data suggest that learning in this phase did not differ between the groups: in the first 5 choice trials in the test phase that involved a pair of sure cues, the probability of a correct response was 0.78 ± 0.18 in the DYT group and 0.81 ± 0.07 in the CTL group (Mann-Whitney U test, df = 24, P = 0.59). We verified that that this level of performance could result from trial-and-error learning by simulating the behaviors of individuals using the best-fit learning rates (see Materials and methods). The simulation confirmed that both groups should show similar rates of success on the first 5 choice trials (DYT 0.81 ± 0.17 probability for correct choice, CTL 0.87 ± 0.13, Mann-Whitney U test z = 0.88, df = 24, P = 0.38) despite differences in learning rates from positive and negative prediction errors (see Results). Indeed the model, which started from initial values of 0 and learned only via reinforcement learning, performed on average better than participants.

DOI: http://dx.doi.org/10.7554/eLife.14155.004

On trials in which the risky 0/10¢ cue appeared together with the equal-mean 5¢ sure cue, control participants showed risk-averse behavior, as is typically observed in such tasks (Kahneman and Tversky, 1979; Niv et al., 2012). In contrast, patients with DYT1 dystonia displayed significantly less risk aversion, choosing the risky stimulus more often than controls throughout the experiment (Figure 3a, Mann Whitney one-sided test for each block separately, all z > 1.68, df = 24, P < 0.05; Friedman’s test for effect of group after correcting for the effect of time χ2 = 16.2, df = 1, P < 0.0001). Overall, the probability of choosing the risky cue was significantly higher among patients with dystonia than among healthy controls (Figure 3b, probability of choosing the risky cue over the sure cue DYT 0.44 ± 0.18, CTL 0.25 ± 0.20, Mann-Whitney z = 2.33, df = 24, P < 0.05).

Figure 3. Risk taking in DYT1 dystonia patients as compared to healthy sex- and age-matched controls.

(a) Mean proportion (± s.e.m) of choosing the risky 0/10¢ cue over the sure 5¢ cue (15 trials per block) in each of the groups. DYT1 dystonia patients (red solid) were less risk-averse than controls (blue dashed). Results from several randomly-selected participants are plotted in the background to illustrate within-participant fluctuations in risk preference over the course of the experiment, presumably driven by ongoing trial-and-error learning. (b) Overall percentage of choosing the risky 0/10¢ cue throughout the experiment. Horizontal lines denote group means; grey boxes contain the 25th to 75th percentiles. DYT1 dystonia patients showed significantly more risk-taking behavior than healthy controls. (c) Proportion of choices of the risky 0/10¢ cue over the sure 5¢ cue, divided according to the outcome of the previous instance in which the risky cue was selected. Both controls and DYT1 dystonia patients chose the risky 0/10¢ cue significantly more often after a 10¢ ‘win’ than after a 0¢ ‘loss’ outcome, demonstrating the effect of previous outcomes on the current value of the risky 0/10¢ cue due to ongoing reinforcement learning. Error bars: s.e.m. The effect of recent outcomes on the propensity to choose the risky option was evident throughout the task, especially in the DYT group, and was seen after both free choice and forced trials (Figure 3—figure supplement 1), suggesting that participants continuously updated the value of the risky cue based on feedback, and used this learned value to determine their choices. (d) Risk taking was correlated with clinical severity of dystonia (Fahn-Marsden dystonia rating scale). The mean of the control group is denoted in blue for illustration purposes only. Interestingly, the regression line for DYT1 dystonia patients’ risk preference intersected the ordinate (0 severity of symptoms) close to the mean risk preference of healthy controls.

DOI: http://dx.doi.org/10.7554/eLife.14155.005

Figure 3.

Figure 3—figure supplement 1. Learning about the risky cue continued throughout the task.

Figure 3—figure supplement 1.

(a) Our experimental design was aimed explicitly at focusing on learning about the risky cue so that we could analyze learning from positive and negative prediction errors decoupled from initial learning about deterministic cues. As shown in Figure 3c, participants’ tendency to choose the risky 0/10 cue over the same-mean 5¢ cue was dynamically adjusted according to experience: if the previous choice of the risky cue was rewarded with 10¢, participants were significantly more likely to choose the risky cue again on the next time it was available, as compared to the case in which the previous choice of the risky cue resulted in 0¢. To verify that the value of the risky cue was continuously updated, we calculated the proportion of choices of the risky cue over the sure 5¢ cue after different outcomes of the previous instance in which the risky cue was selected, for different time bins throughout the task (15 risky trials in each). A three way ANOVA (group X outcome X time-bin) revealed a significant effect for group (P < 0.001), outcome (win or loss; P < 0.001) and no effect of time-bin or interactions. Post-hoc comparisons revealed that the differences between win and loss conditions were significant in all bins for the DYT group only (all Ps < 0.05, two tailed). The first two bins for the CTL group approached significance (P = 0.054, two-tailed). This analysis showed that DYT patients changed their behavior based on outcomes of the risky cue throughout training. Control participants, on the other hand, evidenced somewhat less learning as the task continued, with their behavior in the last quarter of training settling on a risk-averse policy that was not sensitive to local outcomes. In reinforcement learning, this could result from a gradual decrease of learning rates, which is optimal in a stationary environment. Indeed, the final risk-averse policy was predicted by our model, based on the ratio of positive and negative learning rates. In any case, these results suggest that participants learned to evaluate the risky cue based on experienced rewards, and that the locally fluctuating value of the risky cue affected choice behavior, at least in the first half of the experiment, and for the DYT group, throughout the experiment. (b) Recent work on similar reinforcement learning tasks has shown that choice trials and forced trials may exert different effects on learning (Cockburn et al., 2014). To test for this effect in our data, we examined separately the probability of choosing the risky cue over the sure cue following wins or losses, after either forced or choice trials. Our analysis revealed that choices were significantly dependent upon the previous outcome of the risky cue (P < 0.01, F = 7.45, df = 1 for main effect of win versus loss; 3-way ANOVA with factors outcome, choice and group) but not upon its context (P = 0.38, F = 0.93, df = 1 for main effect of forced vs. choice trials). Similar to Cockburn et al. (2014), we did observe a numerically smaller effect of the outcome of forced trials (as compared to choice trials) on future choices, however this was not significant (interaction between outcome and choice P = 0.46, F = 0.56, df = 1). P values in the figure reflect paired t-tests.

Figure 3—figure supplement 2. Effects of ongoing learning in the simulated data.

Figure 3—figure supplement 2.

Proportion of choices of the risky 0/10¢ cue over the sure 5¢ cue, divided according to the outcome of the previous instance in which the risky cue was selected, according to the asymmetric learning model with parameters fit to each participant’s behavior. The model captures the behavioral findings faithfully.

Figure 3—figure supplement 3. Sex of participants did not affect risk sensitivity in our task.

Figure 3—figure supplement 3.

To avoid any possible sex-dependent bias, we matched the sex of both groups when comparing control participants to DYT1 dystonia patients. Similar to Figure 3b, plotted are overall percentage of choices of the risky 0/10¢ cue over sure 5¢ cue throughout the experiment, for different participants (Female N = 8 in each group, filled dots; Male, N = 5 in each group, open dots). A two-way ANOVA (CTL/DYT x Male/Female) did not reveal a significant main effect of sex (P = 0.08) although this analysis is obviously underpowered. The difference between CTL and DYT remained significant in this analysis (P = 0.01 for the main effect of group).

Figure 3—figure supplement 4. Medication did not affect risk-sensitivity.

Figure 3—figure supplement 4.

To minimize the effect of medication on learning in our task, we tested patients before their scheduled dose of medication to the extent that this was possible. As in Figure 3b, plots show overall percentage of choices of the risky 0/10¢ cue over sure 5¢ cue throughout the experiment and its relation to medications and doses.(a) Similar risk-taking (choosing the risky 0/10¢ cue over the sure 5¢ cue) behavior among untreated patients and those taking trihexyphenidyl or baclofen, (b) lack of correlation between risk-taking behavior and the daily dose of trihexyphenidyl (Pearson’s r = 0.19, df = 11, P = 0.526) or (c) baclofen (Pearson’s r = −0.20, df = 11, P = 0.51) all suggested that medication did not contribute significantly to the observed results.

To rule out the possibility that DYT1 patients were simply making choices randomly, causing their behavior to seem indifferent to risk, we divided all 0/10¢ versus 5¢ choice trials according to the outcome of the previous trial in which the risky 0/10¢ cue was chosen. As shown in Figure 3c (see also Figure 3—figure supplement 1), both groups chose the risky 0/10¢ cue significantly more often after a 10¢ ‘win’ than after a 0¢ ‘loss’ outcome (DYT P < 0.005, CTL P < 0.05, Wilcoxon signed-rank test), attesting to intact reinforcement learning in the DYT group (see Figure 3—figure supplement 2, for a reinforcement learning simulation of the same result). If anything, DYT1 dystonia patients showed a greater difference between trials following a win and those following a loss. We next tested for a correlation between risk-taking behavior and the clinical severity of dystonia, as rated on the day of the experiment (see Materials and methods). The results showed that patients with more severe dystonia were more risk taking in our task (Figure 3d, Pearson’s r = 0.62, df = 11, P < 0.05). Risky behavior was not significantly affected by sex (Figure 3—figure supplement 3) or the patient's regime of regular medication (Figure 3—figure supplement 4), and the relationship between risk taking and symptom severity held even when controlling for these factors (p < 0.05 for symptom severity when regressing risk taking on symptom severity, age and either of the two medications; including both medications in the model lost significance for symptom severity, likely due to the large number of degrees of freedom for such a small sample size; age and medication did not achieve significance in any of the regressions).

To test whether increased risk-taking in DYT1 dystonia could be explained by asymmetry in the effects of positive and negative prediction errors on corticostriatal plasticity, we modeled participants’ choice data using an asymmetric reinforcement-learning model (see methods) where the learning rate (η) is modulated by (1+κ) when learning from positive prediction errors and by (1κ) when the prediction error is negative (also called a 'risk-sensitive' reinforcement learning model; Mihatsch and Neuneier, 2002; Niv et al., 2012). Our model also included an inverse-temperature parameter (β) controlling the randomness of choices. This approach exploits fluctuations in each individual’s propensity for risk taking (see Figure 3a) as they update their policy based on the outcomes they experience, to recover the learning rate η and learning asymmetry κ that provide the best fit to each participant’s observed behavior.

First, we tested whether the asymmetric-learning model is justified, that is, whether it explains participants’ data significantly better than the classical reinforcement-learning model with only learning-rate and inverse-temperature parameters. The results showed that the more complex model was justified for the majority of participants (16 out of 26 participants; DYT 6, CTL 19), and in particular, for participants who were risk seeking or risk taking (but not risk-neutral; Figure 4a).

Figure 4. Model comparison supports the asymmetric learning model.

Figure 4.

We compared three alternative models in terms of how well they fit the experimental data. (a) To compare the asymmetric learning model with the classical (symmetric) reinforcement learning (RL) model, we used the likelihood ratio test, which is valid for nested models. Plotted are the log likelihood differences between the asymmetric learning model and the classical RL model. Black line: the minimal difference above which there is a P < 0.05 chance that the additional parameter improves the behavioral fit, as tested via a likelihood ratio test for nested models (dots above this line support the asymmetric learning model). For the majority of participants (16 out of 26; DYT 6, CTL 10) the more complex asymmetric model was justified (chi square test with df = 1, P < 0.05). In particular, and as expected based on Niv et al. (2012), the asymmetric learning model was justified for participants who were either risk-averse or risk-taking, but not those who were risk-neutral. (b) The asymmetric learning model and the nonlinear utility model have the same number of free parameters and therefore could be compared directly using the likelihood of the data under each model. Plotted is the average probability per trial for the asymmetric learning model as compared to the nonlinear utility model, for healthy controls (blue dots) and patients with dystonia (red). Dots above the black line support the asymmetric learning model. The asymmetric learning model fit the majority of participants better than the utility model (15 out of 26; DYT 6, CTL 9) with large differences in likelihoods always in favor of the asymmetric model, and over the entire population the asymmetric learning model performed significantly better (paired one-tailed t-test on the difference in model likelihoods, t = 1.92, df = 25, P < 0.05).

DOI: http://dx.doi.org/10.7554/eLife.14155.010

We then compared the individually fit parameters of the asymmetric model across the two groups. We found significant differences between the groups in the learning asymmetry parameter (DYT −0.05 ± 0.27, CTL −0.34 ± 0.27, Mann-Whitney z = −2.51, df = 24, P < 0.05), but no differences in the other two parameters (learning rate DYT1 0.25 ± 0.19, CTL 0.14 ± 0.11, Mann-Whitney z = 1.33, df = 24, P = 0.18; inverse temperature DYT 0.68 ± 0.37, CTL 0.93 ± 0.47, Mann-Whitney z = −1.18, df = 24, P = 0.23). Thus patients’ behavior was consistent with enhanced learning from positive prediction errors and reduced learning from negative prediction errors as compared to healthy controls, despite the overall rate of learning and the degree of noise in choices (modeled by the inverse temperature parameter) being similar across groups. A significant correlation was also observed between the learning asymmetry parameter and the severity of dystonia (Pearson’s r = 0.64, df = 11, P < 0.05).

One alternative explanation for our results is that the nonlinearity of subjective utility functions (Kahneman and Tversky, 1979) for small amounts of money is different between DYT1 dystonia patients and controls. However, replicating previous results from a healthy cohort (Niv et al., 2012), formal model comparison suggested that choice behavior in our task is significantly better explained by the asymmetric-learning model above (Figure 4b). Moreover, the impetus for our experiment was an a priori hypothesis regarding risk sensitivity as a consequence of asymmetric learning, based on findings from the mouse model of DYT1 dystonia, which has no straightforward equivalent interpretation in terms of nonlinear utilities. We note also that strongly nonlinear utilities in the domain of small payoffs such as those we used here are generally unlikely (Rabin and Thaler, 2001), again suggesting that risk sensitivity is more likely to arise in our experiment from asymmetric learning. Another alternative explanation for behavior in our task, is a win-stay lose-shift strategy that is perhaps utilized to different extent by DYT1 patients and controls. However, this model, equivalent to the classical reinforcement-learning model with a learning rate of 1 and only an inverse temperature parameter, fit 25 out of 26 participants’ data considerably worse than the asymmetric learning model, and therefore was not investigated further.

Discussion

We demonstrated that DYT1 dystonia patients and healthy controls have different profiles of risk sensitivity in a trial-and-error learning task. Our results support the dominant model of reinforcement learning in the basal ganglia, according to which prediction-error modulated LTP and LTD in corticostriatal synapses are responsible for changing the propensity to repeat actions that previously led to positive or negative prediction errors, respectively. Similar to Parkinson’s disease, at first considered a motor disorder but now recognized to also cause cognitive and learning abnormalities, it appears that DYT1 dystonia is not limited to motor symptoms (Fiorio et al., 2007; Heiman et al., 2004; Molloy et al., 2003; Stamelou et al., 2012), and specifically, that the suspected altered balance between LTP and LTD in this disorder has overt, readily measurable effects on behavior.

DYT1 dystonia and Parkinson's disease can be viewed as complementary models for understanding the mechanisms of reinforcement learning in the human brain. In unmedicated Parkinson’s disease patients, learning from positive prediction errors is impaired due to reduced levels of striatal dopamine that presumably signal the prediction errors themselves, whereas learning from negative prediction errors is intact (Frank et al., 2004; Rutledge et al., 2009). This impairment, and the resulting asymmetry that favors learning from negative prediction errors, can be alleviated using dopaminergic medication (Frank et al., 2004; Shohamy et al., 2004). DYT1 dystonia patients, on the other hand, seem to have intact striatal dopamine signaling (Balcioglu et al., 2007; Dang et al., 2006; Grundmann et al., 2007; Zhao et al., 2008), but altered corticostriatal LTP/LTD that favors learning from positive prediction errors.

Our a priori predictions were based on a simplified model of the role of corticostriatal LTP and LTD in reinforcement learning, and the entire picture is undoubtedly more complex. Controversies regarding the functional relationship between the direct and indirect pathways of the basal ganglia (Calabresi et al., 2014; Cui et al., 2013; Kravitz et al., 2012) and the large number of players taking part in shaping synaptic plasticity (Calabresi et al., 2014; Shen et al., 2008) make it hard to pin down the precise mechanism behind reinforcement learning. Indeed, the DYT1 mouse model has also been linked to impaired plasticity in the indirect pathway due to D2 receptor dysfunction (Beeler et al., 2012; Napolitano et al., 2010; Wiecki et al., 2009), which can lead to abnormal reinforcement (Kravitz et al., 2012).

In any case, our finding are compatible with the prominent 'Go'/'NoGo' model of learning and action selection in the basal ganglia (Frank et al., 2004) that incorporates opposing directions of plasticity in the direct and indirect pathways (Collins and Frank, 2014). In particular, current evidence suggests that corticostriatal LTP following positive prediction errors and LTD following negative prediction errors occur in D1 striatal neurons (direct pathway), whereas plasticity in D2-expressing neurons (indirect pathway) is in the opposite direction (Kravitz et al., 2012; Shen et al., 2008). As the direct pathway supports choice (‘Go’) while the indirect pathway supports avoidance (‘NoGo’), under this implementation of reinforcement learning both types of learning eventually lead to the same behavioral outcome: a positive prediction error increases the probability that the action/choice that led to the prediction error would be repeated in the future, and vice versa for negative prediction errors. As such, at the algorithmic level in which our asymmetric learning model was cast, the differences we have shown between dystonia patients and controls would still be expected to manifest behaviorally through diminished risk-aversion in dystonia patients.

In particular, our results are compatible with several alternative abnormalities in corticostriatal plasticity in DYT1 dystonia: (a) Abnormally strong LTP/weak LTD in D1-expressing striatal neurons only, with plasticity in the indirect pathway being intact; in this case, learning in the direct pathway would exhibit the abnormal asymmetries we argue for, whereas the indirect pathway would learn as normal. (b) Abnormally strong LTP/weak LTD in D1-expressing striatal neurons and the opposite pattern, abnormally strong LTD and/or weak LTP in D2-expressing striatal neurons of the indirect pathway in DYT1 dystonia. As a result, a positive prediction error would generate extra strong positive learning in the Go pathway, and a similarly large decrease in the propensity to avoid this stimulus due to activity in the 'NoGo' pathway. Conversely, learning from negative prediction errors would generate relatively little decrease in the propensity to 'Go' to the stimulus and little increase in the propensity to 'NoGo'. In both cases, the effect on both pathways would be in the same direction as is seen in the behavioral asymmetry. (c) Finally, abnormalities may exist in both pathways in the same direction (stronger LTP and weaker LTD), but with a larger effect on LTP as compared to LTD. In this case, a positive prediction error would increase 'Go' activity considerably, but not decrease 'NoGo' activity to the same extent. Negative prediction errors, on the other hand, would increase 'NoGo' propensities while decreasing 'Go' propensities to a lesser extent. This type of asymmetry can explain why the rodent studies suggested almost absent (not only weaker) LTD, but nevertheless, patients did not behave as if they did not learn at all from negative prediction errors. Unfortunately, our model and behavioral results cannot differentiate between these three options. We hope that future data, especially from transgenic DYT1 rodents, will clarify this issue.

Relative weighting of positive and negative outcomes shapes risk-sensitivity in tasks that involve learning from experience. Humans with preserved function of the basal ganglia have been shown to be risk-averse in such tasks. We showed that patients with DYT1 dystonia are more risk-neutral, a rational pattern of behavior given our reward statistics, and in such tasks in general. While this type of behavior may offer advantages under certain conditions, it may also contribute to impaired reinforcement learning of motor repertoire and fixation on actions that were once rewarded. In any case, these reinforcement-learning manifestations of what has been considered predominantly a motor disease provide support for linking corticostriatal synaptic plasticity and overt trial-and-error learning behavior in humans.

Materials and methods

Subjects

Fourteen participants with genetically-proven (c.907_909delGAG) (Ozelius et al., 1997) symptomatic DYT1 dystonia were recruited through the clinics for movement disorders in Columbia University and Beth Israel Medical Centers in New York and through publication in the website of the Dystonia Medical Research Foundation. Exclusion criteria included age younger than 18 or older than 50 years old and deep brain stimulation or other prior brain surgeries for dystonia. A single patient was excluded from further analysis due to choosing the left cue in 100% of trials. Thirteen age- and sex-matched healthy participants were recruited among acquaintances of the DYT1 patients and from the Princeton University community. Healthy control participants were not blood relatives of patients with dystonia and did not have clinical dystonia. All patients and healthy controls had at least 13 years of education.

Nine DYT1 dystonia patients took baclofen (n = 6, daily dose 66.7 ± 28.0 mg, range 30–100 mg) and/or trihexyphenidyl (n = 7, daily dose 30.9 ± 25.8 mg, range 12–80 mg) for their motor symptoms. In order to reduce possible effects of medication, patients were tested before taking their scheduled dose. The median time interval between the last dose of medication and testing was 7.5 hr for baclofen (range 1–20 hr) and 13 hr for trihexyphenidyl (range 1–15 hr). Given that the reported plasma half-life times of baclofen is 6.8 hr (Wuis et al., 1989) and of trihexyphenidyl is 3.7 hr (Burke and Fahn, 1985), three patients were tested within the plasma half life of the last dose of their medication. Finally, we could not find correlation between sex of participants (Figure 3—figure supplement 1) or medication doses (Figure 3—figure supplement 4) and relevant behavioral outcomes.

Procedure

All participants gave informed consent and the study was approved by the Institutional Review Boards of Columbia University, Beth Israel Medical Center, and Princeton University. Clinical scale of dystonia severity was scored immediately after consenting by a movement-disorders specialist (DA), using the Fahn-Marsden dystonia rating scale (Burke et al., 1985). This scale integrates the number of involved body parts, the range of actions that induce dystonia and the severity of observed dystonia. One patient was scored 0 since dystonia was not clinically observed on the day of her testing.

Prior to, and after completing the reported task, all participants performed a short (8–9 min) unrelated auditory discrimination task (Baron, 1973) (results not reported here) that was not associated with any monetary reward. Participants were informed that the two tasks were not related.

Behavioral task

Four different pseudo-letters served as cues (‘slot machines’) and were randomly allocated to four payoff schedules: sure 0¢, sure 5¢, sure 10¢, and one variable-payoff ‘risky’ stimulus associated with equal probabilities of 0¢ or 10¢ payoffs. Participants were not informed about the payoffs associated with the different cues and had to learn them from trial and error.

Two types of trials were pseudo-randomly intermixed: In ‘choice trials’, two cues were displayed (left and right location randomized), and the participant was instructed to select one of the two cues by pressing either the left or right buttons on a keyboard. The cue that was not selected then disappeared and the payoff associated with the chosen cue was displayed for 1 s. After a variable (uniformly distributed) inter-trial interval of 1–2 s, the next trial began. In ‘forced trials’, only one cue was displayed on either the left or right side of the screen, and the participant had to indicate its location using the keyboard to obtain its associated outcome. All button presses were timed out after 1.5 s, at which time the trial was aborted with a message indicating that the response was 'too slow' and the inter-trial interval commenced. Participants were instructed to try to maximize their winnings and were paid according to their actual payoffs in the task. On-screen instructions for the task also informed participants that payoffs depended only on the ‘slot machine’ chosen, not on its location or on their history of choices. Throughout the experiment, to minimize motor precision requirements, any of keys E, W, A, Z, X, D, and S (on the left side of the keyboard) served as allowable response buttons for choosing the left cue and any of keys I, O, L, <, M, J and K (on the right side of the keyboard) served as allowable response buttons for choosing the right cue. Each set of response keys was marked with stickers of different colors (blue for left keys and red for right keys) to aid in their visual identification.

Participants were first familiarized with the task and provided with several observations of the cue–reward mapping in a training phase that included two subparts. The first part involved 16 pseudo-randomly ordered forced trials (four per cue). The second part comprised 10 pseudo-randomly ordered choice trials (two of each of five types of choice trials: 0¢ versus 5¢, 5¢ versus 10¢, 0¢ versus 0/10¢, 5¢ versus 0/10¢ and 10¢ versus 0/10¢).

Before the experimental task began, on-screen instructions informed subjects that they would encounter the same cues as in the training phase. They were briefly reminded of the rules and encouraged to choose those ‘slot machines’ that yielded the highest payoffs, as they would be paid their earnings in this part. The task then consisted of 300 trials (two blocks of 150 trials each, with short breaks after every 100 trials of the experiment), with choice and forced trials randomly intermixed. Each block comprised of 30 'risk' choice trials involving a choice between the 5¢ cue and the 0/10¢ cue, 20 choice trials involving each of the pairs 0¢ versus 0/10¢ and 10¢ versus 0/10¢, 15 choice trials involving each of the pairs 0¢ versus 5¢ and 5¢ versus 10¢, 14 forced trials involving the 0/10¢ cue and 12 forced trials involving each of the 0¢, 5¢ and 10¢ cues. Trial order was pseudo-randomized in advance and was similar between patients and between blocks. Payoffs for the 0/10¢ cue were counterbalanced such that groups of eight consecutive choices of the risky cue included exactly four 0¢ payoffs and four 10¢ payoffs. All task events were controlled using MATLAB (MathWorks, Natick, MA) PsychToolbox (Brainard, 1997).

Our modeling and quantification of the effects of abnormal learning from prediction errors rest solely on the risky cue, for which learning presumably continued throughout the experiment. However, one potential worry is that participants did not use trial-and-error learning to evaluate this cue, but rather ‘guessed’ its value using a cognitive system (as in rule-based learning). To evaluate this possibility, we tested for a difference in the propensity to choose the risky cue after a previous win or a loss, throughout the task (see Figure 3c and Figure 3—figure supplement 1).

Modeling

To test the hypothesis that increased risk-taking in DYT1 dystonia was due to an enhanced effect of positive prediction errors and a weak effect of negative prediction errors, we modeled participants’ choice data using an asymmetric reinforcement learning model (also called a risk-sensitive temporal difference reinforcement learning (RSTD) model) (Mihatsch and Neuneier, 2002; Niv et al., 2012). The learning rule in this model is

Vnew(cue)= Vold(cue)+ηδ(1±κ)

where V(cue) is the value of the chosen cue, δ = r – Vold(cue) is the prediction error, that is, the difference between the obtained reward r and the predicted reward Vold(cue), η is a learning-rate parameter and κ is an asymmetry parameter that is applied as (1+κ) if the prediction error is positive (δ>0) and as (1κ) if the prediction error is negative (δ<0). This model is fully equivalent to a model with two learning rate parameters, one for learning when prediction errors are positive and another for learning when prediction errors are negative. Following common practice, we also assumed a softmax (or sigmoid) action selection function:

p(A)= eβV(A)eβV(A)+eβV(B)

where p(A) is the probability of choosing cue A over cue B, and β is an inverse-temperature parameter controlling the randomness of choices (Niv et al., 2012).

We fit the free parameters of the model (η, κ, and β) to behavioral data of individual participants, using data from both training and test trials (total of 326 trials) as participants learned to associate cues with their outcomes from the first training trial. Cue values were initialized to 0. We optimized model parameters by minimizing the negative log likelihood of the data given different settings of the model parameters using the Matlab function "fminunc". The explored ranges of model parameters was [0,1] for the learning-rate parameter, [−10,10] for the learning-asymmetry parameters, and [0–30] for the inverse-temperature parameter. To avoid local minima, for each participant we repeated the optimization 5 times from randomly chosen starting points, keeping the best (maximum likelihood) result. This method is commonly used for temporal difference learning models and is known to be well-behaved (Niv et al., 2012).

Previous work has shown that the asymmetric learning model best explains participants’ behavior in our task (Niv et al., 2012). To replicate those results in our sample population, we compared the asymmetric learning model to three other alternative models. The first was a classical reinforcement learning model with no learning asymmetry Vnew(cue)=Vold(cue)+η[RVold(cue)]. The second alternative model was based on the classical nonlinear (diminishing) subjective utility of monetary rewards. The idea is that the 10¢ reward may not be subjectively equal to twice the 5¢ reward, therefore engendering risk-sensitive choices in our task. We thus defined learning in a nonlinear utility model as Vnew(cue)=Vold(cue)+η[U(R)Vold(cue)],  where U(R) is the subjective utility of reward R. Without loss of generality, we parameterized the utility function over the three possible outcomes (0¢, 5¢ or 10¢) by setting U(0) = 0, U(5) = 5 and U(10) = a×10, where the parameter a could be larger, equal to or smaller than 1, and was fit to the data of each participant separately. If the effect of a loss is larger than that of a gain of the same magnitude, a should be smaller than 1. Finally, we tested a win-stay-lose-shift strategy model that is equivalent to the classic reinforcement learning model with a learning rate of 1. All models used the softmax choice function with an inverse temperature parameter β. The parameters of each of the models were fit to each participant’s data as was done for the asymmetric learning model.

Statistical analysis

Because the relevant sets of data were not normally distributed (tested using a Kolmogorov-Smirnov test, P < 0.05), we analyzed the data using the nonparametric Mann-Whitney U test to compare two populations, Wilcoxon signed-rank test for repeated measures tests, and Friedman’s test for non-parametric one-way repeated measures analysis of variance by ranks. All statistical tests were two-sided unless otherwise specified.

Acknowledgements

This research was supported in part by the Parkinson’s Disease Foundation (DA), the NIH Office of Rare Diseases Research through the Dystonia Coalition (DA) and the National Institute for Psychobiology in Israel (DA), a Sloan Research Fellowship to YN, NIH grant R01MH098861 (AR and YN) and Army Research Office grant W911NF-14-1-0101 (YN & AR). We are grateful to Hagai Bergman, Reka Daniel, Nathaniel Daw, Stanley Fahn, Ann Graybiel, Elliot Ludvig, Rony Paz, Daphna Shohamy, Nicholas Turk-Browne and Jeff Wickens for very helpful comments on previous versions of the manuscript.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Funding Information

This paper was supported by the following grants:

  • Parkinson's Disease Foundation to David Arkadir.

  • National Institutes of Health NIH Office of Rare Diseases Research through the Dystonia Coalition to David Arkadir.

  • National Institute for Psychobiology in Israel, Hebrew University of Jerusalem to David Arkadir.

  • Alfred P. Sloan Foundation Sloan Research Fellowship to Yael Niv.

  • National Institute of Mental Health R01MH098861 to Angela Radulescu, Yael Niv.

  • Army Research Office W911NF-14-1-0101 to Angela Radulescu, Yael Niv.

Additional information

Competing interests

The authors declare that no competing interests exist.

Author contributions

DA, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article.

AR, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article.

DR, Facilitated access to patient population, Assistance with patient recruitment, Drafting or revising the article.

NL, Facilitated access to patient population, Assistance with patient recruitment, Drafting or revising the article.

SBB, Facilitated access to patient population, Assistance with patient recruitment, Drafting or revising the article.

PM, Analysis and interpretation of data, Drafting or revising the article.

YN, Conception and design, Analysis and interpretation of data, Drafting or revising the article.

Ethics

Human subjects: All participants gave informed consent and the study was approved by the Institutional Review Boards of Columbia University, Beth Israel Medical Center, and Princeton University.

References

  1. Balcioglu A, Kim MO, Sharma N, Cha JH, Breakefield XO, Standaert DG. Dopamine release is impaired in a mouse model of DYT1 dystonia. Journal of Neurochemistry. 2007;102:783–788. doi: 10.1111/j.1471-4159.2007.04590.x. [DOI] [PubMed] [Google Scholar]
  2. Barnes TD, Kubota Y, Hu D, Jin DZ, Graybiel AM. Activity of striatal neurons reflects dynamic encoding and recoding of procedural memories. Nature. 2005;437:1158–1161. doi: 10.1038/nature04053. [DOI] [PubMed] [Google Scholar]
  3. Baron A. Postdiscrimination gradients of human subjects on a tone continuum. Journal of Experimental Psychology. 1973;101:337–342. doi: 10.1037/h0035206. [DOI] [PubMed] [Google Scholar]
  4. Barto AG. Adaptive Critics and the Basal Ganglia. In: Houk J. C, Davis J. L, Beiser D. G, editors. Models of Information Processing in the Basal Ganglia. Cambridge, MA: MIT Press; 1995. pp. 215–232. [Google Scholar]
  5. Beeler JA, Frank MJ, McDaid J, Alexander E, Turkson S, Bernardez Sarria MS, Bernandez MS, McGehee DS, Zhuang X. A role for dopamine-mediated learning in the pathophysiology and treatment of Parkinson's disease. Cell Reports. 2012;2:1747–1761. doi: 10.1016/j.celrep.2012.11.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Brainard DH. The psychophysics toolbox. Spatial Vision. 1997;10:433–436. doi: 10.1163/156856897X00357. [DOI] [PubMed] [Google Scholar]
  7. Burke RE, Fahn S. Pharmacokinetics of trihexyphenidyl after short-term and long-term administration to dystonic patients. Annals of Neurology. 1985;18:35–40. doi: 10.1002/ana.410180107. [DOI] [PubMed] [Google Scholar]
  8. Burke RE, Fahn S, Marsden CD, Bressman SB, Moskowitz C, Friedman J. Validity and reliability of a rating scale for the primary torsion dystonias. Neurology. 1985;35:73–77. doi: 10.1212/WNL.35.1.73. [DOI] [PubMed] [Google Scholar]
  9. Calabresi P, Picconi B, Tozzi A, Ghiglieri V, Di Filippo M. Direct and indirect pathways of basal ganglia: a critical reappraisal. Nature Neuroscience. 2014;17:1022–1030. doi: 10.1038/nn.3743. [DOI] [PubMed] [Google Scholar]
  10. Cockburn J, Collins AG, Frank MJ. A reinforcement learning mechanism responsible for the valuation of free choice. Neuron. 2014;83:551–557. doi: 10.1016/j.neuron.2014.06.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Collins AG, Frank MJ. Opponent actor learning (OpAL): modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive. Psychological Review. 2014;121:337–366. doi: 10.1037/a0037015. [DOI] [PubMed] [Google Scholar]
  12. Cui G, Jun SB, Jin X, Pham MD, Vogel SS, Lovinger DM, Costa RM. Concurrent activation of striatal direct and indirect pathways during action initiation. Nature. 2013;494:238–242. doi: 10.1038/nature11846. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Dang MT, Yokoi F, Pence MA, Li Y. Motor deficits and hyperactivity in Dyt1 knockdown mice. Neuroscience Research. 2006;56:470–474. doi: 10.1016/j.neures.2006.09.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Fiorio M, Gambarin M, Valente EM, Liberini P, Loi M, Cossu G, Moretto G, Bhatia KP, Defazio G, Aglioti SM, Fiaschi A, Tinazzi M. Defective temporal processing of sensory stimuli in DYT1 mutation carriers: a new endophenotype of dystonia? Brain . 2007;130:134–142. doi: 10.1093/brain/awl283. [DOI] [PubMed] [Google Scholar]
  15. Frank MJ, Seeberger LC, O'reilly RC. By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science. 2004;306:1940–1943. doi: 10.1126/science.1102941. [DOI] [PubMed] [Google Scholar]
  16. Grundmann K, Reischmann B, Vanhoutte G, Hübener J, Teismann P, Hauser TK, Bonin M, Wilbertz J, Horn S, Nguyen HP, Kuhn M, Chanarat S, Wolburg H, Van der Linden A, Riess O. Overexpression of human wildtype torsinA and human DeltaGAG torsinA in a transgenic mouse model causes phenotypic abnormalities. Neurobiology of Disease. 2007;27:190–206. doi: 10.1016/j.nbd.2007.04.015. [DOI] [PubMed] [Google Scholar]
  17. Grundmann K, Glöckle N, Martella G, Sciamanna G, Hauser TK, Yu L, Castaneda S, Pichler B, Fehrenbacher B, Schaller M, Nuscher B, Haass C, Hettich J, Yue Z, Nguyen HP, Pisani A, Riess O, Ott T. Generation of a novel rodent model for DYT1 dystonia. Neurobiology of Disease. 2012;47:61–74. doi: 10.1016/j.nbd.2012.03.024. [DOI] [PubMed] [Google Scholar]
  18. Heiman GA, Ottman R, Saunders-Pullman RJ, Ozelius LJ, Risch NJ, Bressman SB. Increased risk for recurrent major depression in DYT1 dystonia mutation carriers. Neurology. 2004;63:631–637. doi: 10.1212/01.WNL.0000137113.39225.FA. [DOI] [PubMed] [Google Scholar]
  19. Kahneman D, Tversky A. Prospect Theory: An Analysis of Decision under Risk. Econometrica. 1979;47:263. doi: 10.2307/1914185. [DOI] [Google Scholar]
  20. Kravitz AV, Tye LD, Kreitzer AC. Distinct roles for direct and indirect pathway striatal neurons in reinforcement. Nature Neuroscience. 2012;15:816–818. doi: 10.1038/nn.3100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Martella G, Tassone A, Sciamanna G, Platania P, Cuomo D, Viscomi MT, Bonsi P, Cacci E, Biagioni S, Usiello A, Bernardi G, Sharma N, Standaert DG, Pisani A. Impairment of bidirectional synaptic plasticity in the striatum of a mouse model of DYT1 dystonia: role of endogenous acetylcholine. Brain. 2009;132:2336–2349. doi: 10.1093/brain/awp194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Mihatsch O, Neuneier R. Risk-sensitive reinforcement learning. Machine Learning. 2002;49:267–290. doi: 10.1023/A:1017940631555. [DOI] [Google Scholar]
  23. Molloy FM, Carr TD, Zeuner KE, Dambrosia JM, Hallett M. Abnormalities of spatial discrimination in focal and generalized dystonia. Brain. 2003;126:2175–2182. doi: 10.1093/brain/awg219. [DOI] [PubMed] [Google Scholar]
  24. Napolitano F, Pasqualetti M, Usiello A, Santini E, Pacini G, Sciamanna G, Errico F, Tassone A, Di Dato, Martella G, Cuomo D, Fisone G, Bernardi G, Mandolesi G, Mercuri NB, Standaert DG, Pisani A. Dopamine D2 receptor dysfunction is rescued by adenosine A2A receptor antagonism in a model of DYT1 dystonia. Neurobiology of Disease. 2010;38:434–445. doi: 10.1016/j.nbd.2010.03.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Niv Y, Edlund JA, Dayan P, O'Doherty JP. Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain.  Journal of Neuroscience. 2012;32:551–562. doi: 10.1523/JNEUROSCI.5498-10.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Ozelius LJ, Hewett JW, Page CE, Bressman SB, Kramer PL, Shalish C, de Leon D, Brin MF, Raymond D, Corey DP, Fahn S, Risch NJ, Buckler AJ, Gusella JF, Breakefield XO. The early-onset torsion dystonia gene (DYT1) encodes an ATP-binding protein. Nature Genetics. 1997;17:40–48. doi: 10.1038/ng0997-40. [DOI] [PubMed] [Google Scholar]
  27. Paudel R, Hardy J, Revesz T, Holton JL, Houlden H. Review: genetics and neuropathology of primary pure dystonia. Neuropathology and Applied Neurobiology. 2012;38:520–534. doi: 10.1111/j.1365-2990.2012.01298.x. [DOI] [PubMed] [Google Scholar]
  28. Pessiglione M, Seymour B, Flandin G, Dolan RJ, Frith CD. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature. 2006;442:1042–1045. doi: 10.1038/nature05051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Rabin M, Thaler RH. Anomalies: Risk Aversion. Journal of Economic Perspectives. 2001;15:219–232. doi: 10.1257/jep.15.1.219. [DOI] [Google Scholar]
  30. Reynolds JN, Hyland BI, Wickens JR. A cellular mechanism of reward-related learning. Nature. 2001;413:67–70. doi: 10.1038/35092560. [DOI] [PubMed] [Google Scholar]
  31. Rutledge RB, Lazzaro SC, Lau B, Myers CE, Gluck MA, Glimcher PW. Dopaminergic drugs modulate learning rates and perseveration in Parkinson's patients in a dynamic foraging task. Journal of Neuroscience. 2009;29:15104–15114. doi: 10.1523/JNEUROSCI.3524-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Schultz W, Dayan P, Montague PR. A neural substrate of prediction and reward. Science. 1997;275:1593–1599. doi: 10.1126/science.275.5306.1593. [DOI] [PubMed] [Google Scholar]
  33. Shen W, Flajolet M, Greengard P, Surmeier DJ. Dichotomous dopaminergic control of striatal synaptic plasticity. Science. 2008;321:848–851. doi: 10.1126/science.1160575. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Shohamy D, Myers CE, Grossman S, Sage J, Gluck MA, Poldrack RA. Cortico-striatal contributions to feedback-based learning: converging data from neuroimaging and neuropsychology. Brain. 2004;127:851–859. doi: 10.1093/brain/awh100. [DOI] [PubMed] [Google Scholar]
  35. Shteingart H, Neiman T, Loewenstein Y. The role of first impression in operant learning. Journal of Experimental Psychology. 2013;142:476–488. doi: 10.1037/a0029550. [DOI] [PubMed] [Google Scholar]
  36. Stamelou M, Edwards MJ, Hallett M, Bhatia KP. The non-motor syndrome of primary dystonia: clinical and pathophysiological implications. Brain. 2012;135:1668–1681. doi: 10.1093/brain/awr224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Steinberg EE, Keiflin R, Boivin JR, Witten IB, Deisseroth K, Janak PH. A causal link between prediction errors, dopamine neurons and learning. Nature Neuroscience. 2013;16:966–973. doi: 10.1038/nn.3413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Sutton RS, Barto AG. Reinforcement Learning: An Introduction. Cambridge, Mass: MIT Press; 1998. [Google Scholar]
  39. Wiecki TV, Riedinger K, von Ameln-Mayerhofer A, Schmidt WJ, Frank MJ. A neurocomputational account of catalepsy sensitization induced by D2 receptor blockade in rats: context dependency, extinction, and renewal. Psychopharmacology. 2009;204:265–277. doi: 10.1007/s00213-008-1457-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Wuis EW, Dirks MJ, Termond EF, Vree TB, Van der Kleijn E. Plasma and urinary excretion kinetics of oral baclofen in healthy subjects. European Journal of Clinical Pharmacology. 1989;37:181–184. doi: 10.1007/BF00558228. [DOI] [PubMed] [Google Scholar]
  41. Zhao Y, DeCuypere M, LeDoux MS. Abnormal motor function and dopamine neurotransmission in DYT1 DeltaGAG transgenic mice. Experimental Neurology. 2008;210:719–730. doi: 10.1016/j.expneurol.2007.12.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
eLife. 2016 Jun 1;5:e14155. doi: 10.7554/eLife.14155.014

Decision letter

Editor: Rui M Costa1

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "DYT1 dystonia increases risk taking in humans" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Timothy Behrens as the Senior Editor.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission. The reviewers found the behavioral findings interesting. However, they indicate that the model presented needs to be better validated, and that alternative models should be considered to explain the data. For example, given the current claims it is important to verify that the risk-taking effect is demonstrably a learning effect rather than a choice-like effect or a WSLS type effect that would not implicate striatal plasticity. To be clear, this was the key concern raised in the reviewer discussion and the reviewers were clear that publication of the study would be contingent on the outcome of these analyses.

Furthermore, in the current version of the study, there is no evidence that the changes in corticostriatal plasticity mentioned are responsible for the risk taking behavior. So that interpretation should be toned down, unless new evidence is presented. Below are the detailed comments, which should help in preparing a revised version.

Reviewer #1:

I enjoyed reading this relatively brief and focused test of the role of dopamine in learning in human subjects. Several previous reports have exploited the fact that the impact of diminished dopamine on learning can be investigated in Parkinson's disease patients (Frank et al., Science, 2004, Shohamy et al., Brain, 2004; Rutledge et al., J. Neurosci., 2009) but the novelty of this report is that it examines the impact of a dopaminergic/striatal change occurring in a type of dystonia, DYT1 dystonia, that is, in a sense, the opposite way round (in the sense that there is an increase in LTP/decrease in LTD in transgenic animals). Perhaps the only other way to conduct a test of this sort is by examining the impact of dopamine increases by L-DOPA treatment (e.g. Pessiglione et al., Nature, 2006).

The task used is one that has been previously used by one of the authors (Niv et al., J. Neuroscience, 2012). The previous report also justifies the modeling approach taken. I therefore had very few questions. I wondered, however, about the following minor points.

1) In the last paragraph of the Introduction. Although I agree that the results are interesting I was not always quite sure precisely how the authors intended to frame them. Perhaps changing or adding a few words in the Introduction or Discussion might help provide the final clarification needed. The current version emphasizes that the dystonic condition is associated with a change in LTP/LTD balance and emphasizes that this might lead to a change in learning from positive prediction errors. By contrast when discussing studies in Parkinson's patients they seem to emphasize dopaminergic changes. Is it not the case that changes in dopamine (or its balance with other neurotransmitters) occur both in Parkinson's patients and dystonic patients and that as a consequence both will be expected to lead to changes in LTP /LTD ratios. Or is there no evidence of a change in LTP/LDT in Parkinson's disease?

2) While I understand that most patients with this particular condition have normal general intelligence I wondered if there was any information about intelligence/education matching in the control and patient groups reported here?

3) In the fourth paragraph of the Discussion. It is stated that "Current methodologies cannot yet bridge the gap between cellular level processes (LTP/LTD) and behavior at the level of an animal, let alone a human." Arguably, however, there are techniques that can be used to induce increments and decrements in the strengths of specific connexions even in awake, behaving human subjects (Buch et al., 2011, J. Neuroscience, Johnen et al., eLife, 2015) although their use is limited to the cortex.

Reviewer #2:

This study investigates risk-taking in a reinforcement learning experiment in 13 patients with dystonia and matched controls. Subjects learned to select between pairs of cues (or are forced to pick one). Three cues lead deterministically to 0, 5 or 10c, and the last one to 0 or 10c with equal probability. Risk was assessed by the choice between the 5c and the probabilistic cues, which have same expected value. The authors show that controls are significantly more risk-averse in this task than patients, who appear to be risk neutral. They show evidence that this is not due to patients choosing randomly, but that they are sensitive to previous reward and thus presumably learning the value over time. They argue that the behavioral difference can be accounted for by assuming variable learning rates for positive vs. negative prediction errors, with a greater asymmetry for patients than controls. The authors hypothesize, based on a rodent model showing abnormally high LTP and low LTD, that the observed behavioral effects thus reflect differences in synaptic plasticity.

Overall, this paper is well written and interesting. There are some limitations to the study.

1) The authors overstate the implications of this study in the Abstract, main text and Discussion. For example, the Discussion states that this "may provide the first link between synaptic plasticity and overt learning behavior in humans". This is a stretch: for one, based on the literature they present, there is no direct evidence that dyt1 patients' impairment is indeed a plasticity impairment. Furthermore, the behavioral evidence tying risk taking to learning here is also weak (see next point). These results may provide more evidence, of a different nature, but of a similar degree of "directness" as existing evidence, for example linking dopamine dependent plasticity individual differences to learning behavior in humans.

2) Model validation. Despite the authors' careful effort to validate it, the RSTD is still not very convincing in this data set.

A) While it fits better over the whole group, it does not seem to be the case within the patient group (6/13). This is a problem, as it may indicate that the fit parameters don't reflect important aspects of the behavior within this group, rendering interpretation difficult.

B) This may be a limitation of the task: Dyt1 patients appear to be in average risk neutral, such that a task in which the only risk comparison relies on equal average value will render the RSTD not identifiable beyond TD.

C) An important validation of the model would be to confirm with simulations that it can reproduce behavioral pattern 3c. This is an important, model-independent data point supporting the learning interpretation, and the model should at minima be able to capture this qualitative phenomenon.

D) As the authors point out, there is nearly no learning needed in this task. A natural model to consider would thus be a non-learning utility model, parameterized with the same a parameter as the learning utility model presented here, but assuming known probabilities. While 3c hints at a potential learning effect, it could also reflect a non-learning win-stay lose-shift strategy. Investigating whether this combination accounts for behavior better or worse than RSTD would be an important point in ensuring whether the learning interpretation is valid.

3) If the RSTD model is validated, it would be useful to also present results relating to the learning rates directly, not just the difference index, as authors attempt an interpretation in terms of LTP/and or LTD.

4) It would also be important to know whether previous rewards have differential effects depending on whether the trial was a free or forced choice trial (e.g. Figure 3C). Recent work (Cockburn et al) has shown that value learning, especially from positive prediction errors, differed between these conditions, and if patients and controls had baseline differences in risk taking for non-learning reasons, this would lead patients to experience more free-choice risky trials than controls, potentially biasing learning. This might be important to rule out.

5) It would be interesting to report patients' behavior in absolute, not only relative to controls. It appears that they are in average risk neutral, which is not a suboptimal or irrational thing to do in this task. It might be interesting to discuss why their impairment appears to "correct an imbalance" that is seen in healthy controls, contrary to what is usually observed in patient studies.

Reviewer #3:

Building on animal work showing that DYT1 dystonia animal models are associated with exaggerated LTP and diminished LTD, the authors outline an experiment testing for learning abnormalities in human DYT1 dystonia patients. The authors hypothesize that DYT1 patients will show a positive learning bias owing to increased LTP and/or reduced LTD. Consistent with their hypothesis, DYT1 patients do indeed show atypical responsivity to outcomes in that they do not exhibit risk aversion as did matched controls.

The authors correctly emphasize the chasm between synaptic modification and behavior. The application of genetically linked animal models and human disease states in concert with an algorithmic description of a learning mechanism offers an exciting and constructive path forward. However, this depends critically on a shared mechanism between the animal model and the human brain. I cannot speak to the quality of the animal models or the nature of DYT1 dystonia; however, the text leaves some question as to the shared commonality. The authors point to atypical LPT/LTD in animal models, but abnormal dopamine/acetylcholine transmission in humans. Given the manuscript's emphasis on lessening the gap between synapse and behavior, I feel that the manuscript could benefit from more support linking animal and human disorders (perhaps via behavioral patterns in the animal models, mechanisms through which medications operate etc.).

Of greater concern is whether these results truly inculpate a reinforcement learning mechanism. As outlined out by the authors, patients exhibited a strong propensity to pick the risky option again following a win but avoided it following a loss, which is argued to demonstrate outcome sensitivity. But, this response pattern does not necessitate a reinforcement learning strategy. These data also appear to be consistent with a win-stay/lose-shift strategy (WS/LS). Given the 50% chance of reward on the risky stimulus, a WS/LS strategy is consistent with the reported lack of risk aversion. Furthermore, the model generated response pattern illustrated in Figure 1B (bottom panel), suggests that a reinforcement learning strategy becomes increasingly risk averse. There does appear to be some trend of increasing risk aversion in controls, but not so for the patient group. Given that there are no clear response patterns that demonstrate a reinforcement learning strategy (i.e. there are no learning curves etc.), I feel that the results would be more interpretable if a WS/LS model was also included in the analysis and Discussion.

Could observed effects in the patient group be driven by medication withdrawal? The authors offer helpful discussion of medication half-life, but this only serves to demonstrate that effects are probably not driven by the medication itself.

What were the model parameter ranges explored while fitting the data? Within subject variance (Figure 3A) seem to indicate considerably more variance in the control group (though I understand this to be a subset of the dataset), so I am a bit surprised by the apparent lack of effect in the inverse temperature parameter.

eLife. 2016 Jun 1;5:e14155. doi: 10.7554/eLife.14155.015

Author response


Reviewer #1:

1) In the last paragraph of the Introduction. Although I agree that the results are interesting I was not always quite sure precisely how the authors intended to frame them. Perhaps changing or adding a few words in the Introduction or Discussion might help provide the final clarification needed. The current version emphasizes that the dystonic condition is associated with a change in LTP/LTD balance and emphasizes that this might lead to a change in learning from positive prediction errors. By contrast when discussing studies in Parkinson's patients they seem to emphasize dopaminergic changes. Is it not the case that changes in dopamine (or its balance with other neurotransmitters) occur both in Parkinson's patients and dystonic patients and that as a consequence both will be expected to lead to changes in LTP /LTD ratios. Or is there no evidence of a change in LTP/LDT in Parkinson's disease?

We thank the reviewer for this comment and apologize for the confusion. Indeed, there is no evidence of altered dopaminergic signaling in dystonia whereas plasticity itself seems intact in Parkinson’s disease. Therefore, the two disorders model alterations in different parts of the hypothesized mechanism of reinforcement learning.

Dopamine release in the striatum is preserved in both patients with DYT1 dystonia and rodent models of the disease. Autopsies in human with the disease did not reveal any evidence for loss of midbrain dopaminergic neurons (Furukawa et al., 2000; Rostasy et al., 2003) and striatal dopamine levels are normal (Augood et al., 2002; Furukawa et al., 2000). Also, in Rodent models these levels are normal (Balcioglu et al., 2007; Dang et al., 2006; Grundmann et al., 2007; Zhao et al., 2008). Lack of response of patients with DYT1 dystonia to either dopaminergic agonists or antagonists furthermore supports the assumption that striatal levels of dopamine do not play a role in DYT1 dystonia. The physiological importance of certain impairments such as reduced dopamine release triggered by amphetamine (Balcioglu et al., 2007) and increase dopamine turnover (Zhao et al., 2008) is still unknown.

This stands in contrast with Parkinson's disease in which dopamine deficiency (due to progressive death of dopamine neurons in the substantia nigra pars compacta) is the uncontested main player. Indeed, there is no evidence for impaired corticostriatal plasticity (LTP/LTD) in Parkinson’s disease, as the therapeutic effect of increasing dopamine levels in the striatum in this disorder can partly attest to (Thiele et al., 2014).

To summarize, our claim is that while impairments in either striatal dopamine and corticostriatal plasticity can, according to models of reinforcement learning in the basal ganglia, result in similar behavioral outcomes, one finding does not fall out from the other, and they each need to be tested separately. In this sense, our findings are novel and provide non-redundant support for the reinforcement learning model. Based on the Reviewer‘s suggestion we have now modified the Introduction of our revised manuscript so as to more clearly state how our study, that investigates the effects of presumed altered plasticity (but intact dopamine signaling) is different from (but related to) findings from Parkinson’s disease where plasticity is intact but dopamine signaling is altered:

“Dopamine’s role as a reinforcing signal for trial-and-error learning is supported by numerous findings (Pessiglione et al., 2006; Schultz et al., 1997; Steinberg et al., 2013), including in humans, where Parkinson’s disease serves as a human model for altered dopaminergic transmission (Frank et al., 2004). […] In particular, our predictions stem from considering the effects of intact prediction errors on an altered plasticity mechanism that amplifies the effect of positive prediction errors (i.e., responds to positive prediction errors with more LTP than would otherwise occur in controls) and mutes the effects of negative prediction errors (that is, responds with weakened LTD as compared to controls).”

We have also modified the Discussion to make this point clearer:

“DYT1 dystonia and Parkinson's disease can be viewed as complementary models for understanding the mechanisms of reinforcement learning in the human brain.. […] DYT1 dystonia patients, on the other hand, seem to have intact striatal dopamine signaling, but altered corticostriatal LTP/LTD that favors learning from positive prediction errors.”

2) While I understand that most patients with this particular condition have normal general intelligence I wondered if there was any information about intelligence/education matching in the control and patient groups reported here?

We agree with the Reviewer that this information is important. We matched the level of education between groups and all our patient had normal intelligence and at least 13 years of formal education. This information is given under Material and methods. We did not perform formal intelligence tests but confirmed that all subject fully understood the task.

3) In the fourth paragraph of the Discussion. It is stated that "Current methodologies cannot yet bridge the gap between cellular level processes (LTP/LTD) and behavior at the level of an animal, let alone a human." Arguably, however, there are techniques that can be used to induce increments and decrements in the strengths of specific connexions even in awake, behaving human subjects (Buch et al., 2011, J. Neuroscience, Johnen et al., eLife, 2015) although their use is limited to the cortex.

The Reviewer is absolutely correct that the introduction of different TMS protocols (such as paired-pulse stimuli) narrows the gap between LTP/LTD as demonstrated on the cellular levels and behaving organisms. We also agree with the Reviewer that this technique probably modulates cortical plasticity. We therefore deleted this sentence.

Reviewer #2:

1) The authors overstate the implications of this study in the Abstract, main text and Discussion. For example, the Discussion states that this "may provide the first link between synaptic plasticity and overt learning behavior in humans". This is a stretch: for one, based on the literature they present, there is no direct evidence that dyt1 patients' impairment is indeed a plasticity impairment. Furthermore, the behavioral evidence tying risk taking to learning here is also weak (see next point). These results may provide more evidence, of a different nature, but of a similar degree of "directness" as existing evidence, for example linking dopamine dependent plasticity individual differences to learning behavior in humans.

We thank the Reviewer for his comment, and agree that we might have overstated our findings. We have now adopted the Reviewer’s suggestion and attenuated our claim by modifying the relevant sentences in the manuscript. For example, in the Abstract, "implicating striatal plasticity" has been modified to "supporting striatal plasticity" and in the final sentence of the Discussion, "[…] may provide the first link between synaptic plasticity and overt learning behavior" has been modified to".[…] support the link between synaptic plasticity and overt learning behavior".

2) Model validation. Despite the authors' careful effort to validate it, the RSTD is still not very convincing in this data set.

A) While it fits better over the whole group, it does not seem to be the case within the patient group (6/13). This is a problem, as it may indicate that the fit parameters don't reflect important aspects of the behavior within this group, rendering interpretation difficult.

B) This may be a limitation of the task: Dyt1 patients appear to be in average risk neutral, such that a task in which the only risk comparison relies on equal average value will render the RSTD not identifiable beyond TD.

We thank the reviewer for this comment. We fully agree with the reviewer's claim that splitting the single learning-rate parameter of the classical TD model into two separate learning rate parameters (positive and negative) in our risk-sensitive temporal difference (RSTD) model is mainly justified for participant who are either risk-averse or risk-taking. The analysis presented in Figure 3—figure supplement 4 supports this claim. This analysis shows that the behavior of risk-averse or risk-taking participants in both groups was better described by our model.

Our behavioral task was designed to test our hypothesis that DYT1 patients will be less risk averse than controls due to relative overweighting of outcomes associated with positive prediction values. This hypothesis was indeed supported by the behavioral data—this is our main result, and the modeling was only used to illustrate and further clarify the (suggested) provenance of the behavioral findings. That is, we did not design the task for differentiating between models, but rather only to test the risk sensitivity of participants. It is for this reason that in our task we compared a risky and a non-risky option with similar mean values – we thought this was the most direct way to isolate the effect of risk on behavioral decision making.

Given the model-comparison results, an alternative way to frame the results in terms of the models is to say that DYT1 dystonia patients show no difference between learning from positive and negative prediction errors, and thus are better fit by a TD model with a single learning rate, in contrast to healthy controls. We believe that this framing is not more illuminating, and in fact, would suffer from the same criticism as only 7/13 of the patients are better fit with this model. That is, the patient group, as a whole, is “indifferent” to the two models. However, we respectfully disagree with the Reviewer that this means that the model parameters are perhaps unreliable – in participants better fit by the TD model with one learning rate, the RSTD model also showed that the two learning rates were similar, thus those parameters are as reliable. In particular, one way to quantify the similarity between the learning rates is as |η+η|η++η, that is, the absolute difference between the two learning rates scaled by their overall magnitude. This similarity metric was < 0.2 for all participants who were better fit by the TD model, in either group (see Author response image 1, circles denote subjects better fit by the RSTD model, stars denote subjects better fit by the simpler TD model).

Author response image 1. Learning rate similarity by participant.

Author response image 1.

DOI: http://dx.doi.org/10.7554/eLife.14155.011

All this having been said, our modeling exercise was not intended as a separate result, but only to aid in the interpretation of the main behavioral result. Indeed, in a previous version of the paper (that we ended up not submitting) we did not include the modeling at all. If the Reviewer and editors feel that the modeling is superfluous or otherwise distracting from the main result, we are happy to remove it altogether.

C) An important validation of the model would be to confirm with simulations that it can reproduce behavioral pattern 3c. This is an important, model-independent data point supporting the learning interpretation, and the model should at minima be able to capture this qualitative phenomenon.

We thank the Reviewer for this comment. Following the Reviewer’s suggestion, we simulated the behavioral pattern in Figure 3C, using for the RSTD model the learning rates that were fit to each individual based on her/his behavior. As can be seen in Figure 3—figure supplement 2, the simulation qualitatively captured the observed pattern of behavior.

D) As the authors point out, there is nearly no learning needed in this task. A natural model to consider would thus be a non-learning utility model, parameterized with the same a parameter as the learning utility model presented here, but assuming known probabilities. While 3c hints at a potential learning effect, it could also reflect a non-learning win-stay lose-shift strategy. Investigating whether this combination accounts for behavior better or worse than RSTD would be an important point in ensuring whether the learning interpretation is valid.

We thank the reviewer for this important comment. The reviewer is correct that the results presents in Figure 3C indicate that participants continuously updated (i.e., learned) the value of the risky cue and updated their behavioral policy accordingly. While the learning requirement for non-risky cues was minimal, it is not clear to us how participants could have known the true probabilities for the risky stimulus absent learning, hence we did not test a model that did not include learning at all.

Inspired by the Reviewer’s comment, we now tested a win-stay-lose-shift (WSLS) model with learning – this is none other than the TD model with learning rate of 1 (that is, a model that chooses the risky stimulus after every win and avoids it after every loss, with a softmax parameter that allows for only a tendency, rather than absolute responding according to the predefined WSLS strategy). A likelihood ratio test showed that this model is inferior to the RSTD model for 25 out of the 26 participants (DYT = 12, CTR = 13; p < 0.05, Chi square test with df = 2), as seen in Author response image 2 (points above the diagonal favor the RSTD model; red = DYT, blue = CTR).

Author response image 2. Comparison of risk sensitive (learning asymmetry) model and win-stay-lose-shift model.

Author response image 2.

DOI: http://dx.doi.org/10.7554/eLife.14155.012

Another, more “model-free,” way to show that the choices of the risky cue depend on reinforcement learning and not on a heuristic such as WSLS, is to test whether choices were sensitive not only to the most recent outcome for this cue, but also preceding outcomes. Indeed, as shown in Author response image 3, choices in both groups differed depending on the outcome for the risky cue in the last two times it was chosen, as would be expected from reinforcement learning. That is, the tendency to choose this cue was highest after two “wins”, lowest after two “losses” and was intermediate for one “win” and one “loss”, with a more recent “win” contributing to a higher propensity to choose the cue again. These results are exactly as would be expected from a reinforcement learning model as this model computes the value of a cue as a recency-weighted average of the outcomes obtained for the cue. We did not test three trials back and further because the short length of our experiment and the few trials involving the risky cue limit our power once trials are divided into 8 combinations of wins and losses.

Author response image 3. Effect of outcomes of the past two trials on choices of the risky cue.

Author response image 3.

DOI: http://dx.doi.org/10.7554/eLife.14155.013

One other piece of information that bears on this question is that after completing the task, we asked participants to verbally associate the different cues with outcomes and their probabilities. Only 9 of the 26 participants (35%) estimated the monetary value of the risky cue correctly (see table below). Among the 21 participants who associated the risky cue with the correct outcomes (0 or 10¢), the mean estimated outcome was not significantly different across groups (DYT: N=10, 5.38 ± 1.37¢; CTL: N=11, 4.44 ± 0.98¢, p=0.09, t test). Moreover, there was no correlation between the reported value and behavioral risk taking (N=21, Pearson's r=0.26, p=0.26), suggesting that the explicit estimation of monetary value at the end of the task and choice behavior throughout the task were not tightly related.

Reported outcome for risky cue

DYT1

CTL

Correct answer P(10) = p(0)

4

5

Answered that P(10)>P(0)

4

1

Answered that P(10)<P(0)

2

5

Could not estimate or associated the risky cue with incorrect 5 cent outcome (0/5/10 or 5/10)

3

2

Total

13

13

Thus we feel it is safe to conclude that explicit rule learning, if that took place, was not the dominant process driving choice behavior for the risky cue.

3) If the RSTD model is validated, it would be useful to also present results relating to the learning rates directly, not just the difference index, as authors attempt an interpretation in terms of LTP/and or LTD.

We agree with the reviewer that these results should be incorporated into the main text. There were no significant differences between the groups in each of the learning rates on its own, but rather only in the asymmetry index (which quantifies their relative contribution to learning, scaled by the overall rate of learning).

Upon reflection, and based on this comment, we realized that there is another way to specify the RSTD model that would more intuitively illustrate this main modeling result – that the differences between the groups were due to learning asymmetry and not overall levels of learning. The RSTD model can be rewritten with a single learning rate (that determines the rate at which new outcomes change the current value, which is equivalent to the rate in which the impact of previous outcomes is phased out) and a second asymmetry parameter. In fact, this is how the model was originally specified by Mihatsch & Neuneier (2002):

Vnew(cue)= Vold(cue)+ηδ(1±κ)

where δ is the prediction error, η is the (single) learning rate and κ is an asymmetry parameter that is added or subtracted depending on whether the prediction error is positive or negative, respectively. In this formulation, the learning rates are not different between the groups, but the κ parameter is. This is the same result that we previously reported, as in this specification of the model η+= η(1κ) and η= η(1+κ), giving κ= η η+η+ η+ which was our asymmetry index that did differ between the groups, and η= η+ η+2, the average of our two learning rates (that did not differ between the groups). Given that this framing seems clearer, we have now rewritten the model in the manuscript in this form (calling it a learning asymmetry model). As requested, we also provide the mean values of each of the fit parameters for both groups, together with their statistics:

"We found significant differences between the groups in the learning asymmetry parameter (DYT -0.05 ± 0.27, CTL -0.34 ± 0.27, Mann-Whitney z=-2.51, df=24, P<0.05), but no differences in the other two parameters (learning rate DYT1 0.25 ± 0.19, CTL 0.14 ± 0.11, Mann-Whitney z=1.33, df=24, P=0.18; inverse temperature DYT 0.68 ± 0.37, CTL 0.93 ± 0.47, Mann-Whitney z=-1.18, df=24, P=0.23)".

4) It would also be important to know whether previous rewards have differential effects depending on whether the trial was a free or forced choice trial (e.g. Figure 3C). Recent work (Cockburn et al) has shown that value learning, especially from positive prediction errors, differed between these conditions, and if patients and controls had baseline differences in risk taking for non-learning reasons, this would lead patients to experience more free-choice risky trials than controls, potentially biasing learning. This might be important to rule out.

Given that patients chose the risky cue more often than did controls, and given that there was a fixed number of forced trials that did not depend on group or choices, by necessity DYT group participants chose the risky cue more often in the free choice trials. Therefore, they experienced more learning from the risky cue in choice trials than did the control group. However, it is not clear to us how this could interact with a non-learning account of the results.

In any case, following the Reviewer's comment, and due to the relevance of recent work by Cockburn et al. (2014) to studies such as ours, we examined separately the probability of choosing the risky cue over the sure cue following wins or losses, after either forced or choice trials. Our analysis revealed that choices were significantly dependent upon the previous outcome of the risky cue (P<0.01, F=7.45, df=1 for main effect of win versus loss; 3-way ANOVA with factors outcome, choice and group) but not upon its context (P=0.38, F=0.93, df=1 for main effect of forced vs. choice trials), as seen in Figure 3—figure supplement 1B). We note, however, that similar to Cockburn et al., we indeed observed a numerically smaller effect of the outcome of forced trials (as compared to choice trials) on future choices, as can be seen in the smaller difference between wins and losses in forced as compared to choice conditions (interaction between outcome and choice not significant – P=0.46, F=0.56, df=1).

To address this important issue in the manuscript, the caption to this Supplemental figure reads: “Recent work on similar reinforcement learning tasks has shown that choice trials and forced trials may exert different effects on learning (Cockburn et al., 2014). […] Similar to Cockburn et al. (2014), we did observe a numerically smaller effect of the outcome of forced trials (as compared to choice trials) on future choices, however this was not significant (interaction between outcome and choice P=0.46, F=0.56, df=1). P values in the figure reflect paired t-tests.”

5) It would be interesting to report patients' behavior in absolute, not only relative to controls. It appears that they are in average risk neutral, which is not a suboptimal or irrational thing to do in this task. It might be interesting to discuss why their impairment appears to "correct an imbalance" that is seen in healthy controls, contrary to what is usually observed in patient studies.

We thank the reviewer for this comment. We had reported the absolute behavior of patients:

"Overall, the probability of choosing the risky cue was significantly higher among patients with dystonia than among healthy controls (Figure 3B, probability of choosing the risky cue over the sure cue DYT 0.44 ± 0.18, CTL 0.25 ± 0.20, Mann-Whitney z=2.33, df=24, P<0.05)."

Following the Reviewer’s suggestion, we also now added the following text to the final paragraph of our Discussion:

"Relative weighting of positive and negative outcomes shapes our risk-sensitivity in tasks that involve learning from experience. […] In any case, these reinforcement-learning manifestations of what has been considered predominantly a motor disease provide support for linking corticostriatal synaptic plasticity and overt trial-and-error learning behavior in humans."

Reviewer #3:

Building on animal work showing that DYT1 dystonia animal models are associated with exaggerated LTP and diminished LTD, the authors outline an experiment testing for learning abnormalities in human DYT1 dystonia patients. The authors hypothesize that DYT1 patients will show a positive learning bias owing to increased LTP and/or reduced LTD. Consistent with their hypothesis, DYT1 patients do indeed show atypical responsivity to outcomes in that they do not exhibit risk aversion as did matched controls.

The authors correctly emphasize the chasm between synaptic modification and behavior. The application of genetically linked animal models and human disease states in concert with an algorithmic description of a learning mechanism offers an exciting and constructive path forward. However, this depends critically on a shared mechanism between the animal model and the human brain. I cannot speak to the quality of the animal models or the nature of DYT1 dystonia; however, the text leaves some question as to the shared commonality. The authors point to atypical LPT/LTD in animal models, but abnormal dopamine/acetylcholine transmission in humans. Given the manuscript's emphasis on lessening the gap between synapse and behavior, I feel that the manuscript could benefit from more support linking animal and human disorders (perhaps via behavioral patterns in the animal models, mechanisms through which medications operate etc…).

We thank the Reviewer for this comment, and apologize for the apparent contradiction in the text. We agree with the Reviewer that the gap between animal models and human patients is still wide. Increased LTP/LTD ratio in corticostriatal synapses in rodent models of DYT1 dystonia is a persistent finding in a variety of transgenic models. This LTP/LTD ratio is determined by complex interactions between numerous players and currently it is not known what is the role of each of these players (Calabresi et al., 2014). We discuss this issue at length in response number 1 to Reviewer 1 – we would like to refer the Reviewer to that discussion.

In addition, and in support of our assumption that the findings regarding the LTP/LTD (im)balance are relevant to humans, the LTP/LTD impairment is restored by anticholinergic agents (such as trihexyphenidyl) that are routinely given to patients with DYT1 dystonia (Martella et al., 2009). We agree with the Reviewer that further studies are needed in order to show this relationship conclusively, as so far there is only circumstantial evidence.

Finally, to our knowledge, risk-sensitivity in DYT1 dystonia animal models has never been tested. This is an excellent idea for a future study that we hope our findings will help spur.

Of greater concern is whether these results truly inculpate a reinforcement learning mechanism. As outlined out by the authors, patients exhibited a strong propensity to pick the risky option again following a win but avoided it following a loss, which is argued to demonstrate outcome sensitivity. But, this response pattern does not necessitate a reinforcement learning strategy. These data also appear to be consistent with a win-stay/lose-shift strategy (WS/LS). Given the 50% chance of reward on the risky stimulus, a WS/LS strategy is consistent with the reported lack of risk aversion. Furthermore, the model generated response pattern illustrated in Figure 1B (bottom panel), suggests that a reinforcement learning strategy becomes increasingly risk averse. There does appear to be some trend of increasing risk aversion in controls, but not so for the patient group. Given that there are no clear response patterns that demonstrate a reinforcement learning strategy (i.e. there are no learning curves etc…), I feel that the results would be more interpretable if a WS/LS model was also included in the analysis and Discussion.

We thank the Reviewer for this important comment, which was also raised by Reviewer 2. To avoid repetition, we refer the Reviewer to our answer to Reviewer 2’s point 2C above, where we test the WSLS model explicitly in both a model-based and two model-free analyses.

Regarding the simulation in Figure 1B. We used for this simulation the average values for each of the groups in order to qualitatively demonstrate our hypothesis – higher LTP/LTD ratio is translated to increased implicit value of the risky cue and observed as risky behavior. This point is now clarified in the figure legend of our revised manuscript.

In general, the interaction between choice and valuation in reinforcement learning indeed leads to risk aversion, with more risk aversion as the learning rate increases (see Niv et al., 2002 for a mathematical proof of this claim). Intuitively, if the risky option at some point has a lower value than the sure option, the risky option will be chosen less often and thus its value will not be updated, leaving it lower than the sure option (whose value does not change) for a disproportionately long period of time. Therefore, the results of the model simulation and the control group indeed conform to what would be expected from a reinforcement learner. This finding, on its own, however, cannot fully explain the risk sensitivity of even control participants, as they are not always risk averse. Indeed, in Niv et al. (2012), we compared this account for risk sensitivity to the alternative RSTD model, and showed that the latter model better explains choice behavior. The asymmetric learning in the RSTD model allows one to “correct” the imbalance inherent in reinforcement learning, and to achieve risk neutrality or even risk seeking behavior by learning more from positive prediction errors than from negative prediction errors. This is exactly what we show here for the DYT group, and it indeed is mirrored in the fit parameters of the model. Therefore, we don’t agree with the Reviewer that our reinforcement-learning simulations suggest results that are at odds with the behavior of our participants.

Could observed effects in the patient group be driven by medication withdrawal? The authors offer helpful discussion of medication half-life, but this only serves to demonstrate that effects are probably not driven by the medication itself.

We thank the Reviewer for raising this issue. Indeed, withdrawal symptoms were described with both baclofen (Terrence and Fromm, 1981) and trihexyphenidyl (Mclnnis and Petursson, 1985) but typically appear following medication withdrawal for at least a few days or weeks. Patients with dystonia do not typically experience symptoms of withdrawal before their next dose of medication (as in our study) or even when they skip a scheduled dose. This stands in contrast with the 'early wearing off' phenomenon observed in patients with advanced Parkinson's disease. Moreover, risk-taking behavior (Figure 3—figure supplement 3) was also observed in these patient who were not taking any medications, although this sub-group was very small.

We agree with the Reviewer that such an explanation cannot be absolutely ruled out and are aware for this methodological limitation imposed by our choice to work with human subjects. Still, based on the arguments above, we believe it is not a plausible explanation for our observation. In any case, we now also performed a linear regression to verify that our results hold even when accounting for medication:

“Risky behavior was not significantly affected by sex (Figure 3—figure supplement 3) or the patient's regime of regular medication (Figure 3—figure supplement 4), and the relationship between risk taking and symptom severity held even when controlling for these factors (p<0.05 for symptom severity when regressing risk taking on symptom severity, age and either of the two medications; including both medications in the model lost significance for symptom severity, likely due to the large number of degrees of freedom for such a small sample size; age and medication did not achieve significance in any of the regressions).”

What were the model parameter ranges explored while fitting the data? Within subject variance (Figure 3A) seem to indicate considerably more variance in the control group (though I understand this to be a subset of the dataset), so I am a bit surprised by the apparent lack of effect in the inverse temperature parameter.

We thank the reviewer for this important comment. The explored ranges of model parameters were 0-1 for both positive and negative learning rates (now -10 to 10 for the learning asymmetry parameter) and 0-30 for the inverse-temperature parameter. This information is now mentioned in our revised manuscript (Materials and methods; Modeling).

We chose to demonstrate changes in risk-taking policy (due to previous exposure and certain randomness captured by our model) by showing behavior of only a few randomly chosen subjects. This was done for the sake of visual clarity and does not reflect the entire group. We note that in this figure, the within-subject variance reflects the combined effects of exposure to rewards in previous choices (these were not equated trial-by-trial across participants), learning rates (higher learning rates lead to rapidly changing policy) and noisy behavior represented by the inverse temperature parameter.

Following the Reviewer’s comment, and in order to directly test the within-subject variability in risk-taking (probability of choosing risky cue) across trial bins (shown in Figure 3A) we calculated, for each individual separately, the standard deviation across bins. This value was 0.13 ± 0.07 for both groups.


Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

RESOURCES