Abstract
The ability to learn about other people is crucial for human social functioning. Dopamine has been proposed to regulate the precision of beliefs, but direct behavioural evidence of this is lacking. In this study, we investigate how a high dose of the D2/D3 dopamine receptor antagonist sulpiride impacts learning about other people’s prosocial attitudes in a repeated Trust game. Using a Bayesian model of belief updating, we show that in a sample of 76 male participants sulpiride increases the volatility of beliefs, which leads to higher precision weights on prediction errors. This effect is driven by participants with genetically conferred higher dopamine availability (Taq1a polymorphism) and remains even after controlling for working memory performance. Higher precision weights are reflected in higher reciprocal behaviour in the repeated Trust game but not in single-round Trust games. Our data provide evidence that the D2 receptors are pivotal in regulating prediction error-driven belief updating in a social context.
Subject terms: Social neuroscience, Reward, Learning algorithms
Inferring other people’s intentions from their actions is essential for successful social engagement. Here, the authors show that in social contexts, dopamine D2 receptors are important in regulating uncertainty-driven belief updating.
Introduction
Knowing whom to trust with our money, information, or health is central to our personal well-being1. The ability to form beliefs about other persons’ attitudes from their actions is therefore pivotal for successfully navigating our social world. Inflexible beliefs, particularly about intentions of others, can lead to thoughts of persecution or even paranoid delusions—a hallmark symptom of psychotic disorders2–4. Understanding the neurocomputational substrates of social inference is therefore essential for informing pharmacological treatments of psychotic symptoms.
When learning whether to trust another person, we often do so by observing their behaviour across repeated interactions. How behaviours of others affect our overall beliefs about their trustworthiness largely depends on how certain we are about the attitudes that presumably drive others’ actions5. For instance, if we believe someone will be hostile or friendly towards us with high certainty, any gesture from them will not much change our belief about them. On the other hand, that same gesture from someone whose intentions we are unsure of will likely strongly shift what we think about them. This process of belief updating under uncertainty has been formalised within the Bayesian Inference framework, where beliefs are represented as probability distributions and the degree to which new information affects the updating of beliefs is modulated by the precision (the inverse of uncertainty) of those beliefs6. As in similar computational frameworks7, the belief update is proportional to the deviation of the prediction from the actual outcome, termed a prediction error (PE), weighted by the precision of prior beliefs. On top of this, new information also reduces uncertainty about the outcome. When prior beliefs are highly uncertain, the weight on the PE will be high and beliefs will be highly volatile. Conversely, if beliefs are held with high precision, this leads to a down-regulation of the influence of PE on learning and lowers belief volatility. Inflexibility in forming beliefs about others proportionally to their actions can result from high precision of prior beliefs about others’ attitudes. Yet, the neurocomputational and neurochemical mechanisms regulating the uncertainty of beliefs are poorly understood. In this study, we examined the effects of the antipsychotic drug sulpiride, a D2/D3 dopamine receptor antagonist, on the uncertainty of beliefs about another person’s trustworthiness.
Seminal studies in animals have established that mesolimbic dopaminergic circuits carry PE signals that drive belief updating in various contexts8–10. However, dopaminergic midbrain neurons have been shown to be involved in various probabilistic computations that go well beyond phasic signalling of surprising rewarding events. Dopamine responses scale with outcome variance11,12 and reflect temporal and perceptual precision13–15. Several computational accounts of brain function suggest that uncertainty or precision coding is the main unifying feature of dopamine’s involvement in belief updating16–19. Through encoding of uncertainty of beliefs about the world and what action to perform, dopamine receptors are thought to adjust the weights on PEs and control action selection20,21. But while there is evidence for the involvement of dopamine receptors in processing uncertainty in action selection22–25, evidence for their causal role in regulating the uncertainty of social beliefs and adjusting weights on PEs is lacking.
Dopamine receptors within the corticostriatal circuitry are ideally positioned to regulate PE-related signal propagation and encode precision26,27. One possible neurobiological substrate of precision is proposed to be the post-synaptic gain of neuronal populations reporting PEs, where synaptic gain refers to the amplification or attenuation of the pre-synaptic signal on the post-synaptic cell20,28. Post-synaptic D1 and D2 type dopamine receptors in the striatum have complementary effects on synaptic gain26,29. D1-like receptors increase the excitability of post-synaptic neurons, whereas D2-type attenuate signal propagation and decrease synaptic gain29. A prediction from this is that dopamine binding to D1 receptors would promote PE propagation and increase belief updating. In contrast, D2 receptor stimulation would reduce post-synaptic responses and attenuate changes in beliefs, leading to belief rigidity30,31. In line with this reasoning, when learning about others, blocking D2 receptors should increase the volatility (or rate of change) of beliefs.
Although some studies indeed showed that blocking D2-type receptors enhanced learning from positive feedback32, led to pronounced PE-related activity in the striatum33, and enhanced performance34,35, there is also evidence for attenuated PE coding and greater variability in choice selection25,36,37. The inconsistencies of these findings raise several important considerations. First, when alternative choices are available, it is often unclear whether increased switching between available choice options arises from drug effects on belief updating per se or from the effects on decision-making strategies (see for instance38). Second, D2 dopamine receptors have a higher affinity for dopamine39 and doses of D2 antagonists commonly used in studies with healthy participants might not be sufficiently high to block the D2 receptor driven regulation of the PE signal40. Third, administration of compounds binding to dopamine receptors can have different and even opposing effects on learning and decision-making, depending on genetic variation in baseline dopamine function23,41,42. Fourth, integrating novel information with prior experiences also relies on working memory43, which is also affected by drugs that target dopamine receptors44. And finally, beyond the methodological limitations of previous work, most studies with dopamine receptor antagonists have examined learning about abstract stimulus-outcome associations using secondary rewards, which makes the translation to more complex social interactions questionable.
In light of these considerations, the present study administered a relatively high dose of the selective D2/D3 receptor antagonist sulpiride (800 mg) or placebo in a randomised double-blind parallel group design to 78 male participants, preselected based on their Taq1a polymorphism. The drug dose was chosen to maximise the blockade of postsynaptic dopamine D2 receptors while still being safe45. Most previous work used doses of 400 mg which leads to an occupation of ~30% of D2 receptors46. Using 800 mg leads to more than 60% occupancy and increases the likelihood of sufficiently blocking the effect of D2 receptors. Furthermore, as mentioned above, the effect of D2 antagonists often interacts with baseline variation on dopamine function42,47. Taq1a polymorphism is one of the most widely investigated genetic variations of the D2 receptor. Individuals with at least one A1 minor allele have been shown to have higher presynaptic dopamine availability48 and reduced D2 receptor density in some subdivisions of the striatum49,50. Blocking D2 receptors might therefore have a stronger effect on belief updating in that genetic subgroup.
We investigated social learning by asking the participants to learn about other players’ trustworthiness through a repeated Trust game (Fig. 1a). In the Trust game the investor may choose to transfer any portion of their monetary endowment to the trustee51. The transferred points are then multiplied by the experimenter before being passed on to the trustee. The trustee can then either reciprocate in a way that equalises the payoff of the two players or betray and keep everything. Participants in our study played 25 rounds of the Trust game as investors against two other players that were preprogramed to mostly equalise (“good trustee”) or mostly betray (“bad trustee”). Importantly, we told the participants that the other players had given their answers weeks before the study day. Therefore, their decision to equalise or betray did not depend on the participant’s investment. With this procedure, we increased the likelihood that their investments reflected the degree of uncertainty they had about the other player’s response and were not confounded by strategic investment strategies, or exploratory action policies. By asking the participants to learn about a stable feature, we also ensure that participants’ behaviour did not reflect differences in beliefs in the task volatility (the likelihood that the other person changed their mind), which might have obscured more basic processes related to forming beliefs about others.
The main goal of the study was to test the hypothesis that blocking D2-type receptors increases belief updates by reducing the precision of beliefs about others, whereby we also hypothesised that this effect would be more pronounced in participants with genetically conferred higher endogenous dopamine levels. The Results section of the paper is structured as follows: we first looked at how sulpiride affected investment updates and how this effect was moderated by the Taq1a genotype. We then examined how the updates related to the back-transfer of the trustee, by examining the effects of the drug and genotype on reciprocal behaviour. We then turned to computational modelling to determine how sulpiride affects the course of each participant’s uncertainty around the other player’s trustworthiness. To evaluate to what degree the effects of sulpiride were due to actions on working memory we included data on spatial working memory performance into our parameter estimation process. Finally, to control for effects of sulpiride on sensitivity to social feedback unrelated to learning, we surveyed data from two single-round social interaction tasks, targeting positive and negative reciprocal behaviour.
With this we show that sulpiride has a profound effect on how healthy males update their investment more from one trial to the next this effect is driven by increased uncertainty around the other player’s actions. Results from both behavioural analysis and computational modelling support the claim that the drug effect on belief updating was more pronounced in participants with higher endogenous striatal dopamine levels (indexed by the Taq1a polymorphism).
Results
D2/D3 receptor antagonism increases investment updates
We employed a Bayesian multi-level linear model predicting absolute change in investment from the previous trial, including variables for Treatment (sulpiride or placebo), Trial and their interaction as predictors (refer to supplementary material for outcomes of alternative models). Figure 1b shows that, following sulpiride administration, participants on average updated their investments more than participants in the placebo group (b = 0.633, 95% Credibility Interval (CrI) [0.117, 1.115], proportion of the posterior distribution of the regression coefficient below 0 being P(b < 0) = 0.005), with an effect size d = 0.239 (95% CrI [0.045, 0.42]). The difference in investment updates was most apparent in the last trial of the task (b = 0.863, 95% CrI [0.289, 1.411], P(b < 0) = 0.002, d = 0.325, 95% CrI [0.109, 0.531]) and we also found a small effect size on the Trial*Treatment interaction (b = 0.457, 95% CrI [−0.069, 0.99], P(b < 0) = 0.047, d = 0.172, 95% CrI [−0.026, 0.373]). As participants learned about the trustees, changes of investments from one to the next trial reduced, and this decrease across time was less pronounced in the sulpiride group.
To examine whether the effects of the drug were moderated by the Taq1a polymorphism we ran another model including a variable for Taq1a-specific genotype and Trustee as predictors with the four-way interaction between the two new variables, Treatment and Trial, including a random intercept and slope for the Trustee (Supplementary Fig. 1a, Supplementary Table 4). Again, we found a main effect of treatment (b = 0.595, 95% CrI [0.112, 1.098], P(b < 0) = 0.008), and a significant three-way interaction between Treatment, Genotype and Trial number (b = 0.053, 95% CrI [0.01, 0.098], P(b < 0) = 0.007), but found no credible evidence of a two-way interaction Treatment × Genotype (b = −0.284, 95% CrI [−1.266, 0.708], P(b > 0) = 0.287). These analyses suggest that on average sulpiride affected investment updates comparably across both genotype groups, but in contrast to the A2 homozygotes, the effect in the A1+ group was time dependent.
D2/D3 receptor antagonism increases sensitivity to social feedback in the A1+ group in the repeated trust game
To further understand how investment updates related to back-transfer from the trustee, we defined reciprocal trials as trials where participants either increased investments (or repeated the maximal investment of 10 points) following positive feedback and decreased investments (or repeated an investment of 0 points) following a betrayal (Fig. 1c, for exact definition see Supplementary Note 2). We found that sulpiride led to a higher proportion of reciprocal trials (= 0.339, 95% CrI [0.048, 0.661], P( < 0) = 0.012).This effect was significant in the A1+ group (= 0.469, 95% CrI [0.052, 0.914], P( < 0) = 0.015) but we found no credible evidence for an effect in the A1- group (= 0.209, 95% CrI [−0.212, 0.643], P( < 0) = 0.162); however, we also found no credible evidence that there was a difference of drug effects between the two genotype groups ( = −0.263, 95% CrI [−0.867, 0.329], P( > 0) = 0.186). Furthermore, we found some support for a dose response effect, whereby sulpiride serum levels in the blood correlated with reciprocal trials in the A1+ group (b = 0.185, 95% CrI [−0.04, 0.41], P(r < 0) = 0.05), but found no credible evidence for a correlation in the A1- group (Supplementary Table 1). Similar, albeit weaker, effects were found when we examined to what extent the signed investment update was dependent on the back-transfer and how this differed across the drug and genotype groups (Supplementary Fig. 1b).
No credible evidence of an effect of D2/D3 receptor antagonism on average investment behaviour or overall performance
Next, we investigated whether this higher change of investments from one trial to the next is reflected in average investment patterns (Fig. 2a, for more detailed plots see Supplementary Fig. 2). Using an ordinal logistic model predicting investments from Treatment, Genotype, Trustee and Trial variables, with a random slope for Trustee and Trial we found no credible evidence of a difference between sulpiride and placebo on average investment behaviour either in the A1+ group ( = −0.011, 95% CrI [−1.95, 1.796], P( > 0) = 0.494, = 0.089, 95% CrI [−2.256, 2.47], P( < 0) = 0.47), nor in the A1- group ( = −0.496, 95% CrI [−2.353, 1.367], P( > 0) = 0.297, = 0.821, 95% CrI [−1.508, 3.257], P( < 0) = 0.244). We also found no credible evidence of a difference in initial investments across the four drug and genotype groups (Supplementary Fig. 2). The overall initial investment was estimated to be, on average, 6.33 (95% CrI [3.129, 9.687]), suggesting that most participants expected a positive back-transfer initially. In line with this, the slope when playing against the good trustee was positive (b = 0.086, 95% CrI [−0.008, 0.187], P(b < 0) = 0.036), but not as pronounced as the slope when playing against the bad trustee (b = −0.153, 95% CrI [−0.284, −0.027], P(b > 0) = 0.009). While we found no credible evidence of a difference between slopes across the drug groups in the A1+ participants ( = −0.058, 95% CrI [−0.184, 0.065], P( > 0) = 0.179, = 0.044, 95% CrI [−0.125, 0.21], P( < 0) = 0.297), we did observe an increase in the slope following sulpiride administration in the A1- group when playing against the bad trustee ( = 0.159, 95% CrI [−0.007, 0.336], P( < 0) = 0.03) but not when playing against the good trustee ( = 0.069, 95% CrI [−0.06, 0.19], P( < 0) = 0.14). Similarly, we found no credible evidence of a difference across drug and genotype groups regarding how many points they earned when playing against either trustee (Fig. 2b, Supplementary Table 13).
In summary, we found no credible effect of sulpiride on investing behaviour on average, but we do find some evidence in support of sulpiride increasing sensitivity to social feedback when learning about others. To determine whether and how this behavioural pattern relates to the uncertainty of participants’ beliefs about the other persons’ trustworthiness, we explicitly modelled the participants’ trial-by-trial evolution of beliefs with a Bayesian belief model.
Computational framework
The belief model uses a hierarchical Gaussian filter (HGF) to generate trial-wise sequences of participants’ beliefs about the trustworthiness of two trustees as well as the uncertainty (or precision) surrounding those beliefs (Fig. 36,52). We estimated a participant-specific parameter , called belief volatility, that describes how each participant’s precision of beliefs evolved over time and consequently determined the relative rigidity (or malleability) of beliefs. More specifically, on each trial, we approximate the latent belief about the trustworthiness of the other player as a gaussian distribution with a specific mean and variance. Higher belief volatility implies higher variance (or lower precision) of trial-by-trial belief estimates. Importantly, the dynamic learning rate () on the PE is proportional to the expected variance or inversely proportional to the precision of beliefs and is therefore referred to as a “precision-weight”. Low precision of prior beliefs leads to higher precision-weighted learning rates and stronger shifts in beliefs throughout the task (see two example belief trajectories with different values in Fig. 3b).
The beliefs about trustworthiness are mapped on to probability of positive or negative feedback with an inverse logistic function. Because D2 receptor activity is linked to choice uncertainty and action variability22,25, we also included another parameter called choice precision parameter that determined the non-linear mapping from beliefs to the investments. Higher choice precision implies an investment distribution centred around extremes (i.e. investing 0 and 10), and lower values imply a more dispersed investment distribution and more uncertainty or stochasticity in action selection. It thus mirrors the stochastic aspect of the inverse temperature parameter in the softmax equation often used in non-ordinal (e.g. binary) choice tasks. Finally, how beliefs about the probability of a positive back-transfer affect investment behaviour is determined through an ordinal logistic likelihood function. The degree to which inferred trustworthiness correlates with investments is determined by another parameter called the trustworthiness slope (). Crucially, the computational parameters of the model represent distinct behavioural patterns and can be recovered reliably (Fig. 3c). To determine how noisy trials are represented in the model, we defined mistake trials as trials where participants either decreased their investment after a positive back-transfer or increased their investment after a betrayal (for exact definition see Supplementary Note 1). Importantly, we observed that belief volatility correlates with reciprocity (r = 0.277, t = 2.476, df = 74, p = 0.016, Fig. 3d) confirming that higher trial-by-trial uncertainty of beliefs lead to a higher chance of reciprocal behaviour. The log-transformed choice uncertainty parameter correlates negatively with the proportion of mistake trials (r = −0.592, t = −6.3254, df = 74, p < 0.001, Fig. 3d) implying higher randomness in investment selection. We also predicted data from the posterior distributions of parameters. We confirmed that the model captures the crucial aspects of behaviour (Fig. 3e, f) and plotted average beliefs about the other player’s trustworthiness, grouped for each trustee. (Fig. 3g).
We compared this model to an HGF model without the parameter and a Rescorla-Wagner (RW) model. The HGF model without has been used previously when modelling both social53 and non-social54 learning. In this model the non-linear mapping from beliefs to probabilities is modulated by a coupling parameter (see Methods for details). The RW model is a simple Q-learning model with separate static learning rates for gains (positive outcomes) and for losses (negative outcomes). All models used the same ordinal-logistic likelihood function and were compared based on their trial-by-trial predictive accuracy through leave-one-out cross-validation information criterion (LOOIC) and expected log predictive density (ELPD). We found that the HGF model with the choice precision parameter outperforms both models (Supplementary Fig. 3a). We also compared the models across trials and across trustees with the LOOIC and by looking at the correlations of predicted investments with actual behaviour (Supplementary Fig. 3b, c). Interestingly, performance of both models varies similarly across trials with the HGF performing better across the whole task, particularly for investments against the good trustee.
D2/D3 receptor antagonism increases belief volatility
For parameter estimation, we embedded the HGF derived equations in a hierarchical Bayesian model which allowed us to estimate the drug and genotype effects on all computational parameters in one inferential step55,56. Through this analysis, we found a main effect of sulpiride on volatility of beliefs (b = 0.831, 95% CrI [0.115, 1.533], P(b < 0) = 0.01, d = 0.65, 95% CrI [0.088, 1.283], Fig. 4a), and an interaction effect of sulpiride with the genotype (b = −1.506, 95% CrI [−2.649, −0.411], P(b > 0) = 0.004, d = −1.175, 95% CrI [−2.238, −0.306]). In fact, the effect of sulpiride on belief volatility is driven by the A1+ allele carriers (b = 1.598, 95% CrI [0.727, 2.465], d = 1.25, 95% CrI [0.533, 2.119]) while we found no credible evidence of an effect in the A1- group (b = 0.076, 95% CrI [−0.874, 0.985], d = 0.06, 95% CrI [−0.683, 0.783]).
The key consequence of higher belief volatility is that it leads to lower precision of prior beliefs and therefore of predictions, which has a direct effect on the learning rates. Indeed, we founnd credible evidence that participants under sulpiride have higher average precision-weights (d = 0.452, 95% CrI [0.081, 0.704], P(d < 0) = 0.008, Fig. 4b), particularly in the A1+ group (d = 1.042, 95% CrI [0.225, 1.424], P(d < 0) = 0.003), but little credible evidence for an effect in the A2 homozygotes (d = −0.202, 95% CrI [−0.482, 0.103], P(d > 0) = 0.089) with a significant interaction effect (d = 1.244, 95% CrI [0.335, 1.714], P(d < 0) = 0.001). Importantly, in the A1+ group, this effect of sulpiride on precision-weighting correlated with the degree of serum levels in the blood (b = 0.356, 95% CrI [0.045, 0.663], P(b < 0) = 0.013, Fig. 4c).
Looking at potential asymmetries when dealing with uncertainty around beliefs about trustworthy or untrustworthy partners, we find that, on average, the volatility of beliefs about the bad trustee were more volatile (d = 0.412, 95% CrI [0.031, 0.827], P(d < 0) = 0.018). When examining the drug effects, we observed that in the A1+ group, the difference in between placebo in sulpiride is apparent in interactions with both trustees (dbad = 1.62, 95% CrI [0.731, 2.644], P(dbad < 0) < 0.001, dgood = 0.689, 95% CrI [−0.089, 1.575], P(dgood < 0) = 0.042, but is higher for the bad trustee (dgood-bad = −0.923, 95% CrI [−1.754, −0.121], P(dgood-bad > 0) = 0.01, Supplementary Fig. 4b, c). Interestingly, this analysis also showed that in the A1- group, there is a significant interaction of sulpiride and trustee effects (dgood-bad = −1.453, 95% CrI [−2.529, −0.51], P(dgood-bad > 0) = 0.001, Supplementary Fig. 4b, c), whereby we find little credible evidence that the effects of sulpiride on belief volatility are higher for the bad trustee (dbad = 0.711, 95% CrI [−0.21, 1.669], P(dbad < 0) = 0.066), and even negative for the good trustee (dgood = −0.742, 95% CrI [−1.74, 0.091], P(dgood > 0) = 0.042). At this point, we also note that our model suggests that participants expected the trustee to reciprocate (d = 0.717, 95% CrI [0.406, 1.023], P(d < 0) = 0.001) with initial inferred probability of reciprocation being 0.67 (95% CrI [0.60, 0.73]). However, we found no credible evidence of a difference between treatment groups in initial beliefs () about the trustworthiness either overall (d = −0.166, 95% CrI [−0.697, 0.37], P(b > 0) = 0.272, Supplementary Fig. 4), in the A1+ group (d = 0.111, 95% CrI [−0.585, 0.802], P(d < 0) = 0.383), or in the A1- group (d = −0.441, 95% CrI [−1.164, 0.287], P(d > 0) = 0.112).
We then compared the results from the HGF model to those of the RW model (Supplementary Fig. 5). The evidence points in same direction, with some evidence for sulpiride leading to higher learning rates overall (d = 0.315, 95% CrI [−0.087, 0.729], P(d < 0) = 0.064, Supplementary Fig. 5a), an effect that the A1+ participants drove (d = 0.593, 95% CrI [0.03, 1.16], P(d < 0) = 0.02), with no credible evidence of an effect in the A1- participants (d = 0.037, 95% CrI [−0.524, 0.605], P(d < 0) = 0.453), and little credible evidence of a difference between the effect (d = −0.553, 95% CrI [−1.343, 0.235], P(d > 0) = 0.079). Further, the effect of the drug in the A1+ group was observed both when learning about positive outcomes (d = 0.697, 95% CrI [0.029, 1.369], P(d < 0) = 0.021, Supplementary Fig. 5b) as well as negative outcomes (d = 0.488, 95% CrI [−0.059, 1.044], P(d < 0) = 0.039, Supplementary Fig. 5c). However, the difference across the two types of learning rates was not as pronounced as in the HGF model.
D2/D3 receptor antagonism increases choice uncertainty
In addition to the effect on belief volatility, sulpiride also increases choice uncertainty by decreasing the choice precision parameter (b = −1.049, 95% CrI [−1.6, −0.502], P(b < 0) < 0.001, d = −0.979, 95% CrI [−1.535, −0.455], Fig. 5a), with smaller effects in the A1+ group (b = −0.646, 95% CrI [−1.272, −0.033], P(x > 0) = 0.02, d = −0.608, 95% CrI [−1.206, −0.031]) and more prominent effects in the A2 group (b = −1.44, 95% CrI [−2.261, −0.639], P(b < 0) < 0.001, d = −1.351, 95% CrI [−2.133, −0.601]). Since lower values of correlated with higher proportion of mistake trials we examined how sulpiride affected the proportion of mistake trials and found that it on average increased the number of mistake trials (blogodds = 1.172, 95% CrI [0.443, 1.992], P(blogodds < 0) < 0.001, Fig. 5b), an effect driven by the A1- group (blogodds = 1.876, 95% CrI [0.781, 3.032], P(blogodds < 0) < 0.001) with no credible evidence for an effect in the A1+ group (blogodds = 0.468, 95% CrI [−0.535, 1.537], P(blogodds < 0) = 0.184), and some evidence for an interaction effect (blogodds = 0.885, 95% CrI [−0.041, 1.857], P(blogodds < 0) = 0.03). The effect of sulpiride on the proportion of mistake trials in the A1- group was proportional to the blood serum levels (blogodds = 0.607, 95% CrI [0.089, 1.142], P(blogodds < 0) = 0.011) with no credible evidence of a correlation of the A1+ group (blogodds = −0.328, 95% CrI [−0.818, 0.143], P(blogodds >0) = 0.085). This parameter also determines the skew in the distribution of investments, whereby higher values make extreme investments more likely (Fig. 5c, d). From a perspective of an expected utility maximising agent, extreme investments are most optimal (Supplementary Note 2). Individuals with higher therefore behave more as rational agents and take the uncertainty of the outcome less into consideration when choosing investments. Sulpiride also increased the parameter (b = 1.459, 95% CrI [0.532, 2.42], P(b < 0) < 0.001, d = 0.941, 95% CrI [0.331, 1.58]), further advocating for the assertion that sulpiride increased the degree to which beliefs about trustworthiness influenced participants’ investments. In sum, the overall results from the computational modelling suggest that sulpiride treatment led to higher choice uncertainty (lower choice precision), which was related to increased mistakes in the in the A1− group specifically. We found strong support for sulpiride increasing belief volatility and precision-weights on PEs, an effect that was driven by the A1+ group, whereas we found no credible evidence of an effect in the A1− group.
Effects of sulpiride on belief updating remain after accounting for working memory performance In the repeated Trust game, participants must remember the trustees’ responses to previous trials. Higher choice stochasticity could therefore be due to poorer working memory. Furthermore, the inability to remember outcomes of past trials might increase the reliance on the previous trial and thereby cause increased learning rates and belief volatility. To determine to what degree our findings were influenced by the possible effects of sulpiride on working memory, we included data from a spatial working memory (WM) task performed in the same sample and previously published57. In the spatial WM task, participants uncover ‘tokens’ from sets of boxes, whereby they need to remember which boxes were previously searched and what the outcomes of those searches were (see Methods for details). As Naef et al. report57, sulpiride had a detrimental effect on working memory performance, whereby participants in both genotype groups performed more errors (opened boxes previously already opened) in more challenging task trials (trials with 10 or 12 boxes).
In the present study, we first investigated whether the model parameters are influenced by WM performance. To do so, we re-estimated the hierarchical model that included only WM data at the group level without the drug and genotype variables. As can be seen from Fig. 6a, the belief volatility parameter was associated with a higher number of errors in the WM task (d = 0.658, 95% CrI [0.334, 1.01], P(d < 0) < 0.001), and the Choice precision parameter negatively correlated with the number of errors (d = −0.811, 95% CrI [−1.087, −0.541], P(d > 0) < 0.001). This implies that poorer working memory performance is related to higher choice and belief uncertainty. Importantly, however, when plotting the residuals of the model parameters (unexplained variance after accounting for WM effects), the impact of sulpiride on belief volatility in the A1+ group can be seen still to be present (Fig. 6b). To obtain posterior distributions of drug and genotype effects after accounting for WM data, we re-estimated the parameters of the model this time including WM data as well as drug and genotype variables. We found that including WM information in the hierarchical model only slightly changed the inference about the effect of sulpiride on belief updating. The main effect of sulpiride on was now somewhat less certain with the 95% CrI including values below 0 (d = 0.56, 95% CrI [−0.052, 1.211], P(d < 0) = 0.036), but the effect in the A1+ group was still present (b = 0.852, 95% CrI [0.116, 1.61], P(b < 0) = 0.014, d = 0.694, 95% CrI [0.093, 1.369]). Similarly, posterior intervals of sulpiride effects on after including WM data were comparable to those without WM data, with the main effect remaining negative (d = −1.034, 95% CrI [−1.634, −0.461], P(d > 0) < 0.001), and the evidence for the effect is substantialin the A1− group (d = −1.562, 95% CrI [−2.423, −0.722], P(d > 0) < 0.001) and less so in the A1+ group (d = −0.512, 95% CrI [−1.168, 0.133], P(d > 0) = 0.056).
An important final step was to exclude the possibility that this increase in updating was due to increased sensitivity to social feedback in general, or due to decreased desire to maximise outcomes. For this, we turned to data from single-round social interaction games that measure learning-independent positive and negative reciprocity.
No credible evidence for an effect of D2/D3 receptor antagonism on single-round reciprocal behaviour
In the single-round interaction games the participants played a slightly modified versions of the Trust game. In the positive reciprocity game, they played the trustee and could reward the investor for their decision (Fig. 7a). In the negative reciprocity game, they played as investor and could punish the trustee (Fig. 7b). We found no credible evidence of a difference between sulpiride and placebo, neither in the amount of reward (Back-transfer) in the positive reciprocity game (b = −0.023, 95% CrI [−6.605, 6.263], d = 0.000, 95% CrI [−0.032, 0.03], Fig. 7c) nor in punishing behaviour in the negative reciprocity game (b = 1.552, 95% CrI [−0.903, 3.98], P(x < 0) = 0.106, d = 0.2, 95% CrI [−0.114, 0.513], Fig. 7d). This implies that the drug effect on reciprocal behaviour in the Repeated Trust Game was not due to higher sensitivity to social-feedback, or to less rational behaviour.
Discussion
Inferring attitudes of others is fundamental to our social functioning, but the neurocomputational mechanisms of the updating of beliefs about others are not well understood. We show that blocking D2/D3 dopamine receptors by sulpiride has a profound effect on how healthy participants process uncertainty in a social context. When playing as investors in the Repeated Trust Game, participants given a high dose this D2/3 receptor antagonist changed their investment more from one trial to the next. Using a hierarchical Gaussian filter to explicitly model the evolution of participants’ beliefs about the trustworthiness of the trustees, we show that sulpiride increased belief volatility. This implies that for the participants under sulpiride, the beliefs about the trustworthiness of others were held with less precision (i.e. with higher uncertainty), which in turn caused increased precision weights on PEs. This effect was more pronounced in participants with at least one minor A1 allele of the Taq1a polymorphism, associated with higher endogenous striatal dopamine levels. The increase in precision weights on PEs in that genetic subgroup scaled with the sulpiride serum levels in the blood. As a consequence, sulpiride led to higher reciprocal behaviour (increased investment after positive back-transfer and decreased investment after negative back-transfer), but only in the repeated Trust game, whereas we found no credible evidence for an effect in single-round interactions. The effect on the repeated Trust game was present even after controlling for working memory performance. Moreover, sulpiride decreased the value of the parameter of the model that codes for deterministic action selection policies (), implying higher uncertainty about investment selection. The effect was present in both genotype groups.
On the neurophysiological level, it has been proposed that precision is encoded through the post-synaptic gain (i.e. amplification or blunting of presynaptic neuronal input) of neurons that propagate PE signals28. Our results are in accordance with the idea that dopamine binding to D1 receptors of the medium spiny neurons in the striatum increases the gain on PE signals, while binding to D2 receptors decreases gain through disinhibition of the so called indirect pathway26,29. Within this framework increased precision-weights following D2 antagonism can be explained by more dopamine being available to bind to D1-like receptors, a claim that is further substantiated by the observation that the effect of sulpiride was stronger in participants with genetically conferred higher presynaptic dopamine availability and lower D2 receptor density. These findings extend previous studies highlighting the role of dopamine receptors in coding precision or uncertainty in various contexts, such as perceptual and risk-based decision making24,31,58. In particular, previous work has shown that sulpiride decreased the perceived precision of temporal expectations59. In a task where participants were explicitly told about the variance of outcomes, they adapted their behaviour accordingly, which led to more optimal choice performance60. This behavioural pattern was accompanied by adaptive PE signals in the midbrain and the ventral striatum. Under 600 mg of sulpiride, both the PE scaling as well as the adaptive PE coding in the midbrain and partially in the striatum were reduced61. This suggests that D2 receptors likely play a general role in uncertainty coding across various task modalities and contexts.
Our findings that blocking D2/D3 receptors increases learning rates may seem to be at odds with previous work showing that D2/D3 antagonists reduced performance in other learning tasks and attenuated prediction error signals in the striatum36,37 as well as with studies showing no effect of D2/D3 antagonism on learning rates34,37, even when using similar computational frameworks33,62. It is thus important to note that the A1 is a minor allele of the Taq1a polymorphism, meaning that in most other studies participants were likely predominantly A2 homozygotes. We observed a more general effect of D2/D3 receptor antagonism on choice uncertainty that was more prominent in A2 homozygotes and was related to a higher number of mistake trials in that subgroup of participants, although the number of mistakes was not high enough to reduce investment on average. Furthermore, participants could invest on an ordinal 11-point scale, which allowed us to capture smaller belief shifts that might either be missed in learning tasks with categorical choice options or be attributed to a different choice selection policy. For example, the participants in our study also performed a standard probabilistic two-bandit task afterwards, where participants in the A1+ group under sulpiride compared to placebo continued to switch between choice options, which was explained by increased choice stochasticity, parametrised through the soft-max decision temperature25. Further, it is also plausible that the processing of uncertainty in a social context is different than in a non-social context. People might be inherently more motivated to reduce uncertainty about others, so that they can (for instance) classify them more definitely as being a friend or foe5. For example, in one study, stress increased the choice to gamble in a non-social context but decreased the likelihood to invest in one shot-Trust games63. Furthermore, patients with basolateral amygdala damage show markedly impaired belief updating in a repeated Trust game, but seem to have no trouble learning about non-social rewards through a task matched in difficulty and reward size64. It is therefore plausible that the results we found are specific to the social context and might not translate to learning about non-social cues.
One fundamental distinction that separates risky decision-making under a social compared to social conditions is an aversion to betrayal65. People are less risk-taking in social interactions and might be particularly sensitive to indications of untrustworthy interaction partners66. Using a similar model to ours, previous work has shown that belief volatility was higher when assessing (morally) bad agents53, an effect that was present in our data as well, with participants having higher belief volatility when playing against the bad trustee. Although the drug effects on belief volatility were present across both trustees, the effects were stronger for the bad trustee. One reason for this could be that because participants initially expected higher trustworthiness and higher rates of positive back transfers, there was more to learn when playing against the bad trustee and, therefore, more variance across investment behaviour. This asymmetric increase in sensitivity to negative outcomes would also be in line with the notion that D1 and D2 receptors in the striatum contribute to positive and negative outcome processing via the “Go” and “No-Go” pathways, respectively32,67. According to this circuit model, D2 antagonism mimics the dopaminergic ‘dip’ that occurs following negative reinforcement and therefore enhances learning to avoid action with a negative outcome. However, contrary to what we observed, this model also predicts that blocking post-synaptic D2 receptors should decrease positive prediction error propagation.
When interpreting our findings within the Go/No-Go framework, it should be noted that in the repeated Trust game in this study, participants have no agency over the valence of the outcome (positive or negative back-transfer), and investments are possible only on an ordinal scale. Similarly, the RW model we used in our study should also be interpreted with this in mind. Mirroring the effects in the HGF model, sulpiride increased learning rates in the RW model for both positive and negative outcomes. In multi-arm bandit tasks or Go/No-Go tasks where the reinforcement learning framework is often used to explain choice selection, the learning rate reflects the “Law of Effect” whereby actions that lead to positive (negative) outcomes are more (less) likely to get repeated68. A positive outcome following a specific investment choice in the repeated Trust game will lead to higher investment (if possible). A higher learning rate simply reflects the change in the expected response of the trustee. It is therefore related to the degree to which beliefs about trustworthiness change (on average across trials). Generally, the crucial distinction between RW and Bayesian models is that the latter assumes that agents consider the uncertainty of outcomes when updating beliefs. With this, Bayesian models such as the HGF or the Kalman filter can account for phenomena where non-Bayesian reinforcement learning models fail, such as latent inhibition and sensory preconditioning18,69. Given the relative increase in the overall and trial-by-trial predictive performance of the HGF model that includes choice uncertainty over the RW, our data support the notion that uncertainty about the outcomes and which actions to take affects choice behaviour in the repeated Trust game.
One important factor that could confound increased belief volatility and learning rates following sulpiride administration is working memory. Previous work has shown that individual differences in working memory capacity contribute to behavioural variability in reinforcement learning tasks43,70 whereby decreased memory capacity might lead to a higher salience of more recent outcomes and therefore higher learning rates71. We also find support for this notion in our data, whereby poorer WM performance was strongly linked to higher belief volatility and higher choice uncertainty. However, despite sulpiride decreasing WM capacity in our cohort, including WM data in the model did not affect inference about sulpiride’s effect on belief volatility, nor on choice stochasticity. This increase in choice uncertainty or stochasticity under sulpiride is therefore likely not due to failures in working memory capacity. Instead it could have been due to participants being less motivated to maximise outcomes, and therefore less likely to behave as a rational “homo economicus”72. Were that the case in our study, one would expect a different behavioural pattern in the single shot-Trust games. Participants under sulpiride should behave less as rational agents and therefore would be less likely to punish betrayals and reward trusting behaviour73. Note that D2 receptors generally do play a role in motivation. For example, optogenetic excitation and inhibition of D2 receptors in the ventral striatum of rats is reported to respectively increase and decrease motivation74. It is possible that the increased action variability resulted from reduced motivation, or increased noise in belief updating and not in choice selection per se75. What speaks against this interpretation is that the overall performance in the task was not reduced following sulpiride administration for either of the genetic subgroups, suggesting that the investment selection under sulpiride was not random and instead reflected uncertainty about which investment to choose when interacting with the other player.
Indeed, variability in investment selection following sulpiride administration is well in line with what we know about the role of dopamine receptors in action selection. Stimulation of D2 receptors through endogenous dopamine leads to inhibition of the indirect (No-Go) pathway and increases the probability of repeating the same action. Accordingly, blockade of postsynaptic D2 receptors increases the probability of performing competing actions and therefore promotes randomness in action selection76. For example, in macaques, microinfusion of D2 (but not D1) receptor antagonists into the dorsal striatum led to increased choice stochasticity77 and a similar pattern was observed in D2 receptor knockout mice78. In humans, a recent positron emission tomographic imaging study showed that D2 receptor availability in the striatum correlated with deterministic decision-making strategies represented either through decision temperature within reinforcement learning as well as with policy precision within active inference22.
The key idea of active inference models is to extend the Bayesian generative models of beliefs about the states of the world, to include beliefs about preferred states, therefore casting both action and perception as an inference problem79,80. An active inference agent thus prefers actions that minimise the statistical distance (relative entropy) between the distributions of desired and predicted future states. The expected precision of a policy, in the context of our task, controls the confidence with which the participants selected a certain action, which can explain the more variable investment we observed in the sulpiride group. Within this framework, we can interpret the effects of sulpiride in our study as reflecting a more general role of D2 receptors in coding precision of both beliefs and action policies, thus extending previous theoretical and experimental work on the involvement of dopamine in modulating precision in predictive coding schemes24,28,81.
Our findings might be particularly relevant for understanding the effects of antipsychotic medication in patients with psychosis, a disorder characterised by rigid beliefs of persecution, underlined by a profound lack of trust in others82,83. Previous studies with repeated Trust games showed that patients with psychosis have lower initial trust and find it hard to change their beliefs3,84. Neurocomputational accounts of delusions suggest that hyperactivity of D2 receptors in patients leads to increased precision beliefs that result in rigid convictions held with high confidence20,85 and a recent paper shows that higher belief instability in patients with schizophrenia predicts responses to psychotherapeutic treatment86. This suggests that decreasing belief rigidity through D2 antagonism could be an essential contributor to the success of adjunct psychosocial treatment. However, there are profound differences between the effects of repeated use of antipsychotics in patients and acute D2 antagonism in healthy participants. For example, rodent studies show that although in healthy animals D2 receptor antagonism increases the activity of midbrain dopaminergic cells, this pattern is reversed in an established animal model of schizophrenia87. Furthermore, despite rapid receptor blockade of D2 receptor antagonists, the inhibition of excessive dopaminergic signalling proposed to underly the therapeutic effects develops only after weeks of treatment88,89. Our data also suggest that the therapeutic effect could be larger in patients that are A1 allele carriers of the Taq1a polymorphism. Yet, there is no evidence for this90, despite higher dopamine synthesis being the most likely biomarker of psychotic symptoms91,92 and a predictor of response to antipsychotic treatment93. Translating our findings to clinical practice will require more work with targeted patient populations.
Several important limitations should be kept in mind when considering the generalisability of the findings in this study. First, the sample was limited to male participants. This restriction was initially motivated by the notion that including female participants would require more than doubling the sample size due to increased variance of dopamine availability across the ovarian cycle. However, recent work shows no support for this94. Given that there are important sex differences in responses to antipsychotics, both in terms of efficacy and side-effect profiles95, future work should prioritise studies in females. Second, despite our hypothesis-driven approach, the sample size of the genetic subgroups was small. The drug-gene interactions we report should be interpreted with this in mind. We note however that we did find a main effect of sulpiride on increased investment change and on belief volatility in a sample size comparable to other pharmacological studies with a between-subject design96.
In conclusion, we show that blocking D2 dopamine receptors increases the flexibility of beliefs when learning about others. This finding importantly contributes to our understanding of how the brain infers the attitudes of other people. By mapping out the connection between alterations in the dopaminergic system with specific computational substrates this study not only contributes to the advancement of our knowledge of how the brain performs inference, but also to our understanding of when it fails to appropriately do so.
Methods
Participants
The study was performed in accordance with the Declaration of Helsinki and approved by the National Research Ethics Committee of Hertfordshire (11/EE/0111). Data were collected from 78 male participants, aged between 19 and 44 years (mean = 32.1), recruited from a large panel of participants, that were genotyped and screened for mental and physical health (Cambridge BioResource). Only participants with no history of neurological or psychiatric disorder were included in the study. Participants were stratified based on the Taq1A genotype into two groups: participants carrying at least one A1 allele (A1+), and A2 allele homozygotes (A1−).
Procedure
After arrival participants underwent another psychiatric screening and an alcohol test to exclude alcohol consummation on the study day. After an assessment of general intelligence (National Adult Reading Test) participants signed an informed consent before they were administered a single oral dose of either 800 mg of sulpiride or placebo in a randomised, double-blind fashion. We used the parallel group design, because complex behavioural tasks (like the Trust game) have practice (repetition) effects that can confound the results of within group pharmacological experiments. Sulpiride maximal plasma concentration is expected to peak after 3 h, with a plasma half-life of about 12 h97,98. Before behavioural testing participants waited for 3 h in a quiet room, where they were allowed to read a newspaper. To monitor the effects of the pharmacological manipulation, blood pressure and heart rate and mood and drug effects were assessed prior to drug administration and after the 3 h waiting period. Similarly, blood samples to determine the serum levels were taken at both time points. After the blood draws (at around 3 h 20’) the behavioural testing started with the social interaction tasks presented here, which included a repeated Trust game, and positive and negative reciprocity tasks, and were followed by a working memory task and an instrumental learning task, both published elsewhere25,57. Two participants were excluded from the analysis: one felt uncomfortable in the room, and one did not sufficiently understand the instructions of the social interaction tasks. This led to the following group distributions: 17 A1 allele carriers received placebo, and 21 received sulpiride, and 21 A2 homozygotes received placebo, and 17 received sulpiride. Participants were matched across the four groups for age, body mass index, general and verbal intelligence (Table 1, all p > 0.30). Participants received a monetary compensation of £50 plus the extra money earned in the behavioural tasks.
Table 1.
Genotype | Treatment | N | IQ | sd | Verbal IQ | sd | BMI | sd |
---|---|---|---|---|---|---|---|---|
A1+ | Placebo | 17 | 120.2 | 5.333 | 120.335 | 5.913 | 26.098 | 3.125 |
A1+ | Sulpiride | 21 | 120.21 | 7.334 | 120.355 | 8.133 | 26.595 | 5.74 |
A1− | Placebo | 21 | 120.05 | 6.089 | 120.17 | 6.76 | 24.604 | 4.415 |
A1− | Sulpiride | 17 | 117.659 | 7.494 | 117.518 | 8.318 | 25.064 | 5.393 |
sd standard deviation.
Sulpiride serum concentration measurements
The level of serum sulpiride was determined by high-performance liquid chromatography. This method utilises fluorescence endpoint detection with prior solvent extraction. The excitation and emission wavelengths were 300 and 360 nm, respectively. Both intra- and inter-assay coefficients of variation (CVs) were 10% and the limit of detection was 5–10 ng/ml.
Prolactin level assessment
The prolactin level was measured using a commercial immunoradiometric assay (MP Biomedicals, Santa Ana, CA, USA), 3 h after capsule ingestion. Prolactin levels were expected to increase with blocking postsynaptic D2 receptors97. The intra- and inter-assay coefficients of variation were 4.2% and 8.2%, respectively, and the limit of detection was 0.5 ng ml−125. We found that sulpiride administration significantly increased blood plasma prolactin levels ( = 33.1 mg/ml, p < 0.001), and this increase was significantly higher (p < 0.001) than the changes in the placebo group ( = −0.91 mg/ml, Mann Whitney test for differences p < 0.001). Data for three participants were excluded due to blood contamination.
Side-effects and mood assessments
Side effects were assessed with a neurovegetative list99, 3 h after drug intake. Mood was assessed with a visual analogue scale at baseline and 3 h after drug intake. Items in the visual analogue scales (VAS) were alert/drowsy, calm/excited, strong/feeble, muzzy/clear-headed, well coordinated/clumsy, lethargic/energetic, contented–discontented, troubled–tranquil, mentally slow/quick-witted, tense/relaxed, attentive/dreamy, incompetent/proficient, happy/sad, antagonistic/amicable, interested/bored and withdrawn/gregarious. The factors “alertness”, “contentedness”, and “calmness” were calculated from these items100. Data from one participant were excluded due to technical issues. We found no credible evidence of drug effects on mood, heart rate, blood pressure or self-reported side-effects (for details see Supplementary Material).
Repeated trust game
In the Trust game51 an investor (Player A) decides on how much money they want to transfer to the other player, called the trustee (Player B). The trustee receives the investment that is however tripled by the experimenter and decides on how to split the acquired sum. We used a multi-round version of the task101, where the interchange between the investor and the trustee repeated across 25 trials. In the beginning of each trial both players were endowed with 10 points, to avoid investments motivated by inequality aversion. Each point converted to two pence at the end of the experiment. The participants could invest points on a scale from 0 to 10 and the trustee could respond in a binary fashion, by either equalising the payoff, or defecting by keeping all the points in the trial for themselves. Participants played as investors against two pre-programmed trustees: one defected in 7 out of 25 trials (the good agent) and the other defected in 18 out of 25 trials (the bad agent). The feedback was pseudorandomized separately for each participant and was interleaved whereby only two consecutive trials with the same trustee were allowed. To increase ecological validity, the participants were led to believe that they play against two actual people that have already given their answers in advance several weeks before the testing, and that their decision will impact the payoff of these participants. All paradigms were programmed in Visual Basic.
Positive and negative reciprocity games
In the positive reciprocity game, the two players need to distribute 800 points. First, player A is offered a distribution whereby they get 800 points and player B gets 0 points. They can decide to either keep all the points or delegate the decision on how to divide the points to player B. If the decision was to delegate, to player B can decide on any point distribution between the two players. Participants in our study played as player B sequentially against 7 different people playing as player A. The negative reciprocity game is like a Trust game in which defecting behaviour of the trustee can be punished by the investor. Both players are first endowed with 10 points. Player A then decides to either transfer his endowment (all the 10 points) all transfer nothing. The transfer of player A is quadrupled by the experimenter. Player B can then decide to either keep everything to themselves or to equalise the payoff. Following the decision of player B, both players get endowed with another 20 points and player A can spend each of these 20 points to penalise player B’s outcome, whereby each penalty point of player A spent this way deducts three times as many points from player B’s outcome. Participants in our study played as player A against 7 different people playing as player B. The actions of people playing player B were pre-programmed so that 5 out of 7 defected.
In both games the participants were told that the players have given their answers already days before the testing. Each point converts to 0.2 pence for the positive reciprocity game and 4 pence for the negative reciprocity game.
Spatial working memory task
In the Spatial working memory task the participants were required to search through a spatial array of coloured squared boxes for a hidden ‘token’57 using a tablet. Participants have to touch the box to open it in order to reveal whether the token is in the box or not. When a token is found, the search starts again, whereby no token will be hidden in a box twice in the same trial. The performance measure is the number of search errors defined as errors committed when participants choose a box that has already had a token in that trial. There are 3, 4, 6, 8, 10 or 12 boxes in each trial, making the task progressively harder. The three boxes search were used as practice trials and a successful completion of them was a requirement for progressing onto the main test. The other difficulties each appeared three times. One subjects did not complete the working memory task and was removed from the analysis.
Behavioural analysis
Behavioural analysis was done with Bayesian multilevel (generalised) linear regression55, fitted with the brms package in R102 through RStudio. All models were run with 4 chains, 3000 iterations each with 800 warm-up. The quality of chain convergence was inspected visually based on trace plots of main fixed effects, and a threshold on Gelman-Rubin R̂ Statistic for each parameter was set to 1.01103. Throughout the behavioural analysis we z-scored the dependent variables (across the whole group), coded the Treatment variable as 0 (placebo) and 1 (sulpiride), Back-transfer as 1 (equalise) and −1 (betray), Genotype as 0 (A1−) and 1 (A1 + ) and centred the trustee variable (0.5 for good, and −0.5 for bad trustee). All random effects were modelled as a multivariate normal distribution, thereby evaluating the correlation between the effects as well as pooling information across the effects. Priors used are depicted in Table 2. The effect sizes where calculated by dividing the regression coefficients with the square root of summed variances of the residuals and of all random effects104. All models were redone also in the lme4 package105 or nlme package106, and the results of those models are reported in the supplementary material.
Table 2.
Standard Deviations | |
---|---|
Regression Coefficients | |
Prior for the correlation matrix |
Analysing investment behaviour
All model summary tables are in the supplementary material. The effect of sulpiride on absolute change from one trial to the next was evaluated with a model predicting effects on absolute change of sulpiride and trials with random intercepts for each participant. We also rerun the model including a participant-level slope for the trial and found that it does not affect inference about the main effect, but does increase the uncertainty around the interaction term. Next, the Genotype and Trustee as group level predictors were included as well as a random slope for the Trustee for each participant. Since the dependent variable is bounded at 0, the same analysis was done again with the dependent variable shifted by 1 and log transformed. This did not affect the conclusion of the model.
To analyse relative changes in investment the z-scored relative change from one trial to the next was predicted from the variable for Back-transfer (coded as −1 and 1), Treatment, Genotype, Trustee as well as their interactions, with a participant-level random intercept and slope for the Trustee.
The reciprocal and mistake trials were analysed with a multilevel logistic regression model including predictors Treatment, Genotype, Trustee, and their interaction, again with a random intercept and slope for the effects of the Trustee for each participant.
To analyse average investments, we used a multilevel ordinal-logistic regression model, with Treatment, Genotype, Trustee, Trial and their interaction, with a random intercept and slope for the effects of the Trustee for each participant.
Models analysing the single-round reciprocity tasks predicted Punishment (negative reciprocity) and Back transfer (positive reciprocity) from Treatment, Genotype and their interaction, including a random intercept per participant.
Computational modelling
We first defined a generative model of the evolution of beliefs about the other players’ trustworthiness as a Gaussian random walk. The belief volatility parameter describes the degree to which these beliefs can change from one trial to the next. We then used the Hierarchical Gaussian Filter (HGF) to invert this model6.
Generative model
The generative model describes the evolution of beliefs about the other person’s trustworthiness as a Gaussian random walk with a step size of . In particular, at trial the belief on the other player’s trustworthiness is defined as
1 |
where is a participant level parameter. The mapping from the trustworthiness beliefs to the probability of a positive Back transfer () occurs through a sigmoid transform . So, at trial we define:
2 |
3 |
Model inversion and update equations
To define the inferred participant level belief trajectories the generative model is inverted using Hierarchical Gaussian Filtering6,52. The HGF approximates full Bayesian inference using variational Bayes to derive at trial level update equations that resemble those of a Kalman filter69,107. In particular, the weights (learning rates) on the PEs are determined by the precision of prior beliefs as well as the uncertainty about the outcome. The HGF provides inferred posterior distributions of participants’ belief trajectories as Gaussians through the mean and variance or its inverse, the precision in the update equations for both time series:
4 |
5 |
6 |
7 |
8 |
where is the prediction error, is the precision weight (learning rate) that is determined by the expected precision of prediction (), and and are free parameters in the model. This is a so-called recognition or perceptual model and describes our beliefs about the belief of the participants. To map the beliefs about the trustworthiness of the other person ()on to the probability of feedback we used two versions of non-linear mapping from beliefs to probabilities:
9 |
10 |
where is a probability weighting function on the unit interval, with another free parameter that determines the skew of the function. We estimated either or and fixed the other parameter.
We then mapped the probability of feedback to behaviour of the participant with a response model is defined through a likelihood function. Because investments occur on an ordinal scale we used the ordered logistic link function108:
11 |
Where is a vector of intercepts and is the noise parameter that (similarly to the inverse temperature in the softmax equation) determines to what degree belief about the other person’s trustworthiness determines investment behaviour. The ordered-logit estimates 10 intercepts, that determine the mapping from the linear term to the ordinal investments. In an ideal case, we would estimate all 10 intercepts for each subject, which was not feasible with our data. We therefore estimate the 10 intercepts for all subjects, add one subject-level intercept in the linear term and assist the model in accounting for the various investment distributions with the non-linear mapping from beliefs to probabilities enabled either by or In the winning HGF model, was fixed to and was estimated. In the second HGF model, was fixed to and was estimated as a free parameter.
We also compared both HGF models to a simple Rescorla-Wagner model109 with separate learning rates for positive and negative feedback. Given a learning rate , and an action value of a chosen action on trial , we defined the updating equation as:
12 |
13 |
while using the same likelihood function as for the HGF models. We estimated a different for positive and negative outcomes.
Parameter estimation
The model parameters were estimated in one hierarchical Bayesian model. This approach reduces overfitting55, pools information across different levels (drug groups, and participants) and allows us to estimate both participant and group level parameters in one inferential step. Meaning, we estimate the effects of our drug manipulation on all relevant computational parameters in one model, while at the same time, leading to more stable parameter estimates110. Models were implemented in Stan111 using R as the programming language and RStudio as the integrated development environment for R. Each candidate model with four independent chains and 3000 iterations (800 warm-up). Convergence of sampling chains was estimated through the Gelman-Rubin statistic103, whereby we considered values smaller than or equal to as acceptable.
The intercepts from the response model, , were estimated on the group level. This determined a general mapping from the probability to Investment. The participant level parameters (, , , and ) were modelled as a multivariate Gaussian distribution:
14 |
Where is the covariance matrix and is the vector of means. The parameter denotes the modelled difference in between the good and the bad Trustee. The matrix was factored into a diagonal matrix with standard deviations and the correlation matrix . The prime denotes the parameters in estimation space, whereby was estimated in log space, due to it being lower bound by 0. The vector included all group level regression coefficients for the drug, genotype, and their interaction. The priors for group-level means for non-transformed parameters were weakly informative, for , estimated in log-space, the prior was , the prior for group-level standard deviations were more regularising, with , and the prior for the correlation matrix was . The prior for group level was set to something above 0, because chains that sampled from areas too close to 0 usually got stuck in that area. An overview of priors of parameters across the three models can be seen in Table 3.
Table 3.
HGF with | HGF with | RW | |
Hyperpriors (random effects) |
|
|
|
Priors of group level (fixed) effects |
|
To control to what degree working memory affects inference about the effects of drug and genotype we additionally estimated another model that included the number of search errors (z-scored) as a covariate on the group-level effect. To calculate the subject-level residuals after accounting for spatial working memory data, we ran another model that did not include drug data, meaning the only group level parameter affecting the group level means was the regression coefficient for te spatial working memory.
Model validation and comparison
For parameter recovery 5 parameter sets were drawn from each participant’s mean and standard deviation and used to simulate data. Simulated data were then estimated with the same model and the re-estimated parameters were correlated with the simulated ones. Further, posterior distributions of parameters were used to simulate data and check whether the crucial aspects of behaviour are captured by the model. A trial based Leave-One-Out Information Criterion (LOOIC) was used to compare the three models112 using the loo package in R. The LOOIC approximate out-of-sample predictive accuracy of each trial, with lower LOOIC scores indicating better prediction accuracy out of sample. In addition, the performance of the models was compared with the with the “loo_compare” which return the difference (and the standard error) of each model to the best performing model in terms of expected predictive accuracy on a log scale (ELPD).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Acknowledgements
This paper is dedicated to the memory of Christoph Eisenegger. This research work was funded by the Vienna Science and Technology Fund (WWTF) with a grant (VRG13-007) awarded to Christoph Eisenegger and Claus Lamm. Christoph Eisenegger was also supported by the Swiss National Science Foundation (PA00P1_134135). We are grateful for the participation of all NIHR Cambridge BioResource (CBR) participants and thank the Cambridge BioResource staff for their help with participant recruitment. We also thank members of the Cambridge BioResource SAB and Management Committee for their support given to our study and the National Institute for Health Research Cambridge Biomedical Research Centre for funding. The paper has significantly improved from the constructive and insightful remarks of the three reviewers. Open access funding was provided by University of Vienna and Durham University.
Author contributions
Study Design: C.E., M.N., U.M., L.C., and T.W.R.; Computational modelling: N.M., and C.M.; Software: M.N., and N.M.; Data analysis: N.M., and C.M.; Data Collection: C.E., M.N., and U.M.; Medical cover: U.M.; Resources: T.W.R., C.E. and C.L.; Data curation: C.E., N.M. and M.N.; Writing—Original draft: N.M.; Writing—reviewing and editing: all authors; Visualisation: N.M.; Supervision: M.N., T.W.R., C.M., and C.L.; Funding acquisition: C.E., and C.L.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.
Data availability
All data of the experiment is available online (10.5281/zenodo.7779029).
Code availability
The analysis scripts are available online (https://github.com/nacemikus/belief-volatility-da-trustgame.git).
Competing interests
L.C. has received royalties from Cambridge Cognition Ltd. relating to neurocognitive testing. U.M. discloses consultancy for Janssen-Cilag, Lilly, Heptares and Shire, and educational funding from AstraZeneca, Bristol-Myers Squibb, Janssen-Cilag, Lilly, Lundbeck and Pharmacia-Upjohn. T.W.R. discloses consultancy with Cambridge Cognition Ltd and a research grant with Shionogi Inc. The remaining authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Deceased: Christoph Eisenegger.
These authors jointly supervised this work: Claus Lamm, Michael Naef.
Contributor Information
Nace Mikus, Email: nace.mikus@univie.ac.at.
Claus Lamm, Email: claus.lamm@univie.ac.at.
Michael Naef, Email: michael.naef@durham.ac.uk.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-023-39823-5.
References
- 1.Meyer-Lindenberg A, Tost H. Neural mechanisms of social risk for psychiatric disorders. Nat. Neurosci. 2012;15:663–668. doi: 10.1038/nn.3083. [DOI] [PubMed] [Google Scholar]
- 2.Wellstein KV, et al. Inflexible social inference in individuals with subclinical persecutory delusional tendencies. Schizophr. Res. 2020;215:344–351. doi: 10.1016/j.schres.2019.08.031. [DOI] [PubMed] [Google Scholar]
- 3.Gromann PM, et al. Trust versus paranoia: Abnormal response to social reward in psychotic illness. Brain. 2013;136:1968–1975. doi: 10.1093/brain/awt076. [DOI] [PubMed] [Google Scholar]
- 4.Diaconescu AO, Wellstein KV, Kasper L, Mathys CD, Stephan KE. Hierarchical bayesian models of social inference for probing persecutory delusional ideation. J. Abnorm. Psychol. 2020;129:556–569. doi: 10.1037/abn0000500. [DOI] [PubMed] [Google Scholar]
- 5.FeldmanHall O, Shenhav A. Resolving uncertainty in a social world. Nat. Hum. Behav. 2019;3:426–435. doi: 10.1038/s41562-019-0590-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Mathys CD, Daunizeau J, Friston KJ, Stephan KE. A bayesian foundation for individual learning under uncertainty. Front. Hum. Neurosci. 2011;5:39. doi: 10.3389/fnhum.2011.00039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Sutton, R. S. & Barto, A. G. Reinforcement learning: an introduction. UCL,Computer Sci. Dep. Reinf. Learn. Lect. 1054 (2017). 10.1109/TNN.1998.712192
- 8.Montague PR, Dayan P, Sejnowski TJ. A framework for mesencephalic predictive Hebbian learning. J. Neurosci. 1996;16:1936–1947. doi: 10.1523/JNEUROSCI.16-05-01936.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Schultz W. Predictive Reward Signal of Dopamine Neurons. J. Neurophysiol. 1998;80:1–27. doi: 10.1152/jn.1998.80.1.1. [DOI] [PubMed] [Google Scholar]
- 10.Matsumoto M, Hikosaka O. Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature. 2009;459:837–841. doi: 10.1038/nature08028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Tobler PN, Fiorillo CD, Schultz W. Adaptive coding of reward value by dopamine neurons. Sci. (80-.). 2005;307:1642–1645. doi: 10.1126/science.1105370. [DOI] [PubMed] [Google Scholar]
- 12.Schultz W, et al. Explicit neural signals reflecting reward uncertainty. Philos. Trans. R. Soc. L. B Biol. Sci. 2008;363:3801–3811. doi: 10.1098/rstb.2008.0152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Fiorillo, C. D., Newsome, W. T. & Schultz, W. The temporal precision of reward prediction in dopamine neurons. Nat. Neurosci.11, 966–973 (2008). [DOI] [PubMed]
- 14.Fiorillo CD, Tobler PN, Schultz W. Evidence that the delay-period activity of dopamine neurons corresponds to reward uncertainty rather than backpropagating TD errors. Behav. Brain Funct. 2005;1:1–5. doi: 10.1186/1744-9081-1-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.De Lafuente V, Romo R. Dopamine neurons code subjective sensory experience and uncertainty of perceptual decisions. Proc. Natl Acad. Sci. Usa. 2011;108:19767–19771. doi: 10.1073/pnas.1117636108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Gershman SJ, Uchida N. Believing in dopamine. Nat. Rev. Neurosci. 2019;20:703–714. doi: 10.1038/s41583-019-0220-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Friston KJ, Stephan KE, Montague PR, Dolan JR. Computational psychiatry: The brain as a phantastic organ. Lancet Psychiatry. 2014;1:148–158. doi: 10.1016/S2215-0366(14)70275-5. [DOI] [PubMed] [Google Scholar]
- 18.Gershman SJ. Dopamine, Inference, and Uncertainty. Neural Comput. 2018;3326:3311–3326. doi: 10.1162/neco_a_01023. [DOI] [PubMed] [Google Scholar]
- 19.Mikhael JG, Bogacz R. Learning Reward Uncertainty in the Basal Ganglia. PLoS Comput. Biol. 2016;12:1–28. doi: 10.1371/journal.pcbi.1005062. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Adams RA, Stephan KE, Brown HR, Frith CD, Friston KJ. The Computational Anatomy of Psychosis. Front. Psychiatry. 2013;4:1–26. doi: 10.3389/fpsyt.2013.00047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Babayan BM, Uchida N, Gershman SJ. Belief state representation in the dopamine system. Nat. Commun. 2018;9:1891. doi: 10.1038/s41467-018-04397-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Adams RA, et al. Variability in Action Selection Relates to Striatal Dopamine 2/3 Receptor Availability in Humans: A PET Neuroimaging Study Using Reinforcement Learning and Active Inference Models. Cereb. Cortex. 2020;30:3573–3589. doi: 10.1093/cercor/bhz327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Eisenegger C, et al. DAT1 Polymorphism Determines L-DOPA Effects on Learning about Others’ Prosociality. PLoS ONE. 2013;8:e67820. doi: 10.1371/journal.pone.0067820. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Schwartenbeck P, FitzGerald THB, Mathys CD, Dolan JR, Friston KJ. The dopaminergic midbrain encodes the expected certainty about desired outcomes. Cereb. Cortex. 2015;25:3434–3445. doi: 10.1093/cercor/bhu159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Eisenegger C, et al. Role of Dopamine D2 Receptors in Human Reinforcement Learning. Neuropsychopharmacology. 2014;39:2366–2375. doi: 10.1038/npp.2014.84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Yao WD, Spealman RD, Zhang J. Dopaminergic signaling in dendritic spines. Biochem. Pharmacol. 2008;75:2055–2069. doi: 10.1016/j.bcp.2008.01.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Friston KJ. Hierarchical models in the brain. PLoS Comput. Biol. 2008;4:1000211. doi: 10.1371/journal.pcbi.1000211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Friston KJ, et al. Dopamine, affordance and active inference. PLoS Comput. Biol. 2012;8:e1002327. doi: 10.1371/journal.pcbi.1002327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Frank MJ. Dynamic dopamine modulation in the basal ganglia: A neurocomputational account of cognitive deficits in medicated and nonmedicated Parkinsonism. J. Cogn. Neurosci. 2005;17:51–72. doi: 10.1162/0898929052880093. [DOI] [PubMed] [Google Scholar]
- 30.Adams RA, Huys QJM, Roiser JP. Computational Psychiatry: Towards a mathematically informed understanding of mental illness. J. Neurol. Neurosurg. Psychiatry. 2016;87:53–63. doi: 10.1136/jnnp-2015-310737. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Adams, R. A., Vincent, P., Benrimoh, D., Friston, K. J. & Parr, T. Everything is connected: Inference and attractors in delusions. Schizophr. Res. (2021). 10.1016/j.schres.2021.07.032 [DOI] [PMC free article] [PubMed]
- 32.Frank MJ, O’Reilly RC. A mechanistic account of striatal dopamine function in human cognition: Psychopharmacological studies with cabergoline and haloperidol. Behav. Neurosci. 2006;120:497–517. doi: 10.1037/0735-7044.120.3.497. [DOI] [PubMed] [Google Scholar]
- 33.Iglesias S, et al. Cholinergic and dopaminergic effects on prediction error and uncertainty responses during sensory associative learning. Neuroimage. 2021;226:117590. doi: 10.1016/j.neuroimage.2020.117590. [DOI] [PubMed] [Google Scholar]
- 34.Jocham G, Klein TA, Ullsperger M. Dopamine-Mediated Reinforcement Learning Signals in the Striatum and Ventromedial Prefrontal Cortex Underlie Value-Based Choices. J. Neurosci. 2011;31:1606–1613. doi: 10.1523/JNEUROSCI.3904-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Eyny YS, Horvitz JC. Opposing roles of D1 and D2 receptors in appetitive conditioning. J. Neurosci. 2003;23:1584–1587. doi: 10.1523/JNEUROSCI.23-05-01584.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Pessiglione, M., Seymour, B., Flandin, G., Dolan, J. R. & Frith, C. D. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature (2006). 10.1038/nature05051 [DOI] [PMC free article] [PubMed]
- 37.Jocham G, Klein TA, Ullsperger M. Differential Modulation of Reinforcement Learning by D2 Dopamine and NMDA Glutamate Receptor Antagonism. J. Neurosci. 2014;34:13151–13162. doi: 10.1523/JNEUROSCI.0757-14.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Zhang L, Lengersdorff L, Mikus N, Gläscher J, Lamm C. Using reinforcement learning models in social neuroscience: Frameworks, pitfalls, and suggestions. Soc. Cogn. Affect. Neurosci. 2020;15:695–707. doi: 10.1093/scan/nsaa089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Richfield EK, Penney JB, Young AB. Anatomical and affinity state comparisons between dopamine D1 and D2 receptors in the rat central nervous system. Neuroscience. 1989;30:767–777. doi: 10.1016/0306-4522(89)90168-1. [DOI] [PubMed] [Google Scholar]
- 40.Bressan RA, et al. Is regionally selective D2/D3 dopamine occupancy sufficient for atypical antipsychotic effect? A in vivo quantitative [123I]epidepride SPET study of amisulpride-treated patients. Am. J. Psychiatry. 2003;160:1413–1420. doi: 10.1176/appi.ajp.160.8.1413. [DOI] [PubMed] [Google Scholar]
- 41.Cools R, D’Esposito M. Inverted-U-shaped dopamine actions on human working memory and cognitive control. Biol. Psychiatry. 2011;69:e113–e125. doi: 10.1016/j.biopsych.2011.03.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Cohen MX, Krohn-Grimberghe A, Elger CE, Weber B. Dopamine gene predicts the brain’s response to dopaminergic drug. Eur. J. Neurosci. 2007;26:3652–3660. doi: 10.1111/j.1460-9568.2007.05947.x. [DOI] [PubMed] [Google Scholar]
- 43.Collins AGE, Frank MJ. How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis. Eur. J. Neurosci. 2012;35:1024–1035. doi: 10.1111/j.1460-9568.2011.07980.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Mehta MA, Sahakian BJ, McKenna PJ, Robbins TW. Systemic sulpiride in young adult volunteers simulates the profile of cognitive deficits in Parkinson’s disease. Psychopharmacol. (Berl.) 1999;146:162–174. doi: 10.1007/s002130051102. [DOI] [PubMed] [Google Scholar]
- 45.Takano A, et al. The antipsychotic sultopride is overdosed - A PET study of drug-induced receptor occupancy in comparison with sulpiride. Int. J. Neuropsychopharmacol. 2006;9:539–545. doi: 10.1017/S1461145705006103. [DOI] [PubMed] [Google Scholar]
- 46.Mehta MA, Montgomery AJ, Kitamura Y, Grasby PM. Dopamine D2 receptor occupancy levels of acute sulpiride challenges that produce working memory and learning impairments in healthy volunteers. Psychopharmacol. (Berl.) 2008;196:157–165. doi: 10.1007/s00213-007-0947-0. [DOI] [PubMed] [Google Scholar]
- 47.Westbrook A, et al. Dopamine promotes cognitive effort by biasing the benefits versus costs of cognitive work. Sci. (80-.). 2020;367:1362–1366. doi: 10.1126/science.aaz5891. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Laakso A, et al. The A1 allele of the human D2 dopamine receptor gene is associated with increased activity of striatal L-amino acid decarboxylase in healthy subjects. Pharmacogenet. Genom. 2005;15:387–391. doi: 10.1097/01213011-200506000-00003. [DOI] [PubMed] [Google Scholar]
- 49.Gluskin BS, Mickey BJ. Genetic variation and dopamine D2 receptor availability: A systematic review and meta-analysis of human in vivo molecular imaging studies. Transl. Psychiatry. 2016;6:e747. doi: 10.1038/tp.2016.22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Smith CT, et al. The impact of common dopamine D2 receptor gene polymorphisms on D2/3 receptor availability: C957T as a key determinant in putamen and ventral striatum. Transl. Psychiatry. 2017;7:e1091. doi: 10.1038/tp.2017.45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Berg, J., Dickhaut, J. & McCabe, K. Trust, reciprocity, and social history. Games Econ. Behav. (1995). 10.1006/game.1995.1027
- 52.Mathys CD, et al. Uncertainty in perception and the Hierarchical Gaussian Filter. Front. Hum. Neurosci. 2014;8:825. doi: 10.3389/fnhum.2014.00825. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Siegel JZ, Mathys CD, Rutledge RB, Crockett MJ. Beliefs about bad people are volatile. Nat. Hum. Behav. 2018;2:750–756. doi: 10.1038/s41562-018-0425-1. [DOI] [PubMed] [Google Scholar]
- 54.Adams RA, Napier G, Roiser JP, Mathys CD, Gilleen J. Attractor-like dynamics in belief updating in schizophrenia. J. Neurosci. 2018;38:9471–9485. doi: 10.1523/JNEUROSCI.3163-17.2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.McElreath, R. Statistical rethinking: A bayesian course with examples in R and stan. Statistical Rethinking: A Bayesian Course with Examples in R and Stan (2018). 10.1201/9781315372495
- 56.Kruschke, J. K. Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan, second edition. (Academic Press, 2014). 10.1016/B978-0-12-405888-0.09999-2
- 57.Naef M, et al. Effects of dopamine D2/D3 receptor antagonism on human planning and spatial working memory. Transl. Psychiatry. 2017;7:e1107. doi: 10.1038/tp.2017.56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Guggenmos M, Wilbertz G, Hebart MN, Sterzer P. Mesolimbic confidence signals guide perceptual learning in the absence of external feedback. Elife. 2016;5:1–19. doi: 10.7554/eLife.13388. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Tomassini A, Ruge D, Galea JM, Penny W, Bestmann S. The role of dopamine in temporal uncertainty. J. Cogn. Neurosci. 2016;28:96–110. doi: 10.1162/jocn_a_00880. [DOI] [PubMed] [Google Scholar]
- 60.Diederen KMJ, Spencer T, Vestergaard MD, Fletcher PC, Schultz W. Adaptive Prediction Error Coding in the Human Midbrain and Striatum Facilitates Behavioral Adaptation and Learning Efficiency Adaptive Prediction Error Coding in the Human Midbrain and Striatum Facilitates Behavioral Adaptation and Learning Efficiency. Neuron. 2016;90:1127–1138. doi: 10.1016/j.neuron.2016.04.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Diederen KMJ, et al. Dopamine modulates adaptive prediction error coding in the human midbrain and striatum. J. Neurosci. 2017;37:1708–1720. doi: 10.1523/JNEUROSCI.1979-16.2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Marshall L, et al. Pharmacological Fingerprints of Contextual Uncertainty. PLoS Biol. 2016;14:1–31. doi: 10.1371/journal.pbio.1002575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.FeldmanHall O, Raio CM, Kubota JT, Seiler MG, Phelps EA. The Effects of Social Context and Acute Stress on Decision Making Under Uncertainty. Psychol. Sci. 2015;26:1918–1926. doi: 10.1177/0956797615605807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Rosenberger LA, et al. The Human Basolateral Amygdala Is Indispensable for Social Experiential Learning. Curr. Biol. 2019;29:3532–3537. doi: 10.1016/j.cub.2019.08.078. [DOI] [PubMed] [Google Scholar]
- 65.Fehr E. On The Economics and Biology of Trust. J. Eur. Econ. Assoc. 2009;7:235–266. doi: 10.1162/JEEA.2009.7.2-3.235. [DOI] [Google Scholar]
- 66.Bohnet I, Greig F, Herrmann B, Zeckhauser R. Betrayal aversion: Evidence from brazil, china, oman, switzerland, turkey, and the united states. Am. Econ. Rev. 2008;98:294–310. doi: 10.1257/aer.98.1.294. [DOI] [Google Scholar]
- 67.Frank MJ, Seeberger LC, O’Reilly RC. By carrot or by stick: Cognitive reinforcement learning in Parkinsonism. Sci. (80-.). 2004;306:1940–1943. doi: 10.1126/science.1102941. [DOI] [PubMed] [Google Scholar]
- 68.Thorndike, E. L. Animal Intelligence; Experimental Studies (Macmillan, New York, 1911).
- 69.Gershman SJ. A Unifying Probabilistic View of Associative Learning. PLoS Comput. Biol. 2015;11:1–20. doi: 10.1371/journal.pcbi.1004567. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Collins AGE, Brown JK, Gold JM, Waltz JA, Frank MJ. Working Memory Contributions to Reinforcement Learning Impairments in Schizophrenia. J. Neurosci. 2014;34:13747–13756. doi: 10.1523/JNEUROSCI.0989-14.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Collins AGE, Ciullo B, Frank MJ, Badre D. Working Memory Load Strengthens Reward Prediction Errors. J. Neurosci. 2017;37:4332–4342. doi: 10.1523/JNEUROSCI.2700-16.2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Camerer, C. F. Behavioral Game Theory: Experiments in Strategic Interaction (Princeton University Press, Princeton, 2003).
- 73.Fehr E, Fischbacher U, Gächter S. Strong reciprocity, human cooperation, and the enforcement of social norms. Hum. Nat. 2002;13:1–25. doi: 10.1007/s12110-002-1012-7. [DOI] [PubMed] [Google Scholar]
- 74.Soares-Cunha C, et al. Activation of D2 dopamine receptor-expressing neurons in the nucleus accumbens increases motivation. Nat. Commun. 2016;7:11829. doi: 10.1038/ncomms11829. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Findling C, Skvortsova V, Dromnelle R, Palminteri S, Wyart V. Computational noise in reward-guided learning drives behavioral variability in volatile environments. Nat. Neurosci. 2019;22:2066–2077. doi: 10.1038/s41593-019-0518-9. [DOI] [PubMed] [Google Scholar]
- 76.Sridharan D, Prashanth PS, Chakravarthy VS. The role of the basal ganglia in exploration in A neural model based on reinforcement learning. Int. J. Neural Syst. 2006;16:111–124. doi: 10.1142/S0129065706000548. [DOI] [PubMed] [Google Scholar]
- 77.Lee E, Seo M, Dal Monte O, Averbeck BB. Injection of a dopamine type 2 receptor antagonist into the dorsal striatum disrupts choices driven by previous outcomes, but not perceptual inference. J. Neurosci. 2015;35:6298–6306. doi: 10.1523/JNEUROSCI.4561-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Kwak, S. et al. Role of dopamine D2 receptors in optimizing choice strategy in a dynamic and uncertain environment. Front Behav. Neurosci.8, 368 (2014). [DOI] [PMC free article] [PubMed]
- 79.Friston KJ, et al. The anatomy of choice: active inference and agency. Front. Hum. Neurosci. 2013;7:1–18. doi: 10.3389/fnhum.2013.00598. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Friston KJ, et al. Active inference and epistemic value. Cogn. Neurosci. 2015;6:187–214. doi: 10.1080/17588928.2015.1020053. [DOI] [PubMed] [Google Scholar]
- 81.Nour MM, et al. Dopaminergic basis for signaling belief updates, but not surprise, and the link to paranoia. Proc. Natl Acad. Sci. 2018;115:E10167–E10176. doi: 10.1073/pnas.1809298115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Fuchs T. The intersubjectivity of delusions. World Psychiatry. 2015;14:178–179. doi: 10.1002/wps.20209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Freeman D. Persecutory delusions: a cognitive perspective on understanding and treatment. Lancet Psychiatry. 2016;3:685–692. doi: 10.1016/S2215-0366(16)00066-3. [DOI] [PubMed] [Google Scholar]
- 84.Fett AKJ, et al. To trust or not to trust: The dynamics of social interaction in psychosis. Brain. 2012;135:976–984. doi: 10.1093/brain/awr359. [DOI] [PubMed] [Google Scholar]
- 85.Sterzer P, et al. The Predictive Coding Account of Psychosis. Biol. Psychiatry. 2018;84:634–643. doi: 10.1016/j.biopsych.2018.05.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Hauke DJ, et al. Increased Belief Instability in Psychotic Disorders Predicts Treatment Response to Metacognitive Training. Schizophr. Bull. 2022;48:826–838. doi: 10.1093/schbul/sbac029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Valenti O, Cifelli P, Gill KM, Grace AA. Antipsychotic drugs rapidly induce dopamine neuron depolarization block in a developmental rat model of schizophrenia. J. Neurosci. 2011;31:12330–12338. doi: 10.1523/JNEUROSCI.2808-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Kapur S, Agid O, Mizrahi R, Li M. How antipsychotics work-from receptors to reality. NeuroRx. 2006;3:10–21. doi: 10.1016/j.nurx.2005.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Grace AA, Bunney BS, Moore H, Todd CL. Dopamine-cell depolarization block as a model for the therapeutic actions of antipsychotic drugs. Trends Neurosci. 1997;20:31–37. doi: 10.1016/S0166-2236(96)10064-3. [DOI] [PubMed] [Google Scholar]
- 90.Zhang JP, Lencz T, Malhotra AK. D2 receptor genetic variation and clinical response to antipsychotic drug treatment: A meta-analysis. Am. J. Psychiatry. 2010;167:763–772. doi: 10.1176/appi.ajp.2009.09040598. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Howes OD, et al. The Nature of Dopamine Dysfunction in Schizophrenia and What This Means for Treatment. Arch. Gen. Psychiatry. 2012;69:776–786. doi: 10.1001/archgenpsychiatry.2012.169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Fusar-Poli P, Meyer-Lindenberg A. Striatal presynaptic dopamine in schizophrenia, part II: Meta-analysis of [18F/11C]-DOPA PET studies. Schizophr. Bull. 2013;39:33–42. doi: 10.1093/schbul/sbr180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Jauhar S, et al. Determinants of treatment response in first-episode psychosis: an 18F-DOPA PET study. Mol. Psychiatry. 2019;24:1502–1512. doi: 10.1038/s41380-018-0042-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Petersen N, et al. Striatal dopamine D2-type receptor availability and peripheral 17β-estradiol. Mol. Psychiatry. 2021;26:2038–2047. doi: 10.1038/s41380-020-01000-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Hoekstra S, et al. Sex differences in antipsychotic efficacy and side effects in schizophrenia spectrum disorder: results from the BeSt InTro study. npj Schizophr. 2021;7:39. doi: 10.1038/s41537-021-00170-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Martins D, Mehta MA, Prata DP. The “highs and lows” of the human brain on dopaminergics: Evidence from neuropharmacology. Neurosci. Biobehav. Rev. 2017;80:351–371. doi: 10.1016/j.neubiorev.2017.06.003. [DOI] [PubMed] [Google Scholar]
- 97.Wiesel F-A, Alfredsson G, Ehrnebo M, Sedvall G. Prolactin response following intravenous and oral sulpiride in healthy human subjects in relation to sulpiride concentrations. Psychopharmacol. (Berl.) 1982;76:44–47. doi: 10.1007/BF00430753. [DOI] [PubMed] [Google Scholar]
- 98.Bressolle F, Bres J, Fauré‐Jeantis A. Absolute bioavailability, rate of absorption, and dose proportionality of sulpiride in humans. J. Pharm. Sci. 1992;81:26–32. doi: 10.1002/jps.2600810106. [DOI] [PubMed] [Google Scholar]
- 99.Rush CR, Stoops WW, Hays LR, Glaser PEA, Hays LS. Risperidone attenuates the discriminative-stimulus effects of d-amphetamine in humans. J. Pharmacol. Exp. Ther. 2003;306:195–204. doi: 10.1124/jpet.102.048439. [DOI] [PubMed] [Google Scholar]
- 100.Eisenegger C, et al. Dopamine receptor D4 polymorphism predicts the effect of L-DOPA on gambling behavior. Biol. Psychiatry. 2010;67:702–706. doi: 10.1016/j.biopsych.2009.09.021. [DOI] [PubMed] [Google Scholar]
- 101.King-Casas B. Getting to Know You: Reputation and Trust in a Two-Person Economic Exchange. Sci. (80-.). 2005;308:78–83. doi: 10.1126/science.1108062. [DOI] [PubMed] [Google Scholar]
- 102.Bürkner, P.-C. brms: An R Package for Bayesian Multilevel Models Using Stan. J. Stat. Softw. (2017). 10.18637/jss.v080.i01
- 103.Gelman, A. & Rubin, D. B. Inference from iterative simulation using multiple sequences. Stat. Sci. (1992). 10.1214/ss/1177011136
- 104.Nalborczyk L, Batailler C, Loevenbruck H, Vilain A, Bürkner P-C. An introduction to bayesian multilevel models using brms: A case study of gender effects on vowel variability in standard Indonesian. J. Speech Lang. Hear. Res. 2019;62:1225–1242. doi: 10.1044/2018_JSLHR-S-18-0006. [DOI] [PubMed] [Google Scholar]
- 105.Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. arXiv Prepr. arXiv1406.5823 (2014).
- 106.Pinheiro J, Bates D, DebRoy S, Sarkar D, Team RC. Linear and nonlinear mixed effects models. R. Packag. Version. 2007;3:1–89. [Google Scholar]
- 107.Kakade S, Dayan P. Acquisition and extinction in autoshaping. Psychol. Rev. 2002;109:533–544. doi: 10.1037/0033-295X.109.3.533. [DOI] [PubMed] [Google Scholar]
- 108.Bürkner P-C, Vuorre M. Ordinal Regression Models in Psychology: A Tutorial. Adv. Methods Pract. Psychol. Sci. 2019;2:77–101. doi: 10.1177/2515245918823199. [DOI] [Google Scholar]
- 109.Rescorla, R. & Wagner, A. A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. in Classical conditioning: current research and theory, 2 (1972). 10.1101/gr.110528.110
- 110.Ahn, W.-Y., Haines, N. & Zhang, L. Revealing Neurocomputational Mechanisms of Reinforcement Learning and Decision-Making With the hBayesDM Package. Comput. Psychiatry (2017). 10.1162/cpsy_a_00002 [DOI] [PMC free article] [PubMed]
- 111.Carpenter, B. et al. Stan: A probabilistic programming language. J. Stat. Softw. (2017). 10.18637/jss.v076.i01 [DOI] [PMC free article] [PubMed]
- 112.Vehtari, A., Gelman, A. & Gabry, J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat. Comput. (2017). 10.1007/s11222-016-9696-4
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data of the experiment is available online (10.5281/zenodo.7779029).
The analysis scripts are available online (https://github.com/nacemikus/belief-volatility-da-trustgame.git).