Humans use forward thinking to exploit social controllability

Soojung Na; Dongil Chung; Andreas Hula; Ofer Perl; Jennifer Jung; Matthew Heflin; Sylvia Blackmore; Vincenzo G Fiore; Peter Dayan; Xiaosi Gu

doi:10.7554/eLife.64983

. 2021 Oct 29;10:e64983. doi: 10.7554/eLife.64983

Humans use forward thinking to exploit social controllability

Soojung Na ^1,^2,^3,^†, Dongil Chung ^4,^†, Andreas Hula ⁵, Ofer Perl ³, Jennifer Jung ⁶, Matthew Heflin ³, Sylvia Blackmore ^3,⁷, Vincenzo G Fiore ³, Peter Dayan ^8,⁹, Xiaosi Gu ^2,^3,^✉

Editors: Catherine Hartley¹⁰, Christian Büchel¹¹

PMCID: PMC8555988 PMID: 34711304

Abstract

The controllability of our social environment has a profound impact on our behavior and mental health. Nevertheless, neurocomputational mechanisms underlying social controllability remain elusive. Here, 48 participants performed a task where their current choices either did (Controllable), or did not (Uncontrollable), influence partners’ future proposals. Computational modeling revealed that people engaged a mental model of forward thinking (FT; i.e., calculating the downstream effects of current actions) to estimate social controllability in both Controllable and Uncontrollable conditions. A large-scale online replication study (n=1342) supported this finding. Using functional magnetic resonance imaging (n=48), we further demonstrated that the ventromedial prefrontal cortex (vmPFC) computed the projected total values of current actions during forward planning, supporting the neural realization of the forward-thinking model. These findings demonstrate that humans use vmPFC-dependent FT to estimate and exploit social controllability, expanding the role of this neurocomputational mechanism beyond spatial and cognitive contexts.

Research organism: Human

Introduction

Humans do not always have influence over the environments which they occupy. A lack of controllability has a profound impact on mental health, as has been demonstrated by decades of research on uncontrollable stress, pain, and learned helplessness (Maier and Seligman, 1976; Maier and Watkins, 2005; Overmier, 1968; Weiss, 1968). Conversely, high levels of controllability have been associated with better mental health outcomes such as higher subjective well-being (Lachman and Weaver, 1998) and less negative affect (Maier and Seligman, 2016; Southwick and Southwick, 2018). For humans, one of the most important types of controllability we need to track concerns our social environment. Doing this could be one of the roles of the various neural systems whose involvement in social cognition is supported by mounting evidence (Atzil et al., 2018; Dunbar and Shultz, 2007). Nevertheless, despite the importance, the neurocomputational mechanisms underlying social controllability have not been systematically investigated.

Based on previous work demonstrating the computational mechanisms of controllability in non-social environments, here we hypothesize that people use mental models to exploit social controllability, for instance via forward simulation. In non-social contexts, it has been proposed that controllability quantifies the extent to which the acquisition of outcomes, and particularly desired outcomes, can be influenced by the choice of actions (Huys and Dayan, 2009; Dorfman and Gershman, 2019; Ligneul, 2021). In these non-social settings, agents need to learn the association between actions and state (event) transitions and potential outcomes in order to simulate future possibilities (Pezzulo et al., 2013; Szpunar et al., 2014) and make decisions (Daw et al., 2011; Dolan and Dayan, 2013; Doll et al., 2015; Gläscher et al., 2010). It has also been hypothesized that both under- and over-estimation of controllability could be detrimental to behavior (Huys and Dayan, 2009) depending on the complexity of the environment. Yet, it remains unknown whether this is true for social controllability.

Studies on strategic decision-making (Camerer, 2011) have provided initial insight into the possible mechanisms underlying social controllability and influence. For example, Hampton et al., 2008 showed that people can learn the influence of their own actions on others during an iterative inspection game; and that the medial prefrontal cortex (mPFC) tracked expected reward given the degree of expected influence (Hampton et al., 2008). In other types of strategic games such as bargaining, it has been suggested that individuals differ drastically in their ability to manage their social images and exert influence on others, a behavioral phenomenon subserved by underlying neural differences in prefrontal regions (Bhatt et al., 2010). Furthermore, through the application of an interactive partially observable Markov decision process model, Hula et al., 2015 found that humans are able to use forward planning and mentally simulate future interactions in an iterative trust game (Hula et al., 2015). All of these studies suggest that learning the structure of the social environment is crucial for exerting influence, yet none have systematically examined the computational underpinnings of social controllability in a group setting where an agent plays with multiple other players that constitute a more social-like environment.

Neurally, along with recent findings about its role in providing a representational substrate for cognitive tasks (Behrens et al., 2018; Niv, 2019; Schuck et al., 2016), the ventromedial prefrontal cortex (vmPFC) has been shown to signal expected values across a wide range of settings (Boorman et al., 2009; Kable and Glimcher, 2007; FitzGerald et al., 2009; Behrens et al., 2008; Bartra et al., 2013; Venkatraman et al., 2009). The majority of studies have focused on the role of the vmPFC in encoding the subjective values of non-social choices (Boorman et al., 2009; FitzGerald et al., 2009; Kable and Glimcher, 2007; Venkatraman et al., 2009). Nevertheless, accumulating evidence also pinpoints to a central role of the vmPFC in computing the value of social choices (Behrens et al., 2008; Hampton et al., 2008; Hiser and Koenigs, 2018), such as expected values computed based on learned influence (Hampton et al., 2008). A recent meta-analysis suggests that both social and non-social subjective values reliably activate the vmPFC (Bartra et al., 2013). Thus, we expect that the vmPFC will also play an important role in social controllability where the value of future events should be simulated and computed.

In the current study, we hypothesize that humans exploit social controllability by implementing forward thinking (FT) and mentally simulating future interactions. In particular, we consider the long-lasting effect that one’s current interaction with one other person can have on future interactions with many others who constitute the social environment, for instance by developing a reputation. We predict that social agents will use forward planning to take into account not only decision variables related to the present interaction with a current partner, but also those related to future interactions with other partners from the same milieu. Finally, we hypothesize that the choice values integrating the planned paths would be signaled in the vmPFC.

We used computational modeling and functional magnetic resonance imaging (fMRI; n=48), in the context of a social exchange paradigm (see Figure 1 and Materials and methods), to test the hypothesis that FT serves as a mechanism for social controllability. Furthermore, we replicated our computational findings in a large-scale online study involving more geographically diverse participants (n=1342). Both in-person and online participants completed an economic exchange task where they did (Controllable) or did not (Uncontrollable) influence their partners’ proposals of monetary offers in the future (see Figure 1a and b, and Materials and methods for details). Participants were told that they were playing members coming from two different teams, one each for the two controllability conditions (in a counterbalanced order across subjects); in fact, they played with a computer algorithm in both cases. Supplementary file 2 provides the task instruction provided to participants. To directly compare the impact of social versus non-social contexts on individuals’ decision strategies, we further administered a matched controllability experiment where participants were explicitly told that they were playing against a computer algorithm (Figure 2—figure supplement 1 and Supplementary file 1a).

Figure 1—figure supplement 1. — (a) Participants played a social exchange task based on the ultimatum game. There were two blocks: one ‘Controllable’ condition and one ‘Uncontrollable’ condition. Order of the conditions was counterbalanced across participants. Each block had 40 (fMRI sample) or 30 (online sample) trials. In each trial, participants needed to decide whether to accept or reject the split of $20 proposed by virtual members of a team. In the fMRI study, participants rated their emotions after their choice in 60% of the trials. Upon the completion of the game, participants rated their subjective beliefs about controllability for each block. (b) The schematic of the offers (the proposed participants’ portion of the split) generation under the Controllable condition. Under the Controllable condition, if participants accepted the offer at trial t, the next offer at trial t+1 decreased by d={0, 1, or 2} (1/3 chance each). If they rejected the offer, the next offer increased by d={0, 1, or 2} (1/3 chance for each option). Such contingency did not exist in the Uncontrollable condition where the offers were randomly drawn from a Gaussian distribution (μ=5, σ=1.2, rounded to the nearest integer, max=8, min=2) and participants’ behaviors had no influence on the future offers.

Participants played against each team as the responder in a social exchange game adapted from the ultimatum game (Camerer, 2011) (single-shot games with 40 different partners (rounds) per team for the fMRI sample, and 30 rounds for the online sample). In the Uncontrollable condition, on each round, participants were offered a split of $20 from their partners and asked to decide whether to accept or reject the offer. Unbeknownst to participants, the actual offer was randomly drawn from a normal distribution (rounded and restricted to be between $2 and $8 (inclusive) for the fMRI sample and between $1 and $9 (inclusive) for the online sample; the first offer was always $5). Here, participants’ current choices had no influence on the next offers from their partners. The Controllable condition was the same except that participants could exert control over their partners using their own actions. Specifically, participants’ current decisions (i.e., to accept or reject the offer) influenced the next offers from their partners in a systematic manner. Subject only to being between $1 and $9 (inclusive), partners increased the next offer by $0, $1, or $2 (probability of ⅓ each, subject to the constraints) if the participant rejected the present offer, and decreased the next offers by $0, $1, or $2 (probability of ⅓ each, again subject to the constraints) if the participant accepted the current offer (Figure 1b and Materials and methods). Again, the starting offer was $5. At the end of the task, after all the trials were completed, we asked participants to rate how much control they believed they had over their partners’ offers in each condition using a 0–100 scale to measure their perceived action-offer contingency (‘self-reported/perceived controllability’ hereafter). In the fMRI study, on 60% of the trials, participants were also asked about their emotional state (How do you feel?) on a scale of 0 (unhappy) to 100 (happy) after they made a choice (i.e., 24 ratings per condition; see Figure 1—figure supplement 1).

Note that participants were not instructed about the statistics of the task environment nor the nature of the condition they were playing, although the instruction about the existence of two separate teams was provided to encourage participants to learn contingent rules and norms within each condition (Supplementary file 2). If participants were able to detect social controllability correctly within each condition, they would show strategic decisions that exert appropriate levels of control over others’ subsequent choices.

Results

Participants distinguished between controllable and uncontrollable environments

We first examined whether participants’ choices were sensitive to the difference in controllability between the two social environments, noting that there was no explicit instruction about this difference. Our primary measures here were the offer sizes participants received in each condition, their rejection behavior, and their self-reported controllability. If individuals learned the action-offer contingency of the controllable environment, we should observe that (1) offers received under the Controllable condition would be pushed up to a higher level than those under the Uncontrollable condition; (2) people would need to reject more offers to obtain larger future offers under the Controllable than the Uncontrollable condition; and (3) people would report higher self-reported controllability for the Controllable than for the Uncontrollable condition.

First, we found that despite the same starting offer of $5, participants indeed received higher offers over time under the Controllable compared to the Uncontrollable condition (mean_C=5.9, mean_U=4.8, t(47.45)=4.33, p<0.001; Figure 2a1, a2), indicating that individuals in general successfully exerted influence over the offers made by partners when they were given control.

Figure 2. — (a1) Participants raised the offers along the trials when they had control (Controllable), compared to when they had no control (Uncontrollable). (a2) The mean offer size was higher for the Controllable (C) than Uncontrollable (U) condition (mean_C=5.9, mean_U=4.8, t(47.45)=4.33, p<0.001). (b1) Overall rejection rates were not different between the two conditions (mean_C=50.8%, mean_U=49.1%, t(67.87)=0.43, p=0.67). (b2) However, participants were more likely to reject middle and high offers when they had control (low ($1–3): mean_C=77%, mean_U=87%, t(22)=–1.35, p=0.19; middle ($4–6): mean_C=66%, mean_U=45%, t(47)=5.41, p<0.001; high ($7–9): mean_C=28%, mean_U=8%, t(72.50)=4.00, p<0.001). Each offer bin for the Controllable in (b2) represents 23, 48, and 41 participants who were proposed the corresponding offers at least once, whereas each bin for the Uncontrollable represents all 48 participants. The t-test for each bin was conducted for those who had the corresponding offers for both conditions. (c) The self-reported controllability ratings were higher for the Controllable than Uncontrollable condition (mean_C=65.9, mean_U=43.7, t(74.55)=4.10, p<0.001; eight participants were excluded due to missing data). (d) Response times were longer for the Controllable than the Uncontrollable condition (mean_C=1.75±0.38, mean_U=1.53±0.38; paired t-test t(47)=4.34, p<0.001), suggesting that participants were likely to engage more deliberation during decision-making in the Controllable condition. A paired t-test was used for the rejection rates for low and middle offers and the self-reported controllability ratings. The t-statistics for the mean offer size, overall rejection rate, rejection rate for high offers, and self-reported controllability are from two-sample t-tests assuming unequal variance using Satterthwaite’s approximation according to the results of the F-tests for equal variance. Error bars and shades represent SEM; ***p<0.001; n.s. indicates not significant. For (a2, b1, c, d), each line represents a participant and each bold line represents the mean.

Figure 2—figure supplement 1. — (a1) Participants raised the offers along the trials when they had control (Controllable), compared to when they had no control (Uncontrollable). (a2) The mean offer size was higher for the Controllable (C) than Uncontrollable (U) condition (mean_C=5.9, mean_U=4.8, t(47.45)=4.33, p<0.001). (b1) Overall rejection rates were not different between the two conditions (mean_C=50.8%, mean_U=49.1%, t(67.87)=0.43, p=0.67). (b2) However, participants were more likely to reject middle and high offers when they had control (low ($1–3): mean_C=77%, mean_U=87%, t(22)=–1.35, p=0.19; middle ($4–6): mean_C=66%, mean_U=45%, t(47)=5.41, p<0.001; high ($7–9): mean_C=28%, mean_U=8%, t(72.50)=4.00, p<0.001). Each offer bin for the Controllable in (b2) represents 23, 48, and 41 participants who were proposed the corresponding offers at least once, whereas each bin for the Uncontrollable represents all 48 participants. The t-test for each bin was conducted for those who had the corresponding offers for both conditions. (c) The self-reported controllability ratings were higher for the Controllable than Uncontrollable condition (mean_C=65.9, mean_U=43.7, t(74.55)=4.10, p<0.001; eight participants were excluded due to missing data). (d) Response times were longer for the Controllable than the Uncontrollable condition (mean_C=1.75±0.38, mean_U=1.53±0.38; paired t-test t(47)=4.34, p<0.001), suggesting that participants were likely to engage more deliberation during decision-making in the Controllable condition. A paired t-test was used for the rejection rates for low and middle offers and the self-reported controllability ratings. The t-statistics for the mean offer size, overall rejection rate, rejection rate for high offers, and self-reported controllability are from two-sample t-tests assuming unequal variance using Satterthwaite’s approximation according to the results of the F-tests for equal variance. Error bars and shades represent SEM; ***p<0.001; n.s. indicates not significant. For (a2, b1, c, d), each line represents a participant and each bold line represents the mean.

Next, we examined the rejection patterns from the two conditions. On average, rejection rates in the two conditions were comparable (mean_C=50.8%, mean_U=49.1%, t(67.87)=0.43, p=0.67; Figure 2b1). By separating the trials each individual experienced into three levels of offer sizes (low: $1–3, medium: $4–6, and high: $7–9) and then aggregating across all individuals, we further examined whether rejection rates varied as a function of offer size. We found that participants were more likely to reject medium to high ($4–9) offers in the Controllable condition, while they showed comparable rejection rates for the low offers ($1–3) between the two conditions (low ($1–3): mean_C=77%, mean_U=87%, t(22)=–1.35, p=0.19; middle ($4–6): mean_C=66%, mean_U=45%, t(47)=5.41, p<0.001; high ($7–9): mean_C=28%, mean_U=8%, t(72.50)=4.00, p<0.001; Figure 2b2; see Figure 2—figure supplement 2 for rejection rates by each offer size). These results suggest that participants behaved in a strategic way to utilize their influence over the partners. One possible confound is that individuals may have experienced different affective states in the two conditions and changed their choice behaviors. However, this seemed unlikely because there was no significant difference in emotional rating between the Controllable and the Uncontrollable conditions (Figure 1—figure supplement 1).

As additional evidence that participants distinguished the controllability between conditions, we compared self-reported beliefs about controllability between the two conditions. Indeed, participants reported higher self-reported controllability for the Controllable than the Uncontrollable condition (mean_C=65.9, mean_U=43.7, t(74.55)=4.10, p<0.001; Figure 2c). Besides the clear indication of individuals’ recognition of the difference in controllability between conditions, the mean level of self-reported controllability for the Uncontrollable condition was 43.7%, which was still substantially higher than their actual level of controllability on future offers made by the partners (0%). This result might suggest that participants could develop an illusory sense of control when they had no actual influence over their partners’ offers.

In addition, we examined response times as an exploratory analysis and found that participants took longer time to make their decisions in the Controllable condition than the Uncontrollable condition. These results again suggest that participants differentiated the controllability between conditions (mean_C=1.75±0.38, mean_U=1.53±0.38; paired t-test t(47)=4.34, p<0.001; Figure 2d). Taken together, these findings demonstrate that participants were able to exploit and perceive their influence in a social environment when they had influence, although they have developed an illusion of control, at least to some degree, even when controllability did not exist. We delineate the computational mechanisms underlying these behaviors in the next sections.

Participants used forward thinking to exploit social controllability

We constructed computational models of participants’ choices and sought to investigate what cognitive processes might underlie people’s ability to exploit social controllability. Previous studies on value-based decision-making have shown that people can use future-oriented thinking and mentally simulate future scenarios when their current actions have an impact on the future (Daw et al., 2011; Gläscher et al., 2010; Lee et al., 2014; Moran et al., 2019). Relying on this framework, we hypothesized that individuals use FT to estimate the impact of their behavior on future social interactions.

To test this hypothesis, we constructed a set of FT models which assume that an agent computes the values of action (here, accepting or rejecting) by summing up the current value (CV) and the future value (FV) based on her estimation of the amount of controllability she has over the social interactions. These models also incorporate social norm adaptation (Gu et al., 2015) to characterize how individuals’ aversion thresholds to unfairness is adjusted by observing the counterpart teams’ proposals (Fehr and Schmidt, 1999) (see Materials and methods for details). The key individual-level parameter-of-interest in this model is the ‘expected influence,’ δ, representing the amount of the offer changes that participants thought they would induce by rejecting the current offer (see Materials and methods). We constrained the range of δ using a sigmoid function to −$2 to $2, in order to match with the range participants observed in the Controllable condition ($0–2) and to encompass what could happen in the Uncontrollable condition (−$2 to $0). Moreover, we considered the number of steps one calculates into the future (i.e., planning horizon; Figure 3a). We compared models that considered from one to four steps further in the future in addition to standalone social learning (‘0-step;’ also see Figure 3—figure supplement 5 for comparison with a model-free [MF] learning). The 0-step model only considers the utility at the current state. All other components including the utility function of the immediate rewards, and the variable initial norm and norm learning incorporated in the utility function are shared across all the candidate models. In model fitting, we excluded the first 5 out of 40 trials for the fMRI sample (30 trials for the online sample) to exclude initial exploratory behaviors and to focus on stable estimation of controllability. We also excluded the last five trials because subjects might adopt a different strategy toward the end of the interaction (e.g., ‘cashing out’ instead of trying to raise the offers higher).

Figure 3. — (a) The figure depicts how individuals’ simulated value of the offers evolves contingent upon the choices along the future steps under the Controllable condition. Future simulation was assumed to be deterministic (only one path is simulated instead of all paths being visited in a probabilistic manner). The solid and thicker arrows represent an example of a simulated path. To examine how many steps along the temporal horizon participants might simulate to exert control, we tested the candidate models considering from zero to four steps of the future horizon. (b) For both the Controllable and Uncontrollable conditions, the forward thinking (FT) models better explained participants’ behavior than the 0-step model. The 2-step FT model was selected for further analyses, because the improvement in the DIC score (Draper’s Information Criteria; Draper, 1995) was marginal for the models including further simulations (paired t-test comparing 2-step FT model with (i) 0-step Controllable: t(47)=–4.45, p<0.0001, Uncontrollable: t(47)=–4.21, p<0.001; (ii) 1-step Controllable: t(47)=–4.41, p<0.0001, Uncontrollable: t(47)=–3.01, p<0.001; (iii) 3-step Controllable: t(47)=0.39, p=0.70, Uncontrollable: t(47)=–0.04, p=0.97; (iv) 4-step Controllable: t(47)=0.06, p=0.95, Uncontrollable: t(47)=–0.12, p=0.91). (c) The choices predicted by the 2-step FT model were matched with individuals’ actual choices with an average accuracy rate of 83.7% for the Controllable and 90.1% for the Uncontrollable. Each bold black line represents mean accuracy rate. (d) The levels of expected influence drawn from the 2-step FT model were higher for the Controllable than the Uncontrollable (mean_C=1.33, mean_U=0.98, t(47)=2.90, p<0.01). Each line represents a participant and each bold line represents the mean. (e) The expected influence was positively correlated between the Controllable and the Uncontrollable conditions (R=0.30, p<0.05). (f) The self-reported controllability was not significantly correlated between the conditions (R=–0.18, p=0.26). (g) Under the Controllable condition, expected influence correlated with mean offers (R=0.78, p<<0.0001). Each dot represents a participant. Error bars and shades represent SEM; ****p<0.0001; ***p<0.001; **p<0.01; *p<0.05. C, controllable; U, Uncontrollable.

Figure 3—figure supplement 1. — (a) The figure depicts how individuals’ simulated value of the offers evolves contingent upon the choices along the future steps under the Controllable condition. Future simulation was assumed to be deterministic (only one path is simulated instead of all paths being visited in a probabilistic manner). The solid and thicker arrows represent an example of a simulated path. To examine how many steps along the temporal horizon participants might simulate to exert control, we tested the candidate models considering from zero to four steps of the future horizon. (b) For both the Controllable and Uncontrollable conditions, the forward thinking (FT) models better explained participants’ behavior than the 0-step model. The 2-step FT model was selected for further analyses, because the improvement in the DIC score (Draper’s Information Criteria; Draper, 1995) was marginal for the models including further simulations (paired t-test comparing 2-step FT model with (i) 0-step Controllable: t(47)=–4.45, p<0.0001, Uncontrollable: t(47)=–4.21, p<0.001; (ii) 1-step Controllable: t(47)=–4.41, p<0.0001, Uncontrollable: t(47)=–3.01, p<0.001; (iii) 3-step Controllable: t(47)=0.39, p=0.70, Uncontrollable: t(47)=–0.04, p=0.97; (iv) 4-step Controllable: t(47)=0.06, p=0.95, Uncontrollable: t(47)=–0.12, p=0.91). (c) The choices predicted by the 2-step FT model were matched with individuals’ actual choices with an average accuracy rate of 83.7% for the Controllable and 90.1% for the Uncontrollable. Each bold black line represents mean accuracy rate. (d) The levels of expected influence drawn from the 2-step FT model were higher for the Controllable than the Uncontrollable (mean_C=1.33, mean_U=0.98, t(47)=2.90, p<0.01). Each line represents a participant and each bold line represents the mean. (e) The expected influence was positively correlated between the Controllable and the Uncontrollable conditions (R=0.30, p<0.05). (f) The self-reported controllability was not significantly correlated between the conditions (R=–0.18, p=0.26). (g) Under the Controllable condition, expected influence correlated with mean offers (R=0.78, p<<0.0001). Each dot represents a participant. Error bars and shades represent SEM; ****p<0.0001; ***p<0.001; **p<0.01; *p<0.05. C, controllable; U, Uncontrollable.

The results showed that for both conditions (Controllable, Uncontrollable), all FT models significantly better explained participants’ choices than the standalone norm learning model without FT (0-step model) (Gu et al., 2015), as indexed by Draper’s Information Criteria (DIC) (Draper, 1995) scores averaged across individuals (paired t-test comparing 2-step FT model with 0-step model Controllable: t(47)=–4.45, p<0.0001; Uncontrollable: t(47)=–4.21, p<0.001; Figure 3b). In addition, not all parameters were recoverable in parameter recovery analysis using the 0-step model (e.g., sensitivity to norm violation; Controllable: r=–0.03, p=0.82; Uncontrollable: r=0.20, p=0.15) whereas all the parameters from the FT models were identifiable (see Figure 3—figure supplement 3a-j for parameter recovery of the 2-step model). These results suggest that participants engaged in future-oriented thinking and specifically, calculated how their current choice might affect subsequent social interactions, regardless of the actual level of controllability of the environment.

The FT models with longer planning horizon tend to show smaller DIC scores (i.e., better model fit), but the fit improvement became marginal after two steps (paired t-test comparing 2-step FT model with (i) 1-step Controllable: t(47)=–4.41, p<0.0001, Uncontrollable: t(47)=–3.01, p<0.001; (ii) 3-step Controllable: t(47)=0.39, p=0.70, Uncontrollable: t(47)=–0.04, p=0.97; (iii) 4-step Controllable: t(47)=0.06, p=0.95, Uncontrollable: t(47)=–0.12, p=0.91; Figure 3b). The 2-step FT model predicted participants’ choices with an average accuracy rate of 83.7% for the Controllable and 90.1% for the Uncontrollable condition (Figure 3c), which was higher than the 1-step model for the Controllable condition (Controllable 78.4% (t(47)=–3.63, p<0.001), Uncontrollable 88.7% (t(47)=–1.45, p=0.15)) and comparable with the models with longer planning horizon (3-step model: Controllable 84.0% (t(47)=0.20, p=0.84), Uncontrollable 90.7% (t(47)=0.62, p=0.53); 4-step model: Controllable 84.0% (t(47)=0.21, p=0.84), Uncontrollable 90.2% (t(47)=0.09, p=0.93)). Particularly, the parameter of our interest, expected influence δ, was better identified and recovered in general for the 2-step model (Controllable r=0.87, Uncontrollable r=0.79) compared to the other models (1-step model: Controllable r=0.80, Uncontrollable r=0.68; 3-step model: Controllable r=0.81, Uncontrollable r=0.68; 4-step model: Controllable r=0.89, Uncontrollable r=0.68). We thus used parameters from the 2-step FT model for subsequent analyses (see Table 1 for a full list of parameters from this model).

Table 1. Parameter estimates from the 2-step forward thinking (FT) model.

	Inverse temperature	Sensitivity to norm violation	Initial norm	Adaptation rate	Expected influence
Mean (SD)	β	α	f0	ε	δ
Controllable
fMRI sample	8.33 (8.55)	0.76 (0.29)	8.21 (7.14)	0.24 (0.24)	1.33 (0.79)
Online sample	9.77 (8.54)	0.74 (0.29)	9.01 (7.26)	0.32 (0.31)	1.34 (0.84)
Uncontrollable
fMRI sample	10.38 (8.84)	0.79 (0.31)	8.84 (6.96)	0.29 (0.24)	0.98 (0.62)
Online sample	12.94 (7.66)	0.78 (0.23)	9.07 (6.31)	0.24 (0.24)	0.90 (1.06)

Open in a new tab

It might seem counterintuitive that participants engaged a 2-step FT model to estimate the future impact of their current choices under the Uncontrollable condition. However, as in most real-life situations where the controllability of our social interactions is unknown or uncertain, participants were not explicitly told about the uncontrollability of the environment. Indeed, they incorrectly estimated that they could exert at least some control (Figure 2c). Thus, we infer that individuals attempted to make strategic decisions with belief that they have some controllability over the social environment independent of the actual controllability.

Given that participants were successful in raising offers in the Controllable condition (Figure 2a), we predicted that the expected influence parameter $δ$ would differ between the two conditions. Indeed, we found that the expected influence parameter estimates drawn from the 2-step FT model were higher for the Controllable than for the Uncontrollable condition (mean_C=1.33, mean_U=0.98, t(47)=2.90, p<0.01; Figure 3d), indicating that participants simulated greater levels of controllability when environments were in fact controllable than when they were uncontrollable. Interestingly, despite the systematic difference between the two conditions, the expected influence was still positively correlated between the conditions (r=0.30, p<0.05; Figure 3e), suggesting a trait-like characteristic of the parameter. This is in contrast with the self-reported belief about controllability, which was not correlated between the conditions (r=–0.18, p=0.26; Figure 3f; correlation between expected influence and self-reported controllability is listed in Figure 4—figure supplement 4a-d). Furthermore, we observed a positive association between expected influence and task performance during the Controllable condition (r=0.78, p<<0.0001; Figure 3g). This result suggests that those who simulated a greater level of controllability were able to raise the offers higher, indicating the beneficial effect of doing so.

Comparison with a non-social controllability task

To investigate whether our results are specific to the social domain, we ran a non-social version of the task in which participants (n=27) played the same game with the instruction of ‘playing with computer’ instead of ‘playing with virtual human partners.’ Using the same computational models, we found that not only participants exhibited similar choice patterns (Figure 2—figure supplement 1a-c), but also the 2-step FT model was still favored in the non-social task (Figure 2—figure supplement 1d,e) and that delta was still higher for the Controllable than the Uncontrollable condition (Figure 2—figure supplement 1f, mean_C=1.31, mean_U=0.75, t(26)=2.54, p<0.05).

Interestingly, a closer examination of subjective data revealed two interesting differences in the non-social task compared to the social task. First, participants’ subjective report of controllability did not differentiate between conditions in the non-social task (Figure 2—figure supplement 1g; mean_C=62.7, mean_U=56.9, t(25)=0.78, p=0.44), which suggests that the social aspect of an environment might have a unique effect on subjective beliefs about controllability. Second, inspired by previous work demonstrating the impact of reward prediction errors (RPEs) on emotional feelings (Rutledge et al., 2014), we examined the impact of norm PE (nPE) on emotion ratings for the non-social and social contexts using a mixed effect regression model (Supplementary file 1a). We found a significant interaction between social context and nPE ( $β$ =0.52, p<0.05), suggesting that the non-social context reduced the impact of nPE on emotional feelings. Taken together, these new results suggest that despite of a similar involvement of FT in exploiting controllability, the social context had a considerable impact on subjective experience during the task.

Replication of behavioral and computational findings in a large-scale online study

To test replicability and generalizability of our findings, we recruited 1342 participants from Prolific (http://prolific.co), an online survey platform, and had them play the game online (see Materials and methods for details). Notably, this online sample was more demographically diverse than the fMRI ‘healthy’ control, because we recruited them without any pre-screening or geographical constraints within the United States. Despite the greater level of diversity, the three model-agnostic findings remained robust. First, we found that the offer size increased throughout the trials under the Controllable condition, replicating the results from the fMRI sample (mean_C=6.0, mean_U 5.0, t(1,341)=20.29, p<<0.0001; Figure 4a). Second, the rejection pattern was different between the two conditions, with a more flattened rejection curve for the Controllable than for the Uncontrollable condition (low ($1–3): mean_C=66%, mean_U=86%, t(741.54)=–12.28, p<<0.0001; middle ($4–6): mean_C=67%, mean_U=59%, t(2,606)=5.96, p<<0.0001; high ($7–9): mean_C=47%, mean_U=15%, t(1,925)=31.67, p<<0.0001; Figure 4b). Specifically, the online participants rejected more medium and high offers under the Controllable than the Uncontrollable, similar to the fMRI participants. Furthermore, for low offers, online participants showed significantly lower rejection rates under the Controllable than the Uncontrollable condition, which trend was not statistically significant for the fMRI sample. Third, online participants reported higher perceived control for the Controllable than the Uncontrollable as fMRI participants did (mean_C=58.3, mean_U=25.6, t(2,579)=27.93, p<<0.0001; Figure 4c).

Figure 4. — (a) Online participants successfully increased the offer under the Controllable condition as fMRI participants did (mean_C=6.0, mean_U=5.0, t(1,341)=20.29, p<<0.0001). (b) Rejection rates binned by offer sizes differed between the two conditions in the online sample (low ($1–3): mean_C=66%, mean_U=86%, t(741.54)=–12.28, p<<0.0001; middle ($4–6): mean_C=67%, mean_U=59%, t(2,606)=5.96, p<<0.0001; high ($7–9): mean_C=47%, mean_U=15%, t(1,925)=31.67, p<<0.0001). (c) Online participants reported higher self-reported controllability for the Controllable than Uncontrollable (mean_C=58.3, mean_U=25.6, t(2,579)=27.93, p<<0.0001). (d) Consistent with the fMRI sample, expected influence was higher for the Controllable than the Uncontrollable for the online sample (mean_C=1.34, mean_U=0.90, t(1,341)=12.97, p<<0.0001). (e) The expected influence was correlated between the two conditions (r=0.18, p<<0.0001). (f) The self-reported controllability showed negative correlation between the two conditions for the online sample (r=–0.10, p<0.001). (g) The significant correlation between expected influence and mean offers under the Controllable was replicated in the online sample (r=0.50, p<<0.0001). Each dot represents a participant. The t-statistics for the mean offer size, binned rejection rate, and self-reported controllability are from two-sample t-tests assuming unequal variance using Satterthwaite’s approximation according to the results of F-tests for equal variance. Error bars and shades represent SEM. For (c, d), each line represents a participant and each bold line represents the mean. C, controllable; U, Uncontrollable.

Figure 4—figure supplement 1. — (a) Online participants successfully increased the offer under the Controllable condition as fMRI participants did (mean_C=6.0, mean_U=5.0, t(1,341)=20.29, p<<0.0001). (b) Rejection rates binned by offer sizes differed between the two conditions in the online sample (low ($1–3): mean_C=66%, mean_U=86%, t(741.54)=–12.28, p<<0.0001; middle ($4–6): mean_C=67%, mean_U=59%, t(2,606)=5.96, p<<0.0001; high ($7–9): mean_C=47%, mean_U=15%, t(1,925)=31.67, p<<0.0001). (c) Online participants reported higher self-reported controllability for the Controllable than Uncontrollable (mean_C=58.3, mean_U=25.6, t(2,579)=27.93, p<<0.0001). (d) Consistent with the fMRI sample, expected influence was higher for the Controllable than the Uncontrollable for the online sample (mean_C=1.34, mean_U=0.90, t(1,341)=12.97, p<<0.0001). (e) The expected influence was correlated between the two conditions (r=0.18, p<<0.0001). (f) The self-reported controllability showed negative correlation between the two conditions for the online sample (r=–0.10, p<0.001). (g) The significant correlation between expected influence and mean offers under the Controllable was replicated in the online sample (r=0.50, p<<0.0001). Each dot represents a participant. The t-statistics for the mean offer size, binned rejection rate, and self-reported controllability are from two-sample t-tests assuming unequal variance using Satterthwaite’s approximation according to the results of F-tests for equal variance. Error bars and shades represent SEM. For (c, d), each line represents a participant and each bold line represents the mean. C, controllable; U, Uncontrollable.

Next, we tested whether the 2-step FT model performs as well for the large online sample as for the fMRI sample. First, we assessed the accuracy rate of the 2-step FT model’s choice prediction; the mean of which was 80.2% for the Controllable and 93.9% for the Uncontrollable condition (Figure 4—figure supplement 1a). The parameters of the 2-step FT model were identifiable for the online sample as well (Figure 4—figure supplement 1b-k). Not only the model performance, but also the individual estimation results revealed consistency between the two heterogeneous samples. The parameter estimates for the online sample were comparable with the fMRI sample as shown in Table 1. The expected influence drawn from the 2-step FT model was higher for the Controllable than the Uncontrollable (mean_C=1.34, mean_U=0.90, t(1,341)=12.97, p<<0.0001; Figure 4d). Yet, consistent with the fMRI sample, the parameters for the two conditions were correlated (r=0.18, p<<0.0001; Figure 4e). The self-reported controllability showed a negative correlation between the conditions (r=–0.10, p<0.001; Figure 4f). In addition, the expected influence was positively correlated with the mean offer size (r=0.50, p<<0.0001; Figure 4g). Taken together, our independent large-scale replication results show that our suggested future thinking model explains decision processes involved in social controllability of a general population.

The vmPFC computed summed choice values from the 2-step FT model

A computational model that could explain cognitive processes should not only fit choice behavior well, but also be represented by neurobiological substrates in the brain (i.e., biological plausibility) (Cohen et al., 2017; O’Doherty et al., 2007; Wilson and Collins, 2019). Accordingly, we expected that the total (both current and future) choice values estimated by the 2-step FT model, but not those from the 0-step model (only CVs), would be signaled in the vmPFC, a brain region that is known to process subjective values (Bartra et al., 2013; Hiser and Koenigs, 2018) during both social and non-social decision-making. To test this hypothesis, we regressed at the individual level trial-by-trial simulated normalized total values (TVs) of the chosen option drawn from the 2-step FT model (or the 0-step model in a separate GLM) as parametric modulators against event-related blood-oxygen-level-dependent (BOLD) responses recorded during fMRI (see Materials and methods). These analyses showed that the BOLD signals in the vmPFC tracked the value estimates drawn from the 2-step planning model across both conditions (P_FDR<0.05, k>50; Figure 5a, Supplementary file 1e), and there was no significant difference between the two conditions (P_FDR<0.05). In contrast, BOLD responses in the vmPFC did not track the trial-by-trial value estimates from the 0-step model, even at a more liberal threshold (p<0.005 uncorrected, k>50; Figure 5b, Supplementary file 1f). We also conducted model comparison at the neural level using the MACS toolbox (see Figure 5—figure supplement 3 for details) and found that the vmPFC encoded TVs rather than only CV or FV.

Figure 5. — (a) The vmPFC parametrically tracked mentally simulated values of the chosen actions drawn from the 2-step forward thinking (FT) model in both conditions (P_FDR<0.05, k>50). (b) No activation was found in the brain including the vmPFC in relation with the value signals estimated from the 0-step model at a more liberal threshold (p<0.005, uncorrected, k>50). (c) The vmPFC ROI coefficients for the 2-step FT’s value estimates were significantly greater than 0 for both the Controllable and Uncontrollable conditions (Controllable: mean_C=0.29, t(47)=1.96, p<0.05 (one-tailed); Uncontrollable: mean_U=0.24, t(47)=2.14, p<0.05 (one-tailed)) whereas the coefficients from the same ROI for 0-step’s value estimates were not significant for either condition (Controllable: mean_C=0.09, t(46)=0.69, p=0.25 (one-tailed); Uncontrollable: mean_U=0.12, t(46)=1.17, p=0.12 (one-tailed)). The vmPFC coefficients were significantly higher under the 2-step model than the 0-step model for both the Controllable and Uncontrollable conditions (Controllable: t(46)=1.81, p<0.05 (one-tailed); Uncontrollable: t(46)=2.04, p<0.05 (one-tailed)). The coefficients were extracted from an 8-mm-radius sphere centered at [6, 52, −16] based on a meta-analysis study that assessed neural signatures in the ultimatum game (Feng et al., 2015). Error bars represent SEM; *p<0.05; n.s. indicates not significant. C, Controllable; ROI, region-of-interest; U, Uncontrollable.

Figure 5—figure supplement 1. — (a) The vmPFC parametrically tracked mentally simulated values of the chosen actions drawn from the 2-step forward thinking (FT) model in both conditions (P_FDR<0.05, k>50). (b) No activation was found in the brain including the vmPFC in relation with the value signals estimated from the 0-step model at a more liberal threshold (p<0.005, uncorrected, k>50). (c) The vmPFC ROI coefficients for the 2-step FT’s value estimates were significantly greater than 0 for both the Controllable and Uncontrollable conditions (Controllable: mean_C=0.29, t(47)=1.96, p<0.05 (one-tailed); Uncontrollable: mean_U=0.24, t(47)=2.14, p<0.05 (one-tailed)) whereas the coefficients from the same ROI for 0-step’s value estimates were not significant for either condition (Controllable: mean_C=0.09, t(46)=0.69, p=0.25 (one-tailed); Uncontrollable: mean_U=0.12, t(46)=1.17, p=0.12 (one-tailed)). The vmPFC coefficients were significantly higher under the 2-step model than the 0-step model for both the Controllable and Uncontrollable conditions (Controllable: t(46)=1.81, p<0.05 (one-tailed); Uncontrollable: t(46)=2.04, p<0.05 (one-tailed)). The coefficients were extracted from an 8-mm-radius sphere centered at [6, 52, −16] based on a meta-analysis study that assessed neural signatures in the ultimatum game (Feng et al., 2015). Error bars represent SEM; *p<0.05; n.s. indicates not significant. C, Controllable; ROI, region-of-interest; U, Uncontrollable.

These whole-brain analyses results were further corroborated by a set of independent region-of-interest (ROI) analyses. Specifically, we created a vmPFC ROI based on the peak coordinate from an independent meta-analysis on social decision making (an 8-mm-radius sphere centered at [6, 52, −16]) (Feng et al., 2015) and extracted parameter estimates from the mask. This analysis showed that the vmPFC ROI coefficients for the choice values were significantly greater for the 2-step model than for the 0-step model regardless of the condition (Controllable: t(46)=1.81, p<0.05 (one-tailed); Uncontrollable: t(46)=2.04, p<0.05 (one-tailed); Figure 5c). Indeed, the ROI coefficients based on the 2-step model were significantly larger than zero for each condition (Controllable: mean_C=0.29, t(47)=1.96, p<0.05 (one-tailed); Uncontrollable: mean_U=0.24, t(47)=2.14, p<0.05 (one-tailed); Figure 5c) whereas these coefficients for the choice values (CV only) based on the 0-step model were not significant for either condition (Controllable: mean_C=0.09, t(46)=0.69, p=0.25 (one-tailed); Uncontrollable: mean_U=0.12, t(46)=1.17, p=0.12 (one-tailed); Figure 5c). These findings suggest that individuals engaged the vmPFC to compute the projected total (current and future) values of their choices during FT. Furthermore, vmPFC signals were comparable between the two conditions both in the whole-brain analysis and the ROI analyses. Consistent with our behavioral modeling results, these neural results further support the notion that humans computed summed choice values regardless of the actual controllability of the social environment.

In addition, we examined whether norm prediction errors (nPEs) and norm estimates themselves from the 2-step FT model were tracked in the brain. We found that nPEs were encoded in the ventral striatum (VS; [4, 14, −14]) and the right anterior insula (rAI; [32, 16, −14]) for the Controllable condition (Figure 5—figure supplement 4a), while these signals were found in the anterior cingulate cortex (ACC; [2, 46, 16]) for the Uncontrollable condition (Figure 5—figure supplement 4b) at P_FWE <0.05, small volume corrected. All three regions have been suggested to encode prediction errors in other norm learning tasks (Xiang et al., 2013). We further contrasted the whole-brain map of the two conditions and found that the VS ([4, 14, −14]) and the rAI ([32, 16, −14]) had significantly greater BOLD responses for the Controllable than the Uncontrollable condition (P_FWE<0.05, small volume corrected; Figure 5—figure supplement 4c) whereas the ACC ([2, 46, 16]) response under the Uncontrollable condition was not significantly greater than the Controllable condition at the same threshold (Figure 5—figure supplement 4d). We also found that internal norm-related BOLD signals were tracked in the VS ([10, 16, −2]) for the Controllable condition (Figure 5—figure supplement 4a), and in the rAI ([28, 16, −6]) and the amygdala ([18, −6, −8]) for the Uncontrollable condition (Figure 5—figure supplement 5b) at P_FWE<0.05, small volume corrected. However, the difference between the conditions was not statistically significant in the whole-brain contrast (Figure 5—figure supplement 5c-d). Taken together, these results suggest that the controllability level of the social interaction modulates neural encoding of internal norm representation and adaptation, expanding our previous knowledge about the computational mechanisms of norm learning (Gu et al., 2015; Xiang et al., 2013).

Finally, in an exploratory analysis, we examined the behavioral relevance of these neural signals in the vmPFC beyond the tracking of trial-by-trial values. Recall that despite the significant activations of the vmPFC in both conditions, individuals still exhibited different levels of self-reported controllability and the expected influence. Furthermore, there was condition-dependent discrepancy between the self-reported controllability and the expected influence (Figure 4—figure supplement 4a-d). Thus, we examined whether neural encoding of value in the vmPFC might relate to this discrepancy depending on the controllability of the environment. To do this, we assessed the correlation between extracted parameter estimates from the vmPFC and the disconnection between the belief and the expected influence (i.e., the ‘biased belief’ computed by subtracting the normalized expected influence from the normalized self-reported controllability). We found that the correlation between vmPFC-encoded value signals and the belief-behavior disconnection was indeed dependent on the condition (difference in slope: Z=2.40, p<0.05). Specifically, vmPFC signals were positively correlated with the disconnection between self-reported controllability and expected influence in the uncontrollable environment (r=0.35, p<0.05; Figure 5—figure supplement 1a), but not in the controllable environment (r=–0.14, p=0.38; Figure 5—figure supplement 1b). These results suggest that the meaning of vmPFC encoding of value signals could be context-dependent—and that heightened vmPFC signaling in uncontrollable situations is related to overly optimistic beliefs about controllability.

Discussion

For social animals like humans, it is crucial to be able to exploit the controllability of our social interactions and to consider the long-term social effects of our current choices. This study provides a mechanistic account for how humans identify and use social controllability. In two independent samples of human participants, we demonstrate that (1) humans are capable of exploiting the controllability of their social interactions and exert influence on social others when given the opportunity, and that (2) they do so by engaging a mental model of FT and calculating the downstream effects of their current social choices. By using model-based fMRI analytic approach, we demonstrate that the vmPFC represents combined signals of CV and FV during forward social planning; and that this neural value representation was positively associated with belief-behavior disconnect in the Uncontrollable condition. These findings demonstrate that people use vmPFC-dependent FT to guide social choices, expanding the role of this neurocomputational mechanism beyond subjective valuation.

FT is an important high-level cognitive process that is frequently associated with abstract reasoning (Hegarty, 2004), planning (Szpunar et al., 2014), and model-based control (Constantinescu et al., 2016; Daw et al., 2011; Gläscher et al., 2010; Schuck et al., 2016; Wang et al., 2018). Also known as prospection, FT has been suggested to involve four intertwined modes: mental simulation, prediction, intention, and planning (Szpunar et al., 2014). All four modes are likely to have taken place in our study, as our FT model implies that a social decision-maker mentally simulates social value functions into the future, predicts how her action would affect the following offers from partners, sets a goal of increasing future offers, and plans steps ahead to achieve the goal. Future studies will be needed to disentangle the neurocomputational mechanisms underlying each of these modes.

Critically relevant to the current study, previous research suggests that humans can learn and strategically exploit controllability during various forms of exchanges with others (Bhatt et al., 2010; Camerer, 2011; Hampton et al., 2008; Hula et al., 2015). The current study is in line with this literature and expands beyond existing findings. Here, we show that humans can also exploit controllability and exert their influence even when interacting with a series of other players (as opposed to a single other player as tested in previous studies). Furthermore, our 2-step FT model captures the explicit magnitude of controllability in individuals’ mental models of an environment, which can be intuitively compared to subjective, psychological controllability. Finally, our 2-step FT model simultaneously incorporates aversion to norm violation and norm adaptation, two important parameters guiding social adaptation (Fehr, 2004; Gu et al., 2015; Spitzer et al., 2007; Zhang and Gläscher, 2020). These individual- and social-specific parameters will be crucial for examining social deficits in various clinical populations in future studies.

Our key parameter from the FT model is δ, the expected influence that individuals would be mentally simulating during decision processes. We found that individuals who showed higher δ performed better in terms of achieving higher offers from their partners under the Controllable condition, suggesting a direct association between FT and performance in strategic social interaction. Although δ was higher in the Controllable than the Uncontrollable condition, one surprising finding is that people’s behavior was better explained by the 2-step FT model than the 0-step no planning model even for the Uncontrollable condition. In addition, we did not find any significant differences in vmPFC encoding of controllability between the conditions. These results suggest that participants still expected some level of influence (controllability) over their partners’ offers even when environment was in fact uncontrollable. Furthermore, δ was positively correlated between the conditions, indicating the stability of the mentally simulated controllability across situations within an individual. We speculate that people still attempted to simulate future interactions in uncontrollable situations due to their preference and tendency to control (Leotti and Delgado, 2014; Shenhav et al., 2016).

Our modeling result was corroborated by neural findings of simulated total choice value encoding in the vmPFC regardless of the actual social controllability of conditions. There are currently at least two distinct views about the role of the vmPFC. The first view considers the vmPFC to encode a generic value signal (e.g., the common currency; Levy and Glimcher, 2012), including the value of social information (Behrens et al., 2008; Chung et al., 2015) and anticipatory utility (Iigaya et al., 2020). An alternative theory suggests that the vmPFC represents mental maps of state space (Schuck et al., 2016) and of conceptual knowledge (Constantinescu et al., 2016), in addition to other ‘map’-encoding brain structures such as the hippocampus (O’keefe and Nadel, 1978; Tavares et al., 2015) and entorhinal cortex (Stensola et al., 2012). Of course, these views of the vmPFC might not necessarily be in contradiction. Instead, the exact function of the vmPFC could depend on the specific setup of the task environment. In the particular case of our task, as explicit values are inherently embedded in each state (i.e., each interaction), the vmPFC computed a summed value of not only the current state, but also future states. That is, both types of computations could be required to calculate the total downstream values of current social choices in our experimental setup. We also found that vmPFC signal was amplified by illusory beliefs only when the social environment was uncontrollable (but not when environment was controllable), suggesting that the behavioral relevance of value-encoding in the vmPFC is context-dependent. Taken together, our neural results illustrate a role of the vmPFC in constructing the TVs (both CV and FV) of current actions as humans engaged in forward planning during social exchange; and that these vmPFC-encoded values signals can be counterproductive and relate to exaggerated illusory beliefs about controllability when environment does not allow control.

Given our results, it is compelling to design tasks that focus on the way that subjects learn the model of an environment (in our terms, acquiring a value for the parameter $δ$ ) in early trials or build complex models of their partners’ minds (as in a cognitive hierarchy; Camerer et al., 2004). Indeed, even though, in our task, the straightforward model based on norm-adjustment characterized participants’ behavior well, there are more sophisticated alternatives that are used to characterize interpersonal interactions, such as the framework of interactive partially-observable Markov decision processes (Gmytrasiewicz and Doshi, 2005; Gu et al., 2015; Xiang et al., 2012). These might provide additional insights into the sorts of probing that our subjects presumably attempted in early trials to gauge controllability (and the ways this differs in both the Controllable and the Uncontrollable conditions between subjects who do and do not suffer from substantial illusions of control). The framework would also allow us to examine whether our subjects thought that their partners built a model of them themselves (as in theory-of-mind or a cognitive hierarchy; Camerer et al., 2004), which would add extra richness to the interaction, and allow us to capture individual trajectories regarding social interactions in a finer detail—if, for instance, our subjects might have become irritated (Hula et al., 2015) at their partners’ unwillingness to respond to their social signaling under the Uncontrollable condition.

The current study has the following limitations. First, due to the nature of the study design (i.e., reduction in uncertainty within the sequence of offers might be an inherent feature to controllability), the distributions of overall offers were not completely matched between conditions and may affect individuals’ belief about their controllability. We did not find evidence that uncertainty or autocorrelation affected the expected influence or self-reported controllability and that reduction in uncertainty might be an inherent feature to controllability (Supplementary file 1g). Still, future experimental designs which dissociate change in uncertainty from change in controllability may better address potentially different effects of controllability and uncertainty on choice behavior and neural responses. Second, the lack of clear instruction in different controllability conditions in our study may have affected the extent to which individuals exploit controllability and develop illusion of control. Future studies implementing explicit instructions might be better suited to examine controllability-specific behaviors and neural substrates.

In summary, the current study provides a mechanistic account for how people exploit the controllability of their social environment. Our finding expands the roles of the vmPFC and model-based planning beyond spatial and cognitive processes. The implications of these findings could be far-reaching and multifaceted, as the proposed model not only showcases how FT can help optimize normative social behavior, as often required during strategic social interaction (e.g., negotiation, reputation building, and social networking), but may also help us understand how aberrant computation of social controllability may contribute to mental health symptoms and deterioration of group cooperation and trust in future studies.

Materials and methods

MRI participants

The study was approved by the Institutional Review Board of the University of Texas at Dallas and the University of the Texas Southwestern Medical Center (S.N., V.G.F, and X.G.’s previous institute where data were collected). The sample size was computed by G*Power 3.1.9.4. assuming a paired two-tailed t-test with the effect size of 0.5, alpha of 0.05, and the power of 0.95 was 54. 56 healthy adults (38 female, age=27.3±9.2 years, 3 left-handed) were recruited in the Dallas-Fort Worth metropolitan area. Participants provided written informed consent and completed this study. Five participants were excluded due to behavior data loss caused by computer collapse, one participant was excluded due to fMRI data loss, one participant was excluded due to excessive in-scanner head motion, and one participant was excluded due to poor quality of parameter recovery. The final sample had 48 healthy adults (33 female, age=27.6 ±9.1 years, 3 left-handed). Participants were paid a reward randomly drawn from the outcomes of this task, in addition to their baseline compensation calculated by time and travel distance.

Online participants

The study was approved by the Institutional Review Board at the Icahn School of Medicine at Mount Sinai. Participants were recruited from Prolific (http://prolific.co), an online survey platform. A total of 1499 adults (734 female, age=35.1±13.1 years) provided online consent and completed this study. The online participant data were part of a larger study examining social cognition. We excluded 14 participants because of duplication of their data files and 143 additional participants because they had flat responses (accepted all or rejected all offers) for all the rounds within at least one condition. The final sample had 1342 adults (649 female, age=34.5 ± 12.8 years; report of demographics excluded another 21 participants who typed in an incorrect ID for the demographics survey and whose task data were intact but could not be linked to demographic data). Participants were paid 10% of the reward drawn from a random trial of this task, in addition to $7.25 of the baseline compensation and the bonuses from the tasks other than the current social exchange game, which were not part of this study.

Experimental paradigm: laboratory version

We designed an economic exchange task to probe social controllability based on an ultimatum game. This task consisted of two blocks, each representing an experimental condition (‘Controllable’ vs. ‘Uncontrollable’). In both conditions, participants were offered a split of $20 by a partner and decided whether to accept or reject the proposed offer from the partner. If a participant accepted the proposal, the participant and the partner split the money as proposed. If a participant rejected the proposal, both the participant and the partner received nothing. At the beginning of each block, participants were instructed that they would play the games with members of Team A or Team B. This instruction allows participants to perceive players in each block as a group with a coherent norm, rather than random individuals. However, participants were not told how the players in each team would behave so that participants would need to learn the action-offer contingency. There were 40 trials in each block (for fMRI participants). In 60% of the trials, participants were also asked to rate their feelings after they made a choice.

In the Uncontrollable condition, participants played a typical ultimatum game: the offers were randomly drawn from a truncated Gaussian distribution (μ=$5, σ=$1.2, rounded to the nearest integer, max=$8, min=$2) on the fly using the MATLAB function ‘normrnd’ and ‘round.’ Thus, participants’ behaviors had no influence on the future offers. Importantly, in the Controllable condition, participants could increase the next offer from their partner by rejecting the current offer, or decrease the next offer by accepting the present offer in a probabilistic fashion (⅓ chance of ±$2, ⅓ chance of ±$1, ⅓ chance of no change; the range of the offers for Controllable was between $1 and $9 [inclusive]—the range was not matched for the two conditions by mistake) (Figure 1b). We designed this manipulation based on the finding that reputation plays a crucial role in social exchanges (Fehr, 2004; King-Casas et al., 2005; Knoch et al., 2009); thus, in a typical ultimatum game, accepting any offers (although considered perfectly rational by classic economic theories; Becker, 2013) will develop a reputation of being ‘cheap’ and eventually lead to reduced offers, while the rejection response can serve as negotiation power and will force the partner to increase offers. At the end of each condition, participants also rated how much control they perceived using a sliding bar (from 0% to 100%).

Experimental paradigm: online version

For the online study, we revised certain perceptual features of the task in order to better maintain participants’ attention and ensure data quality, while maintaining the main structure of the task (Appendix 1). First, we reduced the number of trials to 30 from 40, considering both the minimal need for modeling purpose as well as the initial finding that behaviors typically stabilize after only 5 trials or so. We also introduced avatars in addition to partners’ names to make the online interactions more engaging, and made minor revisions to the instructions to further emphasize that participants might or might not influence their partners’ offers (but still without telling them how they might influence the offers or which team might be influenced). Finally, to remove unintended inter-individuals variability in offers for the Uncontrollable condition, we pre-determined the offer amounts under Uncontrollable (offers=[$1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9], mean=$5.0, std=$2.3, min=$1, and max=$9) and randomized the order of them.

Computational modeling

We hypothesized that people would estimate their social controllability by using the consequential future outcomes to compute action values. To test this hypothesis, we constructed a FT value function with different horizons: zero to four steps of forward planning whereby zero-step represents the no FT model.

First, we assumed that participants correctly understood the immediate rules of the task as follows:

a_{i} \in {0, 1}

(1)

r_{i} = {\begin{matrix} 0 i f a_{i} = 0 \\ s_{i} i f a_{i} = 1 \end{matrix}}

(2)

$a_{i}$ represents the action that a participant takes at the $i$ th trial where 0 representing rejection and one representing acceptance. $r_{i}$ is the reward a participant receives at the $i$ th trial depending on $a_{i}$ . Participants receive nothing if they reject whereas they receive the offered amount, $s_{i}$ , if they accept.

Similar to our previous work on norm adaptation (Gu et al., 2015), we assumed that people are averse to norm violations, defined as the difference between the actual offer received and one’s internal norm/expectation of the offers. Thus, the subjective utility of the expected immediate reward was constructed as follows.

Here, $U$ , the utility, is a function of the reward and $f$ (internal norm) at the $i t h$ trial. The internal norm, which will be discussed in detail in the next paragraph, is an evolving reference value that determines the magnitude of subjective inequality. $α$ (‘sensitivity to norm violation,’ $0 \leq α \leq 1$ ) represents the degree to which an individual is averse to norm violation. We assumed that if one rejected the offer and received nothing, aversion would not be involved as the individual already understood the task rule that rejection would lead to a zero outcome. Given that, if there is only one isolated trial, participants will choose to accept or reject by comparing $U (s_{i}, f_{i})$ (because $r_{i} = s_{i}$ if one accepts an offer) and $U (0, f_{i}) = 0$ (because $r_{i} = 0$ if one rejects).

For the internal norm updating, as our previous study (Gu et al., 2015) showed that Rescorla-Wagner (RW) (Sutton and Barto, 2018) models fit better than Bayesian update models, we used RW norm updates to capture how people learn the group norm throughout the trials as follows.

f_{i} = f_{i - 1} + ε (s_{i} - f_{i - 1})

(3)

Here, $ε$ is the norm adaptation rate ( $0 \leq ε \leq 1$ ), the individual learning parameter that determines the extent to which the norm prediction error ( $s_{i} - f_{i - 1}$ ) is reflected to the posterior norm. The initial norm was set as a free parameter ( $$ 0 \leq f_{0} \leq $ 20$ ).

Next, we formulated internal valuation as follows.

Δ Q_{i} = v |_{a_{i} = 1} - v |_{a_{i} = 0}

(4)

$n$ , the difference between the value of accepting $n$ and the value of rejecting $n$ , determines the probability of taking either action at the th trial. Importantly, we incorporated forward thinking procedure in calculation of. For an n-step forward thinking model, was calculated as follows.

v |_{a_{i}} = U (r_{i}, f_{i}) + \sum_{j = 1}^{n} γ^{j} \times U (\hat{E} (r_{i + j} | a_{i}, {\underline{a}}_{i + 1}, . . . {\underline{a}}_{i + j}), f_{i})

(5)

\hat{E} (s_{k + 1}) = {\begin{cases} s_{k} + δ & i f a_{k} o r {\underline{a}}_{k} = 0 \\ m a x (s_{k} - δ, 1) & i f a_{k} o r {\underline{a}}_{k} = 1 \end{cases}}

(6)

{\underline{a}}_{k} = {\begin{array}{lr} 1 & i f U (\hat{E} (s_{k}), f_{k}) > 0) \\ 0 & o t h e r w i s e \end{array}}

(7)

Given a hypothetical action $a_{i}$ in the current ( $i^{t h}$ ) trial, $v$ is the sum of the expected future reward utility assuming simulated future actions, $\underline{a}$ . We used the term $\hat{E}$ to represent an expected value in individuals’ perception and estimation. We assumed that in individual’s FT, her hypothetical action at the future trial ( ${\underline{a}}_{k}$ ) increases or decreases the hypothetical next offer ( $\hat{E} (s_{k + 1})$ ) by $δ$ (‘expected influence,’ $- $ 2 \leq δ \leq $ 2$ ). Here, we assumed symmetric change ( $δ$ ) for either action so the change applies to both rejection and acceptance with the same magnitude but in the opposite direction. Given the structure of the task, we restricted $| δ | \leq $ 2$ in inference. Note that the main behavioral results (statistical testing results in Figures 2—4) remain true even if we excluded the subjects who showed negative deltas (Figure 3—figure supplement 4, Figure 4—figure supplement 3). We assumed that individuals knew that offers would not go below $1 because an offer of $0 would make their choice (accept or reject) undifferentiable. Although actual offers had an upper limit ($9), we did not set any upper limit for individuals’ hypothetical offers because there is no evidence for individuals to reason so especially until they repeatedly encounter offers of $9, even in which case individuals might or might not rule out the possibility of getting offered above $9. We assumed that simulated future actions ( ${\underline{a}}_{k}$ ) are deterministic, contingent on the subjective utility of the immediately following rewards ( $U (\hat{E} (r_{k}), f_{k})$ ); this is a form of 1-level reasoning in a cognitive hierarchy (Camerer et al., 2004). The FVs computed through expected influence were discounted by $γ$ , the temporal discounting factor. We fixed $γ$ at 0.8, the empirical mean across the participants from one initial round of estimation, in order to avoid collinearity with the parameter of our interest, $δ$ .

We modeled the probability of accepting the offer using the softmax function as follows:

P_{i} (a_{i} = 1) = \frac{1}{1 + e^{- β Q_{i}}}

(8)

Here, $β$ (‘inverse temperature,’ $0 \leq β \leq 20$ ) indicates how strictly people base their choices on the estimated value difference between accepting and rejecting. The lower the inverse temperature is, the more exploratory the choices are.

We fit the model to individual choice data for the middle trials (30 trials for the fMRI sample and 20 trials for the online sample), excluding the first and the last five trials. The first five trials were excluded because one might be still learning the contingency between their action and the outcomes. The last five trials were also excluded because during those trials, the room to increase the offers becomes smaller and thus, participants had less incentive to reject offers as the interactions were close to the end (Gneezy et al., 2003).

fMRI data acquisition and pre-processing

Anatomical and functional images were collected on a Philips 3T MRI scanner. High-resolution structural images were acquired using the MP-RAGE sequence (voxel size=1 mm×1 mm×1 mm). Functional scans were acquired during the participants completed the task in the scanner. The detailed settings were as follows: repetition time (TR)=2000 ms; echo time (TE)=25 ms; flip angle=90°; 38 slices; voxel size: 3.4 mm×3.4 mm×4.0 mm. The functional scans were preprocessed using standard statistical parametric mapping (SPM12, Wellcome Department of Imaging Neuroscience; https://www.fil.ion.ucl.ac.uk/spm/) algorithms, including slice timing correction, co-registration, normalization with resampled voxel size of 2 mm×2 mm×2 mm, and smoothing with an 8 mm Gaussian kernel. A temporal high-pass filter of 128 Hz was applied to the fMRI data and temporal autocorrelation was modeled using a first-order autoregressive function.

fMRI general linear modeling

To find the BOLD responses that are correlated with the value estimates from the 2-step FT model and the 0-step model, we conducted two separate GLMs for each model. We specified each GLM with a parametric modulator of the chosen actions’ values estimated from the corresponding model, normalized within a subject, at the individual level using SPM12. The event regressors were (1) offer onset, (2) choice submission, (3) outcome onset, and (4) emotion rating submission of the Controllable and Uncontrollable conditions. The parametric modulator was entered at the event of choice submission. In addition, six motion parameters of each condition were included as covariates. After individual model estimation, we generated the contrast images of whole-brain coefficient estimates with the contrast weight of 1 to value estimates of both the Controllable and Uncontrollable conditions. At the group-level, we conducted a one-sample t-test of the aforementioned individual whole-brain contrast images at P_FDR<0.05 and k>50. We also conducted cross-validate Bayesian model selection (cvBMS) at the neural level using the MACS toolbox in SPM (Soch and Allefeld, 2018) in order to confirm that the vmPFC encoded TVs rather than only CVs or FVs. We considered four different GLMs: (i) the GLM with TV (our original GLM), (ii) the GLM with both CV and FV without orthogonalization (CV & FV), (iii) the GLM with only CV, and (iv) the GLM with only FV. All value estimates were extracted from the 2-step FT model. We computed the cross-validated log model evidence (cvLME) for each model at the individual level and computed exceedance probability (EP) of each model at the group level. For the ROI analysis, the vmPFC ROI (a 8-mm-radius sphere centered at [6, 52, −16]) was chosen from an independent meta-analysis study (Feng et al., 2015) in which the coordinate was presented as showing greater activation for fair offers than unfair offers in the ultimatum game context. ROIs were extracted using the MarsBaR toolbox (Brett et al., 2002).

Acknowledgements

XG is supported by the National Institute on Drug Abuse (Grant numbers: R01DA043695 and R21DA049243) and the National Institute of Mental Health (Grant numbers: R21MH120789, R01MH124115, and R01MH122611). This study was supported by a faculty startup grant to XG from the University of Texas, Dallas (where XG previously worked). DC is supported by UNIST internal funding (1.180073.01) and the National Research Foundation of Korea [Grant number: NRF-2018R1D1A1B07043582]. VGF is funded by the Mental Illness Research, Education, and Clinical Center (MIRECC VISN 2) at the James J. Peter Veterans Affairs Medical Center, Bronx, NY. PD is supported by the Max Planck Society and the Alexander von Humboldt Foundation. The authors thank Jae Shin for building the task website. The data in this study were used in a dissertation as partial fulfillment of the requirements for a PhD degree at the Graduate School of Biomedical Sciences at Mount Sinai.

Appendix 1

Task design for online study

The task was proceeded as shown in Appendix 1—figure 1.

Screen #6–11: Practice rounds.
Screen #14–15: Team assignment. Displayed at the beginning of each condition.
Screen #16–20: One round of the actual task; repeated 30 times for each team (condition).
- The order of partners (avatars and names) were randomized.
- Duration:
  - Screen #16 (avatar): 1.5–2.5 s; jittered
  - Screen #17 (choice): self-paced
  - Screen #18 (post-choice), #19 (outcome), #20 (fixation): 1 s

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Xiaosi Gu, Email: xiaosi.gu@mssm.edu.

Catherine Hartley, New York University, United States.

Christian Büchel, University Medical Center Hamburg-Eppendorf, Germany.

Funding Information

This paper was supported by the following grants:

National Institute on Drug Abuse R01DA043695 to Xiaosi Gu.
National Institute on Drug Abuse R21DA049243 to Xiaosi Gu.
National Institute of Mental Health R01MH124115 to Xiaosi Gu.
National Institute of Mental Health R01MH123069 to Xiaosi Gu.
Max Planck Society to Peter Dayan.
Alexander von Humboldt Foundation to Peter Dayan.
National Institute of Mental Health R21MH120789 to Xiaosi Gu.
National Institute of Mental Health R01MH122611 to Xiaosi Gu.
Ulsan National Institute of Science and Technology 1.180073.01 to Dongil Chung.
National Research Foundation of Korea NRF-2018R1D1A1B07043582 to Dongil Chung.
Mental Illness Research, Education, and Clinical Center (MIRECC VISN 2), James J. Peter Veterans Affairs Medical Center to Vincenzo G Fiore.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Investigation, Funding acquisition, Investigation, Methodology.

Conceptualization, Funding acquisition, Writing – review and editing.

Funding acquisition, Writing – review and editing.

Conceptualization.

Investigation.

Writing – review and editing.

Software, Funding acquisition, Writing – review and editing.

Conceptualization, Funding acquisition, Methodology, Writing – review and editing, Supervision, Writing - original draft, Writing - review and editing.

Ethics

All fMRI participants provided written informed consent and all online participants provided online consent. The fMRI study was approved by the Institutional Review Board of the University of Texas at Dallas (IRB 15-77) and the University of the Texas Southwestern Medical Center (STU 072015-031) (S.N., V.G.F, and X.G.'s previous institute where data were collected). Analyses of the original fMRI data collected at UT Dallas were covered by a Data Use Agreement between UT Dallas and the Icahn School of Medicine at Mount Sinai (ISMMS) (#19C7073) and IRB protocol approved by the ISMMS (HS#: 18-00728). The online study was approved by the Institutional Review Board at the Icahn School of Medicine at Mount Sinai (determined exempt; IRB-18-01301).

Additional files

Supplementary file 1. Supplementary tables.

elife-64983-supp1.docx^{(38.7KB, docx)}

Supplementary file 2. Task instructions.

elife-64983-supp2.docx^{(36.9KB, docx)}

Transparent reporting form

elife-64983-transrepform1.pdf^{(333.6KB, pdf)}

Data availability

The fMRI and behavioral data and analysis scripts are accessible at https://github.com/SoojungNa/social_controllability_fMRI, (copy archived at swh:1:rev:8ea1fb4fe6cbd625f9a25fe292f82fc953f8c713).

References

Atzil S, Gao W, Fradkin I, Barrett LF. Growing a social brain. Nature Human Behaviour. 2018;2:624–636. doi: 10.1038/s41562-018-0384-6. [DOI] [PubMed] [Google Scholar]
Bartra O, McGuire JT, Kable JW. The valuation system: a coordinate-based meta-analysis of BOLD fMRI experiments examining neural correlates of subjective value. NeuroImage. 2013;76:412–427. doi: 10.1016/j.neuroimage.2013.02.063. [DOI] [PMC free article] [PubMed] [Google Scholar]
Becker GS. The Economic Approach to Human Behavior. University of Chicago press; 2013. [Google Scholar]
Behrens TEJ, Hunt LT, Woolrich MW, Rushworth MFS. Associative learning of social value. Nature. 2008;456:245–249. doi: 10.1038/nature07538. [DOI] [PMC free article] [PubMed] [Google Scholar]
Behrens TEJ, Muller TH, Whittington JCR, Mark S, Baram AB, Stachenfeld KL, Kurth-Nelson Z. What is a cognitive map? Organizing knowledge for flexible behavior. Neuron. 2018;100:490–509. doi: 10.1016/j.neuron.2018.10.002. [DOI] [PubMed] [Google Scholar]
Bhatt MA, Lohrenz T, Camerer CF, Montague PR. Neural signatures of strategic types in a two-person bargaining game. PNAS. 2010;107:19720–19725. doi: 10.1073/pnas.1009625107. [DOI] [PMC free article] [PubMed] [Google Scholar]
Boorman ED, Behrens TEJ, Woolrich MW, Rushworth MFS. How green is the grass on the other side? Frontopolar cortex and the evidence in favor of alternative courses of action. Neuron. 2009;62:733–743. doi: 10.1016/j.neuron.2009.05.014. [DOI] [PubMed] [Google Scholar]
Brett M, Anton JL, Valabregue R, Poline JB. 8th International Conference on Functional Mapping of the Human Brain. Region of interest analysis using an SPM toolbox; Sendai, Japan. 2002. [Google Scholar]
Camerer CF, Ho TH, Chong JK. A cognitive hierarchy model of games. The Quarterly Journal of Economics. 2004;119:861–898. doi: 10.1162/0033553041502225. [DOI] [Google Scholar]
Camerer CF. Behavioral Game Theory: Experiments in Strategic Interaction. Princeton University Press; 2011. [Google Scholar]
Chung D, Christopoulos GI, King-Casas B, Ball SB, Chiu PH. Social signals of safety and risk confer utility and have asymmetric effects on observers’ choices. Nature Neuroscience. 2015;18:912–916. doi: 10.1038/nn.4022. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cohen JD, Daw N, Engelhardt B, Hasson U, Li K, Niv Y, Norman KA, Pillow J, Ramadge PJ, Turk-Browne NB, Willke TL. Computational approaches to fMRI analysis. Nature Neuroscience. 2017;20:304–313. doi: 10.1038/nn.4499. [DOI] [PMC free article] [PubMed] [Google Scholar]
Constantinescu AO, O’Reilly JX, Behrens TEJ. Organizing conceptual knowledge in humans with a gridlike code. Science. 2016;352:1464–1468. doi: 10.1126/science.aaf0941. [DOI] [PMC free article] [PubMed] [Google Scholar]
Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ. Model-based influences on humans’ choices and striatal prediction errors. Neuron. 2011;69:1204–1215. doi: 10.1016/j.neuron.2011.02.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dolan RJ, Dayan P. Goals and habits in the brain. Neuron. 2013;80:312–325. doi: 10.1016/j.neuron.2013.09.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
Doll BB, Duncan KD, Simon DA, Shohamy D, Daw ND. Model-based choices involve prospective neural activity. Nature Neuroscience. 2015;18:767–772. doi: 10.1038/nn.3981. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dorfman HM, Gershman SJ. Controllability governs the balance between Pavlovian and instrumental action selection. Nature Communications. 2019;10:1–8. doi: 10.1038/s41467-019-13737-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Draper D. Assessment and propagation of model uncertainty. Journal of the Royal Statistical Society. 1995;57:45–70. doi: 10.1111/j.2517-6161.1995.tb02015.x. [DOI] [Google Scholar]
Dunbar RIM, Shultz S. Evolution in the social brain. Science. 2007;317:1344–1347. doi: 10.1126/science.1145463. [DOI] [PubMed] [Google Scholar]
Fehr E, Schmidt KM. A theory of fairness, competition, and cooperation. The Quarterly Journal of Economics. 1999;114:817–868. doi: 10.1162/003355399556151. [DOI] [Google Scholar]
Fehr E. Human behaviour: don’t lose your reputation. Nature. 2004;432:449–450. doi: 10.1038/432449a. [DOI] [PubMed] [Google Scholar]
Feng C, Luo YJ, Krueger F. Neural signatures of fairness-related normative decision making in the ultimatum game: a coordinate-based meta-analysis. Human Brain Mapping. 2015;36:591–602. doi: 10.1002/hbm.22649. [DOI] [PMC free article] [PubMed] [Google Scholar]
FitzGerald THB, Seymour B, Dolan RJ. The role of human orbitofrontal cortex in value comparison for incommensurable objects. The Journal of Neuroscience. 2009;29:8388–8395. doi: 10.1523/JNEUROSCI.0717-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gläscher J, Daw N, Dayan P, O’Doherty JP. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron. 2010;66:585–595. doi: 10.1016/j.neuron.2010.04.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gmytrasiewicz PJ, Doshi P. A framework for sequential planning in multi-agent settings. Journal of Artificial Intelligence Research. 2005;24:49–79. doi: 10.1613/jair.1579. [DOI] [Google Scholar]
Gneezy U, Haruvy E, Roth AE. Bargaining under a deadline: Evidence from the reverse ultimatum game. Games and Economic Behavior. 2003;45:347–368. doi: 10.1016/S0899-8256(03)00151-9. [DOI] [Google Scholar]
Gu X, Wang X, Hula A, Wang S, Xu S, Lohrenz TM, Knight RT, Gao Z, Dayan P, Montague PR. Necessary, yet dissociable contributions of the insular and ventromedial prefrontal cortices to norm adaptation: computational and lesion evidence in humans. The Journal of Neuroscience. 2015;35:467–473. doi: 10.1523/JNEUROSCI.2906-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
Guinote A. How Power Affects People: Activating, Wanting, and Goal Seeking. Annual Review of Psychology. 2017;68:353–381. doi: 10.1146/annurev-psych-010416-044153. [DOI] [PubMed] [Google Scholar]
Hampton AN, Bossaerts P, O’Doherty JP. Neural correlates of mentalizing-related computations during strategic interactions in humans. PNAS. 2008;105:6741–6746. doi: 10.1073/pnas.0711099105. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hegarty M. Mechanical reasoning by mental simulation. Trends in Cognitive Sciences. 2004;8:280–285. doi: 10.1016/j.tics.2004.04.001. [DOI] [PubMed] [Google Scholar]
Hiser J, Koenigs M. The multifaceted role of the ventromedial prefrontal cortex in emotion, decision making, social cognition, and psychopathology. Biological Psychiatry. 2018;83:638–647. doi: 10.1016/j.biopsych.2017.10.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hula A, Montague PR, Dayan P, Gershman S. Monte carlo planning method estimates planning horizons during interactive social exchange. PLOS Computational Biology. 2015;11:e1004254. doi: 10.1371/journal.pcbi.1004254. [DOI] [PMC free article] [PubMed] [Google Scholar]
Huys QJM, Dayan P. A Bayesian formulation of behavioral control. Cognition. 2009;113:314–328. doi: 10.1016/j.cognition.2009.01.008. [DOI] [PubMed] [Google Scholar]
Iigaya K, Hauser TU, Kurth-Nelson Z, O’Doherty JP, Dayan P, Dolan RJ. The value of what’s to come: Neural mechanisms coupling prediction error and the utility of anticipation. Science Advances. 2020;6:eaba3828. doi: 10.1126/sciadv.aba3828. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kable JW, Glimcher PW. The neural correlates of subjective value during intertemporal choice. Nature Neuroscience. 2007;10:1625–1633. doi: 10.1038/nn2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
King-Casas B, Tomlin D, Anen C, Camerer CF, Quartz SR, Montague PR. Getting to know you: reputation and trust in a two-person economic exchange. Science. 2005;308:78–83. doi: 10.1126/science.1108062. [DOI] [PubMed] [Google Scholar]
Knoch D, Schneider F, Schunk D, Hohmann M, Fehr E. Disrupting the prefrontal cortex diminishes the human ability to build a good reputation. PNAS. 2009;106:20895–20899. doi: 10.1073/pnas.0911619106. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lachman ME, Weaver SL. The sense of control as a moderator of social class differences in health and well-being. Journal of Personality and Social Psychology. 1998;74:763–773. doi: 10.1037//0022-3514.74.3.763. [DOI] [PubMed] [Google Scholar]
Lee SW, Shimojo S, O’Doherty JP. Neural computations underlying arbitration between model-based and model-free learning. Neuron. 2014;81:687–699. doi: 10.1016/j.neuron.2013.11.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
Leotti LA, Delgado MR. The value of exercising control over monetary gains and losses. Psychological Science. 2014;25:596–604. doi: 10.1177/0956797613514589. [DOI] [PMC free article] [PubMed] [Google Scholar]
Levy DJ, Glimcher PW. The root of all value: a neural common currency for choice. Current Opinion in Neurobiology. 2012;22:1027–1038. doi: 10.1016/j.conb.2012.06.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ligneul R. Prediction or Causation? Towards a Redefinition of Task Controllability. Trends in Cognitive Sciences. 2021;25:431–433. doi: 10.1016/j.tics.2021.02.009. [DOI] [PubMed] [Google Scholar]
Maier SF, Seligman ME. Learned helplessness: theory and evidence. Journal of Experimental Psychology. 1976;105:3–46. doi: 10.1037/0096-3445.105.1.3. [DOI] [Google Scholar]
Maier SF, Watkins LR. Stressor controllability and learned helplessness: the roles of the dorsal raphe nucleus, serotonin, and corticotropin-releasing factor. Neuroscience & Biobehavioral Reviews. 2005;29:829–841. doi: 10.1016/j.neubiorev.2005.03.021. [DOI] [PubMed] [Google Scholar]
Maier SF, Seligman MEP. Learned Helplessness at Fifty: Insights from Neuroscience. Psychological Review. 2016;123:349–367. doi: 10.1037/rev0000033. [DOI] [PMC free article] [PubMed] [Google Scholar]
Moran R, Keramati M, Dayan P, Dolan RJ. Retrospective model-based inference guides model-free credit assignment. Nature Communications. 2019;10:750. doi: 10.1038/s41467-019-08662-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Niv Y. Learning task-state representations. Nature Neuroscience. 2019;22:1544–1553. doi: 10.1038/s41593-019-0470-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Overmier JB. terference with avoidance behavior: failure to avoid traumatic shock. Journal of Experimental Psychology. 1968;78:340–343. doi: 10.1037/h0026365. [DOI] [PubMed] [Google Scholar]
O’Doherty JP, Hampton A, Kim H. Model-based fMRI and its application to reward learning and decision making. Annals of the New York Academy of Sciences. 2007;1104:35–53. doi: 10.1196/annals.1390.022. [DOI] [PubMed] [Google Scholar]
O’keefe J, Nadel L. The Hippocampus as a Cognitive Map. Oxford: Clarendon Press; 1978. [Google Scholar]
Pezzulo G, Rigoli F, Chersi F. The mixed instrumental controller: using value of information to combine habitual choice and mental simulation. Frontiers in Psychology. 2013;4:92. doi: 10.3389/fpsyg.2013.00092. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rutledge RB, Skandali N, Dayan P, Dolan RJ. A computational and neural model of momentary subjective well-being. PNAS. 2014;111:12252–12257. doi: 10.1073/pnas.1407535111. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schuck NW, Cai MB, Wilson RC, Niv Y. Human orbitofrontal cortex represents a cognitive map of state space. Neuron. 2016;91:1402–1412. doi: 10.1016/j.neuron.2016.08.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shenhav A, Cohen JD, Botvinick MM. Dorsal anterior cingulate cortex and the value of control. Nature Neuroscience. 2016;19:1286–1291. doi: 10.1038/nn.4384. [DOI] [PubMed] [Google Scholar]
Soch J, Allefeld C. MACS - a new SPM toolbox for model assessment, comparison and selection. Journal of Neuroscience Methods. 2018;306:19–31. doi: 10.1016/j.jneumeth.2018.05.017. [DOI] [PubMed] [Google Scholar]
Southwick FS, Southwick SM. The Loss of a Sense of Control as a Major Contributor to Physician Burnout: A Neuropsychiatric Pathway to Prevention and RecoveryLoss of Sense of Control as a Major Contributor to Physician BurnoutLoss of Sense of Control as a Major Contributor to Physician Burnout. JAMA Psychiatry. 2018;75:665–666. doi: 10.1001/jamapsychiatry.2018.0566. [DOI] [PubMed] [Google Scholar]
Spitzer M, Fischbacher U, Herrnberger B, Grön G, Fehr E. The neural signature of social norm compliance. Neuron. 2007;56:185–196. doi: 10.1016/j.neuron.2007.09.011. [DOI] [PubMed] [Google Scholar]
Stensola H, Stensola T, Solstad T, Frøland K, Moser MB, Moser EI. The entorhinal grid map is discretized. Nature. 2012;492:72–78. doi: 10.1038/nature11649. [DOI] [PubMed] [Google Scholar]
Sutton RS, Barto AG. Reinforcement Learning: An Introduction. MIT press; 2018. [Google Scholar]
Szpunar KK, Spreng RN, Schacter DL. A taxonomy of prospection: introducing an organizational framework for future-oriented cognition. PNAS. 2014;111:18414–18421. doi: 10.1073/pnas.1417144111. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tavares RM, Mendelsohn A, Grossman Y, Williams CH, Shapiro M, Trope Y, Schiller D. A map for social navigation in the human brain. Neuron. 2015;87:231–243. doi: 10.1016/j.neuron.2015.06.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
Venkatraman V, Payne JW, Bettman JR, Luce MF, Huettel SA. Separate neural mechanisms underlie choices and strategic preferences in risky decision making. Neuron. 2009;62:593–602. doi: 10.1016/j.neuron.2009.04.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang JX, Kurth-Nelson Z, Kumaran D, Tirumala D, Soyer H, Leibo JZ, Hassabis D, Botvinick M. Prefrontal cortex as a meta-reinforcement learning system. Nature Neuroscience. 2018;21:860–868. doi: 10.1038/s41593-018-0147-8. [DOI] [PubMed] [Google Scholar]
Weiss JM. Effects of coping responses on stress. Journal of Comparative and Physiological Psychology. 1968;65:251–260. doi: 10.1037/h0025562. [DOI] [PubMed] [Google Scholar]
Wilson RC, Collins AG. Ten simple rules for the computational modeling of behavioral data. eLife. 2019;8:e49547. doi: 10.7554/eLife.49547. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xiang T, Ray D, Lohrenz T, Dayan P, Montague PR, Sporns O. Computational phenotyping of two-person interactions reveals differential neural response to depth-of-thought. PLOS Computational Biology. 2012;8:e1002841. doi: 10.1371/journal.pcbi.1002841. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xiang T, Lohrenz T, Montague PR. Computational substrates of norms and their violations during social exchange. The Journal of Neuroscience. 2013;33:1099–108a. doi: 10.1523/JNEUROSCI.1642-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang L, Gläscher J. A brain network supporting social influences in human decision-making. Science Advances. 2020;6:eabb4159. doi: 10.1126/sciadv.abb4159. [DOI] [PMC free article] [PubMed] [Google Scholar]

eLife. doi: 10.7554/eLife.64983.sa1

Decision letter

Editor: Catherine Hartley¹

Reviewed by: Catherine Hartley², Romain Ligneul³

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

We believe that this examination of the neural and cognitive processes through which social controllability influences decision-making will be of interest to cognitive neuroscientists interested in the computational mechanisms involved in planning and social decision-making. The additional analyses and thorough revisions made to the manuscript have substantially strengthened the paper, and the conclusions and interpretations presented here are well supported by the data.

Decision letter after peer review:

Thank you for submitting your article "Humans Use Forward Thinking to Exploit Social Controllability" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, including Catherine Hartley as the Reviewing Editor and Reviewer #1, and Christian Büchel as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Romain Ligneul (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our policy on revisions we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, when editors judge that a submitted work as a whole belongs in eLife but that some conclusions require a modest amount of additional new data, as they do with your paper, we are asking that the manuscript be revised to either limit claims to those supported by data in hand, or to explicitly state that the relevant conclusions require additional supporting data.

Our expectation is that the authors will eventually carry out the additional experiments and report on how they affect the relevant conclusions either in a preprint on bioRxiv or medRxiv, or if appropriate, as a Research Advance in eLife, either of which would be linked to the original paper.

Summary:

In this manuscript, Na and colleagues the examine the influence of perceived controllability on choice patterns in an ultimatum game task. In a task variant introducing both controllable and uncontrollable conditions, they find that endowing participants with controllability increases rejection rates in a way which enables social transaction to converge towards fairer offers. In order to clarify the cognitive underpinnings of this finding, the authors fit computational models that varied the degree to which the anticipation of future offers – controllable or not – could influence the decisions relative to current offers. Under the best fitting model, social controllability (the influence of current decision on future offers) is used to adjust the expected value of their decision to accept or reject offers, suggesting that subjects used a controllability-dependent forward planning process. The fMRI results suggest that the cognitive operations underlying this forward planning process might depend in part on computations within the ventromedial prefrontal cortex, in which BOLD activation correlated with total (current and future) values computed under such a model.

In most studies using the UG game studies, researchers look at how accept/reject behavior changes conditioned on the offer size, and the UG games are typically one-shot. Na et al., extend this work by looking at how the behavior of receivers could, in turn, affect future offers by the proposer, in an interactive manner. A further strength is that the authors were able to replicate the behavioral and modeling findings in a separate large online sample, and all data and analyses are made available online so that others could make use of them. Overall, the analyses are carefully performed and largely in support of the key conclusions. However, the reviewers felt that some aspects of the analysis could be further developed, refined, or clarified.

Revisions:

1) The background and rationale for the current study could be laid out more clearly in the introduction. The authors should explain what controllability means and why it is important. The introduction would also benefit from inclusion of some important behavioral and neural findings in the literature regarding controllability in non-social contexts. The discussion of "model-based planning" may not be so relevant here. In the 2-step task (Daw et al., 2011), participants need to learn a task transition structure and use this learned knowledge to plan future actions. But in the current task, there is no such abstract structure to learn. A discussion of the role of simulating future events/outcomes (e.g., counterfactual simulation) may be more appropriate than a focus on model-based planning. The authors may also want to include key studies and findings on strategic decision-making and theory of mind. The neural hypotheses should also be introduced, or if the authors didn't have a priori hypotheses, it could be explicitly stated that it is an exploratory study if indeed the case. If the vmPFC is indeed the area of interest a priori, then the authors should provide justification for this hypothesis.

2) It would be helpful to clearly assess and discuss the commonalities and differences in results between the social and non-social versions of the task, and their implications for interpreting the findings. It would be beneficial to see the computational model comparison applied to the non-social control experiment, as well (. Critically, is the 2-step model still favored.

3) The analysis of overall rejection rates (Figure 2b1) is slightly puzzling with respect to the results reported Figure 2a1 and 2b2. Indeed, Figure 2a1 shows that participants encountered a much higher proportion of middle and high offers in the controllable condition (due to their control over offers) and Figure 2b2 shows a very significant increase in rejection rates for these two types of offers but only a modest decrease for low offers. In addition, the offers in the uncontrollable condition seem to vary in a systematic fashion across time and to be very rarely below 3$. In this context, I wonder how mean rejection rates can possibly be equal across controllability conditions. Still regarding rejection rates, it also seems that the uncontrollable condition was associated with a much greater inter-individual variability in rejection rates, hence suggesting that controllability reduced variability in the type of strategy used to solve the task. The authors should (i) clarify how offers of the uncontrollable conditions were generated, (ii) discuss and perhaps try to explain (and relate to other findings) the different inter-individual variability in rejection rates across conditions.

4) In the behavioral analyses, what is the rationale for grouping the offer sizes into three bins rather than using the exact levels of offer sizes? Do the key results hold if exact values are used?

5) It would be helpful to include an analysis of response times. Indeed, one would expect forward planning to be associated with lengthened decision times and correspondingly, for the δ parameter (or strategizing depth, or controllable condition) to be associated with longer decision times (e.g. Keramati et al., Plos Comp. Biol., 2011). Furthermore, it was recently shown that perceived task controllability increases decision times, even in the absence of forward value computations (Ligneul et al., Biorxiv). It is also good practice to include decision times as a control parametric regressor when analyzing brain activities related to a variable potentially correlated with them. Furthermore, one could expect longer reaction times for more conflicting decisions (i.e. closer valuations of reject/accept offers).

6) The authors refer to the δ parameter "modeled controllability", however the model doesn't provide any account of the process of estimating controllability from observed outcomes (see Gershman and Dorfman 2019, Nature Communications or Ligneul et al., 2020, Biorxiv for examples of such models), but only reflects the impact of controllability on value computations, or the monetary amount of "expected influence" in each condition. An augmented model might include a computation of controllability, with the δ parameter controlling the extent to which estimated controllability promotes forward planning. Even if the authors don't fit such a model, they should explicitly acknowledge that their algorithm does not implement any form of controllability estimation, and might consider calling δ a "forward planning parameter". In addition, it is unclear why the authors chose to constrain the δ parameter to fluctuate between -2 and 2$ (rather than between 0 and 2$, in line with their experimental design, or with even broader bounds) and what a negative δ would imply. Also, would it make sense to exclude participants with a negative δ in addition to those with a δ greater than 2? Do all results hold under these exclusions?

7) While the authors performed a parameter recovery analysis, they did not report cross-parameter correlations, which are important for interpreting the best-fitting parameters in each condition. Furthermore, it is good practice to perform model recovery analyses on top of parameter recovery analyses (Wilson and Collins, 2019, eLife; Palminteri et al., 2017, TiCS) in order to make sure that the task can actually distinguish the models included in the model comparison. As a result, the conclusions based on model comparison and parameters values (that is, a significant part of the empirical results) are uncertain. The cross-correlation between parameters and model recovery analysis should be reported as a confusion matrix.

8) The parameters of the adaptive social norm model exhibit fairly poor recoverability, particularly in the controllable condition. The motivation for using this model is that it provided the best fit to subjects data in a prior uncontrollable ultimatum game task, but perhaps such adaptive judgment is not capturing choice behavior well here. It would be helpful to see a comparison of this model with one that has a static parameter capturing each individual's subjective inequity norm.

9) The authors stated that future actions are deterministic (line 576) contingent on the utility following the immediate reward. If so, is Figure 3a still valid? If all future actions are deterministic, there should be only one path from the current to the future, rather than a tree-like trajectory.

10) The MF model, and the rationale for its inclusion in the set of models compared, needs to be explained more clearly. The MF model appears to include no intercept to define a base probability of accepting versus rejecting offers, which makes it hard to compare with the other models in which the initial norm parameter may mimic such an intercept.

11) The fact that the vmPFC encoded total future + current value (2-step) and not current value (0-step) suggests that it might be specifically involved in computing future values but the authors do not report directly the relationship between its activity and future values. How correlated are the values from the 0-step model and the 2-step model? And more importantly, if vmPFC is associated with TOTAL value but not the current value, should that mean the vmPFC is associated with the future value only? It might make more sense to decompose the current value and future value both from the winning 2-step model, and construct them into the same GLM without orthogonalization.

12) The vmPFC result contrast averages across the controllable and uncontrollable conditions (line 629). Why did the authors do so? Wouldn't it be better to see whether the "total value" is represented differently between the two conditions.

13) The analysis of the relation between the vmPFC β weights and the difference between self-reported controllability beliefs and model-derived controllability estimates (Figure 5 d and e) is not adequately previewed. The hypothesis for why vmPFC activity might track this metric is unclear. Moreover, the relation between the two in the uncontrollable condition is somewhat weak. The authors should report the relation between vmPFC β weights and each component of the difference score (modeled and self-report controllability), and clearly motivating their intuition for why vmPFC activation might be related to that metric. If the authors feel strongly that this analysis is important to include, it would be meaningful to see whether the brain data could help explain behavioral data. For example, a simple GLM could serve this purpose: mean_offer ~ β(vmPFC) + self-report_controllability + model_controllabilty. Note that the authors need to state the exploratory nature if they decide to run this type of analysis.

14) The authors might also report the neural correlates of the internal norm and the norm prediction error (line 544). If the participants indeed acquired the social controllability through learning, they might form different internal norms in the two conditions, hence the norm prediction error might also differ.

15) Specific aspects of the experimental design may have influenced the observed results in ways that were not controlled. For example, it is not only the magnitude and controllability of outcomes that differed between the controllable and uncontrollable conditions, but also the uncertainty. It is possible that the less variable offers encountered in the controllable condition may have driven some of the results. The authors should acknowledge that the possible role of autocorrelation and uncertainty on behavioral and modeling results.

16) Moreover, asking participants to repeatedly rate their perception of controllability almost certainly influenced and exacerbated the impact of this factor on choices. It would have been very useful to perform a complementary online study excluding these ratings to ensure that controllability-dependent effects are still evident in such a case.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Humans Use Forward Thinking to Exploit Social Controllability" for further consideration by eLife. Your revised article has been evaluated by Christian Büchel (Senior Editor), Catherine Hartley (Reviewing Editor), and the two original reviewers.

As you will read, the reviewers are in agreement that your manuscript has been substantially strengthened by these revisions, but there are some remaining issues that need to be addressed.

A primary concern is that the manuscript does not provide sufficiently strong support for the claim that the vmPFC supports forward planning, particularly in light of the new neuroimaging analyses performed as part of this revision. Reviewer 3 has a concrete suggestion for how this claim might be strengthened with a model comparison analysis. If further evidence for the claim is not found/provided, it should be tempered. Reviewer 2 also questions whether it is useful and sensible to retain the MF model in the set of compared models, and both reviewers note a few areas where clarification, greater methodological detail, or further interpretation are warranted.

Please carefully consider each of the reviewers suggestions as you revise your manuscript.

Reviewer #2:

The authors have revised their manuscript considerably and addressed a number of concerns raised in the initial review, with their additional analyses and detailed clarification. I particularly appreciate that the authors took the courage to dive into the direct comparison of findings between the social and non-social groups, which provided new insights. Furthermore, the revised Introduction is more thought-provoking with relevant literature included. Now the conclusions are better supported as it stands, and these findings are certainly going to be exciting additions to the literature of social decision neuroscience.

Here I have a few additional points, more for clarification.

(1) In response to comment #2, the authors might unpack the significant interaction result, to explicitly show "that the non-social context reduced the impact of nPE on emotional feelings." Also in the same LME model, I am curious about the significant "Controllable × social task (***)" interaction (β = -5.06). Does this mean, being in the Controllable + Social group, the emotion rating is lower? How would the authors interpret this finding?

(2) In response to comment #5 regarding response time with the additional LME analyses, I wonder which distribution function was used? We know that RT data is commonly positively skewed, so a log-normal or a shifted log-normal should be more accurate.

(3) I retain my initial comment regarding the inclusion of the MF model. The task is deterministic – participants get what appears if they accept and 0 if reject. In fact, the model is making a completely different prediction: according to the Q-value update, if the participant chose an "accept" and then indeed received a reward, then they should repeat "accept". But in the current task design, such a "positive feedback" would make the participants feel they are perhaps too easy to play with, and will be more likely to choose "reject" on the next trial. In essence, the MF model is not even capturing the behavioral pattern of the task, hence it does not seem to be a good baseline model. Rather, the 0-step model is okay enough to be the reference model.

Reviewer #3:

The authors have made very significant efforts to respond to a diversity of concerns and to amend their paper accordingly. The revised version is thus more complete and I believe that the main argument of the paper has been made stronger.

In many cases, the authors have appropriately adjusted their language in order to better align their conclusions with the data (e.g. renaming the δ parameter expected influence parameter) and I think that this paper can constitute an interesting addition to the field.

However, I am still slightly skeptical about the reach of neuroimaging results and I believe that some limitations of the paradigm may be more explicitly discussed.

A. Neuroimaging.

The authors have performed valuable additional analyses regarding the norm and norm prediction errors signals which can be of interest for the field. But I believe that our main concerns about vmPFC effects have not been fully addressed. Indeed, the authors still write that the vmPFC constructs "the total values (both current and future) of current actions as humans engaged in forward planning during social exchange". However, when splitting the analysis of current and future values, the encoding of future values was found in the insula whereas the vmPFC only encoded current values. The authors claim that the lack of encoding of total values derived from the 0-step FT model constitutes evidence in favor of forward planning, but it could be that this lack of evidence is driven by a poorer fit of current (rather than total) values by this simpler model. In order to better substantiate their claim about vmPFC's role, the authors may want to perform a model comparison at the neural level by comparing GLMs (using for example the MACS toolbox) including current value only, current value and future value, future value only or total value. Alternatively, they could analyze the first-level residuals produced by GLMs including alternatively current value, future value and total value (all based on FT-2). If their interpretation is correct, GLMs equipped with a parametric regressor for total value should be associated with smaller residuals in the vmPFC.

Regarding the behavior-belief disconnection analysis, I think that it would be more sensical to study the ratio rather than the difference between behavior and subjective reports, since these two measures are qualitatively different. Finally, it might be worth providing the reader with a brief discussion of the other neural substrates uncovered by the most recent analyses (dmPFC, insula, striatum, etc.).

B. Behavioral paradigm.

I believe that the authors should provide a few more details in the methods and acknowledge a few limitations in their discussion.

First, unless I am mistaking the method used to decide on block order (i.e. C or U first) was not reported. Was the "illusion of control" in the uncontrollable condition driven by the subset of participants who passed the controllable block first? If this is the case, then it might add some plausibility to the interpretation of subjective controllability ratings in the uncontrollable condition as an "illusion of control" (persistence of a control prior). In other words, I think that the authors should refrain from interpreting the raw value of these ratings as an illusion of control (perhaps not all participants understood the meaning of the rating, perhaps they were too lazy to move the cursor until 0, etc.).

While it does not necessarily implies an illusion of control, the fact that participants still relied on on forward planning in the uncontrollable condition (as indexed by the expected value parameter) is presumably what prevented authors to really isolate the neural substrates of strategic controllability-dependent forward planning, and it might thus be mentioned as a limitation of the paradigm.

I believe that it is also important to mention explicitly the fact that a third and a quarter of the data was excluded from the analyses of behavioral and fMRI data (i.e. first and last five trials of each block) respectively and the rationale for this exclusion may be discussed.

The authors wrote that "a task that carefully controls for uncertainty and autocorrelation confounds would help better understanding the accumulative effects of social controllability", which is a good start, but it would be in my opinion important to explicitly acknowledge that change in controllability were confounded with change in uncertainty about upcoming offers.

I would be curious to hear the authors' insight about why participants in the online study (and to some extent in the lab) accepted more often the low offers in the controllable condition. It seems somehow counterintuitive and could mean that participant behaved in a more "automatic" and perseverative way in the controllable condition.

Related to this last point, is it possible that the δ parameter (or expected influence) simply captures a perseverative tendency in rejection/acceptance of offers? This might explain the disconnection between behavior and belief, as well as the positive value of this parameter in the uncontrollable condition, correlated to that of the controllable one. That perseveration increases in the controllable condition would be logical (since that condition allows participants to reach their goal by doing so) and it would therefore still be of interest in the context of this social controllability study. Perhaps the authors could exclude this possibility by running adding a perseveration mechanism to their model, as it is often done in the RL literature?

eLife. 2021 Oct 29;10:e64983. doi: 10.7554/eLife.64983.sa2

Author response

Revisions:

1) The background and rationale for the current study could be laid out more clearly in the introduction. The authors should explain what controllability means and why it is important. The introduction would also benefit from inclusion of some important behavioral and neural findings in the literature regarding controllability in non-social contexts. The discussion of "model-based planning" may not be so relevant here. In the 2-step task (Daw et al., 2011), participants need to learn a task transition structure and use this learned knowledge to plan future actions. But in the current task, there is no such abstract structure to learn. A discussion of the role of simulating future events/outcomes (e.g., counterfactual simulation) may be more appropriate than a focus on model-based planning. The authors may also want to include key studies and findings on strategic decision-making and theory of mind. The neural hypotheses should also be introduced, or if the authors didn't have a priori hypotheses, it could be explicitly stated that it is an exploratory study if indeed the case. If the vmPFC is indeed the area of interest a priori, then the authors should provide justification for this hypothesis.

Thank you for this suggestion. We fully agree that the background and rationale for the study could be more clearly laid out. We have now re-written both Introduction and Discussion sections to include literature more relevant to (non-social) controllability (e.g. Huys and Dayan, 2009), as well as future simulation (e.g. Szpunar et al., 2014), strategic decision-making (e.g. Hampton et al., 2008, Bhatt et al., 2010), and theory of mind (e.g. Hula et al., 2015) instead of focusing exclusively on model-based planning (whilst noting that simulating potential future outcomes is a prominent method of model-based planning in both artificial and natural systems). Furthermore, we have re-constructed the paragraphs about our neural hypothesis and why the vmPFC was our region of a priori interest.

Line 16: “Based on previous work demonstrating the computational mechanisms of controllability in non-social environments, here we hypothesize that people use mental models to track and exploit social controllability, for instance via forward simulation. In non-social contexts, it has been proposed that controllability quantifies the extent to which the acquisition of outcomes, and particularly desired outcomes, can be influenced by the choice of actions (Huys and Dayan, 2009; Dorfman and Gershman, 2019; Ligneul, 2021). […] Lastly, we hypothesize that the choice values integrating the planned paths would be signaled in the vmPFC.”

Line 399: “Critically relevant to the current study, previous research suggests that humans can learn and strategically exploit controllability during various forms of exchanges with others (Bhatt et al., 2010; Camerer, 2011; Hampton et al., 2008; Hula et al., 2015). The current study is in line with this literature and expands beyond existing findings. Here, we show that humans can also exploit controllability and exert their influence even when interacting with a series of other players (as opposed to a single other player as tested in previous studies). Furthermore, our 2-step FT model captures the explicit magnitude of controllability in individuals’ mental models of an environment, which can be intuitively compared to subjective, psychological controllability. Finally, our 2-step FT model simultaneously incorporates aversion to norm violation and norm adaptation, two important parameters guiding social adaptation (Fehr, 2004; Gu et al., 2015; Spitzer et al., 2007; Zhang and Gläscher, 2020). These individual- and social- specific parameters will be crucial for examining social deficits in various clinical populations in future studies.”

2) It would be helpful to clearly assess and discuss the commonalities and differences in results between the social and non-social versions of the task, and their implications for interpreting the findings. It would be beneficial to see the computational model comparison applied to the non-social control experiment, as well (. Critically, is the 2-step model still favored.

Thank you for this comment. Following your suggestion, we have now applied our computational models to the non-social version of the task in a new set of analyses. These new analyses revealed overlapping, yet distinct mechanisms of social and non-social controllability, as detailed below.

First, we found that the 2-step FT model was still favored in the non-social task. We also found that the estimated δ was still higher for the Controllable condition than the Uncontrollable condition in the non-social version of the task (see new Figure 2—figure supplement 1). These findings suggest that forward thinking could be a fundamental mechanism for humans to exert control across social and non-social domains.

Second, we discovered several striking differences in people’s subjective states between social and non-social contexts. As already reported in the original paper, subjective beliefs about controllability were significantly different between the social and non-social contexts. When playing against computer algorithms (i.e., non-social context), participants reported a similar level of perceived controllability (~50%) for both the Controllable and Uncontrollable conditions (Figure 2c), albeit they, evidenced by model-based analyses, mentally simulated a higher level of control in the Controllable condition than in the Uncontrollable condition. This is in sharp contrast to the social task results where participants reported perceiving higher controllability for the Controllable condition (65.9%) compared to the Uncontrollable condition (43.7%) (difference of the self-reported controllability between the conditions (Controllable – Uncontrollable) : mean_{social task} = 22.1; mean_{non-social task} = 0.3; paired t-test t(19) = -2.98, P <.01; Figure 2-figure supplement 1g).

Third, another major difference between the social and non-social contexts was observed in people’s emotional ratings. Previously, it was demonstrated that reward prediction errors (PE) experienced by individuals were significantly associated with the trajectories of self-reported emotional feelings (Rutledge et al., 2014). Building on this finding and our previous finding of how the non-social context reduced subject reports of control, we hypothesized that this emotional engagement would be modulated by the social context, such that the association between PE and emotional ratings would be weaker in the non-social than in the social context. To address this, in our new analysis, we ran a mixed effect GLM predicting emotion ratings with norm prediction errors, task types (social and non-social), interactions between the norm prediction errors and the task types, and other controlling variables such as offers, conditions, and individual random effects ('emotion rating ~ offer + norm prediction error + condition + task + task*(offer + norm prediction error + condition) + (1 + offer + norm prediction error | subject)'). As expected, the impact of norm prediction errors on happiness ratings was reduced in the non-social context compared to the social context (Supplementary file 1a; significant interaction of nPE x social context).

Taken together, these new results demonstrate both overlapping and distinct processes involved in social vs. non-social controllability. Despite a similar involvement of forward thinking in choice behaviors, we speculate that participants might have considered a computer algorithm to be more objective than a human player. Therefore, despite their similar ability to make choices and influence future outcomes, participants did not perceive the Controllable condition to be more controllable than the Uncontrollable condition; nor did they consider the PE signals to be as impactful on their feelings of happiness in the non-social condition as in the social condition. This set of findings further support the notion that subjective states could be detached from action or planning per se; and that the social context modulates the relationship between subjective states and choices.

We have now added all these results and discussion points to the revised manuscript.

Line 253: “Comparison with a non-social controllability task. To investigate whether our results are specific to the social domain, we ran a non-social version of the task in which participants (n=27) played the same game with the instruction of “playing with computer” instead of “playing with virtual human partners”. Using the same computational models, we found that not only participants exhibited similar choice patterns (Figure 2—figure supplement 1a-c), but also the 2-step FT model was still favored in the non-social task (Figure 2—figure supplement 1d,e) and that δ was still higher for the Controllable than the Uncontrollable condition (Figure 2—figure supplement 1f, mean_C = 1.31, mean_U = 0.75, t(26) = 2.54, P < 0.05).

Interestingly, a closer examination of subjective data revealed two interesting differences in the non-social task compared to the social task. First, participants’ subjective report of controllability did not differentiate between conditions in the non-social task (Figure 2—figure supplement 1g; mean_C = 62.7, mean_U = 56.9, t(25) = 0.78, P = 0.44), which suggests that the social aspect of an environment might have a unique effect on subjective beliefs about controllability. Second, inspired by previous work demonstrating the impact of reward prediction errors (PE) on emotional feelings (Rutledge et al., 2014), we examined the impact of norm PE (nPE) on emotion ratings for the non-social and social contexts using a mixed effect regression model (Supplementary file 1a). We found a significant interaction between social context and nPE ( $β$ = 0.52, P < 0.05), suggesting that the non-social context reduced the impact of nPE on emotional feelings. Taken together, these new results suggest that despite of a similar involvement of forward thinking in exploiting controllability, the social context had a considerable impact on subjective experience during the task.”

3) The analysis of overall rejection rates (Figure 2b1) is slightly puzzling with respect to the results reported Figure 2a1 and 2b2. Indeed, Figure 2a1 shows that participants encountered a much higher proportion of middle and high offers in the controllable condition (due to their control over offers) and Figure 2b2 shows a very significant increase in rejection rates for these two types of offers but only a modest decrease for low offers. In addition, the offers in the uncontrollable condition seem to vary in a systematic fashion across time and to be very rarely below 3$. In this context, I wonder how mean rejection rates can possibly be equal across controllability conditions. Still regarding rejection rates, it also seems that the uncontrollable condition was associated with a much greater inter-individual variability in rejection rates, hence suggesting that controllability reduced variability in the type of strategy used to solve the task. The authors should (i) clarify how offers of the uncontrollable conditions were generated, (ii) discuss and perhaps try to explain (and relate to other findings) the different inter-individual variability in rejection rates across conditions.

Thank you for these comments. Per your suggestion, we now clarify how the offers were generated for the Uncontrollable condition and discuss the different variance in rejection rates across conditions.

First, the offers were randomly drawn from a truncated Gaussian distribution (μ = $5, σ = $1.2, min = $2, max = $8; on the fly rather than predetermined) in the Uncontrollable condition for the fMRI sample. As a result, individuals had a slightly different set of offers for the Uncontrollable condition as well as for the Controllable condition where people’s controllability level differed. That is, the number of trials in each offer bin differed by bins and individuals. To calculate the binned rejection rates of our entire sample (Figure 2b2), we first calculated the mean rejection rate for each bin, per each individual, and then aggregated across all individuals. This approach rather than aggregating all the trials across all individuals at once gives the same weight to each individual within a binned subsample. Thus, the depicted overall rejection rates (the thick line in Figure 2b1) is different from the simple average of the binned rejection rates. All these data are made available through a repository (https://github.com/SoojungNa/social_controllability_fMRI) for any readers who might be interested in further exploring. We have now added these clarifications in the revised manuscript.

Line 132: “Next, we examined the rejection patterns from the two conditions. On average, rejection rates in the two conditions were comparable (mean_C = 50.8%, mean_U = 49.1%, t(67.87) = 0.43, P = 0.67; Figure 2b1). By separating the trials each individual experienced into three levels of offer sizes (low: $1-3, medium: $4-6, and high: $7-9) and then aggregating across all individuals, we further examined whether rejection rates varied as a function of offer size. We found that participants were more likely to reject medium to high ($4-9) offers in the Controllable condition, while they showed comparable rejection rates for the low offers ($1-3) between the two conditions (low ($1-3): mean_C = 77%, mean_U = 87%, t(22) = -1.35, P = 0.19; middle ($4-6): mean_C = 66%, mean_U = 45%, t(47) = 5.41, P < 0.001; high ($7-9): mean_C = 28%, mean_U = 8%, t(72.50) = 4.00, P < 0.001; Figure 2b2; see Figure 2—figure supplement 2 for rejection rates by each offer size). These results suggest that participants behaved in a strategic way to utilize their influence over the partners.”

In Methods, we also clarify how offers were generated as copied below.

Line 514: “Experimental paradigm: laboratory version. (…) In the Uncontrollable condition, participants played a typical ultimatum game: the offers were randomly drawn from a truncated Gaussian distribution (μ = $5, σ = $1.2, rounded to the nearest integer, max = $8, min = $2) on the fly using the MATLAB function ‘normrnd’ and ‘round’. Thus, participants’ behaviors had no influence on the future offers. (…)

Experimental paradigm: online version. (…) Lastly, to remove unintended inter-individuals variability in offers for the Uncontrollable condition, we pre-determined the offer amounts under Uncontrollable (offers = [$1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9], mean = $5.0, std = $2.3, min = $1, max = $9) and randomized the order of them.”

Second, we agree with the reviewers that it is interesting to consider whether controllability could have reduced inter-individual variability in rejection rates, given that the variability in rejection rates was indeed smaller for the Controllable condition (F(47, 47) = 207.89, P << 0.0001). We consider this convergence in choice strategy as further proof that humans are able to exploit their environment if there is controllability, whereas being more variable under the uncontrollable condition could be a form of undirected exploration (Wilson et al., 2014), which would result in a greater inter-individual variability.

Additionally, we directly tested if our results hold even after controlling for the variability across individuals. Specifically, we ran a mixed-effect logistic regression predicting choices (accept/reject) using the offer, the condition, and the interaction terms as predictors, and individual subjects as random effects ('choice ~ 1 + offer + condition + offer*condition + (1 + offer + condition + offer*condition | subject)'). The results (Supplementary file 1b) were consistent with our originally reported results, such that there is a significant offer effect (β = 1.82, P <.001), no condition effect (β = -0.58, P = 0.57), and no interaction effect (β = -0.30, P = 0.10; Controllable was coded as 1 and Uncontrollable as 0). That is, choices were sensitive to offers but the sensitivity marginally reduced under the Controllable condition compared to the Uncontrollable condition. These results showed that our main results still held even after accounting for the individual variability. We have added these results and relevant discussions in supplementary information (Supplementary file 1b).

4) In the behavioral analyses, what is the rationale for grouping the offer sizes into three bins rather than using the exact levels of offer sizes? Do the key results hold if exact values are used?

We reported and displayed the binned rejection rates mainly for two reasons: (1) It had a better matched subsample size for each bin and (2) by mistake, the range of the offers were not identical between conditions for the fMRI sample ($1-9 for the Controllable condition; $2-8 for the Uncontrollable condition). We now presented the rejection rates by each offer size and added the result as Figure 2—figure supplement 2 in the supplementary information. As depicted in Figure 2—figure supplement 2, the key results hold regarding rejection rates between conditions and across offer sizes.

Furthermore, as addressed in the previous question, we added the mixed-effect logistic regression results in the paper (Supplementary file 1b). Note that this approach is statistically more stringent, and that it showed consistent results with the original binned results. Yet, we would prefer to keep the original results in the main text, because we believe the original simple analysis would help readers to gain a better and more intuitive understanding of the results.

5) It would be helpful to include an analysis of response times. Indeed, one would expect forward planning to be associated with lengthened decision times and correspondingly, for the δ parameter (or strategizing depth, or controllable condition) to be associated with longer decision times (e.g. Keramati et al., Plos Comp. Biol., 2011). Furthermore, it was recently shown that perceived task controllability increases decision times, even in the absence of forward value computations (Ligneul et al., Biorxiv). It is also good practice to include decision times as a control parametric regressor when analyzing brain activities related to a variable potentially correlated with them. Furthermore, one could expect longer reaction times for more conflicting decisions (i.e. closer valuations of reject/accept offers).

We thank the reviewers for the suggestion. We have now conducted new behavioral as well as fMRI analyses on the response times (RT) and have included these results to the supplementary information.

First, as the reviewers suggested, RT was indeed longer for the Controllable condition than the Uncontrollable condition, suggesting that controllability may involve more contemplation (Figure 2-figure supplement 3a; mean_c = 1.75 ± 0.38, mean_u = 1.53 ± 0.38; paired t-test t(47) = 4.34, P < 0.001). However, in either condition, neither the correlations were significant between RT and individuals’ expected influence parameter δ (Figure 2-figure supplement 3b-c), nor between RT and individuals’ self-reported controllability ratings (Figure 2-figure supplement 3d-e).

Second, following the reviewers’ suggestions, we have now conducted a new set of fMRI analyses to include trial-by-trial RT as the first parametric regressor followed by our main parametric regressor (chosen values) in the original GLM. We did not find any significant neural activation related to the RT (P_FDR <.05). However, consistent with the reported result in the original submission, the vmPFC chosen value signals were still significant at P_FDR <.05 and k > 50 after controlling for any potential RT effects (Figure 2-figure supplement 3f; peak coordinate [0, 54, -2]).

We also ran a mixed effect linear regression to examine whether or not trial-by-trial response time is correlated with the values of chosen actions as below: RT ~ 1+ condition + chosen values + condition * chosen values + (1+ chosen values | subject).

As shown in Supplementary file 1c, we found that neither the chosen value (β = 0.00, P = .63) nor the interaction term (β = -0.00, P = .43) had significant effect on RT, while the condition effect was significant (β = 0.21, P <.001; consistent with Figure 2-figure supplement a). This analysis suggests that in our task, RT did not have a significant relationship with chosen value, the main parametric modulator of interest. Based on these additional results, we decided to keep our original fMRI results in the main text, but to add the new results to SI (Figure 2—figure supplement 3, Supplementary file 1c).

Finally, we ran another mixed effect linear regression to examine how “conflict” ( = chosen value – unchosen value) might impact RT: RT ~ 1+ condition + conflict + condition * conflict + (1+ condition + conflict + condition * conflict | subject)

Both the conflict (β = -0.04, P <.005) and the condition (β = 0.13, P <.001) had significant impacts on RT, while there was no interaction effect (β = 0.03, P = .10) (Supplementary file 1d ). This result suggests that conflict did have a significant impact of on RT. We have now added this to SI (Supplementary file 1d).

6) The authors refer to the δ parameter "modeled controllability", however the model doesn't provide any account of the process of estimating controllability from observed outcomes (see Gershman and Dorfman 2019, Nature Communications or Ligneul et al., 2020, Biorxiv for examples of such models), but only reflects the impact of controllability on value computations, or the monetary amount of "expected influence" in each condition. An augmented model might include a computation of controllability, with the δ parameter controlling the extent to which estimated controllability promotes forward planning. Even if the authors don't fit such a model, they should explicitly acknowledge that their algorithm does not implement any form of controllability estimation, and might consider calling δ a "forward planning parameter". In addition, it is unclear why the authors chose to constrain the δ parameter to fluctuate between -2 and 2$ (rather than between 0 and 2$, in line with their experimental design, or with even broader bounds) and what a negative δ would imply. Also, would it make sense to exclude participants with a negative δ in addition to those with a δ greater than 2? Do all results hold under these exclusions?

Thank you and we fully agree with your suggestion. We have now revised the term “modelled controllability” to “expected influence” throughout the manuscript.

We constrained the magnitude of the δ to be within $2 based on the experimental design of the Controllable condition where the true δ can only be $2 or less. The first (and the last) 5 trials were excluded in model fitting, and we assumed that individuals’ δ would have been properly learned before participants could truly exploit controllability of the environment during majority of trials. Therefore, their expectation about their influence would not be completely off from the true experiences (e.g., expecting a δ of $9 after they have only seen their partners’ offer changes of $1 or $2). It would certainly be interesting in future work to adjust the design of the task, for instance with fluctuating degrees of controllability, to be able to gain more purchase on the learning itself.

The reason why we initially allowed a negative range for the δ is because in the Uncontrollable condition, due to its randomly sampled offers, subsequent offers could drop after a rejection choice and increase after an acceptance response (i.e., opposite direction from the controllable condition). For completeness and following the reviewers’ suggestion, we have now rerun the analysis without those who had a negative δ for the Controllable condition (6.3% of the fMRI sample; 5.7% of the online sample). The behavioral results still held for both the fMRI (Figure 3-figure supplement 4) and the online samples (Figure 4-figure supplement 3). Specifically, there were statistically significant differences between the two conditions in (i) the offer size (fMRI sample: t(44) = 5.05, P <.001; online sample: t(1,265) = 22.94, P <.001), (ii) the rejection rates for the middle (fMRI sample: t(44) = 5.33, P <.001; online sample: t(1,265) = 10.23, P <.001) and high offers (fMRI sample: t(38) = 4.68, P <.001; online sample: t(934) = 31.40, P <.001), (iii) the perceived controllability (fMRI sample: t(36) = 3.67, P <.001; online sample: t(1,265) = 26.23, P <.001), and (iv) the δ (fMRI sample: t(44) = 5.14, P <.001; online sample: t(1,265) = 19.54, P <.001). In addition, (v) the δ was positively correlated between the two conditions (fMRI sample r = .40, P <.01; online sample: r = .25, P <.001), and (vi) the δ and the mean offers were positively correlated (fMRI sample r = .86, P <.001; online sample: r = .71, P <.001). We now provide this set of analyses and results in SI (Figure 3—figure supplement 4 and Figure 4—figure supplement 3).

7) While the authors performed a parameter recovery analysis, they did not report cross-parameter correlations, which are important for interpreting the best-fitting parameters in each condition. Furthermore, it is good practice to perform model recovery analyses on top of parameter recovery analyses (Wilson and Collins, 2019, eLife; Palminteri et al., 2017, TiCS) in order to make sure that the task can actually distinguish the models included in the model comparison. As a result, the conclusions based on model comparison and parameters values (that is, a significant part of the empirical results) are uncertain. The cross-correlation between parameters and model recovery analysis should be reported as a confusion matrix.

Thank you and we fully agree with your suggestion. We have now added the cross-parameter correlations as well as the model recovery results as confusion matrices in the paper.

Figure 4—figure supplement 2 illustrates the cross-parameter correlations. We did not find any strong correlations (r > 0.5) between parameters. The α (sensitivity to norm prediction error) and the F0 (initial norm) were moderately correlated (r = -0.39) under the Controllable condition for the fMRI sample. However, these parameters were still independently identifiable (parameter recovery α: r = 0.57, P < 0.001, F0: r = 0.66, P < 0.001; please see Figure 3—figure supplement 3b-c).

We have now added model recovery results in the supplementary information (added as Figure 3—figure supplement 1). To examine whether our task design is sensitive enough to distinguish the models, we simulated each model where we fixed the inverse temperature at 10 and constrained the δ to be positive (between [0 2], inclusive) to make it similar to the actual empirical behavior we found. Other parameters were randomly sampled within the originally assumed range. We ran 100 iterations of simulation where in each iteration, behavioral choices of 48 individuals (which is equal to our fMRI sample size) were simulated. Next, we fit each model to each model’s simulated data where all the settings were identical to the original settings. Consistent with our original method, we calculated average DIC scores and determined the winning model of each iteration. In this set of new analysis, we focused on three distinct types of models in our model space, namely, model-free (MF), no FT (0-step), and FT (2-step) models. Note that we chose the 2-step FT model as a representative of the FT models due to both its simplicity (i.e., per Occam’s razor) and due to the similarity amongst the FT models. This additional model recovery analysis suggests that our task can actually distinguish the models.

8) The parameters of the adaptive social norm model exhibit fairly poor recoverability, particularly in the controllable condition. The motivation for using this model is that it provided the best fit to subjects data in a prior uncontrollable ultimatum game task, but perhaps such adaptive judgment is not capturing choice behavior well here. It would be helpful to see a comparison of this model with one that has a static parameter capturing each individual's subjective inequity norm.

Thank you for this insightful suggestion. Following your suggestion, we have now examined an alternative 2-step model that has a static norm throughout all trials within each condition. The DIC scores of the alternative model (“static”) were higher than the original 2-step FT model using Rescorla-Wagner learning (“RW”) in both conditions (smaller DIC score indicates a better model fit; added as Figure 3—figure supplement 2). This new analysis suggests that the adaptive norm model still offers a better account for behavior than the static norm model.

9) The authors stated that future actions are deterministic (line 576) contingent on the utility following the immediate reward. If so, is Figure 3a still valid? If all future actions are deterministic, there should be only one path from the current to the future, rather than a tree-like trajectory.

Thank you for bringing up this point. Figure 3a was indeed erroneous and we have now revised it by highlighting one path to represent our model better.

10) The MF model, and the rationale for its inclusion in the set of models compared, needs to be explained more clearly. The MF model appears to include no intercept to define a base probability of accepting versus rejecting offers, which makes it hard to compare with the other models in which the initial norm parameter may mimic such an intercept.

We called it the MF model because it updates Q-values in light of the cached values of the following state (as opposed to the 0-step model, which only considers the current state), yet without a state transition component (as opposed to the FT models that reflect state transitions, which we conceptualized as controllability). All other components including the utility function of the immediate rewards, and the variable initial norm and norm learning incorporated in the utility function are shared across all the candidate models. In common with the other candidate models, the MF model also includes an initial norm parameter, which captures the base probability of accepting versus rejecting offers. We have now revised the manuscript to clarify the possible confusion.

Line 181: “We compared models that considered from one to four steps further in the future in addition to standalone social learning (‘0-step’) and model-free, non-social reinforcement learning (‘MF’). The 0-step model only considers the utility at the current state. The MF model updates and caches the choice values that reflect the following states, yet without a state transition component (as opposed to the FT models that reflect state transitions, which we conceptualized as controllability). All other components including the utility function of the immediate rewards, and the variable initial norm and norm learning incorporated in the utility function are shared across all the candidate models.”

11) The fact that the vmPFC encoded total future + current value (2-step) and not current value (0-step) suggests that it might be specifically involved in computing future values but the authors do not report directly the relationship between its activity and future values. How correlated are the values from the 0-step model and the 2-step model? And more importantly, if vmPFC is associated with TOTAL value but not the CURRENT value, should that mean the vmPFC is associated with the FUTURE value only? It might make more sense to decompose the current value and future value both from the winning 2-step model, and construct them into the same GLM without orthogonalization.

Thank you for this comment. We originally considered the model-based fMRI analysis to be a biological validation of the models more than a way to delineate the different neural substrates of current and future values. That is, the lack of significant neural signals tracking values from the 0-step model may suggest that the 0-step model is less plausible than the 2-step model at the neurobiological level, despite the fact that value estimates are highly correlated between the two models (mean correlation coefficient was 0.74 for the Controllable condition and 0.84 for the Uncontrollable condition). Nevertheless, we agree that it would be interesting to examine if the current and future value terms under the same (2-step FT) model might be encoded by different neural substrates. Thus, we have now run a new set of GLMs with both the current and the future values without orthogonalization. We found that the current value-alone signal was encoded in the vmPFC (peak voxel [2, 52, -4]) and the dmPFC ([2, 50, 18]), and the future value-alone signal was tracked by the right anterior insula ([34, 22, -12]), at the threshold of P <.001, uncorrected. Although these results did not survive the more stringent threshold applied to the main results (P_FDR <.05, k > 50), all survived the small volume correction at P_SVC <.05. Figure 5-figure supplement 2 was displayed at P <.005, uncorrected, k > 15. Together with our main result, these results indicate that the vmPFC encodes both current and total values estimated from the 2-step FT model; and that current and future value signals also had distinct neural substrates (dmPFC and insula). We have now added this new analysis in SI (Figure 5—figure supplement 2).

12) The vmPFC result contrast averages across the controllable and uncontrollable conditions (line 629). Why did the authors do so? Wouldn't it be better to see whether the "total value" is represented differently between the two conditions.

We apologize for not being clear about this point in the previous version of the paper. We showed the averages because there was no significant difference between the two conditions both in the whole brain analysis (P_FDR <.05) as well as in the ROI analyses (Figure 5c). This result might seem puzzling at a first glance; however, it is consistent with our computational modelling results in the sense that people simulated 2 steps regardless of the actual controllability of the environment and that they needed to engage the vmPFC to do so. We have now added more clarification and discussion on this finding in the revised manuscript.

Line 325: “These analyses showed that the BOLD signals in the vmPFC tracked the value estimates drawn from the 2-step planning model across both conditions (P_FDR < 0.05, k > 50; Figure 5a, Supplementary file 1e), and there was no significant difference between the two conditions (P_FDR < 0.05). In contrast, BOLD responses in the vmPFC did not track the trial-by-trial value estimates from the 0-step model, even at a more liberal threshold (P < 0.005 uncorrected, k > 50; Figure 5b, Supplementary file 1f). (…) These findings suggest that individuals engaged the vmPFC to compute the projected total (current and future) values of their choices during forward thinking. Furthermore, vmPFC signals were comparable between the two conditions both in the whole brain analysis and the ROI analyses. Consistent with our behavioral modeling results, these neural results further support the notion that humans computed summed choice values regardless of the actual controllability of the social environment.”

Line 419: “In addition, we did not find any significant differences in neural value encoding between the conditions. These results suggest that participants still expected some level of influence (controllability) over their partners even when environment was in fact uncontrollable. Furthermore, δ was positively correlated between the conditions, indicating the stability of the mentally simulated controllability across situations within an individual. We speculate that people still attempted to simulate future interactions in uncontrollable situations due to their preference and tendency to control (Leotti and Delgado, 2014; Shenhav et al., 2016).”

13) The analysis of the relation between the vmPFC β weights and the difference between self-reported controllability beliefs and model-derived controllability estimates (Figure 5 d and e) is not adequately previewed. The hypothesis for why vmPFC activity might track this metric is unclear. Moreover, the relation between the two in the uncontrollable condition is somewhat weak. The authors should report the relation between vmPFC β weights and each component of the difference score (modeled and self-report controllability), and clearly motivating their intuition for why vmPFC activation might be related to that metric. If the authors feel strongly that this analysis is important to include, it would be meaningful to see whether the brain data could help explain behavioral data. For example, a simple GLM could serve this purpose: mean_offer ~ β(vmPFC) + self-report_controllability + model_controllabilty. Note that the authors need to state the exploratory nature if they decide to run this type of analysis.

Thank you for this comment. This was indeed an exploratory analysis. Thus, we now have clarified this in the revised manuscript and moved relevant results to SI. We sought to explore the neural correlates of the belief-behavior disconnection because we saw a discrepancy between the self-reported belief and the δ parameter particularly for the Controllable condition (r = .004, P = .98; although a moderate correlation exists for the Uncontrollable condition: r = -.43, P <.01) which we had not expected, and seemed rather interesting. We therefore explored the relationship between individuals’ neural sensitivity to total decision values in the vmPFC and each of the two measures, and examined whether the vmPFC sensitivity mediates the relationship between the measures. However, the vmPFC ROI coefficients were not correlated with each separate component, either the perceived controllability (Controllable condition: r = .05, P = .76; Uncontrollable condition: r = .21, P = .18) or the δ (Controllable condition: r = .23, P = .11; Uncontrollable condition: r = -.27, P = .07). To clarify the explorative nature of our analysis, we edited corresponding sections as follows.

Line 351: “Furthermore, in an exploratory analysis, we examined the behavioral relevance of these neural signals in the vmPFC beyond the tracking of trial-by-trial values. [...] These results suggest that the meaning of vmPFC encoding of value signals could be context dependent – and that heightened vmPFC signaling in uncontrollable situations is related to overly optimistic beliefs about controllability.”

Furthermore, we conducted the linear regression suggested by the reviewers. The results showed that only the δ, not the vmPFC ROI coefficient or perceived controllability predicted the mean offer size (Author response table 1). However, this regression does not examine the relationship between neural sensitivity and the belief-behavior disconnection, and thus we did not add the result to the revised manuscript.

Author response table 1.

	Estimate	SE	t	p
Intercept	2.87	0.61	4.70	0.00
vmPFC	-0.26	0.16	-1.60	0.12
PC	0.01	0.01	1.51	0.14
Δ	1.76	0.21	8.34	0.00

Open in a new tab

14) The authors might also report the neural correlates of the internal norm and the norm prediction error (line 544). If the participants indeed acquired the social controllability through learning, they might form different internal norms in the two conditions, hence the norm prediction error might also differ.

Thank you for this excellent prompt. We have now conducted a new set of whole brain analyses to examine the neural correlates of norm prediction errors and internal norms by entering them as parametric modulators in two separate GLMs.

The norm prediction error signals were found in the ventral striatum (VS; [4, 14, -14]) and the right anterior insula ([32, 16, -14]) for the Controllable condition, and in the anterior cingulate cortex ([2, 46, 16]) for the Uncontrollable condition at P_FWE <.05, small volume corrected. These regions have been suggested to encode the prediction errors in the similar norm learning context (Xiang et al., 2013). Next, we contrasted the two conditions and found that the ventral striatum ([4, 14, -14]) and the right anterior insula ([32, 16, -14]) activations were significantly greater for the Controllable condition than for the Uncontrollable condition (P_FWE <.05, small volume corrected) whereas the ACC ([2, 46, 16]) activation under the Uncontrollable condition was not significantly greater than the Controllable condition at the same threshold. Figure 5—figure supplement 3 was displayed at P <.05, uncorrected, k > 120.

The norm-related BOLD signals were found in the ventral striatum ([10, 16, -2]) for the Controllable condition, and in the right anterior insula ([28, 16, -6]) and the amygdala ([18, -6, -8]) for the Uncontrollable condition at P_FWE <.05, small volume corrected. However, the whole brain contrast showed no difference between the conditions. Figure 5—figure supplement 4 was displayed at P <.01, uncorrected, k > 50.

(15) Specific aspects of the experimental design may have influenced the observed results in ways that were not controlled. For example, it is not only the magnitude and controllability of outcomes that differed between the controllable and uncontrollable conditions, but also the uncertainty. It is possible that the less variable offers encountered in the controllable condition may have driven some of the results. The authors should acknowledge that the possible role of autocorrelation and uncertainty on behavioral and modeling results.

Thank you for bringing up this point. Autocorrelation and uncertainty are indeed inherent features of having control. To address this concern, we examined the relationship between uncertainty/autocorrelation and our key outcome variables. To this end, we first operationalized uncertainty as the standard deviation of offers within each condition (“offer SD”). We entered the offer SD with the condition variable in a regression predicting the δ parameter (Δ ~ 1 + offer SD + condition) and the self-reported perceived controllability (perceived control ~ 1 + offer SD + condition). The results showed that the condition still predicted the δ (β = 0.36, P <.05; Supplementary file 1g a) whereas the offer SD had no significant impact on the δ (β = -0.03, P = 0.92; Supplementary file 1g a). Similarly, for the self-reported controllability, the condition (β = 21.09, P <.001; Supplementary file 1g b) had a significant effect whereas the offer SD did not (β = 15.60, P = .16; Supplementary file 1g b).

Next, to examine the autocorrelation issue, we computed the sample autocorrelation of offers using ‘autocorr’ function in MATLAB. Indeed, 30 out of 48 subjects showed significant autocorrelation at lag 1 (“ACF1”) for the Controllable condition whereas none had significant autocorrelation for the Uncontrollable condition. Next, we entered ACF1 into a regression model similar to the one stated above. We found that for δ, the condition effect became marginal (β = 0.46, P = .07; Supplementary file 1g c), and there still was no significant ACF1 effect (β = -0.21, P = .59; Supplementary file 1g c). For the perceived controllability, the condition effect was marginal (β = 18.14, P = .05; Supplementary file 1g d) but the ACF1 effect was not significant (β = 7.93, P = .58; Supplementary file 1g d). We have added these results in the supplementary information (Supplementary file 1g).

In summary, we did not find evidence that uncertainty or autocorrelation had a major impact on our controllability measures. Yet, controlling for the autocorrelation weakened the impact of controllability condition. We added this as a limitation of the study in Discussion.

Line 466: “In addition, a task that carefully controls for uncertainty and autocorrelation confounds would help better understanding the accumulative effects of social controllability. Although we did not find evidence that uncertainty or autocorrelation affected the expected influence or self-reported controllability, we found that the impact of the condition on the expected influence and the self-reported perceived controllability became marginal when controlling for the autocorrelation (P = 0.07 for the expected influence; for self-reported controllability P = 0.05) (Supplementary file 1g).”

(16) Moreover, asking participants to repeatedly rate their perception of controllability almost certainly influenced and exacerbated the impact of this factor on choices. It would have been very useful to perform a complementary online study excluding these ratings to ensure that controllability-dependent effects are still evident in such a case.

Sorry for the lack of clarify in our original writing. In fact, participants were not repeatedly asked to rate the perception of controllability within the trials. This was on purpose because we shared the same concern you raised here. As such, we intentionally avoided such issue by only asking participants to rate their perception of controllability at the end of the task. We have now clarified this point in the revised manuscript.

Line 98: “At the end of the task, after all the trials were completed, participants rated how much control they believed they had over their partners’ offers in each condition using a 0-100 scale (‘self-reported controllability’ hereafter). In the fMRI study, on 60% of the trials, participants were asked about their emotional state (“How do you feel?”) on a scale of 0 (unhappy) to 100 (happy) after they made a choice (i.e., 24 ratings per condition; see Figure 1—figure supplement 1).”

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

A primary concern is that the manuscript does not provide sufficiently strong support for the claim that the vmPFC supports forward planning, particularly in light of the new neuroimaging analyses performed as part of this revision. Reviewer 3 has a concrete suggestion for how this claim might be strengthened with a model comparison analysis. If further evidence for the claim is not found/provided, it should be tempered. Reviewer 2 also questions whether it is useful and sensible to retain the MF model in the set of compared models, and both reviewers note a few areas where clarification, greater methodological detail, or further interpretation are warranted.

Please carefully consider each of the reviewers suggestions as you revise your manuscript.

Thank you and all reviewers for your thoughtful comments. We are thrilled that all reviewers find the revised manuscript significantly improved. We are especially grateful for these two final suggestions made by reviewers. Following Reviewer 3’s suggestion, we have now implemented a neural model comparison using the recommended MACS toolbox; this analysis further substantiated our finding related to the vmPFC and relevant methods/results have been added to the revised manuscript. Following Reviewer 2’s suggestion, we have removed the MF model from the main text. Finally, we have also made every effort to address all remaining concerns. Please see our point-by-point response below.

Reviewer #2:

The authors have revised their manuscript considerably and addressed a number of concerns raised in the initial review, with their additional analyses and detailed clarification. I particularly appreciate that the authors took the courage to dive into the direct comparison of findings between the social and non-social groups, which provided new insights. Furthermore, the revised Introduction is more thought-provoking with relevant literature included. Now the conclusions are better supported as it stands, and these findings are certainly going to be exciting additions to the literature of social decision neuroscience.

Here I have a few additional points, more for clarification.

(1) In response to comment #2, the authors might unpack the significant interaction result, to explicitly show "that the non-social context reduced the impact of nPE on emotional feelings." Also in the same LME model, I am curious about the significant "Controllable × social task (***)" interaction (β = -5.06). Does this mean, being in the Controllable + Social group, the emotion rating is lower? How would the authors interpret this finding?

Thank you for this suggestion. To unpack the interaction effect of norm prediction error and task type, we conducted residual correlations and plotted the mean coefficient in Figure 2-figure supplement 1h. Specifically, we used the regression coefficients from the original mixed-effect regression ('emotion rating ~ offer + norm prediction error + condition + task + task*(offer + norm prediction error + condition) + (1 + offer + norm prediction error | subject)') and calculated the residual, which should be explained by the differential impact of nPE between social and non-social tasks. Correlation coefficients between the residuals and nPE were plotted for each task condition. Note that the non-social group was coded as the reference group (0 for the group identifier) in the regression. We also added this figure in our manuscript as Figure 2—figure supplement 1h. This result indicates that the impact of nPE on emotion was stronger in the Social than in the non-social Computer task, revealing an interesting joint contribution of PE and the social context to subjective states. We speculate that this is due to the interpersonal nature of the social version of the game and suggest that the cause of this effect deserves further instigation in future studies.

Regarding the Controllable × social task interaction, we applied a similar residual approach and found that emotion rating was lower in the Social and Controllable condition compared to other condition × task type combinations (Figure 2-figure supplement 1i ). We speculate that exerting control over other people – compared to not needing to exert control over other people or playing with computer partners – might be more effortful (as shown by our RT results). Intentionally decreasing other people’s portion of money might also induce a sense of guilt. We have now added this figure in Figure 2—figure supplement 1i.

(2) In response to comment #5 regarding response time with the additional LME analyses, I wonder which distribution function was used? We know that RT data is commonly positively skewed, so a log-normal or a shifted log-normal should be more accurate.

Thank you for your suggestion. Our previous results were based on normal distribution. Following the reviewer’s suggestion, we have now re-ran the regressions using log-normal function. New results related to “chosen value” were similar to the previous version (Author response table 2) in that only the condition effect was significant (Author response table 3) .

Author response table 2.

Name	Estimate	SE	t	DF	p-value
Intercept	1.52	0.06	24.43	2860	0.000
Condition (***)	0.21	0.06	3.71	2860	0.000
Chosen value	0.00	0.01	0.48	2860	0.630
Condition × chosen value	0.00	0.01	-0.80	2860	0.426

Open in a new tab

Author response table 3.

Name	Estimate	SE	t	DF	p-value
Intercept	1.60	0.06	28.72	2860	0.000
Condition (***)	0.13	0.03	4.39	2860	0.000
Conflict (**)	-0.04	0.01	-2.98	2860	0.003
Condition × conflict	0.03	0.02	1.66	2860	0.096

Open in a new tab

We ran a mixed effect linear model (RT ~ 1+ condition + chosen values + condition * chosen values + (1+ chosen values | subject)) to test whether chosen values predict response times. We found that neither the chosen value coefficient (β = 0.00, P = .63) nor the interaction term (β = -0.00, P = .43) was significant, while the condition effect was significant (β = 0.21, p <.001; consistent with Figure 2—figure supplement 3a). *** P < 0.001

We ran a mixed effect linear model (RT ~ 1+ condition + conflict + condition * conflict + (1+ conflict | subject)) to test whether conflicts (values of the chosen action – values of the unchosen action) affect response times. Both the conflict (β = -0.04, p <.005) and the condition (β = 0.13, p <.001) had significant impacts, while there was no interaction effect (β = 0.03, p = .10), suggesting that conflict did have a significant impact of on RT. ** P < 0.01; *** P < 001.

The new regression analysis examining “conflict” also showed similar results to the previous version (Author response table 3) except that the condition x conflict interaction effect became significant (Supplementary file 1d) from being marginal in the previous version. We replaced the previous results with these new results.

(3) I retain my initial comment regarding the inclusion of the MF model. The task is deterministic – participants get what appears if they accept and 0 if reject. In fact, the model is making a completely different prediction: according to the Q-value update, if the participant chose an "accept" and then indeed received a reward, then they should repeat "accept". But in the current task design, such a "positive feedback" would make the participants feel they are perhaps too easy to play with, and will be more likely to choose "reject" on the next trial. In essence, the MF model is not even capturing the behavioral pattern of the task, hence it does not seem to be a good baseline model. Rather, the 0-step model is okay enough to be the reference model.

Thank you for the suggestion. We agree that the MF model clearly did not capture real subjects’ behaviors during this social interaction game and have now removed the MF results from the main text. For completeness, we kept MF-relevant descriptions and results in Figure 3—figure supplement 5 for readers who might be interested.

Reviewer #3:

The authors have made very significant efforts to respond to a diversity of concerns and to amend their paper accordingly. The revised version is thus more complete and I believe that the main argument of the paper has been made stronger.

In many cases, the authors have appropriately adjusted their language in order to better align their conclusions with the data (e.g. renaming the δ parameter expected influence parameter) and I think that this paper can constitute an interesting addition to the field.

However, I am still slightly skeptical about the reach of neuroimaging results and I believe that some limitations of the paradigm may be more explicitly discussed.

A. Neuroimaging.

The authors have performed valuable additional analyses regarding the norm and norm prediction errors signals which can be of interest for the field. But I believe that our main concerns about vmPFC effects have not been fully addressed. Indeed, the authors still write that the vmPFC constructs "the total values (both current and future) of current actions as humans engaged in forward planning during social exchange". However, when splitting the analysis of current and future values, the encoding of future values was found in the insula whereas the vmPFC only encoded current values. The authors claim that the lack of encoding of total values derived from the 0-step FT model constitutes evidence in favor of forward planning, but it could be that this lack of evidence is driven by a poorer fit of current (rather than total) values by this simpler model. In order to better substantiate their claim about vmPFC's role, the authors may want to perform a model comparison at the neural level by comparing GLMs (using for example the MACS toolbox) including current value only, current value and future value, future value only or total value. Alternatively, they could analyze the first-level residuals produced by GLMs including alternatively current value, future value and total value (all based on FT-2). If their interpretation is correct, GLMs equipped with a parametric regressor for total value should be associated with smaller residuals in the vmPFC.

Thank you for this insightful comment, which has further helped strengthen the paper. Following your suggestion, we performed a model comparison at the neural level using the MACS toolbox. We compared four different GLMs: (i) the GLM with total value (TV) as one regressor (our original GLM), (ii) the GLM with current and future value (CVandFV) as two regressors (without orthogonalization), (iii) the GLM with only current value (CV), and (iv) the GLM with only future value (FV). All value regressors were estimated from the 2-step forward thinking (FT) model. We found that our original GLM with total value had clearly higher exceedance probability in the vmPFC compared to other candidate GLMs. We added these results in our manuscript as Figure 5—figure supplement 3 and briefly discussed it the Results section.

Line 333 (line numbers are based on the clean version): We also conducted model comparison at the neural level using the MACS toolbox (see Figure 5—figure supplement 3 for details) and found that the vmPFC encoded total values rather than only current or future values.

Regarding the behavior-belief disconnection analysis, I think that it would be more sensical to study the ratio rather than the difference between behavior and subjective reports, since these two measures are qualitatively different.

Thank you for the suggestion. To clarify, both measures were standardized before we calculated the distance. Our measure is simply the signed distance between the standardized belief and the standardized δ, or the signed and scaled “disconnect distance” as illustrated in Author response image 1 (our measure = √(2* [disconnect distance]²)). This approach has been commonly used in the cognitive neuroscience literature (Carter et al., 2012; Chung et al., 2015; Zhu et al., 2014).

We believe that such an Euclidean distance is a better way to operationalize the “disconnect” than a ratio, because a ratio leads to uneven weights. For example, we intend to quantify the disconnections between [behavior: 1 vs belief: 2] and [behavior: 2 vs belief: 1] to have equal magnitude, but opposite signs. The suggested ratio measures generate different magnitudes (1/2=0.5 and 2/1=2), whereas the Euclidean distance captures the difference in signs with keeping the magnitude of distances the same as we intended (1-2=-1 and 2-1=1).

Finally, it might be worth providing the reader with a brief discussion of the other neural substrates uncovered by the most recent analyses (dmPFC, insula, striatum, etc.).

Thank you for the suggestion. We discussed the findings in the result section.

Line 356: “In addition, we examined whether norm prediction errors (nPEs) and norm estimates themselves from the 2-step FT model were tracked in the brain. We found that nPEs were encoded in the ventral striatum (VS; [4, 14, -14]) and the right anterior insula (rAI; [32, 16, -14]) for the Controllable condition (Figure 5—figure supplement 4a), while these signals were found in the anterior cingulate cortex (ACC, [2, 46, 16]) for the Uncontrollable condition (Figure 5—figure supplement 4b) at P_FWE <.05, small volume corrected. […] Taken together, these results suggest that the controllability level of the social interaction modulates neural encoding of internal norm representation and adaptation, expanding our previous knowledge about the computational mechanisms of norm learning (Gu et al., 2015; Xiang et al., 2013).”

B. Behavioral paradigm.

I believe that the authors should provide a few more details in the methods and acknowledge a few limitations in their discussion.

First, unless I am mistaking the method used to decide on block order (i.e. C or U first) was not reported. Was the "illusion of control" in the uncontrollable condition driven by the subset of participants who passed the controllable block first? If this is the case, then it might add some plausibility to the interpretation of subjective controllability ratings in the uncontrollable condition as an "illusion of control" (persistence of a control prior). In other words, I think that the authors should refrain from interpreting the raw value of these ratings as an illusion of control (perhaps not all participants understood the meaning of the rating, perhaps they were too lazy to move the cursor until 0, etc.).

The condition order was counterbalanced (see manuscript line 76 and 875). Here, we also address the two different aspects of your comment below.

1) Regarding examining a potential order effect and proposed potential “Bayesian”-like explanation for the reported behavior: This is an interesting and appealing hypothesis that we have now examined. However, we did not identify any order effect in terms of the expected influence or the self-reported controllability (Supplementary file 1h), suggesting that there was no evidence that “illusion of control” was induced by completing the controllable block first. Priors of controllability might still exist in the subjects – but based on our data, such prior would be more likely due to pre-existing individual differences (e.g. childhood experience in environmental controllability) than task-induced priors per se. Although we did not survey these individual differences, future studies could systematically investigate this interesting question. We have added this to our discussion.

Although we did not find any order effect on the expected influence parameter or self-reported belief, future studies would be needed to probe task-induced priors more thoroughly. PC represents self-reported perceived controllability. C represents the Controllable condition and U represents the Uncontrollable condition.

2) Regarding potentially noisy subject reports: Indeed, whenever we include subjects’ self-reports in empirical studies, there is always the risk that these reports are potentially noisy and reflect non-task related factors (e.g., too lazy to move the cursor). In fact, this has been a longstanding topic for discussion in the history of psychology and cognitive neuroscience (e.g. see Stone, Arthur A., et al., The science of self-report: Implications for research and practice. 1999 for an overview). However, most researchers agree that subjective data are still valuable as they offer unique insights – and perhaps the most direct insight – into one’s subjective world, which is also specific to human studies. Based on our results, we suggest that both measures (objective and subjective) contribute to the illusion of control. For the purpose of this manuscript, we have clarified these points throughout the text and tuned down any direct claim based on the sole association between self-reports and illusion of control.

While it does not necessarily implies an illusion of control, the fact that participants still relied on on forward planning in the uncontrollable condition (as indexed by the expected value parameter) is presumably what prevented authors to really isolate the neural substrates of strategic controllability-dependent forward planning, and it might thus be mentioned as a limitation of the paradigm.

Thank you for this comment. We added this limitation in the Discussion section.

Line 501: “Second, the lack of clear instruction in different controllability conditions in our study may have affected the extent to which individuals exploit controllability and develop illusion of control. Future studies implementing explicit instructions might be better suited to examine controllability-specific behaviors and neural substrates.”

I believe that it is also important to mention explicitly the fact that a third and a quarter of the data was excluded from the analyses of behavioral and fMRI data (i.e. first and last five trials of each block) respectively and the rationale for this exclusion may be discussed.

Thank you for this comment. We added the total number of the trials in the paragraph so that the proportion to be excluded is clearer.

Line 193: “In model fitting, we excluded the first five trials out of 40 trials for the fMRI sample (30 trials for the online sample) to exclude initial exploratory behaviors and to focus on stable estimation of controllability. We also excluded the last five trials because subjects might adopt a different strategy towards the end of the interaction (e.g. “cashing out” instead of trying to raise the offers higher).”

The authors wrote that "a task that carefully controls for uncertainty and autocorrelation confounds would help better understanding the accumulative effects of social controllability", which is a good start, but it would be in my opinion important to explicitly acknowledge that change in controllability were confounded with change in uncertainty about upcoming offers.

Thank you for this comment. We revised this part in the discussion.

Line 496: “We did not find evidence that uncertainty or autocorrelation affected the expected influence or self-reported controllability and that reduction in uncertainty might be an inherent feature to controllability (Supplementary file 1g). Still, future experimental designs which dissociate change in uncertainty from change in controllability may better address potentially different effects of controllability and uncertainty on choice behavior and neural responses.”

I would be curious to hear the authors' insight about why participants in the online study (and to some extent in the lab) accepted more often the low offers in the controllable condition. It seems somehow counterintuitive and could mean that participant behaved in a more "automatic" and perseverative way in the controllable condition.

Related to this last point, is it possible that the δ parameter (or expected influence) simply captures a perseverative tendency in rejection/acceptance of offers? This might explain the disconnection between behavior and belief, as well as the positive value of this parameter in the uncontrollable condition, correlated to that of the controllable one. That perseveration increases in the controllable condition would be logical (since that condition allows participants to reach their goal by doing so) and it would therefore still be of interest in the context of this social controllability study. Perhaps the authors could exclude this possibility by running adding a perseveration mechanism to their model, as it is often done in the RL literature?

Thank you for this comment. First, regarding why online participants seemed to accept more often in the low offers in the controllable condition, we would like to point out that certain type of subjects was over-represented in the low offer bin – those who have low expected influence (δ) and thus ended up with more low offers (yet still more likely to accept these low offers). Individuals who had high δ were much less likely – sometimes never – to reach the lowest offers and were thus under-represented in the “low offer” bin in the Controllable condition. In contrast to the controllable condition, everyone experienced the same number of low offers in the Uncontrollable condition. Thus, the over-representation of “low δ” and under-representation of “high δ” subjects is the main reason why low offers are more accepted in the Controllable condition.

Second, both our previous RT analysis and new analysis of shift ratio provide evidence that perseverance is likely to be less, rather than more, prevalent, in the Controllable condition. Habitual behaviors and behavioral perseverance are generally associated with faster time to respond (e.g., see Hardwick et al., "Time-dependent competition between goal-directed and habitual response preparation." Nature human behaviour 3.12 (2019): 1252-1262; Keramati, Dezfouli, and Piray. "Speed/accuracy trade-off between the habitual and the goal-directed processes." PLoS computational biology 7.5 (2011): e1002055). Here, we found that human subjects showed longer, instead of shorter RT, in the Controllable condition (Figure 2d), suggesting that they were likely to engage more deliberation to exploit controllability.

We also conducted new analysis to examine shift ratio (i.e., the number of the trials where the choice was shifted from the previous trial divided by the total number of the trials). We found that shift ratio was higher, rather than lower, for the Controllable condition than the Uncontrollable condition (mean_C = 52.5%, mean_U = 36.2%, t(47) = 6.62, P <.001), and the shift ratio was not correlated between the two conditions (R = .24, P = .10). Together with the RT analysis, these results suggest that subjects were less likely to be habitual in the Controllable condition. We have now added the new shift ratio analysis in Figure 2—figure supplement 4.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary file 1. Supplementary tables.

elife-64983-supp1.docx^{(38.7KB, docx)}

Supplementary file 2. Task instructions.

elife-64983-supp2.docx^{(36.9KB, docx)}

Transparent reporting form

elife-64983-transrepform1.pdf^{(333.6KB, pdf)}

Data Availability Statement

The fMRI and behavioral data and analysis scripts are accessible at https://github.com/SoojungNa/social_controllability_fMRI, (copy archived at swh:1:rev:8ea1fb4fe6cbd625f9a25fe292f82fc953f8c713).

[bib1] Atzil S, Gao W, Fradkin I, Barrett LF. Growing a social brain. Nature Human Behaviour. 2018;2:624–636. doi: 10.1038/s41562-018-0384-6. [DOI] [PubMed] [Google Scholar]

[bib2] Bartra O, McGuire JT, Kable JW. The valuation system: a coordinate-based meta-analysis of BOLD fMRI experiments examining neural correlates of subjective value. NeuroImage. 2013;76:412–427. doi: 10.1016/j.neuroimage.2013.02.063. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] Becker GS. The Economic Approach to Human Behavior. University of Chicago press; 2013. [Google Scholar]

[bib4] Behrens TEJ, Hunt LT, Woolrich MW, Rushworth MFS. Associative learning of social value. Nature. 2008;456:245–249. doi: 10.1038/nature07538. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] Behrens TEJ, Muller TH, Whittington JCR, Mark S, Baram AB, Stachenfeld KL, Kurth-Nelson Z. What is a cognitive map? Organizing knowledge for flexible behavior. Neuron. 2018;100:490–509. doi: 10.1016/j.neuron.2018.10.002. [DOI] [PubMed] [Google Scholar]

[bib6] Bhatt MA, Lohrenz T, Camerer CF, Montague PR. Neural signatures of strategic types in a two-person bargaining game. PNAS. 2010;107:19720–19725. doi: 10.1073/pnas.1009625107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] Boorman ED, Behrens TEJ, Woolrich MW, Rushworth MFS. How green is the grass on the other side? Frontopolar cortex and the evidence in favor of alternative courses of action. Neuron. 2009;62:733–743. doi: 10.1016/j.neuron.2009.05.014. [DOI] [PubMed] [Google Scholar]

[bib8] Brett M, Anton JL, Valabregue R, Poline JB. 8th International Conference on Functional Mapping of the Human Brain. Region of interest analysis using an SPM toolbox; Sendai, Japan. 2002. [Google Scholar]

[bib9] Camerer CF, Ho TH, Chong JK. A cognitive hierarchy model of games. The Quarterly Journal of Economics. 2004;119:861–898. doi: 10.1162/0033553041502225. [DOI] [Google Scholar]

[bib10] Camerer CF. Behavioral Game Theory: Experiments in Strategic Interaction. Princeton University Press; 2011. [Google Scholar]

[bib11] Chung D, Christopoulos GI, King-Casas B, Ball SB, Chiu PH. Social signals of safety and risk confer utility and have asymmetric effects on observers’ choices. Nature Neuroscience. 2015;18:912–916. doi: 10.1038/nn.4022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] Cohen JD, Daw N, Engelhardt B, Hasson U, Li K, Niv Y, Norman KA, Pillow J, Ramadge PJ, Turk-Browne NB, Willke TL. Computational approaches to fMRI analysis. Nature Neuroscience. 2017;20:304–313. doi: 10.1038/nn.4499. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] Constantinescu AO, O’Reilly JX, Behrens TEJ. Organizing conceptual knowledge in humans with a gridlike code. Science. 2016;352:1464–1468. doi: 10.1126/science.aaf0941. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ. Model-based influences on humans’ choices and striatal prediction errors. Neuron. 2011;69:1204–1215. doi: 10.1016/j.neuron.2011.02.027. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] Dolan RJ, Dayan P. Goals and habits in the brain. Neuron. 2013;80:312–325. doi: 10.1016/j.neuron.2013.09.007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] Doll BB, Duncan KD, Simon DA, Shohamy D, Daw ND. Model-based choices involve prospective neural activity. Nature Neuroscience. 2015;18:767–772. doi: 10.1038/nn.3981. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] Dorfman HM, Gershman SJ. Controllability governs the balance between Pavlovian and instrumental action selection. Nature Communications. 2019;10:1–8. doi: 10.1038/s41467-019-13737-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] Draper D. Assessment and propagation of model uncertainty. Journal of the Royal Statistical Society. 1995;57:45–70. doi: 10.1111/j.2517-6161.1995.tb02015.x. [DOI] [Google Scholar]

[bib19] Dunbar RIM, Shultz S. Evolution in the social brain. Science. 2007;317:1344–1347. doi: 10.1126/science.1145463. [DOI] [PubMed] [Google Scholar]

[bib20] Fehr E, Schmidt KM. A theory of fairness, competition, and cooperation. The Quarterly Journal of Economics. 1999;114:817–868. doi: 10.1162/003355399556151. [DOI] [Google Scholar]

[bib21] Fehr E. Human behaviour: don’t lose your reputation. Nature. 2004;432:449–450. doi: 10.1038/432449a. [DOI] [PubMed] [Google Scholar]

[bib22] Feng C, Luo YJ, Krueger F. Neural signatures of fairness-related normative decision making in the ultimatum game: a coordinate-based meta-analysis. Human Brain Mapping. 2015;36:591–602. doi: 10.1002/hbm.22649. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] FitzGerald THB, Seymour B, Dolan RJ. The role of human orbitofrontal cortex in value comparison for incommensurable objects. The Journal of Neuroscience. 2009;29:8388–8395. doi: 10.1523/JNEUROSCI.0717-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] Gläscher J, Daw N, Dayan P, O’Doherty JP. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron. 2010;66:585–595. doi: 10.1016/j.neuron.2010.04.016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] Gmytrasiewicz PJ, Doshi P. A framework for sequential planning in multi-agent settings. Journal of Artificial Intelligence Research. 2005;24:49–79. doi: 10.1613/jair.1579. [DOI] [Google Scholar]

[bib26] Gneezy U, Haruvy E, Roth AE. Bargaining under a deadline: Evidence from the reverse ultimatum game. Games and Economic Behavior. 2003;45:347–368. doi: 10.1016/S0899-8256(03)00151-9. [DOI] [Google Scholar]

[bib27] Gu X, Wang X, Hula A, Wang S, Xu S, Lohrenz TM, Knight RT, Gao Z, Dayan P, Montague PR. Necessary, yet dissociable contributions of the insular and ventromedial prefrontal cortices to norm adaptation: computational and lesion evidence in humans. The Journal of Neuroscience. 2015;35:467–473. doi: 10.1523/JNEUROSCI.2906-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] Guinote A. How Power Affects People: Activating, Wanting, and Goal Seeking. Annual Review of Psychology. 2017;68:353–381. doi: 10.1146/annurev-psych-010416-044153. [DOI] [PubMed] [Google Scholar]

[bib29] Hampton AN, Bossaerts P, O’Doherty JP. Neural correlates of mentalizing-related computations during strategic interactions in humans. PNAS. 2008;105:6741–6746. doi: 10.1073/pnas.0711099105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] Hegarty M. Mechanical reasoning by mental simulation. Trends in Cognitive Sciences. 2004;8:280–285. doi: 10.1016/j.tics.2004.04.001. [DOI] [PubMed] [Google Scholar]

[bib31] Hiser J, Koenigs M. The multifaceted role of the ventromedial prefrontal cortex in emotion, decision making, social cognition, and psychopathology. Biological Psychiatry. 2018;83:638–647. doi: 10.1016/j.biopsych.2017.10.030. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] Hula A, Montague PR, Dayan P, Gershman S. Monte carlo planning method estimates planning horizons during interactive social exchange. PLOS Computational Biology. 2015;11:e1004254. doi: 10.1371/journal.pcbi.1004254. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] Huys QJM, Dayan P. A Bayesian formulation of behavioral control. Cognition. 2009;113:314–328. doi: 10.1016/j.cognition.2009.01.008. [DOI] [PubMed] [Google Scholar]

[bib34] Iigaya K, Hauser TU, Kurth-Nelson Z, O’Doherty JP, Dayan P, Dolan RJ. The value of what’s to come: Neural mechanisms coupling prediction error and the utility of anticipation. Science Advances. 2020;6:eaba3828. doi: 10.1126/sciadv.aba3828. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib35] Kable JW, Glimcher PW. The neural correlates of subjective value during intertemporal choice. Nature Neuroscience. 2007;10:1625–1633. doi: 10.1038/nn2007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] King-Casas B, Tomlin D, Anen C, Camerer CF, Quartz SR, Montague PR. Getting to know you: reputation and trust in a two-person economic exchange. Science. 2005;308:78–83. doi: 10.1126/science.1108062. [DOI] [PubMed] [Google Scholar]

[bib37] Knoch D, Schneider F, Schunk D, Hohmann M, Fehr E. Disrupting the prefrontal cortex diminishes the human ability to build a good reputation. PNAS. 2009;106:20895–20899. doi: 10.1073/pnas.0911619106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib38] Lachman ME, Weaver SL. The sense of control as a moderator of social class differences in health and well-being. Journal of Personality and Social Psychology. 1998;74:763–773. doi: 10.1037//0022-3514.74.3.763. [DOI] [PubMed] [Google Scholar]

[bib39] Lee SW, Shimojo S, O’Doherty JP. Neural computations underlying arbitration between model-based and model-free learning. Neuron. 2014;81:687–699. doi: 10.1016/j.neuron.2013.11.028. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] Leotti LA, Delgado MR. The value of exercising control over monetary gains and losses. Psychological Science. 2014;25:596–604. doi: 10.1177/0956797613514589. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib41] Levy DJ, Glimcher PW. The root of all value: a neural common currency for choice. Current Opinion in Neurobiology. 2012;22:1027–1038. doi: 10.1016/j.conb.2012.06.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib42] Ligneul R. Prediction or Causation? Towards a Redefinition of Task Controllability. Trends in Cognitive Sciences. 2021;25:431–433. doi: 10.1016/j.tics.2021.02.009. [DOI] [PubMed] [Google Scholar]

[bib43] Maier SF, Seligman ME. Learned helplessness: theory and evidence. Journal of Experimental Psychology. 1976;105:3–46. doi: 10.1037/0096-3445.105.1.3. [DOI] [Google Scholar]

[bib44] Maier SF, Watkins LR. Stressor controllability and learned helplessness: the roles of the dorsal raphe nucleus, serotonin, and corticotropin-releasing factor. Neuroscience & Biobehavioral Reviews. 2005;29:829–841. doi: 10.1016/j.neubiorev.2005.03.021. [DOI] [PubMed] [Google Scholar]

[bib45] Maier SF, Seligman MEP. Learned Helplessness at Fifty: Insights from Neuroscience. Psychological Review. 2016;123:349–367. doi: 10.1037/rev0000033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib46] Moran R, Keramati M, Dayan P, Dolan RJ. Retrospective model-based inference guides model-free credit assignment. Nature Communications. 2019;10:750. doi: 10.1038/s41467-019-08662-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib47] Niv Y. Learning task-state representations. Nature Neuroscience. 2019;22:1544–1553. doi: 10.1038/s41593-019-0470-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib48] Overmier JB. terference with avoidance behavior: failure to avoid traumatic shock. Journal of Experimental Psychology. 1968;78:340–343. doi: 10.1037/h0026365. [DOI] [PubMed] [Google Scholar]

[bib49] O’Doherty JP, Hampton A, Kim H. Model-based fMRI and its application to reward learning and decision making. Annals of the New York Academy of Sciences. 2007;1104:35–53. doi: 10.1196/annals.1390.022. [DOI] [PubMed] [Google Scholar]

[bib50] O’keefe J, Nadel L. The Hippocampus as a Cognitive Map. Oxford: Clarendon Press; 1978. [Google Scholar]

[bib51] Pezzulo G, Rigoli F, Chersi F. The mixed instrumental controller: using value of information to combine habitual choice and mental simulation. Frontiers in Psychology. 2013;4:92. doi: 10.3389/fpsyg.2013.00092. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib52] Rutledge RB, Skandali N, Dayan P, Dolan RJ. A computational and neural model of momentary subjective well-being. PNAS. 2014;111:12252–12257. doi: 10.1073/pnas.1407535111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib53] Schuck NW, Cai MB, Wilson RC, Niv Y. Human orbitofrontal cortex represents a cognitive map of state space. Neuron. 2016;91:1402–1412. doi: 10.1016/j.neuron.2016.08.019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib54] Shenhav A, Cohen JD, Botvinick MM. Dorsal anterior cingulate cortex and the value of control. Nature Neuroscience. 2016;19:1286–1291. doi: 10.1038/nn.4384. [DOI] [PubMed] [Google Scholar]

[bib55] Soch J, Allefeld C. MACS - a new SPM toolbox for model assessment, comparison and selection. Journal of Neuroscience Methods. 2018;306:19–31. doi: 10.1016/j.jneumeth.2018.05.017. [DOI] [PubMed] [Google Scholar]

[bib56] Southwick FS, Southwick SM. The Loss of a Sense of Control as a Major Contributor to Physician Burnout: A Neuropsychiatric Pathway to Prevention and RecoveryLoss of Sense of Control as a Major Contributor to Physician BurnoutLoss of Sense of Control as a Major Contributor to Physician Burnout. JAMA Psychiatry. 2018;75:665–666. doi: 10.1001/jamapsychiatry.2018.0566. [DOI] [PubMed] [Google Scholar]

[bib57] Spitzer M, Fischbacher U, Herrnberger B, Grön G, Fehr E. The neural signature of social norm compliance. Neuron. 2007;56:185–196. doi: 10.1016/j.neuron.2007.09.011. [DOI] [PubMed] [Google Scholar]

[bib58] Stensola H, Stensola T, Solstad T, Frøland K, Moser MB, Moser EI. The entorhinal grid map is discretized. Nature. 2012;492:72–78. doi: 10.1038/nature11649. [DOI] [PubMed] [Google Scholar]

[bib59] Sutton RS, Barto AG. Reinforcement Learning: An Introduction. MIT press; 2018. [Google Scholar]

[bib60] Szpunar KK, Spreng RN, Schacter DL. A taxonomy of prospection: introducing an organizational framework for future-oriented cognition. PNAS. 2014;111:18414–18421. doi: 10.1073/pnas.1417144111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib61] Tavares RM, Mendelsohn A, Grossman Y, Williams CH, Shapiro M, Trope Y, Schiller D. A map for social navigation in the human brain. Neuron. 2015;87:231–243. doi: 10.1016/j.neuron.2015.06.011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib62] Venkatraman V, Payne JW, Bettman JR, Luce MF, Huettel SA. Separate neural mechanisms underlie choices and strategic preferences in risky decision making. Neuron. 2009;62:593–602. doi: 10.1016/j.neuron.2009.04.007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib63] Wang JX, Kurth-Nelson Z, Kumaran D, Tirumala D, Soyer H, Leibo JZ, Hassabis D, Botvinick M. Prefrontal cortex as a meta-reinforcement learning system. Nature Neuroscience. 2018;21:860–868. doi: 10.1038/s41593-018-0147-8. [DOI] [PubMed] [Google Scholar]

[bib64] Weiss JM. Effects of coping responses on stress. Journal of Comparative and Physiological Psychology. 1968;65:251–260. doi: 10.1037/h0025562. [DOI] [PubMed] [Google Scholar]

[bib65] Wilson RC, Collins AG. Ten simple rules for the computational modeling of behavioral data. eLife. 2019;8:e49547. doi: 10.7554/eLife.49547. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib66] Xiang T, Ray D, Lohrenz T, Dayan P, Montague PR, Sporns O. Computational phenotyping of two-person interactions reveals differential neural response to depth-of-thought. PLOS Computational Biology. 2012;8:e1002841. doi: 10.1371/journal.pcbi.1002841. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib67] Xiang T, Lohrenz T, Montague PR. Computational substrates of norms and their violations during social exchange. The Journal of Neuroscience. 2013;33:1099–108a. doi: 10.1523/JNEUROSCI.1642-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib68] Zhang L, Gläscher J. A brain network supporting social influences in human decision-making. Science Advances. 2020;6:eabb4159. doi: 10.1126/sciadv.abb4159. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Humans use forward thinking to exploit social controllability

Soojung Na

Dongil Chung

Andreas Hula

Ofer Perl

Jennifer Jung

Matthew Heflin

Sylvia Blackmore

Vincenzo G Fiore

Peter Dayan

Xiaosi Gu

Roles

Abstract

Introduction

Figure 1. Experimental paradigm.

Figure 1—figure supplement 1. Emotion ratings.

Results

Participants distinguished between controllable and uncontrollable environments

Figure 2. Model-agnostic behavioral results.

Figure 2—figure supplement 1. Behavioral results of a non-social controllability task.

Figure 2—figure supplement 2. Rejection rates as a function of offer size.

Figure 2—figure supplement 3. Response time.

Figure 2—figure supplement 4. Shift ratio.

Participants used forward thinking to exploit social controllability

Figure 3. Computational modeling of social controllability.

Figure 3—figure supplement 1. Model recovery analyses.

Figure 3—figure supplement 2. Adaptive norm learning versus static norm models.

Figure 3—figure supplement 3. Parameter recovery.

Figure 3—figure supplement 4. fMRI sample: results without those who had negative deltas.

Figure 3—figure supplement 5. Comparison with model-free (MF) learning.

Table 1. Parameter estimates from the 2-step forward thinking (FT) model.

Comparison with a non-social controllability task

Replication of behavioral and computational findings in a large-scale online study

Figure 4. Replication of the behavioral and computational results in an independent large online sample (n=1342).

Figure 4—figure supplement 1. Model accuracy and parameter recovery for the online sample.

Figure 4—figure supplement 2. Cross-parameter correlations.

Figure 4—figure supplement 3. Online sample: results without those who had negative deltas.

Figure 4—figure supplement 4. Correlations between expected influence and self-reported controllability for each condition and each sample.

The vmPFC computed summed choice values from the 2-step FT model

Figure 5. The ventromedial prefrontal cortex (vmPFC) computes projected summed choice values in exerting social controllability.

Figure 5—figure supplement 1. Neural encoding of value in the vmPFC is associated with behavior-belief disconnect under the Uncontrollable condition.

Figure 5—figure supplement 2. Current and future value signals.

Figure 5—figure supplement 3. GLM comparison at the neural level.

Figure 5—figure supplement 4. Norm prediction error signals.

Figure 5—figure supplement 5. Norm signals.

Discussion

Materials and methods

MRI participants

Online participants

Experimental paradigm: laboratory version

Experimental paradigm: online version

Computational modeling

fMRI data acquisition and pre-processing

fMRI general linear modeling

Acknowledgements

Appendix 1

Task design for online study

Appendix 1—figure 1. Task design for online study’ and caption: Screen #6–11: Practice rounds.

Funding Statement

Contributor Information

Funding Information

Additional information

Competing interests

Author contributions

Ethics

Additional files

Data availability

References

Decision letter

Roles

Author response

Author response table 1.

Author response table 2.

Author response table 3.

Author response image 1.

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS