Abstract
Precisely neuromodulating deep brain regions could bring transformative advancements in both neuroscience and treatment. We demonstrate that non-invasive transcranial ultrasound stimulation (TUS) can selectively modulate deep brain activity and affect learning and decision making, comparable to deep brain stimulation (DBS). We tested whether TUS could causally influence neural and behavioural responses by targeting the nucleus accumbens (NAcc) using a reinforcement learning task. Twenty-six healthy adults completed a within-subject TUS–fMRI experiment with three conditions: TUS to the NAcc, dorsal anterior cingulate cortex (dACC), or Sham. After TUS, participants performed a probabilistic learning task during fMRI. TUS-NAcc altered BOLD responses to reward expectation in the NAcc and surrounding areas. It also affected reward-related behaviours, including win–stay strategy use, learning rate following rewards, learning curves, and repetition rates of rewarded choices. DBS-NAcc perturbed the same features, confirming target engagement. These findings establish TUS as a viable approach for non-invasive deep-brain neuromodulation.
Subject terms: Decision, Neurological disorders, Learning algorithms
This study shows that non-invasive ultrasound to the human nucleus accumbens can modulate deep brain activity and enhance reward-guided learning, offering a potential alternative to invasive neuromodulation therapies.
Introduction
Brain-related health conditions affect one in four individuals worldwide1. Precise neuromodulation holds the potential to complement or even surpass current treatments by offering a targeted, nonpharmacological approach that directly modulates neural activity, providing personalised therapeutic treatment2,3. Harnessing ultrasound waves, typically employed in diagnostics, allows for the precise targeting of specific brain regions4. The method, called transcranial ultrasound stimulation (TUS), focuses ultrasound beams through the skull onto the brain, safely altering neural activity noninvasively5–8. TUS can reach deep brain regions with millimetric resolution and modulate discrete cell types within specific regions without affecting overlying brain areas9–11.
Repetitive TUS produces neural changes that outlast the stimulation period itself. It has, therefore, been possible to design ‘offline’ TUS protocols with effects lasting hours and resembling early phase neuroplasticity and outlasting concurrent peripheral confounds8,12. Recent proof-of-concept studies using repetitive TUS at 10 Hz in non-human primates show changes in functional connectivity when TUS is targeted at cortical and subcortical regions13. These studies have been supplemented by others demonstrating that TUS also induces changes in cognition and behaviour14–19. In parallel, when careful precautions are taken to limit the transmission loss caused by the skull20,21 recent studies in humans have demonstrated that TUS can effectively and precisely modulate activity and neurochemistry in deep parts of the cortex while participants are at rest22. Yet it remains to be determined whether applying TUS to deep subcortical structures during active cognitive engagement, such as during decision making and learning, can yield localised and specific changes in behaviour and neural function. If this is possible, not only does it open new avenues for testing causal hypotheses regarding human brain function, but it also raises the possibility of refined therapeutic applications.
Although studies in non-human primates have laid important groundwork by demonstrating that TUS can modulate behaviour through subcortical stimulation16,23,24, translating these findings to humans presents non-trivial challenges. Structural differences in skull geometry and density affect ultrasound propagation and targeting precision, complicating direct comparisons. Moreover, human studies have so far operated under far more conservative acoustic energy limits due to regulatory constraints, most notably those set by the FDA for diagnostic ultrasound prior to the ITRUSST guidelines25,26. These constraints, which apply to energy deposition even before the ultrasound beam reaches the skull, have led to significantly lower dosing in human TUS protocols relative to those used in animal studies. As a result, demonstrating behavioural effects in humans under these conditions is not only technically challenging but also critically important for establishing translational relevance.
Here we focus on the nucleus accumbens (NAcc) in the ventral striatum, a deep subcortical region implicated in reward-guided learning in humans and other animals27. It is a key target of amygdala and mesolimbic dopaminergic projections28,29. Such inputs allow NAcc to guide behaviour based on reward outcomes expected after choices are made, and to facilitate learning and adaptation by signalling disparities between the predicted rewards and actual rewards. Midbrain dopamine neurons projecting onto NAcc carry both anticipated reward and prediction error signals, which can be described with classical reinforcement learning models29–33. In humans, functional magnetic resonance imaging (fMRI) studies align with this hypothesis, with evidence for reinforcement learning-based prediction error activity in midbrain33 and NAcc34–41.
Building on our own and others’ previous work in humans and macaques, we sought to test whether TUS, using a system specifically designed to deliver stimulation deep in the human brain, could manipulate reward-related activity in NAcc. We then examined whether neural changes were associated with a change at a behavioural level. We were, therefore, guided by studies that have carried out causal manipulations of NAcc in macaques42–47, emphasising the importance of both reward-guided, as opposed to loss-guided, aspects of behaviour, and stochastic rather than deterministic reward schedules for investigating NAcc. In aggregate, these studies emphasise the importance of stochastic rather than deterministic reward schedules for investigating the NAcc and that NAcc interventions in primates specifically affect reward-guided, as opposed to loss-guided, aspects of behaviour. We therefore used a related task in the current investigation and took special care to assess the degree to which behavioural changes were specific to reward-guided aspects of learning and decision making.
To establish the specificity of any effects found after NAcc stimulation, we included the dorsal anterior cingulate cortex (dACC) as an active control site. The dACC and NAcc are sometimes co-active and aspects of their roles in cognition are related, but crucially, there are also important differences14,48–50 and if any impact of the dACC intervention occurred, it was expected to manifest differently. Including the dACC, therefore, allowed us to test whether any observed behavioural or neural effects were specific to NAcc stimulation, as opposed to a more general response to TUS.
Whilst previous work has now demonstrated behavioural and neurophysiological modulation after TUS, it remains a novel stimulation strategy, particularly in humans. This is in contrast to electrical stimulation using deep brain stimulation (DBS) electrodes, which has become the standard of care for a number of neurological conditions, whilst also being used to treat psychiatric conditions and chronic pain51. The widespread clinical application of DBS has allowed for concurrent examination of micro, meso and macroscale circuit alterations that are associated with symptom response and behavioural modulation52. Given this knowledge of DBS, we therefore include DBS of the NAcc as a control stimulation method to probe possible mechanisms of action of TUS by directly comparing TUS effects with DBS effects.
In the current study, we took care to employ an especially cautious set of TUS parameters when testing human participants. We were therefore open to the possibility that changes in one of these parameters might lead to a facilitative or a disruptive effect, even while retaining the expectation that any effects would be specific to reward-guided learning and decision making. Our hypothesis was that non-invasive TUS could be used to causally modulate behaviour by selectively targeting the NAcc in humans. We predicted that stimulating the NAcc would specifically alter reward-guided learning, consistent with its role in processing reward prediction errors. In contrast, we did not expect similar behavioural effects following stimulation of the dACC, based on previous investigations of dACC stimulation in non-human primates and the activity patterns that have been reported in this region in both humans and non-human primates14,48–50. Our previous findings using magnetic resonance spectroscopy also showed minimal changes in GABA concentrations following TUS to the dACC, possibly due to the small focal area stimulated within a large and functionally heterogeneous Brodmann area. This distinction between regions enabled a stringent test of neural specificity.
By stimulating the NAcc during a probabilistic reversal learning task, known to robustly activate this region, we sought to demonstrate that TUS can drive specific, behaviourally meaningful modulation in a deep subcortical area in humans. This provides an important tool for causal manipulation in human neuroscience, enabling a better understanding of the functional roles of circumscribed brain regions. Such evidence would also represent a critical step toward establishing TUS as a targeted, nonpharmacological intervention for neuropsychiatric conditions.
Results
Transcranial ultrasound stimulation of the nucleus accumbens in humans
26 healthy adults participated in a within-subject repeated measures design with four visits, including an initial screening and MRI acquisition for planning TUS (session 1). Each of the subsequent three visits included TUS targeted at NAcc (TUS-NAcc), dACC (TUS-dACC), or a no sonication (Sham) condition. Visits were spaced at least one week apart and administered in a counterbalanced order across participants (sessions 2–4; Fig. 1a). During each of these sessions, participants first received TUS (or Sham) and were then placed in an MRI scanner and engaged in a probabilistic reversal learning task. Recordings of neural activity began ~10 min after the end of TUS application, when any potential auditory effects of the stimulation had dissipated.
Fig. 1. TUS intervention.
a 26 participants attended four sessions. During the first, participants practiced a short version of the task and underwent anatomical MRI scans, after being assigned a randomised TUS condition order for the subsequent three TUS-MRI sessions (sessions 2–4). During these, participants underwent TUS targeted at the region determined by the condition order. This was followed by ~1 h of MRI during which they played the probabilistic learning task. At the start of each block, participants saw three symbols that would appear in that block. On each trial, participants saw two of the three symbols and were required to indicate the symbol that was most likely to lead to a reward. After a delay, they were presented with either a tick (reward), or a cross (no reward). Participants were presented with 320 trials divided into four task blocks. Every 25 trials, a reversal happened, forcing participants to relearn cue–reward contingencies. At the end of each session, participants filled in a questionnaire about any symptoms related to TUS. This was repeated at least 24 h after each session. b Target trajectories for neuronavigation and TUS: 80 s of 5-Hz repetitive TUS (pulse duration (PD) = 20 ms and pulse repetition interval (PRI) = 200 ms). c Visualisation of the positioning of the transducer on the head and of the bone conductive headphones. d Averaged pressure maps across all participants restricted to each focal volume (as defined by −6 dB). The top row represents the average pressure map (FSLeyes NIH fire colour) across all individuals for the dACC while the bottom row represents the average pressure map for the NAcc (n = 26). e Example of individual acoustic simulations for both brain targets. f Transcranial mechanical index at the maximum peak in the head, usually in the skull (left panel), ISPPA at the intended brain target coordinate (middle panel), and occurrence of side effects after TUS–fMRI of the two TUS targets (right panel) (n = 26). Box plots show the mean and the standard error (bounds of the box). Data from each individual participant are presented as small black circles. Source data are provided as a Source Data file.
Four task blocks were presented on average ~15, 28, 35, and 48 min after TUS (Fig. 1a). Through reward feedback, participants learned to choose the symbol with the highest probability of reward14,34,35. We employed the same task in previous work in humans34,35 and macaques14,15. After 25 trials, the high reward probability was assigned to a different symbol, prompting participants to adapt (i.e. a ‘reversal’ in reward contingencies occurred). Because the choice–reward associations were stochastic as opposed to deterministic, the task was similar to one previously found to be sensitive to NAcc intervention in macaques47.
Figure 1 summarises the TUS approach, personalised planning, and results of acoustic simulation of TUS (Supplementary Tables 1 and 2). Targeting the left NAcc and dACC was determined based on Montreal Neurological Institute (MNI) coordinates, individually adjusted using T1-weighted MRI scans during target and transducer placement planning and acoustic simulation (Fig. 1b–d). Repetitive TUS comprised a 5 Hz-patterned protocol with a 10% duty cycle applied for 80 s. We planned the TUS intervention on each individual participant to maximise target engagement and minimise losses through the skull while ensuring safety prior to conducting the three TUS sessions. This was achieved by positioning the transducer on each individual skull model while assessing TUS trajectory and in situ intensity and pressure. Importantly, we used a bespoke NeuroFUS system with a deep steering range (27.3 and 82.6 mm) making it possible to reach human NAcc (‘Methods’). Although this was planned using a T1weighted-informed personalised pseudo-computerize tomography (CT)21, other methods may achieve higher precision, for example using an ultrashort echo time image53 or CT scans.
The spatial peak pulse average intensity (ISPPA) in water was maintained at 35 W/cm2 across participants22,54. In the sham TUS, no stimulation was delivered. Instead, participants heard a sound mimicking the TUS protocol sound, played via bone conduction headphones (Fig. 1c). Post-study verbal prompts revealed no discernible differences in perception between sessions. Figure 1d displays averaged pressure maps across individuals for each target region while Fig. 1e presents a single participant. The acoustic simulation parameters and output for all study participants can be found in Supplementary Tables 1 and 2. Both NAcc and dACC were associated with similar maximum transcranial mechanical indices (MItc), ISPPA at focus (int eh region of the target), and occurrence of side effects (Fig. 1f; ‘Methods’). We took care to remain within guidelines for human ultrasound exposure as defined by ITRUSST26. The results of our acoustic simulations predominantly indicated a maximum skull temperature rise below 2 °C. In cases where this threshold was exceeded, we calculated the Cumulative Equivalent Minutes at 43 °C (CEM43), a metric reflecting both duration and intensity of heating relative to 43 °C, the critical threshold for thermal cell damage. We ensured CEM43 values remained well below 0.2526. In our study, CEM43 was always below 0.1.
TUS-NAcc induces specific changes in reward-related behaviours
During the task, participants’ responses were probabilistic and aligned with the principles of a reinforcement learning mechanism. Model comparison revealed that a two learning rates model (2LRs)—for learning from rewarded and non-rewarded outcomes—fitted behaviours better than a simple Rescorla–Wagner (RW) model (BICRW = 385.78, BIC2LRs = 293.48).
One of our main goals was to determine whether TUS-NAcc induces specific changes in reward-related behaviours in the hour following TUS. Given the importance ascribed to NAcc in reward-guided behaviour in macaque lesion studies43–47, we focused on win–stay behaviours and ran a regression analysis across blocks, participants, and conditions (‘Methods’, Analysis A; Supplementary Table 4). This revealed a main effect of condition (ANOVA: F2,75 = 3.3, p = 0.042). Post-hoc t-tests revealed a stronger relationship between reward and subsequent win–stay behaviour after TUS-NAcc compared to Sham (t25 = −2.75, p = 0.011 Bonferroni corrected, Cohen D = 0.53, full results in Supplementary Table 3) and compared to TUS-dACC (t25 = −3.52, p = 0.001 Bonferroni corrected, Cohen D = 0.69). There was no significant difference between TUS-dACC and Sham (t25 = −0.57, p = 0.573, Cohen D = 0.11; Fig. 2a).
Fig. 2. Behavioural and model results.
a Win–stay analyses revealed an increase in the relationship between reward and subsequent win–stay behaviours after TUS-NAcc. Each dot represents an individual participant’s mean beta estimate for the specified condition (n = 26). b Left panel: Time course of TUS-NAcc effects on reward-related behaviour, showing the difference between TUS-NAcc and Sham conditions for each of the four post-TUS testing blocks. The TUS-NAcc-induced reward-related changes were most prominent in the middle of the approximately 1-h post-TUS period. This effect was not observed in the right panel, which shows the comparison between TUS-dACC and Sham; no significant time window emerged in that contrast (n = 26). c Learning curves for the high-probability option across all reversal blocks. A 5-trial running average was applied to smooth trial-by-trial variability, which results in a slight shift in the apparent reversal point—appearing earlier than the actual reversal at trial 24 (trial 25 marks the start of the new reversal period). The shaded area represents the standard error of the mean. d Same as (c) but showing only the first reversal block. The three conditions (TUS-NAcc, TUS-dACC, and Sham) are presented stacked and with transparency to allow for direct visual comparison. e Higher learning rates after reward feedback in the TUS-NAcc condition. Each dot represents an individual participant’s estimated learning rate for the corresponding condition (n = 26). f We did not observe any change in learning rates after non-reward feedback after TUS-NAcc (n = 26). g Increase in rate of repetition of low probability options after reward showing maladaptive behaviour after TUS-NAcc condition compared to both TUS-dACC and Sham conditions (n = 26). n.s. non-significant. p < 0.1; *p < 0.05; **p < 0.01; ***p < 0.001; # significant windows. Exact p values are presented in the main text and in the supplementary material. Source data are provided as a Source Data file. a, e, f, g Statistical significance was determined using One-way ANOVA and two-sided t-tests. b Single two-sided t-tests were employed for each window. No multiple comparisons were applied.
To test the temporal dynamics of TUS effects post-sonication, we examined the difference between TUS-NAcc and Sham for each of the four testing blocks collected approximately 15-, 28-, 35-, and 48-min post-TUS (averaged across participants and conditions). Similar TUS-NAcc effects were apparent at all times and the differences compared to Sham were even statistically significant at the block level at 28- and 35-min post-TUS. There was, however, no statistically significant difference between dACC and Sham at any time point (Fig. 2b; post-hoc tests in Supplementary Table 3).
Having found an increase in win–stay behaviours, we then looked at the learning curves across trials during a reversal period in the blocks identified in the previous analysis as demonstrating especially clear significant effects. We computed the rate of choice of the high probability option and found that participants were more likely to select the high probability option at the end of a reversal period after TUS-NAcc, compared to TUS-dACC and Sham (mixed-effect model: t5924 = −2.74, p = 0.006; Fig. 2c; Supplementary Table 4). This was particularly true for the first reversal (t1478 = −3.97, p = 0.0007; Fig. 2d; Supplementary Table 4).
To further characterise the behavioural effects of TUS, we conducted an additional analysis of trials involving two low-probability options. This revealed increased switching after TUS-dACC, consistent with disrupted counterfactual evaluation14 (see Supplementary Fig. 1 and Supplementary Results). We also examined post-learning accuracy, perseveration errors, and post-error adjustments. These exploratory analyses did not reveal statistically significant differences between conditions. Specifically, post-learning accuracy showed no significant main effect of TUS (mixed-effects model [PostLearning_Acc ~1 + TUS_session + (1 | sub)]: t₇₅ = 1.27, p = 0.209), though we observed a trend toward an effect in the NAcc condition (post-hoc t-test, NAcc vs. Sham: p = 0.07). Perseveration errors also did not differ significantly between TUS conditions (mixed-effects model [Perf_Err ~1 + TUS_session + (1 | sub)]: t₇₅ = 0.363, p = 0.717), nor did post-error adjustment (mixed-effects model [Post_Err ~1 + TUS_session + (1 | sub)]: t₇₅ = 0.165, p = 0.869). These analyses are reported for completeness and to guide future investigations.
To look at the influence of reward history on learning, we fitted reinforcement learning models to the behavioural data and repeated all previous analyses with our best fitting reinforcement learning model estimates (Supplementary Fig. 2; Supplementary Tables 5–7).
Having established that TUS-NAcc impacted choices associated with the high probability option, we tested whether, overall, participants exhibited higher learning rates after TUS-NAcc estimated from the reinforcement learning model. This is indeed what we found. Learning rates linked with positive feedback were higher after TUS-NAcc compared to TUS-dACC and Sham (ANOVA: F2,75 = 5.68, p = 0.005; post hoc t-tests: TUS-dACC vs. TUS-NAcc: t25 = −2.46, p = 0.021; Cohen D = 0.48, TUS-NAcc vs. Sham: t25 = −3, p = 0.006; Cohen D = 0.59; TUS-dACC vs. Sham: t25 = 0.96, p = 0.34, Fig. 2e, Supplementary Table 7). Learning rates associated with non-reward feedback were not different across conditions (Fig. 2f). Although ventral striatum (including NAcc) activity reflects both aversive stimuli as well as rewarding stimuli, interventions, such as lesions, of ventral striatum in macaques, only affect reward-based learning44. Ventral striatal activity is present in deterministic and stochastic learning situations, but, again, as in the present study, only the latter are affected by ventral striatal lesions43.
In general, win–stay strategies are adaptive because, by definition, they lead to the repetition of previously successful choices. Such a strategy may not be adaptive, however, after reward is experienced for choosing an option with a low average rate of reward. We, therefore, also investigated the rate of win–stay strategy when participants made choices between two options with low average reward rates. We found a main effect of condition (ANOVA: F2,75 = 3.75, p = 0.02) on the rate of maladaptive choices. Participants did not repeat the same low probability choice on the subsequent trial when a high probability symbol was on the screen in the Sham (t25 = −0.3, p = 0.76) or TUS-dACC condition (t25 = 0.85, p = 0.39), but they did after TUS-NAcc (t25 = 3.24, p = 0.003; Fig. 2g, Supplementary Table 8).
Task-related TUS change in brain activity
Our behavioural analyses revealed an overall increase in reward sensitivity, which we explored in our main fMRI analysis by investigating the parametric blood oxygen level dependent (BOLD) responses to both reward expectation and delivery. Our main hypothesis was that there would be a task-related change in BOLD in the region targeted with TUS, which we first investigated with a region of interest analysis (ROI; ‘Methods’ and Supplementary Fig. 3). This revealed a clear increase in the parametric response to reward expectation in the NAcc in the TUS-NAcc condition compared to Sham and TUS-dACC condition (ANOVA: F2,75 = 7.15, p = 0.001; post hoc t-tests: dACC-NAcc: t25 = −3.92, p = 0.0006; Cohen D = 0.77; NAcc-sham: t25 = −2.38, p = 0.024; Cohen D = 0.47; dACC-sham: t25 = −1.34, p = 0.19; Cohen D = 0.26. Fig. 3a, Supplementary Table 9). A similar analysis revealed an increased response to reward delivery in dACC across conditions (ANOVA: F2,75 = 3.12, p = 0.049), however, the t-tests revealed that the comparison between TUS-dACC and Sham was not significant (t25 = −1.47, p = 0.15; Fig. 3b), although the difference between TUS-dACC and TUS-NAcc was (t25 = −2.90, p = 0.007).
Fig. 3. ROI-based analysis and whole-brain differences.
a Enhancement of the parametric BOLD representation of reward expectation in the NAcc after TUS-NAcc compared to Sham and TUS-dACC (n = 26). b Although a main effect was observed in the dACC ROI for an increase in the BOLD representation of reward delivery after TUS-dACC, this effect was not significantly different from that seen in the Sham condition in the post-hoc test. TUS-dACC compared to TUS-NAcc was, however, significant (n = 26). c The same analysis at the whole-brain level revealed that TUS-NAcc not only increased the NAcc signalling of reward anticipation in the ROI but also in the adjacent striatum. Evidence of an even more distributed network of areas in which reward signalling was more prominent became apparent when TUS-NAcc was compared to TUS-dACC. However, consistent with (b), reward expectation-related activity remained prominent in the NAcc and adjacent striatum. d Similarly, when contrasting the whole-brain maps between TUS-dACC to TUS-NAcc and Sham, we found strong evidence for BOLD changes in the dACC and medial frontal regions adjacent to the sonication site. This was not the case when contrasting TUS-NAcc to sham. FDR-corrected Z values of the differences between conditions are shown in red/orange and the averaged sonication site for dACC and NAcc are shown in green/blue. n.s. non-significant. p < 0.1; *p < 0.05; **p < 0.01; ***p < 0.001; # significant windows. Exact p values are available in the main text as well as in the supplementary material. Source data are provided as a Source Data file. For a, b statistical significance was determined using One-way ANOVA and two-sided t-test. No multiple comparisons were applied.
This ROI analysis was followed by a whole brain analysis, which allowed us to investigate the full extent of the neural activity difference between TUS conditions. This analysis confirmed increased reward expectation-related activity after TUS-NAcc compared to Sham, not only in NAcc itself, when it was the sonicated region, but also in the adjacent striatum, thalamus, amygdala, precuneus, and PCC (Z > 2.3; FDR corrected; Fig. 3c, left panel). The analysis also revealed increased reward expectation-related activity after TUS-NAcc compared to TUS-dACC, not just in NAcc but also in vmPFC, lOFC, insula, thalamus, putamen, precuneus, and PCC (Z > 2.3; FDR-corrected; Fig. 3c, middle panel). In dACC, however, there was no statistically significant difference for reward expectation between TUS-dACC and Sham conditions (Fig. 3c, right panel).
However, a whole-brain repeated measures ANOVA did identify an increased response to reward delivery after TUS-dACC compared to Sham in a distributed network adjacent to the sonicated region, in the adjacent dACC and medial PFC (Z > 2.3; FDR-corrected; Fig. 3d, right panel). Although the ROI analysis had not revealed any reward-related activity differences precisely at the TUS-dACC target, the immediately adjacent dACC and medial prefrontal cortex did exhibit changes when the whole brain analysis was performed. The whole brain approach also revealed a stronger response to reward delivery after TUS-dACC compared to TUS-NAcc in a distributed network surrounding the dACC TUS focus (Z > 2.3; FDR-corrected; Fig. 3d, middle panel; compare dACC and NAcc bars in 3b). There was no statistically significant difference between reward delivery-related activity after TUS-NAcc and Sham (Fig. 3d, left panel).
On the day of stimulation and the following day, participants were prompted to report any adverse events by filling out a 4-point rating scale questionnaire (‘Methods’)55. We found no difference between TUS and Sham conditions on the day of, and the day after, the sessions (Supplementary Fig. 4). However, upon qualitative inspection, it became evident that participants experienced various MRI-related issues, including sleepiness, neck pain, headache and blurry vision, which were attributed to the prolonged period of lying down and the lighting conditions inside the MR scanner.
DBS-NAcc impacts the same reward-related behavioural indices as TUS-NAcc
So far, we have shown a correspondence between the effects of TUS-NAcc in humans and NAcc lesions in macaques43,44; both affect positive outcome-related behaviour on a probabilistic reward learning task. In a final experiment, we confirmed that direct high-frequency electrical DBS of bilateral human NAcc (Fig. 4a) also affected the same positive outcome-related behaviours in the probabilistic learning task. Three DBS-NAcc implanted patients, in a double-blinded, counterbalanced fashion, performed the probabilistic reversal learning task around 10 min after DBS was turned ON or OFF. The results show that DBS patients were more likely to select the high probability option when DBS was turned OFF than ON (Fig. 4b; mixed-effect model: t452 = −5.65, p = 2.7e-08), associated with a significant reduction in reward sensitivity (Fig. 4c; mixed-effect model: t4 = −5.06 p = 0.007), which seemed to happen immediately after condition onset, suggesting a blunting of striatal response to reward sensitivity aligned with previous DBS-NAcc research56.
Fig. 4. DBS-NAcc investigation.
a Reconstruction of the DBS electrodes implanted in the bilateral NAcc to treat anorexia on a 3D coronal view of an MNI template brain with the NAcc core and shell presented in light blue and dark blue, respectively b Learning curves associated with the high probability option are presented across the whole session (i.e. four reversal periods). Each participant (n = 3) had a total of four sessions during ON and OFF DBS (so a total of 16 reversals for each condition). A 5-trial running average was applied to smooth trial-by-trial variability, which results in a slight shift in the apparent reversal point—appearing earlier than the actual reversal at trial 24 (trial 25 marks the start of the new reversal period). The shaded area represents the standard error of the mean. The learning curve when the DBS is OFF is presented on the left, when DBS is ON, in the middle, and in healthy participants (from the Sham-TUS group), on the right. c Win–stay analyses revealed a decrease in the relationship between reward and subsequent win–stay behaviours after DBS-NAcc. Each dot represents an individual participant’s mean beta estimate for a block and the specified condition (n = 3, four blocks each). The shaded area represents the standard deviation around the mean. The Sham-TUS is also shown for comparison (n = 26, healthy group). Source data are provided as a Source Data file.
Again, as with after human TUS-NAcc and macaque NAcc lesions43,44, DBS-NAcc altered reward outcome-related behaviours. The same reward outcome-related behavioural indices were affected by both TUS-NAcc and DBS-NAcc; however, it is important to note that the direction of change for each index was opposite for TUS-NAcc and DBS-NAcc. It was also noticeable that the baseline reward sensitivity levels of the patients in the DBS-OFF state, all of whom had been treated with DBS for anorexia nervosa, (Fig. 4b, c) were higher (Mann–Whitney test between DBS-OFF and healthy-TUS-Sham; U-stat = 67; p = 0.045) than those of healthy participants in SHAM. In fact, DBS-ON seemed to lower patient reward sensitivity to the healthy level (Mann–Whitney test between DBS OFF and healthy TUS-Sham; U-stat = 58; p = 0.2).
Discussion
This study aimed to capitalise on the high spatial resolution of TUS and its capacity to reach deep regions in the brain to target the NAcc in humans in the context of probabilistic reversal learning. A total of 26 healthy participants were enroled in a within-subject repeated measures design experiment involving repetitive TUS and subsequent fMRI. After the application of 5 Hz patterned TUS for 80 s in a counterbalanced fashion to the NAcc (TUS-NAcc), the dACC (TUS-dACC), or no sonication (Sham), participants performed a probabilistic reversal learning task in the MRI scanner, which started on average 15 min post sonication, when any potential auditory or somatosensory effects of stimulation were dissipated. We used both direct measures of reward sensitivity and model-based estimates of the expected value associated with each potential choice stimulus. The models were also used to examine prediction errors when participants received feedback to indicate if the choice was rewarded or unrewarded in analyses of both behaviour and neural activity. We then compared these results with those of electrical DBS to the NAcc (DBS-NAcc) in a rare cohort of patients with electrodes in the NAcc.
With careful individualised TUS planning, using an estimate of each participant’s skull image to achieve an optimised trajectory, it was possible to show that TUS-NAcc has neural effects that are most prominent in the region stimulated and that they are associated with changes in indices of behaviour that are similar to those emphasised in previous NAcc lesion studies42–45 but also in DBS-NAcc as observed in this study. Indeed, we found significant alterations in reward-related behaviours, including alterations in the tendency to adopt a win–stay strategy, and a changed learning curve for the rewarding option.
While TUS offers high spatial precision, its effects can extend beyond the immediate target due to both anatomical proximity and the broader functional connectivity of the stimulated region. Although the protocol was carefully optimised for focal delivery, and behavioural effects were limited to one stimulation site, whole-brain analyses revealed more distributed neural activity when directly comparing the two active TUS conditions. Importantly, these effects appeared more spatially confined when each condition was compared to Sham, suggesting that the broader patterns reflect differential engagement of distinct networks rather than nonspecific or global activation.
This distinction is critical for interpreting the specificity of TUS effects. Localised behavioural outcomes, when paired with distributed neural changes, point toward functionally specific modulation within a larger interacting system. Indeed, evidence from the DBS literature suggests that stimulation of a single target can modulate activity of distinct neural networks in opposing directions, with both spatially57 and temporally distinct dynamics58,59. Furthermore, small variations in electrode position can account for widespread differences in network engagement across a variety of neuroanatomical targets60–63. Therefore, while the spatial targeting of TUS remains a key strength, its influence should be interpreted not only in anatomical terms but also through the lens of circuit-level dynamics. Future work combining TUS with high-temporal-resolution methods or connectivity analyses could further elucidate the causal relationships between focal stimulation and distributed neural responses.
Importantly, however, while the impact of TUS to any brain region is likely to be mediated through the connectional network that area has with the rest of the brain, it is equally important to remember that the connectional network of each area is unique64. This means that while two areas, A and B, might share connections with one another, and the effect of stimulation to either area might partly be mediated by a change in activity induced in the other area, it is also the case that areas A and B will always each have distinct aspects to their connectional networks; some aspects of area A’s and area B’s connectional networks will be distinct from one another. This point was underlined in the current study when the effects of TUS to a distinct brain region, dACC, were studied. Although there are some similarities in the activity patterns found in dACC and NAcc, as well as in other areas that project to both areas, such as the dopaminergic midbrain, there are also important differences49. When TUS was applied to dACC in the current study, it did not change the aspects of reversal task performance that were affected by NAcc TUS even though previous studies examining the effect of lesions, microstimulation, or TUS to dACC and adjacent perigenual anterior cingulate cortex have demonstrated other alterations in behaviour such as changes in the ability to track the value of counterfactual choices—choices that were not taken on the current trial but to which the animal might switch in the future14—and in learning from stochastic rewards and errors when an extended history of outcomes65 or uncertainty48 must be taken into account, and changes in cost-benefit integration and motivation for task engagement23,50,66–68 even though some of these effects are partly mediated via striatal regions adjacent to, and linked to, the NAcc67,68.
Regarding the polarity difference between the TUS and DBS results, perhaps the most likely explanation is that the specific DBS and TUS parameters employed had opposing physiological effects. The specific TUS parameters used here are thought to be excitatory22,69. Conversely, though a simplification, high-frequency DBS is thought of as functionally inhibitory70. The behavioural effects we observed here are in keeping with the historical development of DBS for movement disorders, where high-frequency DBS was observed to have the same clinical effect as lesions in non-human primates71,72. Indeed, whilst high frequency subthalamic DBS is known to improve symptoms of Parkinson’s disease with an associated reduction of pathological beta power73, low frequency DBS to the same region increases beta power74 with a correlated worsening of symptoms75. Similarly, low-frequency TUS has recently been reported to increase beta power in the same network76. Nonetheless, the simplicity and directness of translation from excitatory and inhibitory physiological changes to behavioural facilitation and impairment is unclear. Other factors might also be at play, namely the difference in baseline reward sensitivity seen between the healthy participants and the DBS cohort, all of whom had anorexia nervosa. It is known that reward sensitivity can be heightened in eating disorders77,78 and particularly anorexia nervosa79 compared to healthy controls, and that the ability of an individual to learn from rewarding stimuli is reduced after DBS-NAcc80. Therefore, the DBS-NAcc results reported here are in keeping with those previously described in the literature. It may be possible to identify TUS parameters that determine whether TUS exerts enhancing or disruptive effects. However, not only might the effects depend on many features of the TUS (intensity, frequency, and patterning of stimulation), but they might also depend on the anatomical structure of the brain region investigated and the baseline behavioural tendencies of participants. Given the great interest in the possibility of TUS-based therapies2, such factors merit careful consideration before TUS is employed in patients whose conditions may include a range of changes in reward sensitivity, such as anorexia, substance use, bipolar disorder, or depression.
One limitation of the current study is the use of a constant free-field intensity for all participants, which may not effectively induce significant biological effects for some participants. Future research could focus on developing individualised neuromodulation protocols to maintain a constant in situ intensity for each participant. Additionally, customising TUS parameters based on an individual’s baseline reward sensitivity could enhance efficacy. Such personalised approaches are important for both research and therapeutic applications. As the field advances and safety data accumulates, using intensities closer to those in animal models may also yield stronger TUS effects on cognition and brain activity. Finally, another limitation of the current study is the use of unilateral stimulation, which, although guided by current safety considerations for novel deep brain targets, does not allow us to fully assess potential lateralized effects in behaviour or neural response.
This study provides evidence that TUS provides a minimally invasive method that can, in humans, manipulate activity in a deep subcortical region to induce early phase neuroplasticity8. Many deep subcortical regions and subdivisions play crucial and specific roles in regulating fundamental behaviours and cognitive functions81. These regions are difficult to access using traditional neuromodulation techniques, and testing their causal roles in cognition in humans remains largely unexplored. Therefore, this study opens a potentially new and large space in which to examine hypotheses about human brain activity and its relationship with behaviours, with important lessons for future studies in neuropsychiatric conditions.
Methods
Participants
Twenty-six healthy volunteers (14 female; sex and gender aligned by self-report, with no participants identifying as non-binary) aged between 20 and 65 years (mean = 36.3, s.d. ± 13.3) participated in the study. Sex and gender information was collected based on self-report, and consent was obtained for reporting and sharing individual-level data. Participants were screened for contraindications to TUS and MRI22, had no current diagnosis of neurological or psychiatric disorders, and were free of psychoactive medications at the time of the study. Written informed consent was obtained from all participants after experimental procedures were explained in full. The study was approved by the University of Plymouth Faculty of Health Staff Research Ethics and Integrity Committee (reference ID: 2487; date: 13/12/2021). Participants received £110 in total, including a performance bonus of £10 for completing all study sessions. Travel expenses were also reimbursed up to £10 per session. All healthy volunteer study sessions took place at the Brain Research & Imaging Centre in Plymouth, United Kingdom.
Three participants (3 females) aged between 31 and 61 (mean = 42, s.d. ± 13.5) treated with DBS of the NAcc for severe and enduring anorexia nervosa participated in the study (Supplementary table 10). Participants were screened for contraindications to turning off their DBS device for the duration of the experimental session, and written informed consent was obtained by the University of Oxford (reference ID: 12209; date: 05/03/2020). Participants were reimbursed for research-related travel expenses.
All 26 participants took part in all three TUS conditions (TUS-NAcc, TUS-dACC, and Sham) in a within-subjects design, with the order of conditions counterbalanced across participants.
TUS protocol and procedure
Study design
The study design is summarised in Fig. 2a. Participants completed a behavioural practice and MRI only session followed by three TUS and MRI sessions, which were spaced at least one week apart and at the same time of day for each participant. During the behavioural practice and MRI session, participants familiarised themselves with the reversal learning task and underwent a short series of MRI scans, including a high-resolution T1-weighted MRI. This anatomical scan was used to derive a participant-specific head model for neuronavigation and acoustic simulations to plan TUS target and transducer placement for the subsequent TUS and MRI sessions. Participants were then assigned a randomised order of TUS conditions for their TUS and MRI sessions. During the TUS and MRI sessions, participants underwent 80 s of TUS followed by a series of MRI scans, including interleaved blocks of fMRI and MR spectroscopy scans, during which they performed the task (Fig. 1a). MR spectroscopy data will be the subject of another report. The TUS conditions were either active TUS applied to the left nucleus accumbens (TUS-NAcc), active TUS applied to the left dorsal anterior cingulate cortex (TUS-dACC), or Sham, where the transducer was placed as if to target the medial frontal cortex, but no ultrasound was delivered (Fig. 1b).
Target location for ultrasound
The left NAcc target was centred at MNI coordinates x = −9, y = 11, z = −7, and was identified based on an 80% probability atlas from the Harvard–Oxford Subcortical Structural Atlas supplied with FSL. The left dACC target was centred at MRI coordinates x = −5, y = 24, z = 30. During target and transducer placement planning, the targets were identified using an initial co-registration to MNI space and adjusted based on each individual’s T1-weighted MRI.
Acoustic simulations
The output of our NeuroFUS transducer was previously measured with a hydrophone setup in a water tank (see Yaakub et al.22 for details). We used the k-Wave Toolbox82 (version 1.4) and custom scripts21,22 in MATLAB (R2020b, MathWorks, Inc.) for our simulations, with an individual skull model estimated from each participant’s T1-weighted MRI21,22. The codes to quantify the pseudo CT and to run the simulations can be found here83,84. The simulated transducer was modelled based on the physical properties of the NeuroFUS bespoke 4-elements transducer and optimised phases obtained using k-Plan (https://dispatch.k-plan.io). We set our simulation grid size to a 256 × 256 × 256 matrix centred on the midpoint between the transducer and focus with a grid spacing of 0.5 mm (i.e., 6 points per wavelength at 500 kHz).
TUS protocol
A bespoke CTX-500 NeuroFUS TPO system (Brainbox Ltd., Cardiff, UK) with a four-element annular transducer (diameter = 64 mm, central frequency = 500 kHz, and steering range between 27.3 and 82.6 mm) was used to deliver 5 Hz pulse repetition frequency repetitive TUS (pulse duration = 20 ms, pulse repetition interval = 200 ms, total duration = 80 s, total number of pulses = 400). The term bespoke reflects the custom specification of the transducer at the time of purchase in 2019, prior to the release of the standardised CTX-500 series. At that time, Sonic Concepts invited user-defined design parameters, including steering range. We selected a steering range (27.3–82.6 mm) tailored for targeting deep structures such as the human NAcc. As such, this transducer differs from later commercial CTX-500 units, which feature shallower, fixed ranges. This steering range ensured that it was possible to reach the NAcc in humans. The target free-field spatial-peak pulse-average intensity (ISPPA) was kept constant at 35 W/cm2 for all participants, which is the intensity before going through the skull and the soft tissue. Transcranial acoustic and thermal simulations (see ‘Acoustic simulations’ section for details) were performed both during planning and after each session to confirm that transcranial intensities remained within the limits of the ITRUSST safety guidelines for TUS26.
To ensure good ultrasound transmission, we spread a layer of ultrasound transmission gel (Aquasonic 100, Parker Laboratories Inc.) into the hair where the transducer would be placed, making sure that air bubbles were smoothed or combed out. The head was not shaved. A 2 cm gel pad (Aquaflex, Parker Laboratories Inc.) was used between the transducer and the head. Neuronavigation was performed with Brainsight v 2.5 (Rogue Research Inc., Montréal, Québec, Canada) using the anatomical T1-weighted MR images from each participant. The focal depth read of Brainsight during each session was entered on the NeuroFUS TPO before stimulation was delivered. Once stimulation had begun, the trajectory was sampled with Brainsight and used in confirmatory acoustic simulations performed after each session. At the end of, and on the day after, each TUS session, participants were asked to report and elaborate on any adverse effects they thought were associated with TUS using a 4-point scale questionnaire with open-ended responses where they were encouraged to describe any experiences and whether they thought their experiences were related to the study procedures (provided in the Supplementary Material). Participants were blinded, while the experimenters were not.
Sham TUS was delivered in the same way as active TUS, except that the NeuroFUS TPO unit was turned off and no stimulation was delivered. A sound mimicking the sound produced by the envelope of the TUS protocol was played via bone conduction headphones (see ref. 22 for details) placed bilaterally approximately 2 cm posterior and superior to the temples. The same headphones were worn for all the conditions, including the NAcc and dACC active TUS conditions, but the sound was not played during those. We verbally prompted participants at the end of the study to disclose whether they had felt any difference between the sessions. Participants reported that they were unable to discern any differences between the sessions and had not suspected a Sham condition.
DBS protocol and procedure
Study design
Participants completed a behavioural practice session followed by one session with DBS OFF and one session with DBS ON, the order of which were randomised and counterbalanced. One participant had DBS ON then OFF, and two participants had DBS OFF then ON. These sessions took place on the same day. During the behavioural practice, participants familiarised themselves with the reversal learning task. DBS was turned off for a 30 min washout period prior to the experimental sessions and both participants and assessors were blinded to condition allocation (DBS ON or DBS OFF). Participants then undertook four blocks of the reversal learning task in each condition, with 100 trials in each block. DBS ON refers to their DBS device being turned on at their usual therapeutic settings, for all participants this was at 130 Hz (Supplementary Table 10).
DBS Surgery
Participants were recruited into this study after completing a pilot study of DBS to the NAcc to treat severe and enduring anorexia nervosa (trial registration number: NCT01924598). Surgery was performed under general anaesthesia. A 2.7 mm twist drill craniotomy was made, and the electrode lead inserted bilaterally into the NAcc. All patients received intraoperative imaging to confirm electrode positioning was within the target, and the electrode was repositioned in real time if that was not the case. All electrode positions were confirmed using pre-operative MRI fused with post-operative CT with distal contact in NAcc and proximal contacts in the anterior limb of the internal capsule (ALIC). Target selection was based on anatomical/stereotactic references. All participants in this study had Medtronic 3387 electrodes with the Medtronic Activa RC model 37612, which is a constant voltage stimulator.
MRI data acquisition and pre-processing
MRI scans were acquired on a Siemens MAGNETOM Prisma 3T scanner (VE11E, Siemens Healthineers, Erlangen, Germany) with a 32-channel head coil. The scans in this study included a T1-weighted MPRAGE sequence acquired in the sagittal plane (repetition time (TR) = 2100 ms, echo time (TE) = 2.26 ms, inversion time = 900 ms, flip angle (FA) = 8°, GRAPPA acceleration factor = 2, field of view = 256 × 256 mm, number of slices = 176, voxel size = 1 × 1 × 1 mm3), two GE-EPI fMRI scans during which participants performed the probabilistic reversal learning task lasting approximately 10-minutes each (acquisition plane tilted 30° clockwise from the line parallel to the AC–PC line, 1400 ms TR, 30 ms TE, 67° FA, 2.5 mm slice thickness, no slice gap, multi-band acceleration factor of 2, and 60 interleaved slices of 96 × 96 matrix size, giving a voxel size of 2.5 × 2.5 × 2.5 mm3), and field maps for fMRI distortion correction. FMRI pre-processing was performed using tools from the FMRIB Software Library v6.0 (FSL; www.fmrib.ox.ac.uk/fsl). Pre-processing included MCFLIRT motion correction, B0 inhomogeneity correction (effective EPI echo spacing = 0.49 ms, EPI TE = 30 ms, unwarp direction = −y, signal loss threshold = 10%), brain extraction, spatial smoothing (5 mm FWHM), and high pass filtering (0.01 Hz).
Probabilistic reversal learning task
Healthy participants performed a probabilistic reversal learning task during four blocks of MR acquisitions (two of which were fMRI acquisitions). The task consisted of two runs of 100 trials each (presented during the fMRI scans) and two runs of 60 trials each (no fMRI), giving a total of 320 trials across four blocks. Three cues were presented per block, resulting in a total of 12 cues across the experiment (adapted from ref. 34). Additional stimuli included a tick to represent a reward, a cross to represent no reward, and a fixation cross.
In each task block, a different subset of three abstract symbols (e.g. A, B, and C) was randomly selected from the full set of 12 symbols. One of the three symbols would be associated with a 70% chance of obtaining a reward (‘high’ reward probability symbol) while the remaining two symbols were each associated with a 30% chance of obtaining a reward (‘low’ reward probability symbols). During each trial, participants were shown two of the three symbols and asked to select the symbol that they thought was associated with the highest probability of obtaining a reward. Participants were not informed of the exact reward probabilities assigned to each symbol but were instead asked to learn to choose the symbol that was more likely to lead to a reward through trial and error (i.e. by making use of the outcome of past decisions). Therefore, there was only one option associated with a high reward rate, while the other two were associated with lower reward rates. This procedure has a few advantages. First, because the best option is not presented on every trial, participants had to make different types of choices on different trials depending on which options were available. In addition, because for a third of the trials, participants had to choose between the two least rewarding symbols, even after they have learnt the cue–reward contingencies, this manipulation allowed us to achieve a more even distribution of reward and non-reward outcomes.
After making their decision, participants were shown either a tick to represent a reward on that trial or a cross to represent no reward. Participants were told that the number of rewarded trials would be counted across all the blocks and sessions, and that they would be awarded up to £10 as a performance bonus at the end of the study, depending how well they performed during the task. In the present task, outcomes were either reward or non-reward (i.e. reward omission); we use the term ‘non-reward’ throughout to reflect the absence of positive feedback without implying an explicit negative or punishing outcome.
To prevent participants from searching for non-existent patterns and to reduce cognitive load we presented the three possible pair combinations of the three symbols in a fixed order (i.e. AB, BC and CA). However, the presentation of the symbols on the left or right of the fixation cross each time was randomised. Participants were explicitly informed about this manipulation. After every 25 trials in each block, a ‘reversal’ would be introduced whereby the high reward probability was reassigned to a different symbol that was drawn out of the three symbols in total that were employed in each block. Participants would then have to learn to identify the ‘new’ high probability symbol out of the three. Participants were informed that reversals would happen several times during each block, but were not told of the exact frequency of reversals. The stochastic nature of the task and the reversals in choice–reward associations also ensured a need for constant assessment of each option’s value and constant changes in choice selection.
Figure 1a shows the sequence of events for an example trial. At the start of each block of trials, the three abstract symbols selected for that block were shown on the screen for 5 s so that participants could familiarise themselves with the symbols they would see for the duration of that block. Each trial began with a fixation cross shown for a random delay of 1–1.2 s. This was followed by the presentation of two of the three symbols on either side of the fixation cross for 1.25 s, during which time participants were instructed to select one of the symbols by pressing a button. The fixation cross flickered for 100 ms after participants made their selection to indicate that their response was registered. The fixation cross was then shown again with a random delay of 1–1.2 s before the outcome of the trial, either a tick for a reward or a cross for no reward, was shown in the centre of the screen for 0.75 s. Trials in which participants failed to respond within 1.25 s were followed by a ‘Lost trial’ message shown in the middle of the screen in place of the tick or cross outcome.
Stimuli display
The probabilistic reversal learning task was presented on an MRI-compatible LCD screen (BOLDscreen 32 AVI, 32-inch screen, resolution = 1920 × 1080, refresh rate = 120 Hz, Cambridge Research Systems Ltd., Rochester, Kent, UK) placed 1 m behind the MRI scanner, which participants viewed using a mirror attached to the head coil. The experiment was presented using the Presentation software (Neurobehavioural Systems Inc., Berkeley, CA, USA) run on a Windows machine. Responses were collected using an MRI-compatible fibre optic response pad (model: HHSC-2 × 4-C, Current Designs Inc., Philadelphia, PA, USA) placed in the participants’ right hand. Participants selected the symbol on the right with their index finger and the symbol on the left with their middle finger.
Modelling of behavioural data
Models
Model 1
To create trial-wise estimates of the expected value (reward prediction) and prediction error, we first used a Rescorla–Wagner model, within the reinforcement learning framework, using each participants’ behavioural choices and feedback. Specifically, the algorithm assigned each choice i (for example selecting the symbol A) an expected value which was updated via a prediction error, , as follows (1):
| 1 |
where is a learning rate that determines the influence of the prediction error on the updating of the symbol’s expected value. The prediction error is calculated as (2):
| 2 |
where represents the outcome obtained on that trial (0 or 1). The expected values of the unselected stimulus (e.g., B) and the stimulus not shown on trial t (e.g., C) were not updated.
Model 2
The main limitation of the classical Rescorla–Wagner model is the implied symmetry with which reward and non-reward feedback update a choice value estimate. This contradicts evidence that learning from reward and non-reward feedback has different effects on behaviour and decision-making85. We thus implemented an alternative asymmetric learning model which discriminates based on outcome valence, with two learning rates, one for reward and one for non-reward feedback38,86.
With all models we used a SoftMax decision function in which, on each trial t, a stimulus choice probability (e.g. A) was given by (3):
| 3 |
where is the logistic function, and represents the degree of choice stochasticity (i.e. the exploration/exploitation parameter). Choice probability of the unchosen stimulus (e.g. B) and of the stimulus not shown on trial t (e.g. C) were not updated.
The reinforcement learning model estimated expected value (V), prediction error (δ), and separate learning rates (α+ for reward, α− for non-reward) for each participant, block, and TUS condition. While expected value reflects the ongoing belief about how rewarding each option is, prediction error reflects the mismatch between expected and actual outcomes, and drives updating of value estimates. These model-derived variables were used to examine latent learning processes and were analysed separately from directly observed behaviours, such as win–stay strategies.
Estimates from these models were used in both behavioural regressions and as parametric modulators in fMRI analyses, enabling us to dissociate stimulus-response learning from value-based decision signals.
Model fitting and comparison of models
We estimated parameters individually for each participant, block, and stimulation condition (TUS-NAcc, TUS-dACC, and Sham), including and the single or multiple learning rate α. We initially determined reasonably good parameters by a grid search while applying the following parameter constraints: > 0 and 0.1 <α < 0.9. The best parameters from the grid search were then used as starting points for a simplex optimisation procedure, which determined the final parameter estimates. As a goodness-of-fit measure, we used the log likelihood of the observed choices over all trials T given the model and its parameters: , where denotes the probability of choice y in trial t given the model’s parameter set θ. Predicted choice probabilities were calculated based on 1000 simulations per parameter set (combinations of the free parameters), whereby in each simulation the model determined the choices used to update reward expectations (as opposed to observed choices). To obtain estimates of expected values and prediction errors, the two estimated participant-specific parameters were then re-entered into the reinforcement learning algorithm, this time based on participants’ observed choices.
To determine the best fitting model, we performed classical model comparison. Specifically, for each model, we first estimated the subject-wise Bayesian Information Criterion (BIC) as follows (4):
| 4 |
Here, the goodness of fit of a model () is penalised by the complexity term () where the number of free parameters in the model d is scaled by the number of data points n (i.e. trials). We then computed the sum of the subject-wise BIC for each model and compared the model-wise BIC estimates (lower estimates indicating better fit). The code to run these models and fitting procedure can be found here87.
Behavioural analysis
Behavioural data were saved in standard ascii data files and were analysed using custom-written code in MATLAB (R2023a, MathWorks, Inc., https://uk.mathworks.com/). The relationships between Reward, Prediction Error, win–stay behaviours, and task structure were evaluated using linear regression. Statistical testing and post hoc analyses were performed using the default functions in MATLAB, namely anova, fitglm, fitlme and ttest. We ran two series of analyses, very similar in nature. The first one was run without estimates from the reinforcement learning model, simply the behaviours during the task. The second series used estimates from the reinforcement learning model.
Analysis A
The first analysis in the series was concerned with the relationship between reward and subsequent win–stay behaviours. We thus ran a regression model given by the Eq. (5):
| 5 |
where is coded as +1 for Win–Stay, −1 for Win–Switch and 0 otherwise on trial t + 1; Reward is coded as 1 if a reward is received and 0 if not on trial t; choiceStickniness is coded as 1 if choice has been repeated and 0 otherwise on trial t; and isHPscreen is coded as 1 if there was a high probability symbol on the screen and 0 otherwise. In essence, this is equivalent to looking at the proportion of win–stay behaviours while controlling for other features of the task and behaviour, such as any general tendency to choose repetition regardless of reward (choiceStickiness). We then subjected the regression weights to an analysis of variance (ANOVA) such that (6):
| 6 |
where condition is a categorical variable (TUS-dACC, TUS-NAcc or Sham). Further, three two-tailed two-sample t-tests were used for all possible TUS-dACC, TUS-NAcc and Sham pairs, applying Bonferroni correction. Cohen’s D was calculated for a paired-samples t-test by dividing the mean difference by the standard deviation of the difference.
Analysis B
The second analysis tested the difference between the regression weights from Analysis A for NAcc vs. Sham and dACC vs. Sham for each of the four blocks (2 fMRI and 2 tasks only). As these blocks were acquired at different times, this allowed us to plot the effect identified in Analysis A over time. We used two-tailed paired t-tests to check the significance of this difference for each block and considered p < 0.05 to be significant. We did not perform multiple comparisons as we used predefined hypotheses and tests.
Analysis C
The third analysis looked at the rate of choice of the high probability symbol, for the window significant in Analysis B, for the runs within a block, averaged across blocks, and then during the first run of a block, averaged across blocks.
We then repeated the Analyses A, B, and C with model estimates from the reinforcement learning model. In Analysis A, we used the prediction error at t instead of the reward at t. In Analysis C, we used the expected value associated with the high probability symbol instead of the choice rate associated with the high probability symbol and presented it for all four blocks.
Task-based fMRI analysis
FMRI data were pre-processed and analysed using FEAT (FMRI Expert Analysis Tool) Version 6.00, part of FSL (FMRIB’s Software Library, www.fmrib.ox.ac.uk/fsl). Pre-processing included motion correction, B0 field inhomogeneity correction, brain extraction, spatial smoothing (5 mm FWHM) and high pass filtering (0.01 Hz). FMRI data were co-registered to the MNI standard space via a linear transform to the individual’s high-resolution T1-weighted MRI and a non-linear transform to the MNI template.
ROI analyses
We hypothesised that we would find reward-related BOLD changes within the targeted areas of neuromodulation. For the NAcc site, we used the bilateral NAcc defined anatomically by the probabilistic Harvard–Oxford subcortical structural atlases. For the dACC site, we created a sphere around the maximum peak TUS intensity (ISPPA in situ) across participants extracted from our acoustic simulation (see Supplementary Fig. 4).
Statistical analyses of BOLD data were then performed using a fixed-effects approach within the framework of a GLM, as implemented in FSL (using the FEAT module). For each block, we ran a GLM in the form (7):
| 7 |
where BOLD is a T × 1 (T time samples) column vector containing the times series data for a given voxel; UnmodFeedback is an unmodulated regressor (all event amplitudes set to 1) locked at the time of outcome (that is, when the tick/cross appeared) as a boxcar regressor with a duration of 100 ms; RewDel is a simple categorical regressor for reward delivery (amplitudes set to +1 for rewarded outcomes) as a boxcar regressor with a duration of 100 ms; RewExp is a fully parametric regressor whose event amplitudes were modulated by the expected probabilistic reward associated with the chosen option, locked at time of decision, as a boxcar regressor with a duration of 100 ms; ReactionTime is a boxcar regressor with a duration modulated by reaction time, locked at time of decision; and Lost is an unmodulated regressor for all lost trials, locked at time of decision. In addition, we included six nuisance regressors, one for each of the motion parameters (three rotations and three translations).
From each of the two reward-related regressors (RewExp and RewDel), we extracted beta coefficients from the NAcc and dACC by back-projecting the ROIs described above (Supplementary Fig. 4) from standard space into each individual’s EPI (functional) space by applying the inverse transformations as estimated during registration. For the two reward-related regressors, we computed the average beta coefficients from all voxels in the back-projected ROIs and across participants to test the overall BOLD response profile of the ROIs as a function of both reward delivery and reward expectation.
Whole-brain analyses
We ran a single-group tripled t-test which corresponds to a repeated measures ANOVA with one fixed factor with three levels and one random factor. Fitting such a mixed effects model with ordinary least squares (OLS) (as implemented in FEAT) requires an assumption of compound symmetry. This is the state of equal variance and intra-subject correlations being equal. That is, Cov(scan1,scan2) = Cov(scan1,scan3) = Cov(scan2,scan3). For these whole-brain fMRI results, all images were thresholded given a one-sided t-test and subsequently FDR-corrected at p < 0.05.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Source data
Acknowledgements
The authors thank Dr Maryann Noonan and Dr Marco Wittmann for fruitful discussions on the probabilistic task; research assistants Ema Darrieutort and Joshua Marquez for their work on the safety data, all study participants for taking part in the study, and the Brain Research & Imaging Centre (BRIC) MRI radiographers for their help with scanning. This research was supported by a UKRI Medical Research Council Future Leaders Fellowship, BBSRC, Neuromod + /ESPRC and ARIA grant (MR/T023007/1, BB/Y001494/1, EP/W035057/1 and SCNI-PR01-P15) (to E.F.F.). Scanning for this study was supported by the Brain Research & Imaging Centre (BRIC). Effort for N.S.P. was supported by the National Institutes of Mental Health (U01 MH123427) and US Dept of Veterans Affairs (I50 RX002864). The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the funders.
Author contributions
S.N.Y. and E.F.F. conceived this research and designed the study. J.E., A.P.D.Z., and A.L.G. led the DBS studies and J.E. carried out the DBS experiments. N.S.P. advised on ultrasound safety of subcortical regions. J.R. advised on MRI acquisition. N.B. and M.R. advised on data quality and analysis. S.N.Y., E.B., M.L., N.B., and E.F.F. acquired data for the study. S.N.Y., J.E., N.B. and E.F.F. analysed the data. S.N.Y., J.E., N.B., M.R., and E.F.F. wrote the manuscript with input from all authors. All authors reviewed the final manuscript.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Data availability
The raw and processed MR data and acoustic simulation data generated in this study have been deposited in the Open Science Framework database under the CC-By Attribution 4.0 License: (https://osf.io/j34qz/, https://osf.io/w3mev/, and https://osf.io/vst9y/). Source data are also provided with this paper. Source data are provided with this paper.
Code availability
FSL can be downloaded from https://fsl.fmrib.ox.ac.uk/fsl/fslwiki. The code to generate pseudo-CTs from T1-weighted images and the in-house acoustic simulation code based on k-Wave (Matlab) can both be found on GitHub (https://github.com/sitiny/mr-to-pct: 10.5281/zenodo.7110246 and https://github.com/sitiny/BRIC_TUS_Simulation_Tools: 10.5281/zenodo.8027240 respectively). K-wave can be downloaded from http://www.k-wave.org/. The reinforcement learning code is available on GitHub (https://github.com/efouragnan/RL_models: 10.5281/zenodo.16682779). For any further enquiries regarding the data, please contact Elsa Fouragnan. FSL can be downloaded from https://fsl.fmrib.ox.ac.uk/fsl/fslwiki. The code to generate pseudo-CTs from T1-weighted images and the in-house acoustic simulation code based on k-Wave (Matlab) can both be found on GitHub ((https://github.com/sitiny/mr-to-pct and https://github.com/sitiny/BRIC_TUS_Simulation_Tools, respectively). K-wave can be downloaded from http://www.k-wave.org/. For any further enquiries regarding the data, please contact Elsa F. Fouragnan.
Competing interests
E.F.F. is a consultant for Attune Neuroscience. N.S.P. is on the scientific advisory boards of Pulvinar Neuro and Grey Matter Neurosciences, and consultant to Motif Neurotech. All the other authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Siti N. Yaakub, John Eraifej, Nadège Bault.
These authors jointly supervised this work: Matthew F. S. Rushworth, Elsa F. Fouragnan.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-025-65080-9.
References
- 1.Silberberg, D., Anand, N. P., Michels, K. & Kalaria, R. N. Brain and other nervous system disorders across the lifespan - global challenges and opportunities. Nature527, S151–S154 (2015). [DOI] [PubMed] [Google Scholar]
- 2.Mahoney, J. J. et al. Low-intensity focused ultrasound targeting the bilateral nucleus accumbens as a potential treatment for substance use disorder: a first-in-human report. Biol. Psychiatry94, e41–e43 (2023). [DOI] [PubMed] [Google Scholar]
- 3.Yan, H. et al. Nucleus accumbens: a systematic review of neural circuitry and clinical studies in healthy and pathological states. J. Neurosurg.138, 337–346 (2023). [DOI] [PubMed] [Google Scholar]
- 4.Yoo, S., Mittelstein, D. R., Hurt, R. C., Lacroix, J. & Shapiro, M. G. Focused ultrasound excites cortical neurons via mechanosensitive calcium accumulation and ion channel amplification. Nat. Commun.13, 493 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.O’Reilly, M. A. Exploiting the mechanical effects of ultrasound for noninvasive therapy. Science385, eadp7206 (2024). [DOI] [PubMed] [Google Scholar]
- 6.Darmani, G. et al. Non-invasive transcranial ultrasound stimulation for neuromodulation. Clin. Neurophysiol.135, 51–73 (2022). [DOI] [PubMed] [Google Scholar]
- 7.Koutsoumpari, N. et al. Ultrasound neuromodulation reveals distinct roles of the dorsal anterior cingulate cortex and anterior insula in Pavlovian biases. Preprint at Preprint at 10.1101/2025.06.12.659273 (2025).
- 8.Bault, N., Yaakub, S. N. & Fouragnan, E. Early-phase neuroplasticity induced by offline transcranial ultrasound stimulation in primates. Curr. Opin. Behav. Sci.56, 101370 (2024). [Google Scholar]
- 9.Murphy, K. R. et al. A tool for monitoring cell type–specific focused ultrasound neuromodulation and control of chronic epilepsy. Proc. Natl. Acad. Sci. USA119, e2206828119 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Fan, J. M. et al. Thalamic transcranial ultrasound stimulation in treatment resistant depression. Brain Stimul. Basic Transl. Clin. Res. Neuromodulation17, 1001–1004 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Martin, E., Roberts, M., Grigoras, I. F. et al. Ultrasound system for precise neuromodulation of human deep brain circuits. Nat. Commun. 16, 8024 (2025). [DOI] [PMC free article] [PubMed]
- 12.Blackmore, D. G., Razansky, D. & Götz, J. Ultrasound as a versatile tool for short- and long-term improvement and monitoring of brain function. Neuron111, 1174–1190 (2023). [DOI] [PubMed] [Google Scholar]
- 13.Folloni, D. et al. Manipulation of subcortical and deep cortical activity in the primate brain using transcranial focused ultrasound stimulation. Neuron101, 1109–1116.e5 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Fouragnan, E.F. et al. The macaque anterior cingulate cortex translates counterfactual choice value into actual behavioral change. Nat. Neurosci. 10.1038/s41593-019-0375-6 (2019). [DOI] [PMC free article] [PubMed]
- 15.Folloni, D. et al. Ultrasound modulation of macaque prefrontal cortex selectively alters credit assignment–related activity and behavior. Sci. Adv.7, eabg7700 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Bongioanni, A. et al. Activation and disruption of a neural mechanism for novel choice in monkeys. Nature591, 270–274 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Kubanek, J. et al. Remote, brain region-specific control of choice behavior with ultrasonic waves. Sci. Adv.6, eaaz4193 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Deffieux, T. et al. Low-intensity focused ultrasound modulates monkey visuomotor behavior. Curr. Biol.23, 2430–2433 (2013). [DOI] [PubMed] [Google Scholar]
- 19.Priestley, L. et al. Dorsal raphe nucleus controls motivation-state transitions in monkeys. Sci. Adv.11, eads1236 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Attali, D. et al. Three-layer model with absorption for conservative estimation of the maximum acoustic transmission coefficient through the human skull for transcranial ultrasound stimulation. Brain Stimul.16, 48–55 (2023). [DOI] [PubMed] [Google Scholar]
- 21.Yaakub, S. N. et al. Pseudo-CTs from T1-weighted MRI for planning of low-intensity transcranial focused ultrasound neuromodulation: an open-source tool. Brain Stimul.16, 75–78 (2023). [DOI] [PubMed] [Google Scholar]
- 22.Yaakub, S. N. et al. Transcranial focused ultrasound-mediated neurochemical and functional connectivity changes in deep cortical regions in humans. Nat. Commun.14, 5318 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Khalighinejad, N. et al. A basal forebrain-cingulate circuit in macaques decides it is time to act. Neuron105, 370–384.e8 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Munoz, F. et al. Long term study of motivational and cognitive effects of low-intensity focused ultrasound neuromodulation in the dorsal striatum of nonhuman primates. Brain Stimul.15, 360–372 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Murphy, K. R. et al. A practical guide to transcranial ultrasonic stimulation from the IFCN-endorsed ITRUSST consortium. Clin. Neurophysiol.171, 192–226 (2025). [DOI] [PubMed] [Google Scholar]
- 26.Aubry, J.-F. et al. ITRUSST Consensus on Biophysical Safety for Transcranial Ultrasonic Stimulation. Preprint at. 10.48550/arXiv.2311.05359 (2023).
- 27.Arabadzhiyska, D. H. et al. A common neural account for social and nonsocial decisions. J. Neurosci.42, 9030–9044 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Volkow, N. D., Wise, R. A. & Baler, R. The dopamine motive system: implications for drug and food addiction. Nat. Rev. Neurosci.18, 741–752 (2017). [DOI] [PubMed] [Google Scholar]
- 29.Averbeck, B. B. & Costa, V. D. Motivational neural circuits underlying reinforcement learning. Nat. Neurosci.20, 505–512 (2017). [DOI] [PubMed] [Google Scholar]
- 30.Schultz, W. Dopamine reward prediction-error signalling: a two-component response. Nat. Rev. Neurosci.17, 183–195 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Sutton, R.S. & Barto, A.G. Introduction to reinforcement learning (MIT press, 1998).
- 32.Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science275, 1593–1599 (1997). [DOI] [PubMed] [Google Scholar]
- 33.D’Ardenne, K., McClure, S. M., Nystrom, L. E. & Cohen, J. D. BOLD responses reflecting dopaminergic signals in the human ventral tegmental area. Science319, 1264–1267 (2008). [DOI] [PubMed] [Google Scholar]
- 34.Fouragnan, E., Retzler, C., Mullinger, K. & Philiastides, M. G. Two spatiotemporally distinct value systems shape reward-based learning in the human brain. Nat. Commun.6, 8107 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Fouragnan, E., Queirazza, F., Retzler, C., Mullinger, K. J. & Philiastides, M. G. Spatiotemporal neural characterization of prediction error valence and surprise during reward learning in humans. Sci. Rep.7, 1–18 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Fouragnan, E., Retzler, C. & Philiastides, M. G. Separate neural representations of prediction error valence and surprise: evidence from an fMRI meta-analysis. Hum. Brain Mapp.39, 2887–2906 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Gläscher, J., Daw, N., Dayan, P. & O’Doherty, J. P. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron66, 585–595 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Niv, Y., Edlund, J. A., Dayan, P. & O’Doherty, J. P. Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain. J. Neurosci.32, 551–562 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Palminteri, S., Khamassi, M., Joffily, M. & Coricelli, G. Contextual modulation of value signals in reward and punishment learning. Nat. Commun.6, 8096 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Tobler, P. N., O’Doherty, J. P., Dolan, R. J. & Schultz, W. Reward value coding distinct from risk attitude-related uncertainty coding in human reward systems. J. Neurophysiol.97, 1621–1632 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.O’Doherty, J. et al. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science304, 452–454 (2004). [DOI] [PubMed] [Google Scholar]
- 42.Rothenhoefer, K. M. et al. Effects of ventral striatum lesions on stimulus-based versus action-based reinforcement learning. J. Neurosci.37, 6902–6914 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Costa, V. D., Dal Monte, O., Lucas, D. R., Murray, E. A. & Averbeck, B. B. Amygdala and ventral striatum make distinct contributions to reinforcement learning. Neuron92, 505–517 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Taswell, C. A., Costa, V. D., Murray, E. A. & Averbeck, B. B. Ventral striatum’s role in learning from gains and losses. Proc. Natl. Acad. Sci. USA115, E12398–E12406 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Taswell, C. A., Janssen, M., Murray, E. A. & Averbeck, B. B. The motivational role of the ventral striatum and amygdala in learning from gains and losses. Behav. Neurosci.137, 268–280 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Giarrocco, F. et al. Motor system-dependent effects of amygdala and ventral striatum lesions on explore-exploit behaviors. J. Neurosci.44, e1206232023 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Vicario-Feliciano, R., Murray, E. A. & Averbeck, B. B. Ventral striatum lesions do not affect reinforcement learning with deterministic outcomes on slow time scales. Behav. Neurosci.131, 385–391 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Banaie Boroujeni, K. et al. Anterior cingulate cortex causally supports flexible learning under motivationally challenging and cognitively demanding conditions. PLoS Biol.20, e3001785 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Klein-Flügge, M. C., Bongioanni, A. & Rushworth, M. F. S. Medial and orbital frontal cortex in decision-making and flexible behavior. Neuron110, 2743–2770 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Amemori, S. et al. Microstimulation of primate neocortex targeting striosomes induces negative decision-making. Eur. J. Neurosci.51, 731–741 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Krauss, J. K. et al. Technology of deep brain stimulation: current status and future directions. Nat. Rev. Neurol.17, 75–87 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Neumann, W.-J., Steiner, L. A. & Milosevic, L. Neurophysiological mechanisms of deep brain stimulation across spatiotemporal resolutions. Brain146, 4456–4468 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Grodzki, D. M., Jakob, P. M. & Heismann, B. Ultrashort echo time imaging using pointwise encoding time reduction with radial acquisition (PETRA). Magn. Reson. Med.67, 510–518 (2012). [DOI] [PubMed] [Google Scholar]
- 54.Klein-Flügge, M. C., Fouragnan, E. F. & Martin, E. The importance of acoustic output measurement and monitoring for the replicability of transcranial ultrasonic stimulation studies. Brain Stimul. Basic Transl. Clin. Res. Neuromodulation17, 32–34 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Legon, W. A retrospective qualitative report of symptoms and safety from transcranial focused ultrasound for neuromodulation in humans. Sci. Rep.10, 5573 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Shivacharan, R. S. et al. Pilot study of responsive nucleus accumbens deep brain stimulation for loss-of-control eating. Nat. Med.28, 1791–1796 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Shen, L. et al. Subthalamic nucleus deep brain stimulation modulates 2 distinct neurocircuits. Ann. Neurol.88, 1178–1193 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Kahan, J. et al. Deep brain stimulation has state-dependent effects on motor connectivity in Parkinson’s disease. Brain142, 2417–2431 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Eraifej, J. et al. Modulation of limbic resting-state networks by subthalamic nucleus deep brain stimulation. Netw. Neurosci.7, 478–495 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Horn, A. et al. Connectivity Predicts deep brain stimulation outcome in Parkinson disease. Ann. Neurol.82, 67–78 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Li, N. et al. A unified connectomic target for deep brain stimulation in obsessive-compulsive disorder. Nat. Commun.11, 3364 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Raghu, A. L. B. et al. Pallido-putaminal connectivity predicts outcomes of deep brain stimulation for cervical dystonia. Brain144, 3589–3596 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Ríos, A. S. et al. Optimal deep brain stimulation sites and networks for stimulation of the fornix in Alzheimer’s disease. Nat. Commun.13, 7707 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Passingham, R. E., Stephan, K. E. & Kötter, R. The anatomical basis of functional localization in the cortex. Nat. Rev. Neurosci.3, 606–616 (2002). [DOI] [PubMed] [Google Scholar]
- 65.Kennerley, S. W., Walton, M. E., Behrens, T. E. J., Buckley, M. J. & Rushworth, M. F. S. Optimal decision making and the anterior cingulate cortex. Nat. Neurosci.9, 940–947 (2006). [DOI] [PubMed] [Google Scholar]
- 66.Amemori, K. & Graybiel, A. M. Localized microstimulation of primate pregenual cingulate cortex induces negative decision-making. Nat. Neurosci.15, 776–785 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Amemori, S., Graybiel, A. M. & Amemori, K. Cingulate microstimulation induces negative decision-making via reduced top-down influence on primate fronto-cingulo-striatal network. Nat. Commun.15, 4201 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Grohn, J. et al. General mechanisms of task engagement in the primate frontal cortex. Nat. Commun.15, 4802 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Murphy, K.R. et al. Optimized ultrasound neuromodulation for non-invasive control of behavior and physiology. Neuron10.1016/j.neuron.2024.07.002 (2024). [DOI] [PMC free article] [PubMed]
- 70.Blumenfeld, Z. & Brontë-Stewart, H. High frequency deep brain stimulation and neural rhythms in Parkinson’s Disease. Neuropsychol. Rev.25, 384–397 (2015). [DOI] [PubMed] [Google Scholar]
- 71.Benazzouz, A. et al. Effect of high-frequency stimulation of the subthalamic nucleus on the neuronal activities of the substantia nigra pars reticulata and ventrolateral nucleus of the thalamus in the rat. Neuroscience99, 289–295 (2000). [DOI] [PubMed] [Google Scholar]
- 72.Limousin, P. & Martinez-Torres, I. Deep brain stimulation for Parkinson’s disease. Neurotherapeutics5, 309 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Eusebio, A. et al. Deep brain stimulation can suppress pathological synchronisation in parkinsonian patients. J. Neurol. Neurosurg. Psychiatry82, 569–573 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Brown, P. et al. Effects of stimulation of the subthalamic area on oscillatory pallidal activity in Parkinson’s disease. Exp. Neurol.188, 480–490 (2004). [DOI] [PubMed] [Google Scholar]
- 75.Eusebio, A. et al. Effects of low-frequency stimulation of the subthalamic nucleus on movement in Parkinson’s disease. Exp. Neurol.209, 125–130 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Darmani, G. et al. Individualized non-invasive deep brain stimulation of the basal ganglia using transcranial ultrasound stimulation. Nat. Commun.16, 2693 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Neveu, R. et al. Preference for safe over risky options in binge eating. Front. Behav. Neurosci. 10, 10.3389/fnbeh.2016.00065 (2016). [DOI] [PMC free article] [PubMed]
- 78.Neveu, R. et al. Improved planning abilities in binge eating. PLoS ONE9, e105657 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Glashouwer, K. A., Bloot, L., Veenstra, E. M., Franken, I. H. A. & de Jong, P. J. Heightened sensitivity to punishment and reward in anorexia nervosa. Appetite75, 97–102 (2014). [DOI] [PubMed] [Google Scholar]
- 80.Schüller, T. et al. Internal capsule/nucleus accumbens deep brain stimulation increases impulsive decision making in obsessive-compulsive disorder. Biol. Psychiatry Cognit. Neurosci. Neuroimaging8, 281–289 (2023). [DOI] [PubMed] [Google Scholar]
- 81.Klein-Flügge, M. C. et al. Relationship between nuclei-specific amygdala connectivity and mental health dimensions in humans. Nat. Hum. Behav.6, 1705–1722 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Treeby, B. E. & Cox, B. T. k-Wave: MATLAB toolbox for the simulation and reconstruction of photoacoustic wave fields. J. Biomed. Opt.15, 021314 (2010). [DOI] [PubMed] [Google Scholar]
- 83.Yaakub, S. N. mr-to-pct for TUS acoustic simulations. Zenodo. Version 1. 10.5281/zenodo.7110246.
- 84.Yaakub, S.N. BRIC TUS Simulation Tools. Version 2.0.0. 10.5281/zenodo.8027240 (2022).
- 85.van den Bos, W., Güroğlu, B., van den Bulk, B. G., Rombouts, S. A. R. B. & Crone, E. A. Better than expected or as bad as you thought? The neurocognitive development of probabilistic feedback processing. Front. Hum. Neurosci.3, 52 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Gershman, S. J. Do learning rates adapt to the distribution of rewards? Psychon. Bull. Rev.22, 1320–1327 (2015). [DOI] [PubMed] [Google Scholar]
- 87.Fouragnan, E. F. RL modelling. Zenodo. Version 1. 10.5281/zenodo.16682779 (2025).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The raw and processed MR data and acoustic simulation data generated in this study have been deposited in the Open Science Framework database under the CC-By Attribution 4.0 License: (https://osf.io/j34qz/, https://osf.io/w3mev/, and https://osf.io/vst9y/). Source data are also provided with this paper. Source data are provided with this paper.
FSL can be downloaded from https://fsl.fmrib.ox.ac.uk/fsl/fslwiki. The code to generate pseudo-CTs from T1-weighted images and the in-house acoustic simulation code based on k-Wave (Matlab) can both be found on GitHub (https://github.com/sitiny/mr-to-pct: 10.5281/zenodo.7110246 and https://github.com/sitiny/BRIC_TUS_Simulation_Tools: 10.5281/zenodo.8027240 respectively). K-wave can be downloaded from http://www.k-wave.org/. The reinforcement learning code is available on GitHub (https://github.com/efouragnan/RL_models: 10.5281/zenodo.16682779). For any further enquiries regarding the data, please contact Elsa Fouragnan. FSL can be downloaded from https://fsl.fmrib.ox.ac.uk/fsl/fslwiki. The code to generate pseudo-CTs from T1-weighted images and the in-house acoustic simulation code based on k-Wave (Matlab) can both be found on GitHub ((https://github.com/sitiny/mr-to-pct and https://github.com/sitiny/BRIC_TUS_Simulation_Tools, respectively). K-wave can be downloaded from http://www.k-wave.org/. For any further enquiries regarding the data, please contact Elsa F. Fouragnan.




