Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Feb 1.
Published in final edited form as: Trends Neurosci. 2021 Dec 15;45(2):96–105. doi: 10.1016/j.tins.2021.11.007

Reinforcement learning detuned in addiction: integrative and translational approaches

Stephanie M Groman 1,2,*, Summer L Thompson 2, Daeyeol Lee 3, Jane R Taylor 2,4,5
PMCID: PMC8770604  NIHMSID: NIHMS1764657  PMID: 34920884

Abstract

Suboptimal decision-making strategies have been proposed to contribute to the pathophysiology of addiction. Decision-making, however, arises from a collection of computational components that can independently influence behavior. Disruptions in these different components can lead to decision-making deficits that appear similar behaviorally, but differ at the computational, and likely the neurobiological, level. Here, we discuss recent studies that have employed computational approaches to investigate the decision-making processes underlying addiction. Studies in animal models have found that value updating following positive, but not negative, outcomes is predictive of drug use, whereas value updating following negative, but not positive, outcomes is disrupted following drug self-administration. We contextualize these findings with studies on the circuit and biological mechanisms of decision-making to develop a framework for revealing the biobehavioral mechanisms of addiction.

Keywords: dopamine, mGlu5, orbitofrontal cortex, decision-making, amygdala, nucleus accumbens

Computational biomarkers for elucidating the neurobiological mechanisms of addiction

Over the last decade, there has been a growing interest in the use of computational analyses for elucidating the neurobiology of decision-making processes and, specifically, for understanding the pathology of mental illnesses, including addiction [17]. In this review, we discuss recent evidence from human and animal studies that supports the dissociation of the reinforcement-learning processes that are predictive of problematic drug use (e.g., persistent elevations in drug use, escalation in drug consumption, etc.) from those that are disrupted following drug exposure. We contextualize these findings with recent studies focusing on the neural circuits and biological mechanisms involved in these reinforcement-learning processes, and discuss how genetic, developmental, and environmental factors known to impact these same neurobiological mechanisms could, therefore, contribute to addiction pathology. We argue that the integration of computational approaches with emerging neuroscience techniques, such as in vivo calcium imaging, genetically encoded optical sensors, and single cell RNA sequencing, will offer a powerful repertoire of tools for identifying mechanistic links between biology and behavior to provide new insights into the pathological mechanisms of addiction, as well as other mental disorders.

Addiction as a disorder of altered decision-making reinforcement-learning mechanisms

Maladaptive patterns of drug-seeking and -taking, which are symptomatic of addiction, have been proposed to result, in part, from disruptions in the neural systems that enable behavior to be flexible and goal-directed [8]. Substance-dependent individuals have difficulties abstaining from drug use despite both the desire to do so and the negative consequences associated with continued drug use. Thus, addiction is hypothesized by some to be a disorder involving poor decision-making that emerges as a consequence of persistent and sustained use of drugs of abuse [9]. Recent evidence from human and animal studies, however, has demonstrated that decision-making problems that are present prior to any drug use can be predictive of subsequent drug-taking behaviors [1013]. Thus, the decision-making impairments observed in some substance-dependent individuals may, in part, represent an addiction susceptibility phenotype [13,14]. Suboptimal decision-making could be a key component in the development and persistence of addiction in a subset of individuals and, therefore, a useful phenotype for identifying the biological mechanisms that contribute to the pathological stages and cycles of addiction.

Decision-making, however, is a multifaceted process that involves several computations. When conceptualized within the framework of reinforcement learning, these various computational components are likely characterized by distinct systems in the brain (Table 1; [15]). For example, action values that guide choice behavior may be updated differently depending on whether a given action was performed or not, whether the outcome of the chosen action was appetitive or aversive, or whether the updating relies on the actual choice outcomes (e.g., model-free learning) or prospective reasoning about the consequences of future actions (e.g., model-based learning) [2]. Disruptions in any one of these computations can produce similar decision-making impairments that likely differ at the neurobiological level [16,17]. In particular, the reinforcement-learning mechanisms that are disrupted following drug exposure may differ from those that are predictive of drug use and clarifying the neurobiological basis of these differences could reveal key insights into the pathology and heterogeneity of addiction [18,19].

Table 1:

Reinforcement-learning algorithms used to interrogate the decision-making processes that are affected by addiction.

Reinforcement-learning algorithm Free parameters Value updating function Softmax
Chosen versus unchosen action Trial Outcome
Rewarded No reward
Reinforcement-learning model with a single learning rate* α – learning rate
β – inverse temperature (e.g., choice stochasticity)
Chosen Vx,t+1 = Vx,t + α(RtVx,t) Vx,t+1 = Vx,t + α(RtVx,t)
Unchosen Vx,t+1 = Vx,t Vx,t+1 = Vx,t
Reinforcement-learning model with two learning rates α – learning rate
β – inverse temperature (e.g., choice stochasticity)
Chosen Vx,t+1 = Vx,t + α+ (1 − Vx,t) Vx,t+1 = Vx,t + α0 (0 − Vx,t)
Unchosen Vx,t+1 = Vx,t Vx,t+1 = Vx,t
Forgetting reinforcement-learning model* γ – value maintenance
Δ+ – updating following reward
Δ0 – updating following no rewards
Chosen Vx,t+1 = γVx,t + Δ+ Vx,t+1 = γVx,t + Δ0
Unchosen Vx,t+1 = γVx,t Vx,t+1 = γVx,t
Differential forgetting reinforcement-learning model γC – value maintenance for chosen action
γU – value maintenance for unchosen action
Δ+ – updating following reward
Δ0 – updating following no rewards
Chosen Vx,t+1 = γCVx,t + Δ+ Vx,t+1 = γCVx,t + Δ0
Unchosen Vx,t+1 = γUVx,t Vx,t+1 = γUVx,t
*

Value updating for the reinforcement-learning model with a single learning rate is mathematically equivalent to that for the forgetting reinforcement-learning model:

Vx,t+1 = Vx,t + α(RtVx,t) = (1 − α)Vx,t + α * Rt

Set γ = (1 − α) and Δ= α * Rt

Vx,t+1 = γVx,t + Δ

Recent studies have investigated the role of reinforcement-learning mechanisms in addiction-like pathology in rats by assessing flexible choice behavior in dynamic environments before and after drug self-administration [10,2022]. For example, our group has found that the ability of rats to make adaptive choices in a probabilistic reversal learning task was both predictive of problematic patterns of psychostimulant use and impaired following drug self-administration. However, these phenomena were found to be mediated by different reinforcement-learning computations. Specifically, value updating following positive, but not negative, outcomes was predictive of escalation in cocaine use [10] and initial strength of methamphetamine reinforcement [20], whereas value updating following negative, but not positive, outcomes was disrupted following either cocaine or methamphetamine self-administration [10,20,21]. Relationships between reinforcement-learning computations and specific aspects of addiction behavior have been also observed in mice following a single high dose of cocaine [23] and in studies involving human participants. For example, blunted neural responses to anticipated rewards during early adolescence is predictive of problematic drug use [12,24], whereas negative prediction error signaling and sensitivity to negative outcomes is impaired in substance-dependent individuals [9,2531]. These reinforcement-learning phenotypes could, therefore, be useful for dissociating the neurobiological mechanisms of drug use susceptibility from the neurobiological abnormalities that occur as a consequence of drug use.

In the following, we describe evidence that these reinforcement-learning computations are controlled by distinct neural circuits, associated with different biological systems, and related to select addiction-related behaviors. We argue that the integration of computational approaches with neuroscience techniques in animal models can generate the mechanistic bridges linking biology to behavior that are necessary for understanding human psychopathologies, such as addiction (Figure 1). Although our discussion focuses on the neurobiological underpinnings of appetitive and aversive reinforcement learning – motivated in part because these domains have been extensively studied in humans and animal models of addiction – the computational mechanisms that underlie addiction pathology in humans are likely to be more complex. The action values that guide choice behavior, for example, are computed by multiple learning systems in the brain [32] that are known to be altered in drug-exposed individuals [20,33,34]. These learning systems are likely also involved in addiction-related decision-making impairments that are governed by other neurochemical systems than those discussed here. Moreover, certain reinforcement-learning impairments may be exacerbated or only observable during periods of drug abstinence [26,35], suggesting that decision-making deficits may, in part, be state-dependent and more dynamic than implied by common pre- versus post-drug comparisons in animal studies.

Figure 1: Biobehavioral mechanisms mediating the emergence and persistence of addiction-relevant behaviors.

Figure 1:

Left, drug use susceptibility: Neurodevelopmental deviations – possibly driven by early life experiences, genetics, or sex hormones – may disrupt the formation of select neural circuits (e.g., amygdala-OFC) and impact the signaling mechanisms (e.g., midbrain D3 receptor expression) that control the degree of value updating that occurs following a positive outcome. Impairments in positive-feedback updating that lead to inflexible patterns of decision-making may accelerate escalation in drug use and, therefore, the development of inflexible patterns of drug use. Right, drug use consequences: Persistent exposure to drugs of abuse, however, alters a different set of neural circuits (e.g., OFC-NAc) and signaling mechanisms (e.g., mPFC mGlu5) that control the degree of value updating that occurs following a negative outcome. Drug-induced disruptions in negative-feedback updating may lead to drug-taking behaviors that appear insensitive to negative consequences and, therefore, could lead to compulsive behavior. The transition between susceptibility and addiction is likely to be governed by various factors (pink box), including age of initial drug use, duration of drug use, patterns of drug taking, and cycles of abstinence/use. Dashed boxes and lines represent outstanding questions and/or relationships that could be examined in future investigations.

A main goal of this review is to generate a mechanistic framework to highlight how computational approaches can bridge the divide between results collected in animals vs. humans, by linking biochemical mechanisms and neural circuits with select reinforcement-learning computations. Nevertheless, many questions remain, such as whether the dissociation among these reinforcement-learning computations is as clear in humans as it is in rats; which aspects of drug use are problematic in animals; and whether the nature of such problems differ between drugs of abuse. We argue that the present framework provides a foundation by which future findings can be integrated to address these outstanding questions and to identify other key mechanisms of addiction, including phases of addiction pathology, craving, and relapse.

Orbitofrontal circuits control outcome-mediated value updating

The orbitofrontal cortex (OFC) has been established as a critical brain region in adaptive decision-making, or the ability to adjust choice behavior in response to changes in the environment. Disruptions in OFC function are believed to be a mechanism by which decision-making impairments may emerge in some substance-dependent individuals [3638]. Electrophysiological and neuroimaging studies have reported that multiple reinforcement-learning computations reside in the OFC [3943] and that this heterogeneity may reflect diverse connections between the OFC and other brain regions. For example, the OFC sends and receives projections from the amygdala and nucleus accumbens (NAc) [44], which have also been found to encode signals predicted by the same reinforcement-learning computations observed in the OFC [45]. Individual OFC circuits may implement select computational steps important for guiding choice behavior [46,47]. Determining which OFC circuits control individual reinforcement-learning computations could, therefore, elucidate the circuit-level mechanisms impacted in addiction.

The connection between the OFC and amygdala is of particular interest in both decision-making and addiction research. Connectivity between the amygdala and OFC is associated with adaptive decision-making in humans [48], and studies in animals have demonstrated that lesions to the amygdala or OFC disrupt reinforcement-learning computations [17,4951]. Moreover, structural and functional abnormalities observed in the amygdala and OFC of substance-dependent individuals [52] are related to their decision-making impairments. Whether these amygdala-OFC circuit disruptions were present prior to drug use or emerged as a function of drug use is not fully known, but recent studies have found that low functional connectivity between the amygdala and OFC is associated with greater weekly alcohol intake in binge-drinking young adults [53] and predictive of future alcohol use [54], suggesting that the amygdala-OFC circuit mediates drug use susceptibility. Given our group’s finding that poor value updating following a positive outcome is predictive of greater drug use, the amygdala-OFC circuit may control the degree to which values are updated following a positive outcome and do so in a directionally-specific manner (amygdala projections to the OFC versus OFC projections to the amygdala). Indeed, we recently confirmed that ablation of amygdala projections to the OFC, but not OFC projections to the amygdala, attenuated the degree to which values were updated following a rewarded outcome [16]. Together these data suggest that amygdala projections to the OFC are likely to control the decision-making processes associated with drug use susceptibility.

In addition to the amygdala, the OFC also sends projections to the NAc and is believed to play a key role in regulating the reinforcement-learning signals observed in the NAc [55]. Although NAc activity has largely been associated with positive reward prediction errors, a subset of NAc neurons fire selectively to cues predictive of aversive outcomes [56,57]. Furthermore, negative outcomes may be encoded in NAc neurons receiving input from the OFC. Specifically, activity of OFC neurons projecting to the striatum, including the NAc, appears to encode unrewarded choices to a greater degree than rewarded choices [55], and ablation of OFC neurons projecting to the NAc disrupts the ability of rats to integrate negative outcomes into value estimates [16]. Therefore, drug-induced abnormalities in OFC projections to the NAc may be a mechanism by which disruptions in the integration of negative outcomes into value computations emerge in substance-dependent individuals and animals chronically exposed to drugs of abuse. Indeed, diffusion-weighted imaging studies have observed lower fractional anisotropy (FA) in the OFC [58,59] and NAc [59] of substance-dependent individuals [60], which is negatively related to the duration of drug use [59]. Persistent use of drugs of abuse that leads to impairments in value updating following negative outcomes may be due to disruptions in OFC projections to the NAc and is likely a core feature of addiction psychopathology.

In summary, these data suggest that separable OFC circuits may have distinct roles throughout pathological stages of addiction and could provide insights into the heterogeneity of addiction pathology. Developmental abnormalities in the formation of the circuit from the amygdala to OFC, mediated by environmental, hormonal, and/or genetic factors, could alter value updating following positive outcomes and predispose individuals to greater rates of escalation in drug-taking behaviors. We recently reported that value updating following positive outcomes increased during adolescence and, notably, that the rate of this increase was predictive of value updating in adulthood [61]. Neural changes that occur during adolescence, such as the formation and stabilization of amygdala projections to the OFC, are, therefore, likely to be key mediators of susceptibility in developing problematic drug use. In contrast, the degree to which drugs of abuse alter OFC projections to the NAc and consequently disrupt value updating following negative outcomes may be mediated not only by genetic mechanisms, but also by patterns or quantity of drug use, or pharmacodynamic properties of the drug. For example, not all individuals that take drugs of abuse have decision-making impairments and/or develop compulsive patterns of drug use, which might suggest that the OFC-NAc circuit may be more or less susceptible across individuals to the effects of drugs of abuse. Variation in the response of the OFC-NAc circuit to drug use may represent an additional level of vulnerability and explain why decision-making problems are not ubiquitously observed in substance-dependent populations [62].

Neurochemical mechanisms associated with outcome-mediated value updating

Investigations into the neurochemical mechanisms of decision-making and addiction have suggested that abnormal dopamine and glutamate signaling may underlie the reinforcement-learning disruptions observed in substance-dependent individuals. For example, midbrain dopamine D3 receptor availability and cortical metabotropic glutamate receptor 5 (mGlu5) availability are altered in drug-dependent individuals [64,65] and also related to decision-making functions in both drug-naïve and drug-exposed individuals [64,66]. Although these neurochemical abnormalities are presumed to be a consequence of persistent drug use, variation in D3 and/or mGlu5 receptor signaling prior to drug exposure may mediate separable reinforcement-learning computations that regulate different drug-taking behaviors. Therefore, these receptor systems may also serve as potential biological targets for reducing drug use susceptibility and/or treating drug-induced decision-making deficits.

Activity of midbrain dopamine neurons, which are thought to encode the prediction errors [67,68] for guiding choice behavior and value attribution [68,69], are sensitive to D3 receptor signaling. Elevated midbrain D3 receptor availability in stimulant-dependent individuals may, therefore, attenuate the reward-related signals generated by dopamine neurons [70] that likely guide the degree of value updating following an appetitive outcome [71]. Midbrain D3 receptor availability may control the degree of value updating following a rewarding outcome, and, therefore, be predictive of drug use. Indeed, midbrain D3 receptor availability is negatively correlated with the degree of value updating for positive outcomes [10,66] and predictive of the rate of escalation in cocaine self-administration [10] in rats. Notably, midbrain D3 receptor availability and value updating following a positive outcome are not affected by cocaine self-administration [21] suggesting that heightened D3 receptor availability observed in substance-dependent individuals may predate initiation of drug use. Collectively, these data indicate that midbrain D3 receptor availability influences reward-related value updating and could potentially serve as a biomarker for assessing addiction susceptibility in humans.

The role of mGlu5 receptor signaling in decision-making and reinforcement-learning computations is not well understood, but evidence suggests that mGlu5 receptors may mediate the ability to adjust behaviors under conditions when contingencies change [72,73]. For example, increasing mGlu5 receptor signaling facilitates extinction learning and remediates stress-induced reversal-learning deficits in animals [72,74], whereas genetic deletion of mGlu5 receptors in mice impairs extinction and reversal learning [75,76]. These data suggest that mGlu5 receptors may be involved in dynamic behaviors that require integration of negative outcomes into action values and possibly be the mechanism by which drug-induced impairments in decision-making emerge in substance-dependent individuals. mGlu5 receptor availability is lower across multiple brain regions in recently abstinent, stimulant-dependent individuals and in rats following limited cocaine exposure [65,77,78]. Recent studies in alcohol-dependent individuals, and in rats with an extended history of cocaine self-administration, however, have observed greater prefrontal mGlu5 receptor availability compared to controls [21,79,80] and have found that the degree of increase in mGlu5 receptor availability is related to the quantity of drug use [21]. Moreover, in rats, cocaine-induced changes in prefrontal mGlu5, but not midbrain D3, receptor availability are related to drug-induced impairments in negative outcome updating and relapse-like behaviors [21], suggesting that dysregulation of the mGlu5 receptors is likely to be involved in the decision-making problems and high rates of recidivism observed in substance-dependent individuals [81].

These data support an overarching scheme wherein the neurochemical abnormalities observed in substance-dependent individuals are associated with select reinforcement-learning mechanisms that underlie different stages of addiction pathology. Greater midbrain D3 receptor availability prior to any drug exposure is associated with disruptions in value updating following a positive outcome and is predictive of escalation in drug use [10,66]. Therefore, midbrain D3 receptor availability could be a biomarker of addiction susceptibility, and we expect that D3 receptor signaling would be altered in animal models known to have greater drug use susceptibility. Indeed, D3 antagonists have been reported to reduce heightened ethanol consumption in genetic rat models of alcohol use disorder [82,83]. We propose that the D3 receptor may be a mechanistic point of convergence mediating both genetic and environmental susceptibility to drug use, likely by influencing value updating following positive outcomes. Drug-induced adaptations of mGlu5 receptor signaling in the medial prefrontal cortex (mPFC), however, are associated with drug-induced impairments in value updating following a negative outcome that we hypothesize lead to the development of compulsive drug-taking behaviors [63]. Although no studies (to our knowledge) have directly examined the role of mGlu5 receptors in compulsive drug-taking behaviors, there is evidence that mGlu5 receptor antagonists attenuate compulsive-like behaviors in a mouse model of obsessive-compulsive disorder [84] that is known to have decision-making impairment [85]. Normalizing mGlu5 receptor signaling in the mPFC could, therefore, attenuate the compulsive patterns of drug-seeking and -taking in substance-dependent individuals by restoring aberrant value updating systems.

Interplay between circuit and biochemical mechanisms

The circuit and biochemical mechanisms mediating addiction-relevant reinforcement-learning computations are likely integrated. For example, high midbrain D3 receptor availability is associated with reduced functional connectivity between the OFC and neural networks known to be involved in cognitive control [86], suggesting that D3-expressing midbrain neurons may send projections to the same OFC circuits involved in decision-making functions. It is possible that D3-expressing midbrain neurons project to the same OFC neurons that receive input from the amygdala and these signals, jointly, encode the degree of updating that occurs following a positive outcome. Studies that utilize state-of-the-art circuit mapping techniques to visualize, manipulate, and/or record OFC neurons targeted by both amygdala and D3-expressing midbrain neurons could help to elucidate the mechanistic relationships between neurobiology and circuits.

While the relationship between mPFC mGlu5 receptor availability and OFC connectivity remains to be studied, greater mGlu5 receptor availability in the superior frontal cortex is associated with reduced functional connectivity between the superior frontal cortex and subcortical regions, such as the parahippocampal gyrus [87]. Drug-induced increases in mPFC mGlu5 receptor availability may, therefore, disrupt connectivity between mPFC and subcortical areas, such as the NAc, that could also be involved in poor value updating following a negative outcome or control a distinct type of aversive learning mechanism. Alternatively, mGlu5-positive mPFC neurons may project to the OFC and directly modulate activity within the OFC-NAc circuit. Dense collateral projections exist between the mPFC and OFC [8890] and previous work has demonstrated that the OFC can influence mPFC drive of the amygdala [91]. Drug-induced disruptions in negative outcome updating may not be due to impairments in the OFC-NAc circuit, but rather to disruptions in mPFC control of the OFC-NAc circuit. Future studies integrating transsynaptic viral techniques and in vivo calcium imaging in animal models of addiction could help to disentangle these circuit and neurobiological interactions and investigate regional cortical network activity.

Concluding remarks

Insights from computational analyses are crucial to development of a translational framework for elucidating the neurobiological mechanisms of addiction and other neuropsychiatric disorders. Descriptions of how reinforcement-learning computations are linked with neural, cellular, and molecular phenotypes will help to identify unique and shared mechanisms of decision-making processes across mental illnesses [1]. Translational studies that use parallel, multidimensional assessments of behavior combined with computational interrogation of reinforcement-learning mechanisms [92] will provide a powerful basis for cross-validation between human studies and animal models. For example, identification of the anatomical circuits responsible for encoding specific reinforcement-learning computations with invasive approaches in animals (e.g., calcium imaging, genetically encoded optical sensors, optogenetics) could be used to develop mechanistic neuroimaging biomarkers for assessing addiction pathology in humans. Single-cell RNA sequencing and/or protein analyses of reinforcement-learning circuits could identify new genetic markers of addiction susceptibility or consequence that could guide genetic comparisons in existing human datasets [93] and potentially identify new therapeutic targets for treating addiction. As the palette of optical and other neuroscience techniques expands and our understanding of how activity in neural circuits is encoded and regulated by neuromodulators improves, the relationship between circuit dynamics, behavior and computational biomarkers will continue to provide insights into addiction neurobiology.

Outstanding questions.

  • Are there multiple biological paths leading to disruptions in value updating for positive outcomes and, therefore, drug use susceptibility? For example, is addiction risk in some individuals more heavily controlled by genetic versus environmental mechanisms and could this knowledge help to identify individualized biomarkers of addiction risk? Could this also be used to develop individualized treatment regimens to improve prognosis?

  • What are the signaling mechanisms that regulate decision-making computations? How might these signaling pathways be controlled by genetic factors associated with addiction?

  • Are latent variables in reinforcement learning algorithms similarly processed for drug versus non-drug rewards such that value-updating mechanisms are conserved? Are identical neural circuits used?

  • How are distinct reinforcement learning mechanisms within separable neural circuits integrated? Do cortico-cortical networks modulate cortico-subcortical computations? How might this cortical control be altered by persistent drug exposure?

  • How does variation in select reinforcement-learning computations emerge in individuals, and are these computations stable across time? Is this variation linked to specific developmental and/or genetic mechanisms that could be therapeutically targeted to prevent escalation in drug use and the detrimental effects of drug use?

  • Can neurostimulation or behavioral/cognitive methods be employed to modify reinforcement-learning computations such that “meta-learning” strategies can be used to provide resiliency to drug use?

  • Can pharmacotherapies be used to alter latent variables and behavior such that negative and positive value-updating function can be targeted selectively and optimally? Could this be evidence for clinical utility of a theory-driven computational psychiatry approach?

Highlights.

Maladaptive decision making in substance-dependent populations reflects both preexisting risk factors and drug-induced adaptations in specific neurobiological processes.

Delineating the relevant decision-making processes is essential for identifying biological markers for addiction susceptibility and for developing novel targets for the treatment of addiction.

New neuroscience techniques probing learning mechanisms demonstrated that different decision-making computations are mediated by distinct neural, circuit, and cellular systems.

Computational biomarkers of decision-making functions are key for revealing the neurobiological mechanisms of addiction.

Acknowledgements

This work was funded by NIDA DA051598 (SMG), DA051977 (SMG), DA041480 (JRT), and DA043443 (JRT), and NIAAA AA012870 (JRT) and AA029454 (SLT). Additional support was provided by the State of Minnesota through its support of the Medical Discovery Team on Addiction at the University of Minnesota and the State of Connecticut, Department of Mental Health and Addiction Services through its support of the Ribicoff Research Facilities. The work described in this manuscript does not express the views of the Department of Mental Health and Addiction Services or the States of Connecticut or Minnesota. The views and opinions expressed are those of the authors.

Footnotes

Declaration of interests

The authors declare no competing interests in relation to this work.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • 1.Corlett P and Fletcher P (2014) Computational psychiatry: a Rosetta Stone linking the brain to mental illness. The lancet. Psychiatry 1, 399–402 [DOI] [PubMed] [Google Scholar]
  • 2.Lee D (2013) Decision making: from neuroscience to psychiatry. Neuron 78, 233–248 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Redish AD et al. (2008) Aunified framework for addiction: vulnerabilities in the decision process. Behav. Brain Sci. 31, 415–37; discussion 437–87 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Groman S et al. (2021) Unlocking the reinforcement-learning circuits of the orbitofrontal cortex. Behav. Neurosci. 135, 120–128 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Feeney EJ et al. (2017) Explaining Delusions: Reducing Uncertainty Through Basic and Computational Neuroscience. Schizophr. Bull. 43, 263–272 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Voon V et al. Model-Based Control in Dimensional Psychiatry., Biological Psychiatry, 82. (2017), Elsevier Inc, 391–400 [DOI] [PubMed] [Google Scholar]
  • 7.Liu S et al. (2020) Translation of Computational Psychiatry in the Context of Addiction. JAMA Psychiatry 77, 1099–1100 [DOI] [PubMed] [Google Scholar]
  • 8.Jentsch JD and Taylor JR (1999) Impulsivity resulting from frontostriatal dysfunction in drug abuse: implications for the control of behavior by reward-related stimuli. Psychopharmacol. 146, 373–390 [DOI] [PubMed] [Google Scholar]
  • 9.Verdejo-Garcia A et al. (2018) Stages of dysfunctional decision-making in addiction. Pharmacol. Biochem. Behav. 164, 99–105 [DOI] [PubMed] [Google Scholar]
  • 10.Groman S et al. (2020) Midbrain D 3 Receptor Availability Predicts Escalation in Cocaine Self-administration. Biol. Psychiatry 88, 767–776 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Perry JL et al. (2008) Impulsive choice as a predictor of acquisition of IV cocaine self- administration and reinstatement of cocaine-seeking behavior in male and female rats. Exp Clin Psychopharmacol 16, 165–177 [DOI] [PubMed] [Google Scholar]
  • 12.Blair MA et al. (2018) Blunted Frontostriatal Blood Oxygen Level–Dependent Signals Predict Stimulant and Marijuana Use. Biol. Psychiatry Cogn. Neurosci. Neuroimaging 3, 947–958 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Chen H et al. (2021) Model-Based and Model-Free Control Predicts Alcohol Consumption Developmental Trajectory in Young Adults: A 3-Year Prospective Study. Biol. Psychiatry 89, 980–989 [DOI] [PubMed] [Google Scholar]
  • 14.Ahmed SH (2018) Individual decision-making in the causal pathway to addiction: contributions and limitations of rodent models. Pharmacol. Biochem. Behav. 164, 22–31 [DOI] [PubMed] [Google Scholar]
  • 15.Averbeck B and O’Doherty J (2021) Reinforcement-learning in fronto-striatal circuits. Neuropsychopharmacology 10.1038/S41386-021-01108-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Groman SM et al. (2019) Orbitofrontal Circuits Control Multiple Reinforcement-Learning Processes. Neuron 103, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Costa VD et al. (2016) Amygdala and Ventral Striatum Make Distinct Contributions to Reinforcement Learning. Neuron 92, 505–517 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Gueguen MC et al. (2021) Computational theory-driven studies of reinforcement learning and decision-making in addiction: what have we learned? Curr. Opin. Behav. Sci. 38, 40–48 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Sweis BM et al. (2018) Beyond simple tests of value: measuring addiction as a heterogeneous disease of computation-specific valuation processes. Learn. Mem. 25, 501–512 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Groman SM et al. (2019) Model-Free and Model-Based Influences in Addiction-Related Behaviors. Biol. Psychiatry 85, 936–945 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Groman SM et al. (2020) Dysregulation of Decision Making Related to Metabotropic Glutamate 5, but Not Midbrain D3, Receptor Availability Following Cocaine Self-administration in Rats. Biol. Psychiatry 88, 777–787 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Zhukovsky P et al. (2019) Withdrawal from escalated cocaine self-administration impairs reversal learning by disrupting the effects of negative feedback on reward exploitation: a behavioral and computational analysis. Neuropsychopharmacology 44, 2163–2173 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Diao Z et al. (2021) Single Exposure to Cocaine Impairs Reinforcement Learning by Potentiating the Activity of Neurons in the Direct Striatal Pathway in Mice. Neurosci. Bull. 37, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Büchel C et al. (2017) Blunted ventral striatal responses to anticipated rewards foreshadow problematic drug use in novelty-seeking adolescents. Nat. Commun. 2017 81 8, 1–11 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Ersche KD et al. (2016) Carrots and sticks fail to change behavior in cocaine addiction. Science 352, 1468–71 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Parvaz MA et al. (2015) Impaired Neural Response to Negative Prediction Errors in Cocaine Addiction. J. Neurosci. 35, 1872–1879 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Tanabe J et al. (2013) Reduced neural tracking of prediction error in Substance-dependent individuals. Am. J. Psychiatry 170, 1356–1363 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Robinson A et al. (2021) Are methamphetamine users compulsive? Faulty reinforcement learning, not inflexibility, underlies decision making in people with methamphetamine use disorder. Addict. Biol. 26, [DOI] [PubMed] [Google Scholar]
  • 29.Lim T et al. (2021) Impaired learning from negative feedback in stimulant use disorder: Dopaminergic modulation. Int. J. Neuropsychopharmacol. 10.1093/IJNP/PYAB041 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Zhong N et al. (2020) Smaller Feedback-Related Negativity (FRN) Reflects the Risky Decision-Making Deficits of Methamphetamine Dependent Individuals. Front. psychiatry 11, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Smith R et al. (2020) Imprecise action selection in substance use disorder: Evidence for active learning impairments when solving the explore-exploit dilemma. Drug Alcohol Depend. 215, 108208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Niv Y (2009) Reinforcement learning in the brain. J. Math. Psychol. 53, 139–154 [Google Scholar]
  • 33.Sebold M et al. (2014) Model-Based and Model-Free Decisions in Alcohol Dependence. Neuropsychobiology 70, 122–131 [DOI] [PubMed] [Google Scholar]
  • 34.Voon V et al. (2015) Disorders of compulsivity: a common bias towards learning habits. Mol. Psychiatry 20, 345–352 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Wang JM et al. (2019) In Cocaine Dependence, Neural Prediction Errors During Loss Avoidance Are Increased With Cocaine Deprivation and Predict Drug Use. Biol. Psychiatry Cogn. Neurosci. Neuroimaging 4, 291–299 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Schoenbaum G et al. (2006) Orbitofrontal cortex, decision-making and drug addiction. Trends Neurosci. 29, 116–24 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Bolla Kl. et al. (2003) Orbitofrontal cortex dysfunction in abstinent cocaine abusers performing a decision-making task. Neuroimage 19, 1085–94 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Volkow ND et al. (1993) Decreased dopamine D2 receptor availability is associated with reduced frontal metabolism in cocaine abusers. Synapse 14, 169–177 [DOI] [PubMed] [Google Scholar]
  • 39.Kennerley SW and Wallis JD (2009) Encoding of reward and space during a working memory task in the orbitofrontal cortex and anterior cingulate sulcus. J. Neurophysiol. 102, 3352–64 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Sul JH et al. (2010) Distinct Roles of Rodent Orbitofrontal and Medial Prefrontal Cortex in Decision Making. Neuron 66, 449–460 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Abe H and Lee D (2011) Distributed Coding of Actual and Hypothetical Outcomes in the Orbital and Dorsolateral Prefrontal Cortex. Neuron 70, 731–741 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Stalnaker TA et al. (2018) Orbitofrontal neurons signal reward predictions, not reward prediction errors. Neurobiol. Learn. Mem. 153, 137–143 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Massi B et al. (2018) Volatility Facilitates Value Updating in the Prefrontal Cortex. Neuron 99, 598–608.e4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Haber SN et al. (1995) The orbital and medial prefrontal circuit through the primate basal ganglia. J. Neurosci. 15, 4851–67 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Morrison SE et al. (2011) Different Time Courses for Learning -Related Changes in Amygdala and Orbitofrontal Cortex. Neuron 71, 1127–1140 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Frank MJ and Claus ED (2006) Anatomy of a decision: Striato-orbitofrontal interactions in reinforcement learning, decision making, and reversal. Psychol. Rev. 113, 300–326 [DOI] [PubMed] [Google Scholar]
  • 47.Stalnaker T et al. (2021) Orbitofrontal State Representations Are Related to Choice Adaptations and Reward Predictions. J. Neurosci. 41, 1941–1951 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Cohen MX et al. (2008) Amygdala tractography predicts functional connectivity and learning during feedback-guided decision-making. Neuroimage 39, 1396–1407 [DOI] [PubMed] [Google Scholar]
  • 49.Gourley SL et al. (2016) The Medial Orbitofrontal Cortex Regulates Sensitivity to Outcome Value. J. Neurosci. 36, 4600–13 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Izquierdo A et al. (2013) Basolateral amygdala lesions facilitate reward choices after negative feedback in rats. J Neurosci 33, 4105–4109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Izquierdo A and Murray EA (2004) Combined unilateral lesions of the amygdala and orbital prefrontal cortex impair affective processing in rhesus monkeys. J Neurophysiol 91, 2023–2039 [DOI] [PubMed] [Google Scholar]
  • 52.Ma N et al. (2010) Addiction related alteration in resting-state brain connectivity. Neuroimage 49, 738–744 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Crane NA et al. (2018) Amygdala-orbitofrontal functional connectivity mediates the relationship between sensation seeking and alcohol use among binge-drinking adults. Drug Alcohol Depend. 192, 208–214 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Peters S et al. (2017) Amygdala-orbitofrontal connectivity predicts alcohol use two years later: a longitudinal neuroimaging study on alcohol use in adolescence. Dev. Sci. 20, e12448. [DOI] [PubMed] [Google Scholar]
  • 55.Hirokawa J et al. (2019) Frontal cortex neuron types categorically encode single decision variables. Nature 576, 446–451 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Schoenbaum G and Setlow B (2003) Lesions of nucleus accumbens disrupt learning about aversive outcomes. J. Neurosci. 23, 9833–41 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Setlow B et al. (2003) Neural Encoding in Ventral Striatum during Olfactory Discrimination Learning. Neuron 38, 625–636 [DOI] [PubMed] [Google Scholar]
  • 58.Qiu Y et al. (2013) Progressive White Matter Microstructure Damage in Male Chronic Heroin Dependent Individuals: A DTI and TBSS Study. PLoS One 8, e63212. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Li Y et al. (2017) Microstructures in striato-thalamo-orbitofrontal circuit in methamphetamine users. Acta radiol. 58, 1378–1385 [DOI] [PubMed] [Google Scholar]
  • 60.Bracht T et al. (2021) The role of the orbitofrontal cortex and the nucleus accumbens for craving in alcohol use disorder. Transl. Psychiatry 11, 1–10 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Moin Afshar N et al. (2020) Reinforcement learning during adolescence in rats. J. Neurosci. 40, 5857–5870. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Frazer K et al. (2018) Assessing cognitive functioning in individuals with cocaine use disorder. J. Clin. Exp. Neuropsychol. 40, 619–632 [DOI] [PubMed] [Google Scholar]
  • 63.Keip AJ et al. (2019) Unidirectional ablation of orbitofrontal-nucleus accumbens projections decreases sensitivity to negative outcomes in methamphetamine self-administering rats. Soc. Neurosci. Annu. Meet. Chicago, IL. [Google Scholar]
  • 64.Payer DE et al. (2014) Heightened D3 dopamine receptor levels in cocaine dependence and contributions to the addiction behavioral phenotype: a positron emission tomography study with [11C]-+-PHNO. Neuropsychopharmacology 39, 311–8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Martinez D et al. (2014) Imaging glutamate homeostasis in cocaine addiction with the metabotropic glutamate receptor 5 positron emission tomography radiotracer [(11)C]ABP688 and magnetic resonance spectroscopy. Biol. Psychiatry 75, 165–71 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Groman SM et al. (2016) Dopamine D3 Receptor Availability Is Associated with Inflexible Decision Making. J. Neurosci. 36, 6732–41 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Fiorillo CD et al. (2003) Discrete coding of reward probability and uncertainty by dopamine neurons. Science (80-.). 299, 1898–1902 [DOI] [PubMed] [Google Scholar]
  • 68.Saunders BT et al. (2018) Dopamine neurons create Pavlovian conditioned stimuli with circuit-defined motivational properties. Nat. Neurosci. 21, 1072–1083 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Parker NF et al. (2016) Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target. Nat. Neurosci. 19, 845–854 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Verharen JPH et al. (2018) A neuronal mechanism underlying decision-making deficits during hyperdopaminergic states. Nat. Commun. 9, 731. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Barrus MM and Winstanley CA (2016) Dopamine D3 Receptors Modulate the Ability of Win-Paired Cues to Increase Risky Choice in a Rat Gambling Task. J. Neurosci. 36, 785–94 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.André MAE et al. (2015) The metabotropic glutamate receptor, mGlu5, is required for extinction learning that occurs in the absence of a context change. Hippocampus 25, 149–158 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Fontanez-Nuin DE et al. (2011) Memory for Fear Extinction Requires mGluR5-Mediated Activation of Infralimbic Neurons. Cereb. Cortex 21, 727–735 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Joffe ME et al. (2019) Mechanisms underlying prelimbic prefrontal cortex mGlu3/mGlu5-dependent plasticity and reversal learning deficits following acute stress. Neuropharmacology 144, 19–28 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Xu J et al. (2009) mGluR5 has a critical role in inhibitory learning. J. Neurosci. 29, 3676–84 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Zeleznikow-Johnston AM et al. (2018) Touchscreen testing reveals clinically relevant cognitive abnormalities in a mouse model of schizophrenia lacking metabotropic glutamate receptor 5. Sci. Reports 2018 81 8, 1–10 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Milella MS et al. (2014) Limbic system mGluR5 availability in cocaine dependent subjects: A high-resolution PET [11C]ABP688 study. Neuroimage 98, 195–202 [DOI] [PubMed] [Google Scholar]
  • 78.deLaat B et al. (2018) Glutamatergic Biomarkers for Cocaine Addiction: A Longitudinal Study Using MR Spectroscopy and mGluR5 PET in Self-Administering Rats. J. Nucl. Med. 59, 952–959 [DOI] [PubMed] [Google Scholar]
  • 79.Gobin C et al. (2019) Neurobiological substrates of persistent working memory deficits and cocaine-seeking in the prelimbic cortex of rats with a history of extended access to cocaine self-administration. Neurobiol. Learn. Mem. 161, 92–105 [DOI] [PubMed] [Google Scholar]
  • 80.Hillmer A et al. Longitudinal imaging of metabotropic glutamate 5 receptors during alcohol abstinence. Neuropsychopharmacology [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Petzold J et al. (2021) Targeting mGlu 5 for Methamphetamine Use Disorder. Pharmacol. Ther. 224, [DOI] [PubMed] [Google Scholar]
  • 82.Thanos PK et al. (2005) The selective dopamine D3 receptor antagonist SB-277011-A attenuates ethanol consumption in ethanol preferring (P) and non-preferring (NP) rats. Pharmacol. Biochem. Behav. 81, 190–7 [DOI] [PubMed] [Google Scholar]
  • 83.Amancio-Belmont O et al. (2020) Maternal separation plus social isolation during adolescence reprogram brain dopamine and endocannabinoid systems and facilitate alcohol intake in rats. Brain Res. Bull. 164, 21–28 [DOI] [PubMed] [Google Scholar]
  • 84.Ade KK et al. (2016) Increased Metabotropic Glutamate Receptor 5 Signaling Underlies Obsessive-Compulsive Disorder-like Behavioral and Striatal Circuit Abnormalities in Mice. Biol. Psychiatry 80, 522–533 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Manning EE et al. (2018) Impaired instrumental reversal learning is associated with increased medial prefrontal cortex activity in Sapap3 knockout mouse model of compulsive behavior. Neuropsychopharmacology 10.1038/s41386-018-0307-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Cole DM et al. (2012) Orbitofrontal connectivity with resting-state networks is associated with midbrain dopamine D3 receptor availability. Cereb. Cortex 22, 2784–93 [DOI] [PubMed] [Google Scholar]
  • 87.Kim JH et al. (2019) In vivo metabotropic glutamate receptor 5 availability-associated functional connectivity alterations in drug-naïve young adults with major depression. Eur. Neuropsychopharmacol. 29, 278–290 [DOI] [PubMed] [Google Scholar]
  • 88.Murphy MJM and Deutch AY (2018) Organization of afferents to the orbitofrontal cortex in the rat. J. Comp. Neurol. 526, 1498–1526 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Barreiros IV et al. (2021) Organization of Afferents along the Anterior–posterior and Medial–lateral Axes of the Rat Orbitofrontal Cortex. Neuroscience 460, 53–68 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Hoover WB and Vertes RP (2011) Projections of the medial orbital an dventral orbital cortex in the rat. J. Comp. Neurol. 519, 3766–3801 [DOI] [PubMed] [Google Scholar]
  • 91.Chang C. andGrace AA (2018)Inhibitory Modulation of Orbitofrontal Cortex on Medial Prefrontal Cortex–Amygdala Information Flow. Cereb. Cortex (New York, NY) 28, 1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Reed EJ et al.(2020)Paranoia as a adeficit in non-social belief updating .Elife 9, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Karlsson Linnér R et al.(2021)Multivariate analysis of 1.5 million people identifies genetic associations with traits related to self-regulation and addiction. Nat. Neurosci. 2021 2410 24, 1367–1376 [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES