Skip to main content
Journal of Neurophysiology logoLink to Journal of Neurophysiology
. 2014 Sep 3;113(10):3459–3461. doi: 10.1152/jn.00600.2014

Intelligence moderates reinforcement learning: a mini-review of the neural evidence

Chong Chen 1,
PMCID: PMC4455485  PMID: 25185818

Abstract

Our understanding of the neural basis of reinforcement learning and intelligence, two key factors contributing to human strivings, has progressed significantly recently. However, the overlap of these two lines of research, namely, how intelligence affects neural responses during reinforcement learning, remains uninvestigated. A mini-review of three existing studies suggests that higher IQ (especially fluid IQ) may enhance the neural signal of positive prediction error in dorsolateral prefrontal cortex, dorsal anterior cingulate cortex, and striatum, several brain substrates of reinforcement learning or intelligence.

Keywords: intelligence, reinforcement learning, prediction error, model based


computation theory of reinforcement learning (RL) has been one of the most frequently used instruments for explaining decision making and learning (Daw et al. 2011; Smittenaar et al. 2013). RL, the process of optimizing reward under the reinforcement of prediction error (PE), the difference between received and expected value, constitutes a major form of experience learning essential for all human strivings. It was discovered that dopamine neuron signals PE so that its activity is enhanced by positive PE but suppressed by negative PE. Subjects would use these PE signals to update value functions and guide their future behaviors. This type of RL is termed model-free or habitual/automatic RL, since subjects rely entirely on PE, i.e., the historical trial-and-error experience to guide learning. In contrast, in model-based or goal-directed/controlled RL, upon gating signals of PE, subjects use higher order cognitive maps or learned models to predict future outcomes, and thus could update value functions more flexibly (Daw et al. 2011; Smittenaar et al. 2013).

Recent neuroimaging studies have greatly elucidated the neural basis of RL, in which model-free RL typically involves dopamine-rich striatum, whereas model-based RL also extends to dorsolateral prefrontal cortex (PFC) (Smittenaar et al. 2013), medial PFC (Daw et al. 2011), anterior cingulate cortex (ACC), and orbitofrontal cortex (Rushworth et al. 2012), etc. Notably, most of these areas have also been linked to intelligence, another key factor contributing to human strivings. The volume of left striatum is positively correlated with IQ (MacDonald et al. 2014), whereas dorsolateral and medial PFC and ACC together with several other parietofrontal areas constitute the structural and functional substrates of intelligence (Deary et al. 2010). Indeed, further investigation does reveal a promising association between intelligence and RL.

It has been suggested that RL may reflect stable individual difference as a trait (Cohen 2007). It is most likely that intelligence accounts for this stable trait, given intelligence consistently predicts performance in almost all fields of human strivings, especially learning and education (Nisbett 2009). Furthermore, intelligence is commonly perceived as consisting of crystallized (the whole stored knowledge which can be used to solve problems) and fluid intelligence (the acquired ability to solve novel problems that depend little on whole knowledge) (Nisbett 2009), both of which could be used to construct cognitive maps and learned models, thus contributing to model-based RL.

In light of the above reasoning, the purpose of this Neuro Forum is to review three functional magnetic resonance imaging studies (Hawes et al. 2014; Schlagenhauf et al. 2013; Van den Bos et al. 2012) that have examined the link between intelligence and RL.

General Findings

Van den Bos et al. (2012) studied a sample of adolescents (age 13–16 yr, mean 14.39 yr; 23 males, 22 females; IQ 70–130, as measured by the similarities and block design subscales of the Wechsler Intelligence Scale for Children) using a probabilistic learning task. The task included 50 AB and 50 CD trials, and the feedback was probabilistic in that choosing A led to positive feedback in 80% of AB trials, whereas choosing B led to positive feedback in only 20% of AB trials. Similarly, choosing C and D led to positive feedback in 70% and 30% of CD trials, respectively. Since choosing A or C led to positive feedback more often, receiving negative feedback after choosing A or C would be unexpected and generate a negative PE. Since choosing B or D led to negative feedback more often, receiving positive feedback after choosing B or D would generate a positive PE.

The authors then examined the blood oxygen level-dependent (BOLD) signals following positive and negative PEs. They found that higher IQ was related to accentuated activation in right dorsolateral PFC and dorsal ACC following positive PEs; however, this was not the case for negative PEs. Furthermore, educational level moderated the correlation such that dorsolateral PFC activity was correlated with IQ in both pre-vocational (mean IQ 91.3 ± 2.4; r = 0.54) and pre-university subjects (mean IQ 107 ± 2.0; r = 0.59), but dorsal ACC activity was only correlated with IQ in pre-university subjects (r = 0.44).

Behaviorally, the authors observed that IQ was positively correlated with the number of correct choices in the last 60 trials (r = 0.48). In addition, IQ was positively correlated with win-stay choice strategy after expected positive feedback (r = 0.54) and negatively correlated after unexpected positive feedback (r = −0.43). Clearly, after the rules of the task are recognized, keeping the same choice after expected positive feedback and shifting the choice after unexpected positive feedback would be a more optimal strategy.

Schlagenhauf et al. (2013) used a reversal learning task that is frequently used to generate PEs. Subjects (age 22–61 yr, mean 36.9 ± 12.4 yr; 28 males) were asked to choose 1 of 2 abstract targets for 200 trials, and reward and loss were determined by three types of rules. In rule type 1, a reward was delivered if less than 80% of the chosen target had been rewarded, and a punishment occurred otherwise. In rule type 2, the probability reversed to 20%, whereas in rule type 3 it switched to 50%. The rule types changed after 16 trials or 10 trials if subjects had made 70% correct choices. A temporal sequence of PEs was calculated for each subject, and BOLD neuronal activity was modeled with trial-by-trial PE as the modulator. In this way the authors extracted the mean neural PE signals from bilateral ventral striatum. Fluid IQ was derived from factor analysis of nine tests targeting cognitive speed, attention and executive function, working memory, episodic memory, and reasoning. Crystallized IQ was measured with a verbal knowledge test.

The hidden rules of the complex reversal learning task might inhibit those with higher IQ to fulfill their potential and thus achieve better performance, because the authors failed to find associations between IQ scores and correct responses. However, they did find significant positive correlation between fluid IQ and mean BOLD PE signal in bilateral ventral striatum, even after controlling for age. Further analysis revealed that attention and reasoning underlay this correlation. Because the authors extracted the mean PE signals without differentiating positive and negative PEs, this finding might result from a positive correlation between IQ and striatal signals of positive PE and/or a negative correlation between IQ and striatal signals of negative PE.

Unlike the previous two studies, Hawes et al. (2014) employed a manipulated task so that each subject (age 18–38 yr, mean 22 yr; 94 males; IQ 95.5–148.0, as measured by the Wechsler Abbreviated Scale of Intelligence) underwent the same reinforcement history. Subjects had to guess whether an upcoming number would be low (1, 2, 3) or high (4, 5, 6) and received a $2 reward for each correct guess and −$1 punishment for each incorrect guess. Unknown to the subjects, however, the computer responded in such a way that each subject received the same sequence of 20 rewards and 20 losses.

The authors found that %BOLD changes in striatum (caudate) following loss were positively predicted by IQ (β = 0.87) (less negative %BOLD change with higher IQ), which did not change significantly after the authors controlled for caudate volume, age, and working memory performance. Furthermore, %BOLD change following loss was correlated with IQ in posterior cingulate cortex (r = 0.20, P < 0.051, which should be taken with caution due to statistical issues), whereas %BOLD changes following reward and loss were both correlated with IQ in ventromedial PFC and left inferior frontal cortex (r ranges from 0.30 to 0.36).

Given the pseudorandom nature of the task, we simply have no idea what predictions subjects would have made. Nevertheless, since subjects would have been expecting reward before receiving feedback, reward feedback would generate a positive PE and loss feedback would generate a negative PE. The findings by Hawes et al. (2014) thus could be interpreted as indicating that more intelligent subjects showed enhanced neural signals of positive PE in ventromedial PFC and left inferior frontal cortex, and lessened neural signals of negative PE in ventromedial PFC, left inferior frontal cortex, striatum, and posterior cingulate cortex.

Alternatively, these BOLD signals might reflect subjects' emotional response. Both ventromedial PFC and striatum are involved in the receipt of reward and loss, the activity of which may reflect the perceived value magnitude as well as the resulting subjective experience (Diekhof et al. 2012). Moreover, posterior cingulate cortex is also implicated in happy and sad moods (Nielsen et al. 2005). It is likely that intelligence enhances positive emotions following reward and buffers negative emotions following loss. The latter is consistent with the observation that lower IQ exposes individuals to higher risk of developing posttraumatic stress disorder (Bomyea et al. 2012). However, this interpretation in terms of emotion is not in conflict with that of PE, because emotion and PE may coexist.

Behaviorally, Hawes et al. (2014) demonstrated that subjects with higher IQ considered more historical information. Specifically, more intelligent subjects were influenced by feedback one and two periods back; i.e., they tended to guess three combinations (high-high-high, high-high-low, etc.) in three trials in a row as the potential rule of the task, whereas less intelligent subjects were primarily influenced by feedback only one period back; i.e., they tended to guess two combinations (high-high, high-low, low-high, low-low) in two trials in order as the rule.

IQ and RL: a Summary of the Neural Findings

Despite employing different tasks among different populations, these three studies provided pioneering insights into the role of intelligence in RL. Specifically, Van den Bos et al. (2012) demonstrated that higher IQ was associated with accentuated activation in right dorsolateral PFC and dorsal ACC following positive PEs, especially in more intelligent subjects, whereas Schlagenhauf et al. (2013) and Hawes et al. (2014) suggested that higher IQ may be associated with enhanced neural signals following positive PE in striatum, ventromedial PFC, and left inferior frontal cortex, and lessened neural signals of negative PE (less reduced activation) in striatum, ventromedial PFC, left inferior frontal cortex, and posterior cingulate cortex.

Explanation and Implication

Given the tasks used by these studies, we could not differentiate model-free and model-based RL in the above findings (Daw et al. 2011). Furthermore, although originally it was proposed that striatal PE signals reflect exclusively model-free RL, more recent research has revealed that striatum encodes model-based RL, as well (Daw et al. 2011). In contrast, signals in dorsolateral and medial PFC and dorsal ACC may indicate model-based RL (Daw et al. 2011; Rushworth et al. 2012; Smittenaar et al. 2013). Thus the above findings suggest that intelligence enhances model-based RL, although it may also improve model-free RL, confirming the observation that RL reflects stable individual difference as a trait (Cohen 2007).

In other words, the enhanced brain activation following positive PEs and less reduced activation following negative PEs may reflect the fact that higher IQ subjects, especially those with higher fluid IQ, were actively processing information to construct cognitive maps using model-based RL. This is especially true in the context of the following literature. Dorsolateral PFC contributes to working memory, whereas ACC monitors conflict (Deary et al. 2010; Van den Bos et al. 2012). Left inferior frontal cortex is related to semantic search and selection among competitive representations (Bookheimer 2002), whereas posterior cingulate cortex encodes and retrieves episodic memory (Nielsen et al. 2005).

This explanation fits well with the fact that when facing complex and difficult problems, people with higher IQ generally make more effort and show higher brain activation (Deary et al. 2010). It is also in line with and may well explain the behaviors observed by Van den Bos et al. (2012) that more intelligent subjects achieved better performance and showed more optimal shifting behaviors after receiving positive feedback, and by Hawes et al. (2014) that more intelligent subjects considered more historical information.

Moreover, based on the association between IQ and striatal signals, it is also likely that intelligence amplifies model-free RL and thus both positive and negative PEs, but the enhanced negative PEs were buffered or even overridden by model-based RL, resulting in the distinct findings regarding positive and negative PEs.

Finally, a more recent study (Lee et al. 2014) has shown that inferior frontal cortex encodes the reliability of PE signals and may act as an arbitrator determining whether model-free or model-based RL takes control. Consequently, the accentuated activation of left inferior frontal cortex in more intelligent subjects found by Hawes et al. (2014) suggests a possibility that intelligence enhances this arbitration process.

Outlook

Though limited, converging evidence supports a promising role of intelligence in RL. It will be stimulating for future studies to confirm and further elucidate this role. As a major concern, to clarify and differentiate the effect of intelligence on model-free and model-based RL, future studies should use tasks that dissociate these two types of RL (Daw et al. 2011). Furthermore, since intelligence may affect positive and negative PEs in different ways, it is also preferable to analyze them separately. Finally, because the reviewed studies were correlational in nature, to reveal the causality future research could employ experimental manipulation of neural signals or intelligence training (Nisbett 2009).

DISCLOSURES

No conflicts of interest, financial or otherwise, are declared by the author.

AUTHOR CONTRIBUTIONS

C.C. conception and design of research; C.C. analyzed data; C.C. drafted manuscript; C.C. approved final version of manuscript.

ACKNOWLEDGMENTS

I thank Peter Dayan and Nathaniel Daw for their shared ideas and discussion, and Atsuhito Toyomaki for comments on the previous manuscript.

REFERENCES

  1. Bomyea J, Risbrough V, Lang AJ. A consideration of select pre-trauma factors as key vulnerabilities in PTSD. Clin Psychol Rev 32: 630–641, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bookheimer S. Functional MRI of language: new approaches to understanding the cortical organization of semantic processing. Annu Rev Neurosci 25: 151–188, 2002. [DOI] [PubMed] [Google Scholar]
  3. Cohen MX. Individual differences and the neural representations of reward expectation and reward prediction error. Soc Cogn Affect Neurosci 2: 20–30, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ. Model-based influences on humans' choices and striatal prediction errors. Neuron 69: 1204–1215, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Deary IJ, Penke L, Johnson W. The neuroscience of human intelligence differences. Nat Rev Neurosci 11: 201–211, 2010. [DOI] [PubMed] [Google Scholar]
  6. Diekhof EK, Kaps L, Falkai P, Gruber O. The role of the human ventral striatum and the medial orbitofrontal cortex in the representation of reward magnitude–an activation likelihood estimation meta-analysis of neuroimaging studies of passive reward expectancy and outcome processing. Neuropsychologia 50: 1252–1266, 2012. [DOI] [PubMed] [Google Scholar]
  7. Hawes DR, DeYoung CG, Gray JR, Rustichini A. Intelligence moderates neural responses to monetary reward and punishment. J Neurophysiol 111: 1823–1832, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Lee SW, Shimojo S, O'Doherty JP. Neural computations underlying arbitration between model-based and model-free learning. Neuron 81: 687–699, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. MacDonald PA, Ganjavi H, Collins DL, Evans AC, Karama S. Investigating the relation between striatal volume and IQ. Brain Imaging Behav 8: 52–59, 2014. [DOI] [PubMed] [Google Scholar]
  10. Nielsen FA, Balslev D, Hansen LK. Mining the posterior cingulate: segregation between memory and pain components. Neuroimage 27: 520–532, 2005. [DOI] [PubMed] [Google Scholar]
  11. Nisbett RE. Intelligence and How to Get It: Why Schools and Cultures Count. New York: Norton, 2009. [Google Scholar]
  12. Rushworth MF, Kolling N, Sallet J, Mars RB. Valuation and decision-making in frontal cortex: one or many serial or parallel systems? Curr Opin Neurobiol 22: 946–955, 2012. [DOI] [PubMed] [Google Scholar]
  13. Schlagenhauf F, Rapp MA, Huys QJ, Beck A, Wüstenberg T, Deserno L, Buchholz HG, Kalbitzer J, Buchert R, Bauer M, Kienast T, Cumming P, Plotkin M, Kumakura Y, Grace AA, Dolan RJ, Heinz A. Ventral striatal prediction error signaling is associated with dopamine synthesis capacity and fluid intelligence. Hum Brain Mapp 34: 1490–1499, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Smittenaar P, FitzGerald TH, Romei V, Wright ND, Dolan RJ. Disruption of dorsolateral prefrontal cortex decreases model-based in favor of model-free control in humans. Neuron 80: 914–919, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Van den Bos W, Crone EA, Guroglu B. Brain function during probabilistic learning in relation to IQ and level of education. Dev Cogn Neurosci 2, Suppl 1: S78–S89, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Journal of Neurophysiology are provided here courtesy of American Physiological Society

RESOURCES