Highlights
-
•
We use a ‘virtual maze’ to elicit the reward positivity in 8–23 year olds.
-
•
Reward positivity amplitude is comparable across children, adolescents and adults.
-
•
Children have longer latencies for P200, N200 and P300 ERP components.
-
•
We propose that dopamine reinforcement signals develop early to guide behavior.
Keywords: Cognitive control, Reinforcement learning, Development, Reward positivity, Anterior cingulate cortex, Dopamine
Abstract
Children and adolescents learn to regulate their behavior by utilizing feedback from the environment but exactly how this ability develops remains unclear. To investigate this question, we recorded the event-related brain potential (ERP) from children (8–13 years), adolescents (14–17 years) and young adults (18–23 years) while they navigated a “virtual maze” in pursuit of monetary rewards. The amplitude of the reward positivity, an ERP component elicited by feedback stimuli, was evaluated for each age group. A current theory suggests the reward positivity is produced by the impact of reinforcement learning signals carried by the midbrain dopamine system on anterior cingulate cortex, which utilizes the signals to learn and execute extended behaviors. We found that the three groups produced a reward positivity of comparable size despite relatively longer ERP component latencies for the children, suggesting that the reward processing system reaches maturity early in development. We propose that early development of the midbrain dopamine system facilitates the development of extended goal-directed behaviors in anterior cingulate cortex.
1. Introduction
Impulsive behaviors are a hallmark of childhood and adolescence but typically subside in adulthood. This transition is thought to arise from the asynchronous development of two neural systems, first by a “bottom-up” system motivated by immediate rewards, followed by a “top-down” system for cognitive control that regulates impulsive behavior (Casey et al., 2005, Casey et al., 2008, Spear, 2013, Geier, 2013). Brain regions supporting inhibitory control such as prefrontal cortex (PFC) and dorsal anterior cingulate cortex (ACC) exhibit protracted development (Fuster, 2002, Geier, 2013) and increasing task-relevant activation (Ordaz et al., 2013) throughout this period. Consistent with dual-systems models of control (Hofmann et al., 2009), PFC is believed to facilitate execution of task-appropriate behavior by applying control signals that bias information processing in the basal ganglia (BG) and other brain areas (Miller and Cohen, 2001). By contrast, ACC is central to several theories of cognitive control but its specific function remains controversial (Mars et al., 2011).
We have recently proposed that ACC motivates the selection and execution of extended goal-directed behaviors according to principles of hierarchical reinforcement learning (Holroyd and Yeung, 2012). On this account, ACC temporally integrates the value of reward signals carried by the midbrain dopamine (DA) system to learn which tasks are most worth performing, and then selects particular tasks for execution based on the learned values. Once a task is selected, ACC directs PFC to apply top-down control over task execution by the BG and other brain areas (Holroyd and Yeung, 2012, Holroyd, 2013; see also Holroyd and McClure, submitted for publication, Umemoto and Holroyd, submitted for publication). This theory develops a previous proposal that ACC uses reward prediction error (RPE) signals carried by the midbrain DA system to learn the value of action policies (Holroyd and Coles, 2002, Holroyd and Yeung, 2012). It has been suggested that phasic increases in DA activity encode positive RPE signals that indicate when ongoing events are better than expected, and phasic decreases in DA activity encode negative RPE signals that indicate when ongoing events are worse than expected (Schultz et al., 1997), which shape behavior adaptively according to principles of reinforcement learning (Sutton and Barto, 1998). We might therefore expect both ACC and DA to play key roles in the development of behavioral regulation.
The ability to learn from reinforcement continues to develop into adolescence in parallel with the development of self-regulatory control (Crone et al., 2004, Huizinga et al., 2006, van den Bos et al., 2012). During this period connections between PFC and striatum are refined through pruning and enhanced axonal connectivity (Rubia, 2012). Further, the relatively prolonged development of ACC (Crone et al., 2008, Fjell et al., 2012) appears to be responsible for age-related improvements in self-regulation (Velanova et al., 2008). Although the development of the DA system is complex and poorly understood, changes in the relative density of DA receptors in cortical and subcortical structures have been observed (Wahlstrom et al., 2010). Additionally, it has been proposed that increases in tonic DA levels during adolescence encourage exploratory behaviors, allowing for greater exposure to rewarding stimuli (Luciana et al., 2012). Research with rodents has also indicated that tonic dopamine levels code for average reward rate that may be important for motivating behavior (Niv, 2007) and for promoting cognitive flexibility (Floresco, 2013). As learning from explicit rewards has been shown to be dependent on phasic DA responses (Schultz, 2013), it is possible that the simultaneous maturation of the ACC and DA systems may facilitate the development of a cognitive mechanism for reinforcement learning and control.
This developmental trajectory may be evident in a component of the event-related brain potential (ERP) called the reward positivity, which we have proposed reflects the impact of DA RPE signals on ACC for the purpose of adaptive decision making (Holroyd and Coles, 2002, Walsh and Anderson, 2012). Also known as the feedback error-related negativity or feedback-related negativity, the reward positivity appears around 250 ms following the presentation of feedback stimuli, is characterized by a frontal–central scalp distribution, and is sensitive to the valence of feedback stimuli (Miltner et al., 1997). Recent developments of this idea hold that the difference between ERPs elicited by positive and negative feedback results from dopaminergic modulation of the amplitude of the N200, a negative-going ERP component produced in ACC that is generated by unexpected task-relevant events. According to this position, unexpected rewards produce a phasic increase in DA that suppresses the N200, resulting in the reward positivity (Holroyd et al., 2008b; see also Baker and Holroyd, 2011, Hajihosseini and Holroyd, 2013).
The reward positivity provides a means for assessing the developmental trajectory of behavioral regulation but to date only a few studies have examined this ERP component in typically-developing children and adolescents. In pre-school aged children, Mai and colleagues (2011) found no difference in the amplitudes of the ERPs elicited by positive and negative feedback. Eppinger et al. (2009) reported that, relative to young adults, 10–12 year old children produced larger N200 amplitudes to negative feedback, whereas Hämmerer et al. (2011) observed that 9–11 year old children produced larger N200 amplitudes to both positive and negative feedback. Of four studies that examined the reward positivity in adolescents and young adults, three reported no difference between adolescents (13–14, 16–17 and 15–17, respectively) and young adults (Hämmerer et al., 2011, Santesso et al., 2011, Yi et al., 2012) and the fourth study found that male adolescents (14–17) produced a relatively smaller reward positivity (Zottoli and Grose-Fifer, 2012).
These mixed results could stem in part from varying approaches to measuring the reward positivity (see Section 4 below), or to the use of tasks with relatively complex schedules for reward probability and magnitude that could exacerbate the potential for component overlap with other, non-reward related ERP components (San Martin, 2012). Given that the reward positivity is said to index neural systems critical to the development of self-regulation, that it is used increasingly to study atypical development (e.g., Holroyd et al., 2008a), and that ERP morphology differs widely between children and adults (Johnstone et al., 2005, Coch and Gullick, 2012), it is important to establish how the reward positivity develops in a typical population. For these reasons we recorded the ERP from children, adolescents and young adults as they searched for rewards in a relatively engaging “virtual maze” task that produces a canonical reward positivity (Baker and Holroyd, 2009). We predicted that reward positivity amplitude would increase with age, reflecting the developing maturity of the cognitive control system.
2. Method
2.1. Participants
For the purposes of statistical comparison, 60 participants were categorized into three groups based on age: 20 children ages 8–13 (10.0 ± 1.7 years, 11 males), 20 adolescents ages 14–17 (15.6 ± 1.0 years, 10 males), and 20 adults ages 18–23 (19.7 ± 1.4 years, 7 males). Two additional participants were excluded due to incomplete data. Children and adolescents were recruited through a local newspaper ad, fliers posted throughout the community and Facebook event advertisements. The adult sample was obtained through the University of Victoria psychology participant pool. All participants received a performance-related bonus of CDN $5 at the end of the task (see below). In addition, at the conclusion of the experiment, university students received course credit, adolescents received CDN $14 ($7.00/h), and children and their parents received small honorariums of CDN $5 and CDN $10, respectively, for their time. All participants were asked to provide informed consent and/or assent as approved by the local research ethics committee. None of the participants reported a history of head injury or concussion; all participants were right handed and had normal or corrected-to-normal vision. This experiment was conducted in accordance with the ethical standards prescribed in the 1964 Declaration of Helsinki.
2.2. Task
Participants engaged in a “virtual maze” pseudo trial-and-error learning task that has been previously described in detail (Baker and Holroyd, 2009). Briefly, participants were required to navigate through a computer based T-maze by selecting right or left turns. A stimulus at the end of each alley indicated whether the participant earned 5 cents (reward) or 0 cents (no reward) on that trial. Participants were encouraged to maximize their earnings by choosing the alley where they believed the reward was located on that trial. Feedback stimuli indicating reward and no-reward consisted of images of an apple and of an orange and were counterbalanced with respect to reward value across participants. Participants completed a total of 200 trials. The probability of finding a reward on each trial was 50%, which is a standard probability used to elicit a robust reward positivity (Nieuwenhuis et al., 2004). To foster task engagement, participants were given their accumulated earnings halfway through the task and the remainder at the end of the task (CDN $5 in total).
2.3. Data acquisition and analysis
EEG was recorded using BrainVision Recorder Software (Brainproducts, GmbH, Munich, Germany) in accordance to the extended international 10–20 system (Jasper, 1958). For the adults and adolescents, a montage of 36 electrode sites was used. For the children, a reduced montage of 19 electrode sites was used to minimize participant sitting time. Signals were acquired using Ag/AgCl ring electrodes mounted in a nylon cap with an abrasive, conductive gel. For the purpose of artifact correction, the horizontal electrooculogram (EOG) was recorded from the external canthi of both eyes, and the vertical EOG was recorded from the suborbit of the right eye and electrode channel Fp2. Two electrodes were placed on the right and left mastoids and inter-electrode impedances were maintained below 10 kΩ. The EEG data were sampled at a rate of 250 Hz and amplified by low-noise electrode differential amplifiers with a frequency response of DC 0.017–67.5 Hz (90 dB octave roll off).
Post-processing was performed using Brain Vision Analyzer software (Brainproducts GmbH, Munich, Germany). The EEG data were filtered using a 4th order digital Butterworth filter with a passband of 10–20 Hz. An 800 ms epoch of data extending from 200 ms prior to 600 ms following the onset of each feedback stimulus was used for analysis. Ocular artifacts were corrected using the eye movement correction algorithm described by Gratton et al. (1983). EEG data were re-referenced to linked mastoid electrodes and baseline corrected by subtracting from each sample the average activity recorded at that electrode during the 200 ms interval preceding onset of the stimulus. Muscular and other artifacts were removed using a ±150 μV level threshold and a ±35 μV step threshold as rejection criteria. The Hjorth nearest-neighbor correction was applied to excessively noisy data for individual channels. The EEG data were segmented for each participant and electrode by averaging the single-trial EEG based on type of feedback (reward, no-reward). Finally, grand averages were created by averaging the trials by condition for all participants in each age group.
2.4. Statistical analysis
For each participant and each channel the average ERP waveform elicited by reward feedback was subtracted from that of the corresponding no-reward feedback to create a difference wave (Holroyd and Coles, 2002, Miltner et al., 1997). The reward positivity was measured as the difference between reward and no reward conditions at channel FCz, where it typically reaches maximum amplitude (Walsh and Anderson, 2012). Reward positivity amplitude was measured as the mean activity of the difference wave within a 250–350 ms window post-stimulus, as determined by maximal reward positivity amplitude in the grand average difference waves. Reward positivity latency was defined as the time when the difference wave was most negative at channel FCz within the 250–350 ms window following feedback onset.
Visual inspection of the ERPs suggested greater variability in reward positivity (difference-wave) latency for the children compared to the adults and adolescents (see below). Therefore, for the purpose of illustration, we created new grand averages from the latency-jitter corrected (LJC) ERPs. LJC grand average ERPs were created by averaging across participants, separately for each age group, the ERPs at channel FCz locked to the time of reward positivity maximum, within a window extending 250 ms before to 250 ms after the maximum. Likewise, grand average LJC scalp distributions were created by averaging across participants, separately for each age group, the scalp distributions at the time of reward positivity maximum. Note that across-participant LJC modifies the appearance of the grand average but does not affect the underlying statistics.
For the purpose of comparison we also analyzed the amplitudes and latencies of other ERP components that occur in the “raw” ERPs during the time period of the reward positivity but that are removed by the difference wave approach. First, to determine raw ERP component amplitudes, we extracted the mean voltages in three 100 ms windows (100–200 ms, 200–300 ms, and 300–400 ms), which correspond roughly to the timing of the components of interest, at channels FCz and Pz, which were selected because of their observed sensitivity to N200/reward positivity amplitude and P300 amplitude, respectively (Holroyd et al., 2008b). Mean voltages were analyzed separately for the reward and no-reward conditions, as well as collapsed across conditions. In instances when Mauchly's test indicated that the assumption of sphericity had been violated, degrees of freedom were corrected using Huynh-Feldt estimates.
Second, to determine raw ERP component latency, we identified the latency associated with the peak amplitudes of the P200 (Coch and Gullick, 2012) and P300 (Donchin and Coles, 1988) and N200 ERP components within time windows that were specific for each component and age group (Table 1). N200 was measured separately for reward and no reward conditions. Because P200 and P300 amplitudes are not typically sensitive to feedback valence, the latencies of these components were identified from ERPs that were averaged across reward and no reward conditions.
Table 1.
Measurement window | Children | Adolescents | Adults | F(2,57) | |
---|---|---|---|---|---|
P200 | 150–250 | 233 (18) | 205 (26) | 205 (31) | 16.10*** |
N200 Reward | 250–350 | 320 (34) | 290 (43) | 285 (39) | 4.67* |
N200 No Reward | 250–350 | 320 (32) | 286 (34) | 279 (33) | 8.76*** |
P300 | 300–600 | 448 (108) | 389 (78) | 362 (34) | 3.17** |
Significance level: p < .05.
Significance level: p < .01.
Significance level: p < .001.
3. Results
ERP data contaminated by artifacts were discarded (<1.5% for adults, adolescents and children). Additionally, 23% of the adults’, 14% of the adolescents’, and 14% of the children's total trial data were corrected for eye movement artifacts.
3.1. Reward positivity
Fig. 1 (left column) illustrates ERPs to reward and no-reward feedback recorded at channel FCz, and the associated difference waves and scalp distributions, for the child (A), adolescent (B), and adult (C) age groups. The reward positivity was significantly different from zero for all of the groups (Children: −1.6 μV, t(19) = −2.86, p < .05, confidence interval = [−3.0 μV, −0.4 μV]; Adolescents: −2.6 μV, t(19) = −3.69, p < .005, confidence interval = [−4.0 μV, −1.1 μV]; Adults: −2.0 μV, t(19) = −4.08, p = <.005, confidence interval = [−3.1 μV, −1.0 μV]) but differed in their scalp distributions: The grand averages were maximal at channel PO8 (−2.1 μV) for the children, at channel FC2 (−2.9 μV) for the adolescents, and at channel CPz (−2.4 μV) for the adults. However, the differences in the voltages recorded at the distribution maximum and at channel FCz were not significantly different for any of the three groups (p > .05).
ANOVA on reward positivity amplitude revealed no significant differences across the age groups, F(2,57) = .73, p > 05, ηp2 = 03 (Fig. 2A). A comparable ANOVA on reward positivity latency also revealed no differences across groups (Children: 300 ms, SD = 34 ms; Adolescents: 297 ms, SD = 30 ms; Adults: 285 ms, SD = 29 ms), F(2,57) = .48, p > .05, ηp2 = 02. When the data were collapsed across age groups, further exploratory analyses revealed no significant between-sex differences in reward positivity amplitude, t(58) = −.73, p > .05, or latency, t(58) = −1.16, p > .05.
3.2. Latency jitter correction
Although mean reward positivity latencies were about equivalent across groups, visual inspection of the ERP grand averages in Fig. 1 (left column) suggested potential latency jitter in the timing of the reward positivity for the children (see below). To explore this possibility, the grand average ERPs, difference waves and scalp distributions were corrected according to across-subject jitter in the latency of the reward positivity difference waves (right columns of Fig. 1, Fig. 2). The corrected ERPs reveal a typical reward positivity for all three age groups: the adjusted deflection was maximal at channel FCz for the children and adults and at channel FC1 for the adolescents, but the latter value did not significantly differ from that recorded at channel FCz, t(19) = −.20, p > .05. Note that LJC impacts the appearance of the grand average waveforms and scalp distributions but not the associated statistics, revealing that the apparent visual group differences in reward positivity amplitude (Fig. 2A) and scalp distribution (Fig. 1, left column) were due to across-participant latency jitter.
3.3. Raw ERP analysis
ANOVA applied to the latencies of the P200, N200, and P300 in the raw ERPs indicated a difference across the age groups for each ERP component of interest, with children exhibiting the longest latencies overall (Table 1). As a further exploratory analysis, we compared group differences associated with the raw ERPs by conducting a mixed-design repeated measure MANOVA on average ERP amplitude with a between-subject factor of group (adults, adolescents, children) and within-subject factors of condition (reward, no-reward), channel (FCz, Pz), and time (100–200 ms, 200–300 ms, 300–400 ms). All statistically significant main effects and interactions are listed in Table 2; all remaining main effects and interactions were not statistically significant. As there was no significant interaction between condition and group, F(2,57) = .08, p>.05, ηp2 = 003, we combined the data for the reward and no reward conditions for several follow-up analyses (Fig. 3A and Fig. 3B).
Table 2.
F (df) | ηp2 | |
---|---|---|
Time | 169.14 (1.75, 99.7)*** | .75 |
Condition | 14.22 (1,57)*** | .20 |
Channel | 114.42 (1,57)*** | .67 |
Group × time | 3.72 (4,114)** | .12 |
Time × condition | 15.53 (2,114)*** | .21 |
Time × channel | 86.43 (2,114)*** | .60 |
Group × time × channel | 4.34 (4,114)** | .13 |
Time × channel × condition | 8.91 (1.57,89.4)** | .13 |
Significance level: p < .01.
Significance level: p < .001.
Separate ANOVAs for the data recorded at channels FCz and Pz, with group as a between-subject factor and time as a within-subject factor, revealed a significant interaction between time and group at channel FCz (Fig. 3C), F(6,110) = 8.13, p < .001, ηp2 = 31, but not at channel Pz (Fig. 3.D), F(6,110) = 1.71, p > .05, ηp2 = 09. The time and group interaction on ERP amplitude recorded at channel FCz revealed significant effects of group during the 100–200 ms time window, F(2,57) = 12.12, p < .001, ηp2 = 30 and during the 200–300 ms time window, F(2,57) = 3.35, p < .05, ηp2 = 11 and a trend during the 300–400 ms window, F(2,57) = 2.98, p = .059, ηp2 = 10. These effects were driven by children exhibiting more negative mean amplitudes than adults and adolescents during the 100–200 ms window (children = −1.1 μV, adolescents = 0.8 μV, adults = 2.0 μV) and the 300–400 ms window (children = 3.6 μV, adolescents = 6.0 μV, adults = 7.4 μV) and more positive mean amplitudes than adults and adolescents in the 200–300 ms window (children = 6.4 μV, adolescents = 3.6 μV, adults = 5.2 μV) (Fig. 3A and C).
4. Discussion
How children explore and learn from their environment is governed by the developmental trajectory of neural systems for reinforcement learning and cognitive control. Here we assessed these changes using the reward positivity, an ERP component that is proposed to index the impact of DA signals for reinforcement learning on an ACC mechanism for cognitive control (Holroyd and Coles, 2002). We found that reward positivity amplitude was the same across age groups that spanned from about 10 to 20 years old. Apparent morphological differences across groups in the raw ERPs, especially over frontal-central areas of the scalp (Fig. 3), appear to reflect differences in ERP component latencies rather than amplitudes (Fig. 1, Fig. 2). For illustrative purposes, we created latency jitter corrected images of the waveforms and corresponding scalp distributions that highlight the similarity of the reward positivity across groups.
Contrary to our prediction, these results suggest that the reinforcement learning system reaches maturity in children as young as 10 ± 1.7 years of age. Our prediction was based on the assumed rates of maturation of the DA system and ACC. However, this subject is still relatively unexplored and some findings are equivocal. Although there is evidence of continued development of ACC and the DA system (Ordaz et al., 2013, Kuhn et al., 2010) in line with our prediction, one post-mortem study found that development of the DA system reached a plateau by approximately 9 years of age (Haycock et al., 2003). Different regions of ACC also appear to develop at different rates, with caudal and dorsal regions developing earlier than ventromedial regions (Kelly et al., 2009), which would be consistent with the adult-like reward positivity that we observed in our youngest subjects.
The few previous studies that examined the reward positivity in children and adolescents yielded mixed findings. Our results are consistent with those of Santesso et al. (2011) and Yi et al. (2012) who reported that N200 amplitude to reward and no-reward feedback was not significantly different in adolescents when compared to young adults, but these studies did not include younger adolescents or children. By contrast, Hämmerer et al. (2011) reported that the difference in ERPs to gains and losses – measured as a ratio score rather than as a difference wave – was smaller for 9–11 year old children in comparison to adolescents (13–14) and young adults (20–30), but did not differ between adolescents and adults. Hammerer et al. also reported that N200 amplitude to gains and losses were inversely correlated with age. Similar findings were reported by Eppinger et al. (2009), who observed that the N200 to incorrect feedback was larger in 10–12 year old children relative to older participants, and by Zottoli and Grose-Fifer (2012) who reported that 14–17 year old male adolescents compared to adults produced larger N200s in response to both gain and loss stimuli.
Numerous methodological differences across studies make comparisons difficult. Whereas we applied a difference wave approach to isolate the difference between electrophysiological responses to positive and negative feedback, previous studies analyzed the ERPs to reward and no reward conditions separately. Suggestively, the studies that reported age-related differences in ERPs used a peak-to-peak measurement approach (Eppinger et al., 2009, Hämmerer et al., 2011, Zottoli and Grose-Fifer, 2012), whereas those that did not find a significant effect of age used a peak amplitude approach (Santesso et al., 2011, Yi et al., 2012). A concern with both approaches is that they are relatively susceptible to component overlap (Luck, 2005), which exacerbates measuring artifacts when comparing ERPs that differ in latency or scalp distribution across groups. Here we found that the latencies of several raw ERP components were significantly longer for children when compared to those of adolescents and adults, in line with previous findings that children have longer latencies for many ERP components including the P200 (Johnstone et al., 2005) and N200 (Lamm et al., 2006, Cragg et al., 2009). Further, latency jitter appears to be responsible for producing a temporal pattern of mean voltages in the uncorrected, raw ERPs that significantly differed between the children and the older participants (see Fig. 3). Note that LJC of the raw ERPs revealed a typical P2-N2-P3 sequence in the children (Fig. 1D). The difference wave approach utilized here may have minimized component overlap, a potential confound associated with the measurement techniques used in previous studies.
Further complicating across-study comparisons are the use of different reinforcement schedules. Children often perform more poorly on probabilistic learning tasks than do adolescents and young adults and for this reason may be less motivated to complete the tasks. For example, Hämmerer et al. (2011) found that children had more difficulty on a probabilistic learning task and also produced a smaller reward positivity. It may be that the reduced reward positivity resulted from differential engagement of other executive or attentional processes, rather than from differences in the strength of reinforcement learning signals per se. For instance, children who require relatively more trials to reach a learning criterion may have more difficulty sustaining their motivation and attention throughout the task. In the current experiment, all participants completed the same number of trials, which equalized the time required to pay attention. Additionally, the 50% reward and no-reward feedback probabilities ensured equal exposure to reward and no-reward feedback across groups, providing an unbiased baseline to assess the activity of the feedback processing system (Miltner et al., 1997, Holroyd and Coles, 2002).
Consistent with our prediction that reward positivity amplitude would increase with age, Mai et al. (2011) found that the mean amplitude of ERPs to positive and negative feedback were not significantly different from each other in 4 and 5 year old children engaged in guessing game with 50% probabilistic negative vs. positive feedback. They concluded that the feedback processing system in young children is not yet fully developed (but see Berger et al., 2006). Further, it has been proposed that the reinforcement learning signals reach adult maturity in older children and adolescents, such that developmental differences in older children on reinforcement learning tasks result from suboptimal utilization of the signals by a still-immature executive control system (Hämmerer and Eppinger, 2012, van den Bos et al., 2012). Evaluated in this context, our findings suggest that the reinforcement learning system develops relatively quickly between the ages of 5–8 years and becomes fully “on-line” by about 8–10 years of age.
This stage of development is characterized by the formation of increasingly complex, hierarchically organized goal-directed actions that span longer and longer times, reflecting development of self-regulation and a shift from reliance on immediate to delayed reinforcement to guide behavior (Barkley, 1997). A repertoire of relatively elementary behaviors may be learned via a trial-and-error learning process facilitated by an intrinsic motivation to explore (Singh et al., 2005), and subsequently recombined as building blocks to form hierarchical actions that address more difficult problems (Elman et al., 1996). Once learned, hierarchically organized behaviors can enhance computational efficiency by allowing for groups of relatively simple actions (like “filling a pot with water” and “placing it on the stove”) to be manipulated at higher-levels of abstraction (such as “cooking dinner”) (Botvinick et al., 2009). Particular high-level behaviors can then be selected and deployed according to their learned values, a process that we have previously suggested is mediated by the DA-ACC interface (Holroyd and Yeung, 2012), is reflected in the amplitude of the reward positivity (Holroyd and Coles, 2002), and can be utilized to apply top-down inhibitory control over other neural systems such as the striatum (Holroyd and McClure, submitted for publication). DA reward signals can reinforce activity at every level of the hierarchy (Holroyd and Coles, 2002, Frank and Badre, 2012) and are ideally positioned to sculpt hierarchical representations in prefrontal structures throughout development (Quartz, 2003). These signals likely shape a reservoir of schemas in ACC that map task contexts and events onto appropriate actions (Euston et al., 2012). Understood in this context, our present finding that reward positivity amplitude reaches maturity in children as young as 10 years of age suggests that the DA system can facilitate the formation of hierarchical behaviors relatively early in development, providing the framework for self-regulated control over complex behaviors.
Conflict of interest statement
None declared.
Funding
This study was financially supported by a Canadian Institute of Health Research (CIHR) Operating Grant ( #86467 ). CIHR did not have any role in the design of the study, collection, analysis or interpretation of data, in the writing of the report, nor in the decision to submit the manuscript for publication.
Acknowledgments
This research was supported by a Canadian Institute of Health Research Operating Grant ( #86467 ). We would like to thank the research assistants in the Learning and Cognitive Control Laboratory for help with data collection. We also thank the participants and their families for participating in this study.
Contributor Information
Carmen N. Lukie, Email: clukie@uvic.ca.
Somayyeh Montazer-Hojat, Email: somayyeh.montazer.h@gmail.com.
Clay B. Holroyd, Email: holroyd@uvic.ca.
References
- Baker T.E., Holroyd C.B. Which way do I go? Neural activation in response to feedback and spatial processing in a virtual T-maze. Cereb. Cortex. 2009;19:1708–1722. doi: 10.1093/cercor/bhn223. [DOI] [PubMed] [Google Scholar]
- Baker T.E., Holroyd C.B. Dissociated roles of the anterior cingulate cortex in reward and conflict processing as revealed by the feedback error-related negativity and N200. Biol. Psychol. 2011;87:25–34. doi: 10.1016/j.biopsycho.2011.01.010. [DOI] [PubMed] [Google Scholar]
- Barkley R.A. Guilford Press; New York, USA: 1997. ADHD and the Nature of Self-Control. [Google Scholar]
- Berger A., Tzur G., Posner M.I. Infant brains detect arithmetic errors. Proc. Natl. Acad. Sci. U.S.A. 2006;103:12649–12653. doi: 10.1073/pnas.0605350103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Botvinick M.M., Niv Y., Barto A.C. Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective. Cognition. 2009;113:262–280. doi: 10.1016/j.cognition.2008.08.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Casey B.J., Getz S., Galvan A. The adolescent brain. Dev. Rev. 2008;28:62–77. doi: 10.1016/j.dr.2007.08.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Casey B.J., Tottenham N., Liston C., Durston S. Imaging the developing brain: what have we learned about cognitive development? Trends Cogn. Sci. 2005;9:104–110. doi: 10.1016/j.tics.2005.01.011. [DOI] [PubMed] [Google Scholar]
- Coch D., Gullick M.M. Event-related potentials and development. In: Kappenman E.S., Luck S.J., editors. The Oxford Handbook of Event-Related Potential Components. Oxford University Press; New York: 2012. pp. 475–512. [Google Scholar]
- Cragg L., Fox A., Nation K., Reid C., Anderson M. Neural correlates of successful and partial inhibitions in children: an ERP study. Dev. Psychobiol. 2009;51:533–543. doi: 10.1002/dev.20391. [DOI] [PubMed] [Google Scholar]
- Crone E.A., Jennings J., Van der Molen M.W. Developmental change in feedback processing as reflected by phasic heart rate changes. Dev. Psychol. 2004;40:1228–1238. doi: 10.1037/0012-1649.40.6.1228. [DOI] [PubMed] [Google Scholar]
- Crone E.A., Zanolie K., Van Leijenhorst L., Westenberg P.M., Rombouts S.A.R.B. Neural mechanisms supporting flexible performance adjustment during development. Cogn. Affect. Behav. Neurosci. 2008;8:165–177. doi: 10.3758/cabn.8.2.165. [DOI] [PubMed] [Google Scholar]
- Donchin E., Coles M.G.H. Is the P300 component a manifestation of context updating? Behav. Brain Sci. 1988;11:357–427. [Google Scholar]
- Elman J.L., Bates E.A., Johnson M.H., Karmiloff-Smith A. MIT Press; Cambridge, MA, USA: 1996. Rethinking innateness: a connectionist perspective on development. [Google Scholar]
- Eppinger B., Mock B., Kray J. Developmental differences in learning and error processing: evidence from ERPs. Psychophysiology. 2009;46:1043–1053. doi: 10.1111/j.1469-8986.2009.00838.x. [DOI] [PubMed] [Google Scholar]
- Euston D.R., Gruber A.J., McNaughton B.L. The role of medial prefrontal cortex in memory and decision making. Neuron. 2012;76:1057–1070. doi: 10.1016/j.neuron.2012.12.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fjell A.M., Walhovd K., Brown T.T., Kuperman J.M., Chung Y., Hagler D.J. Multimodal imaging of the self-regulating developing brain. Proc. Natl. Acad. Sci. U.S.A. 2012;109:19620–19625. doi: 10.1073/pnas.1208243109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Floresco S.B. Prefrontal dopamine and behavioral flexibility: shifting from an “inverted-U” toward a family of functions. Front. Neurosci. 2013;71:1–12. doi: 10.3389/fnins.2013.00062. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frank M.J., Badre D. Mechanisms of hierarchical reinforcement learning in corticostriatal circuits 1: computational analysis. Cereb. Cortex. 2012;22:509–526. doi: 10.1093/cercor/bhr114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fuster J.M. Frontal lobe and cognitive development. J. Neurocytol. 2002;31:373–385. doi: 10.1023/a:1024190429920. [DOI] [PubMed] [Google Scholar]
- Geier C.F. Adolescent cognitive control and reward processing: implications for risk taking and substance use. Horm. Behav. 2013;64:333–342. doi: 10.1016/j.yhbeh.2013.02.008. http://dx.doi.org/10.1016.j.yhbeh.2013.02.008. [DOI] [PubMed] [Google Scholar]
- Gratton G., Coles M.G.H., Donchin E. A new method for off-line removal of ocular artifact. Electroencephalogr. Clin. Neurophysiol. 1983;55:468–484. doi: 10.1016/0013-4694(83)90135-9. [DOI] [PubMed] [Google Scholar]
- Hajihosseini A., Holroyd C.B. Frontal midline theta and N200 amplitude reflect complementary information about expectancy and outcome evaluation. Psychophysiology. 2013;50:550–562. doi: 10.1111/psyp.12040. [DOI] [PubMed] [Google Scholar]
- Hämmerer D., Li S., Müller V., Lindenberger U. Life span differences in electrophysiological correlates of monitoring gains and losses during probabilistic reinforcement learning. J. Cogn. Neurosci. 2011;23:579–592. doi: 10.1162/jocn.2010.21475. [DOI] [PubMed] [Google Scholar]
- Hämmerer D., Eppinger B. Dopaminergic and prefrontal contributions to reward-based learning and outcome monitoring during child development and aging. Dev. Psychol. 2012;48:862–874. doi: 10.1037/a0027342. [DOI] [PubMed] [Google Scholar]
- Haycock J.W., Becker L., Ang L., Furukawa Y., Hornykiewicz O., Kish S.J. Marked disparity between age-related changes in dopamine and other presynaptic dopaminergic markers in human striatum. J. Neurochem. 2003;873:574–585. doi: 10.1046/j.1471-4159.2003.02017.x. [DOI] [PubMed] [Google Scholar]
- Hofmann W., Friese M., Strack F. Impulse and self-control from a dual-systems perspective. Perspect. Psychol. Sci. 2009;4:162–176. doi: 10.1111/j.1745-6924.2009.01116.x. [DOI] [PubMed] [Google Scholar]
- Holroyd C.B. Theories of anterior cingulate cortex function: opportunity cost. Behav. Brain Sci. 2013;36:693–694. doi: 10.1017/S0140525X13001052. [DOI] [PubMed] [Google Scholar]
- Holroyd C.B., Baker T.E., Kerns K.A., Müller U. Electrophysiological evidence of atypical motivation and reward processing in children with attention-deficit hyperactivity disorder. Neuropsychologia. 2008;46:2234–2242. doi: 10.1016/j.neuropsychologia.2008.02.011. 10.1016/j.neuropsychologia.2008.02.011. [DOI] [PubMed] [Google Scholar]
- Holroyd C.B., Coles M.H. The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. Psychol. Rev. 2002;109:679–709. doi: 10.1037/0033-295X.109.4.679. [DOI] [PubMed] [Google Scholar]
- Holroyd C.B., McClure S.M. 2014. Hierarchical control over effortful behavior by anterior cingulate cortex. (submitted for publication) [Google Scholar]
- Holroyd C.B., Pakzad-Vaezi K.L., Krigolson O.E. The feedback correct-related positivity: sensitivity of the event-related brain potential to unexpected positive feedback. Psychophysiology. 2008;45:688–697. doi: 10.1111/j.1469-8986.2008.00668.x. 10.1111/j.1469-8986.2008.00668.x. [DOI] [PubMed] [Google Scholar]
- Holroyd C.B., Yeung N. Motivation of extended behaviors by anterior cingulate cortex. Trends Cogn. Sci. 2012;16:122–128. doi: 10.1016/j.tics.2011.12.008. [DOI] [PubMed] [Google Scholar]
- Huizinga M., Dolan C.V., van der Molen M.W. Age-related change in executive function: developmental trends and a latent variable analysis. Neuropsychologia. 2006;44:2017–2036. doi: 10.1016/j.neuropsychologia.2006.01.010. [DOI] [PubMed] [Google Scholar]
- Jasper H.H. The ten twenty electrode system of the international federation. Electroencephalogr. Clin. Neurophysiol. 1958;10:371–375. [PubMed] [Google Scholar]
- Johnstone S.J., Pleffer C.B., Barry R.J., Clarke A.R., Smith J.L. Development of inhibitory processing during the go/nogo task: a behavioral and event-related potential study of children and adults. J. Psychophysiol. 2005;19:11–23. [Google Scholar]
- Kelly A.M.C., Di Martino A., Uddin L.Q., Shehzad Z., Gee D.G., Reiss P.T., Milham M.P. Development of anterior cingulate functional connectivity from late childhood to early adulthood. Cereb. Cortex. 2009;19:640–657. doi: 10.1093/cercor/bhn117. [DOI] [PubMed] [Google Scholar]
- Kuhn C., Johnson M., Thomae A., Luo B., Simon S.A., Zhou G., Walker Q.D. The emergence of gonadal hormone influences on dopaminergic function during puberty. Horm. Behav. 2010;58:122–137. doi: 10.1016/j.yhbeh.2009.10.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lamm C., Zelazo P.D., Lewis M.D. Neural correlates of cognitive control in childhood and adolescence: disentangling the contributions of age and executive function. Neuropsychologia. 2006;44:2139–2148. doi: 10.1016/j.neuropsychologia.2005.10.013. [DOI] [PubMed] [Google Scholar]
- Luciana M., Wahlstrom D., Porter J.N., Collins P.F. Dopaminergic modulation of incentive motivation in adolescence: age-related changes in signaling, individual differences, and implications for the development of self-regulation. Dev. Psychol. 2012;48:844–861. doi: 10.1037/a0027432. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Luck S.J. MIT Press; Cambridge, MA: 2005. An Introduction to the Event-Related Potential Technique. [Google Scholar]
- Mars R.B., Sallet J., Rushworth M.F.S., Yeung N. MIT Press; Cambridge, MA: 2011. Neural Basis of Motivational and Cognitive Control. [Google Scholar]
- Mai X., Tardif T., Doan S., Liu C., Gehring W., Luo Y. Brain activity elicited by positive and negative feedback in preschool-aged children. PLoS ONE. 2011;6:e18774. doi: 10.1371/journal.pone.0018774. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miller E.K., Cohen J.D. An integrative theory of prefrontal cortex function. Annu. Rev. Neurosci. 2001;24:167–202. doi: 10.1146/annurev.neuro.24.1.167. [DOI] [PubMed] [Google Scholar]
- Miltner W.R., Braun C.H., Coles M.G.H. Event-related brain potentials following incorrect feedback in a time-estimation task: Evidence for a ‘generic’ neural system for error detection. J. Cogn. Neurosci. 1997;9:788–798. doi: 10.1162/jocn.1997.9.6.788. [DOI] [PubMed] [Google Scholar]
- Nieuwenhuis S., Holroyd C.B., Mol N., Coles M.G.H. Reinforcement-related brain potentials from medial frontal cortex: origins and functional significance. Neurosci. Biobehav. Rev. 2004;28:441–448. doi: 10.1016/j.neubiorev.2004.05.003. [DOI] [PubMed] [Google Scholar]
- Niv Y. Cost, benefit, tonic, phasic: what do response rates tell us about dopamine and motivation? Ann. N.Y. Acad. Sci. 2007;1104:357–376. doi: 10.1196/annals.1390.018. [DOI] [PubMed] [Google Scholar]
- Ordaz S.J., Foran W., Velanova K., Luna B. Longitudinal growth curves of brain function underlying inhibitory control through adolescence. J. Neurosci. 2013;33:18109–18124. doi: 10.1523/JNEUROSCI.1741-13.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Quartz S.R. Learning and brain development: a neural constructivist perspective. In: Quinlan P.T., editor. Connectionist Models of Development. Psychology Press; New York: 2003. pp. 279–310. [Google Scholar]
- Rubia K. Functional brain imaging across development. Eur. Child Adolesc. Psychiatry. 2012;24:2012. doi: 10.1007/s00787-012-0291-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- San Martin R. Event related potential studies of outcome processing and feedback-guided learning. Front. Hum. Neurosci. 2012;6:304–321. doi: 10.3389/fnhum.2012.00304. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Santesso D.L., Dzyundzyak A., Segalowitz S.J. Age, sex and individual differences in punishment sensitivity: factors influencing the feedback-related negativity. Psychophysiology. 2011;48:1481–1489. doi: 10.1111/j.1469-8986.2011.01229.x. [DOI] [PubMed] [Google Scholar]
- Schultz W., Dayan P., Montague P.R. A neural substrate of prediction and reward. Science. 1997;275:1593–1599. doi: 10.1126/science.275.5306.1593. [DOI] [PubMed] [Google Scholar]
- Schultz W. Updating dopamine reward signals. Curr. Opin. Neurobiol. 2013;23:229–238. doi: 10.1016/j.conb.2012.11.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Singh S., Barto A.G., Chentanez N. Intrinsically motivated reinforcement learning. Adv. Neural Inf. Process. Syst. 2005;17:1281–1288. [Google Scholar]
- Spear L.P. Adolescent neurodevelopment. J. Adolesc. Health. 2013;52:S7–S13. doi: 10.1016/j.jadohealth.2012.05.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sutton R.S., Barto A.G. MIT Press; Cambridge, MA: 1998. Reinforcement Learning: An Introduction. [Google Scholar]
- Umemoto A., Holroyd C.B. 2014. Task-specific effects of reward on task switching. (submitted for publication) [DOI] [PubMed] [Google Scholar]
- van den Bos W., Cohen M.X., Kahnt T., Crone E.A. Striatum-medial prefrontal cortex connectivity predicts developmental changes in reinforcement learning. Cereb. Cortex. 2012;22:1247–1255. doi: 10.1093/cercor/bhr198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Velanova K., Wheeler M.E., Luna B. Maturational changes in anterior cingulate and frontoparietal recruitment support the development of error processing and inhibitory control. Cereb. Cortex. 2008;18:2505–2522. doi: 10.1093/cercor/bhn012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wahlstrom D., Collins P., White T., Luciana M. Developmental changes in dopamine neurotransmission in adolescence: behavioral implications and issues in assessment. Brain. Cogn. 2010;72:146–159. doi: 10.1016/j.bandc.2009.10.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Walsh M.M., Anderson J.R. Learning from experience: event-related potential correlates of reward processing, neural adaptation, and behavioral choice. Neurosci. Biobehav. Rev. 2012;36:1870–1884. doi: 10.1016/j.neubiorev.2012.05.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yi F., Chen H., Wang X., Shi H., Yi J., Zhu X., Yao S. Amplitude and latency of feedback-related negativity: aging and sex differences. Ageing. 2012;23:963–969. doi: 10.1097/WNR.0b013e328359d1c4. [DOI] [PubMed] [Google Scholar]
- Zottoli T.M., Grose-Fifer J. The feedback-related negativity (FRN) in adolescents. Psychophysiology. 2012;49:413–420. doi: 10.1111/j.1469-8986.2011.01312.x. [DOI] [PubMed] [Google Scholar]