Abstract
Decision-making requires balancing exploration with exploitation, yet children are highly exploratory, with exploration decreasing with development. Less is known about what drives these changes. We examined the development of decision-making in 188 3–8-year-old children (M = 64 months; 98 girls) and 26 adults (M = 19 years; 13 women). Children were recruited from ethnically diverse suburban middle-class neighborhoods of Columbus, Ohio, USA. Results indicate that mature reward-based choices emerge relatively late in development, with children tending to over-explore. Computational modeling suggests that this exploration is systematic rather than random, as children tend to avoid repeating choices made on the previous trial. This pattern of exploration (reminiscent of novelty preference) decreased with development, whereas the tendency to exploit increased.
Keywords: Exploration, decision-making, cognitive development, reward, choice
Statement of Relevance
Humans have a crucial need for information in order to make effective and rewarding decisions, and exploration is one critical way in which this information is acquired. Exploration is particularly important in childhood, with the balance between exploration and exploitation changing with development. This research focuses on the developmental trajectory of decision-making between 3-to-8 years of age and adulthood, with a specific goal to elucidate what changes with development. Decision making changes from almost exclusive exploration in young children to focusing on rewarding outcomes in older children and adults. Early exploration is systematic rather than random, and its pattern is reminiscent of novelty preference. Such extensive systematic exploration at young ages sets the stage for more focused, efficient information gathering in the future and more effective, rewarding decisions for a lifetime. This trajectory over the lifespan, and prolonged exploration in childhood, may be one key to human intelligence. Overall, this research provides important insight in how humans develop from knowing little of how the world works, into active adults, using vast amounts of information to navigate a complex and changing world.
One of the hallmarks of human intelligence is the desire and the ability to acquire vast amounts of information. One way in which this is accomplished is through exploration. However, exploration is not cost free, as it requires an organism to give up other potentially beneficial activities. For example, explore new promising ways of getting food or use a tried-and-true way? This problem is known as the exploration/exploitation dilemma (see Hills et al., 2015; Mehlhorn et al., 2015 for reviews). It is a dilemma because none of the choices is cost free: opting to explore results in foregoing immediate gains, whereas opting to exploit results in foregoing information.
The solution to the exploration/exploitation dilemma is complicated because it depends on multiple factors, including the state of the organism (if one is starving, exploring new possible ways of getting food is less valuable than getting food right away), its state of knowledge (it is often unwise to settle on an option without knowing what other options are), and time horizon (forgoing immediate gains by exploring is inefficient if there is little time left to make use of the information acquired). Because of these factors, young children should particularly benefit from exploration because: (1) they know relatively little, (2) their opportunity costs are relatively low, as their immediate needs are satisfied by someone else, and (3) they have longer time horizons (i.e., time to benefit from acquired knowledge) compared to mature individuals.
There is a rich body of literature showing that even very young children can and do actively explore (see Gopnik, 2020; Melz & Kidd, 2020, for recent reviews). Much of this work focuses on which cues in the environment (such as ambiguous evidence or violation of expectations) cause children to explore (Bonawitz, van Schijndel, Friel, & L. Schulz, 2012; Cook, Goodman, & L. Schulz, 2011; L. Schulz & Bonawitz, 2007; Sim & Xu, 2017; Stahl & Feigenson, 2015).
There is also a growing body of evidence from active learning and decision-making tasks suggesting that children do, indeed, exhibit high levels of exploration often at the expense of exploitation, with exploration decreasing with age (Blanco & Sloutsky, 2020a, 2020b; Dubois et al, 2022; Giron et al, 2022; Liquin & Gopnik, 2022; Nussenbaum & Hartley, 2019; Meder, Wu, E. Schulz, & Ruggeri, 2021; Pelz & Kidd, 2020; Ruggeri, Lombrozo, Griffith, & Xu, 2016; E. Schulz, Wu, Ruggeri, & Meder, 2019; Sumner et al., 2019; see also Gopnik, 2020, for review). Interestingly, similar tendencies transpire in attention allocation in situations where participants could either attend selectively to goal- or task-relevant information or broadly distribute their attention to many potential sources of information. In general, infants and children (at least until about 7 years of age) tend to broadly distribute their attention, even when their current task requires focusing on one specific source of information (Best, Yim, & Sloutsky, 2013; Blanco & Sloutsky, 2019; Coch, Sanders, & Neville, 2005; Deng & Sloutsky, 2016; Plebanek & Sloutsky, 2017, 2018).
These findings indicate that in both their attention allocation and in their choices, children tend to explore by sampling their environment broadly. By contrast, adults tend to adjust their sampling to their goals: they sample broadly when the goal is to gain information, and they sample narrowly when the goal is to use this information. What drives children’s exploration and what changes with development?
Exploration Across Development: What Changes?
Although there is overall agreement that with development, in general, choices become less exploratory and more exploitative, there is less agreement as to what drives these changes. A number of explanations have been proposed, including: (1) Decrease in intrinsic value of information, (2) Decrease in the randomness of choices, and (3) Decrease in novelty preferences.
Decrease in Intrinsic Value of Information.
There is a long tradition of considering information as having intrinsic value that is independent of rewards (see Loewenstein, 1994 for a review). According to this explanation, information is valuable not because it can be used to increase rewards in the future, but because it is rewarding itself. For example, E. Schulz et al (2019) conducted a study with younger children (ages 7–8 years), older children (ages 9–11 years), and adults examining their search for rewards on a spatial grid. Unbeknownst to participants, the grid was characterized by spatial correlation such that similar reward values tended to cluster together. Although children accumulated less rewards than adults, they tended to explore more. Subsequent computational modeling suggested that children’s exploration was directed rather than random. As directed exploration decreased with age (see also Meder et al, 2021 and Giron et al, 2022 for similar findings), the authors interpreted this finding as being reflective of a developmental decrease in the intrinsic value of information.
While these findings are interesting and provocative, we see some challenges to both the findings and their interpretation. First, the findings themselves are somewhat controversial: whereas these authors found that children exhibited higher levels of directed exploration than adults, other authors (e.g., Somerville, et al., 2017) found that directed exploration does not emerge until adolescence. Second, there are previous findings indicating that superficial changes to stimulus salience all but eliminate exploratory behavior in children (Blanco & Sloutsky, 2020b). These findings suggest that systematic exploration in young children is more likely to be driven by attentional mechanisms than by intrinsic value of information, because the latter should not be affected by stimulus salience. And finally, directed exploration has been traditionally considered as a strategic, goal-directed process (see Somerville et al., 2017 for review and discussion). Given the discussed below immaturity of cognitive control early in development, it seems unlikely that decision making develops from goal-directed information maximization to goal-directed reward maximization. Based on these considerations, we focus here on two other explanations.
Decrease in the randomness of choices.
The second explanation of the developmental decrease in exploration could be the developmental increase in the consistency of choices (transpiring as an increase in the inverse temperature parameter in computational models, hence sometimes referred to as “cooling off”). In the course of development, this somewhat random pattern of choices transpiring as random exploration is eventually replaced by strategic behavior aimed at achieving the goals (often these goals are to maximize rewards). We refer to this explanation as random-to-goal-directed-development hypothesis.
There are multiple possibilities, as to why early in development children may explore randomly, exhibiting broad sampling of problem or search space. First, there are memory-related possibilities. For example, due to age differences in encoding and retention of information, children and adults may have different beliefs about the reward associated with each choice or children may have greater uncertainty about the reward-choice associations. Although, no memory differences capable of explaining the developmental differences in decision making have been found in previous research (Blanco & Sloutsky, 2020a, 2020b), less efficient encoding and retention early in development could result in elevated levels of random exploration. This is because the weaker the memory is, the less could be inferred from a previous trial about what should happen on the current trial. And random exploration in turn, could be defined as an absence of trial-to-trial dependency.
Another possibility is that excessive random exploration is a consequence of overall brain immaturity, resulting in perceptual, memory, and/or decisional noise or from the inability to maintain focus or to form/execute an effective strategy. Maturation of perceptual and cognitive systems and the development of control processes result in noise reduction and, subsequently, in less random, more consistent, efficient, strategic, and goal-directed behavior. In fact, Meder et al (2021, see also Giron et al, 2022) found that random exploration decreases with age. Interestingly, the same group of researchers (E. Schulz et al, 2019) found no evidence of developmental changes in random exploration.
To summarize, this random-to-goal-directed-development hypothesis attributes developmental differences to elevated levels of random exploration early in development rather than to differences in strategy. There is some evidence against the random-to-goal-directed-development possibility (Blanco & Sloutsky, 2020a, 2020b; Gopnik et al., 2017: E. Schulz et al., 2019). For example, in a series of studies (Blanco & Sloutsky, 2020a, 2020b), 4-year-old children repeatedly chose between four options that differed in value, with the goal of collecting points to earn prizes. Surprisingly, most 4-year-olds did not seek maximum reward output, but instead explored extensively by constantly switching among options. Critically, 4-year-old children’s patterns of choices were highly systematic in that they prioritized choosing options that had least recently been tried. In many scenarios such “least recent” choice strategy ensures that all options are sampled frequently and is likely to minimize uncertainty about the options’ outcomes and maximize information gain. Similarly, using a different task, E. Schulz et al (2019) found little evidence for a developmental decrease in random exploration. These findings suggest that another possibility should be considered, and we discuss this possibility next.
Decrease in novelty preferences.
It is also possible that children’s consistently elevated exploration and broad sampling across domains stems from a variant of novelty preference – the tendency to avoid repeated sampling of the same region of space or choosing the same option over time (hereafter, “novelty-preference”-to-goal-directed development). Here, exploration is also not a deliberate strategy, but is a bias against repeating recent choices: children have been shown to continue to sample broadly even when such sampling offered no further information gain (Ruggeri, et al., 2016). This developmental pattern can reflect either developmental differences in (a) value of novelty or (b) in beliefs about how stable the environment is. Specifically, if one does not know whether the environment is stable, their uncertainty about non-selected options would increase with time passed since the last selection of that option (this is a variant of subjective or epistemic uncertainty, and we discuss it further in the Computational Modeling section). As a result, children may avoid repeatedly choosing the current “best” option, and instead re-sample the options that generate higher epistemic uncertainty.
The primary difference of this type of exploration from random exploration is that in random exploration, there is no trial-to-trial dependency: on every subsequent trial all options have the same probability of being chosen as on every preceding trial. By contrast, in “novelty”-based exploration, there is clear trial-to-trial dependency: on a subsequent trial, the option that was chosen on the preceding trial is less likely to be chosen again. Under this construal, novelty is intertwined with uncertainty, and we discuss this issue in the General Discussion.
Importantly, both hypotheses explain early behaviors as primarily stemming from immature cognitive control, including planning, selective attention, and working memory. In addition, the novelty-preference”-to-goal-directed development hypothesis also considers knowledge of the dynamics of the environments as a contributing factor. Critically, if elevated exploration and broad information gathering stem from immature cognitive control, then the development of decision-making may exhibit developmental patterns that are similar to other aspects of cognitive control, such as executive function and meta-cognition (Bunge, Dudukovic, Thomason, Vaidya, & Gabrieli, 2002; Casey, Giedd, & Thomas, 2000; Dufresne & Kobasigawa, 1989; Lockl & Schneider, 2004; O’Leary & Sloutsky, 2017, 2019; Selmeczy & Ghetti, 2019). In this case, we should expect protracted development, with a relatively late transition from predominately exploratory to goal-directed decision-making.
Importantly, whereas both random-to-goal-directed and novelty-preference-to-goal-directed hypotheses account for broad sampling early in development and predict protracted development of decision-making, they differ in what is posited to be changing with development. The former predicts that decisions should become more systematic with age, whereas the latter predicts that decisions should become less novelty driven.
Consider the earlier mentioned study (E. Schulz et al., 2019), adults and older children (ages 7–11) made choices among a large grid of options. The options’ values were spatially correlated such that the rewards tended to be similar for nearby options, and therefore searching locations near those that were already known provided less information than farther away options, allowing for quantification of random compared to directed (i.e., uncertainty-based) exploration. Children explored more broadly than adults, with no developmental differences in random exploration (although see Meder et al, 2021 for finding of developmental differences in random exploration). Although E. Schulz et al (2019) identified this broader exploration early in development as directed (i.e., strategic and goal-directed) exploration, these findings can be also accounted by the novelty preference mechanism, thus providing some (although not unique) support to the novelty-preference-to-goal-directed-development hypothesis.
Present Study
The present study evaluates the two developmental hypotheses (i.e., random-to-goal-directed-development and novelty-preference”-to-goal-directed development hypothesis) by examining how decision-making changes across development, between the ages of 3 and 8 years, and further into adulthood. To achieve this goal, we conducted an experiment in which participants made choices and accumulated rewards. We analyze their choices and use computational modeling to better understand the nature of developmental change in their pattern of choices.
To foreshadow, our results support the novelty-preference-to-goal-directed hypothesis, demonstrating that the youngest children’s choices were highly exploratory, but also highly systematic. They were no more random than those of older children or adults, but instead they suggest some form of novelty preference. Critically, choices become more exploitative with development (rather than less random), with reward value playing an increasing role in older children’s choices. Importantly, decision-making undergoes protracted development, with even 7-to-8-year-olds differing substantially from adults. We posit that early childhood is a time of information gathering, where choices are directed toward broadly gathering information (cf. Gopnik, 2020). As children gain knowledge and independence and greater cognitive control, they place progressively greater emphasis on maximizing gains and achieving advantageous outcomes.
Method
Participants
Participants were 188 3- to 8-year-old children (M = 64 months; range: 36 to 96 months; 98 girls and 90 boys, as reported by parents), and 26 adults (M = 19 years; range: 18 to 21 years; 13 women and 13 men, all self-reported). Because there is no definitive understanding of the exact profile of developmental change in decision-making across this age-range, we aimed to at least match similar sample sizes (for both children and adults) at each age to those that were used in previous studies that showed reliable differences between children and adults (Blanco & Sloutsky, 2020a), or between children of the same age across different conditions (Blanco & Sloutsky, 2020b). Additionally, since adults in previous studies were relatively uniform in their behavior, and the differences between adults and children were very large, a smaller sample of adults was deemed sufficient for the current study. The final sample included 40 3-year-olds, 37 4-year-olds, 40 5-year-olds, 36 6-year-olds, and 35 7-to-8-year-olds. An additional group of 21 children participated but were excluded for not finishing the experiment. Children were recruited from preschools, schools, and childcare centers located in middle-class ethnically diverse suburban neighborhoods of Columbus, Ohio, USA, on the basis of permission forms signed and returned by parents. All participants with signed permission forms were included in the sample. Because most children were tested in childcare centers, it was impossible to collect SES or demographic information, unless the parents provided this on permission forms (which was rarely the case). The locations are considered ethnically diverse according to U.S. Census Bureau, Population Estimates Program (PEP; https://www.census.gov): 58.6% White; 29.0% Black or African American; 6.2% Hispanic or Latino; 5.8% Asian;). Adults were undergraduate students from The Ohio State University, who received course credit for participation. The protocol for this research was approved by The Ohio State University Institutional Review Board. Data were collected between June 2017 to May 2019.
Procedure
Participants completed a simple choice task, framed as a computer game in which they collected virtual candy from different creatures (Figure 1). Their goal was to earn as much candy as possible over the course of the experiment. On each of 100 trials, participants chose one out of the four available creatures and received virtual candy for their choice. The amount of candy received was displayed for 3 s, and then the next trial began. Each option gave a fixed number of candies that remained stable for the duration of the experiment. Three of the options were worth 3, 2, and 1 candy, respectively. The fourth option was substantially better than the other three, being worth 10 candies. The assignment of reward values to creatures, and the location of the creatures, was randomized for each participant but remained stable across the experiment.
Figure 1. Trial dynamics.

A) Four creatures were available for selection, and children selected one by touching it on the screen. B) This choice resulted in candy being earned, which was displayed for 3 s. C) The earned candy was added to a meter that tracked total accumulated candy. When benchmarks were reached, indicated by the white lines on the meter, a congratulatory screen was displayed indicating that the child earned a sticker.
Children made choices by touching the creature on a touchscreen, while adults clicked the creature with a mouse. A meter tracked the total amount of candy that the participant had collected up to that point, which was updated after each trial. For every 180 candies collected, a congratulatory screen was displayed, and the child earned a sticker. Benchmarks on the meter indicated these goals. No stickers were awarded to adults.
Following the main experiment, child participants completed a follow-up memory test. In this test, the experimenter pointed to each location in turn and asked the participant to indicate how many candies that creature gives. This served primarily as a check to evaluate the extent to which children were learning the association between creatures and rewards. Full details of the memory test and analysis of the memory test results are reported in the Supplemental Materials. In summary, the memory test analyses show that the results reported below do not stem from differences in memory accuracy. For each measure of interest, there remained a significant effect of age after accounting for memory accuracy. The entire experiment took 10–15 minutes to complete.
Results
Choice proportions
Given that our statistical analyses focused on hypothesis testing, these analyses are confirmatory in nature. Our first analysis directly examined participants’ choices (Figure 2), particularly how often they chose the highest-value option, as an indicator of exploitation. We first assessed the effect of age in children using a mixed-effects logistic regression predicting choices of the highest-value option from age in months, with participant as a random effect. There was a main effect of age, with older children being more likely to choose the best option, β = 0.71, 95% CI = [0.56, 0.87], z = 8.98, p < 0.001, odds ratio = 2.04.1 Adults were more likely to choose the best option compared to children, t(212) = 8.81, 95% CI = [0.34, 0.54], p < 0.001, d = 1.84.
Figure 2. Choice proportions by age and block.

Three-year-olds chose all four options approximately equally often throughout the experiment, while older children showed an increasing tendency to prefer the highest-value option. Adults exploited the highest option almost exclusively after a brief initial exploration of the other options. Error bars represent 95% confidence intervals.
We then examined whether participants increased their tendency to choose the highest-value option over the course of learning, and whether this tendency differed as a function of age. Within each age group a mixed-effects logistic regression predicting choices of the highest-value option from trial number (with participant as a random effect) was performed. All age groups had a significant main effect of trial number, all p’s < 0.002, showing that all groups, even 3-year-olds, adapted their strategy by increasing the proportion of choosing the highest-value option as the experiment progressed. We then tested the interaction of trial number and age to examine if this tendency differed by age within children. A mixed-effects logistic regression with age (in months), trial number, and their interaction predicting choices of the highest-value option (with participant as a random effect) revealed a significant interaction between age and trial number, β = 0.11, 95% CI = [0.08, 0.15], z = 6.37, p < 0.001, odds ratio = 1.12. This interaction suggests that older children increased their tendency to pick the highest-value option more over time than younger children (Figure 2). Overall, these results indicate both an increasing tendency to choose options based on value as children get older and an increasing tendency to optimize their choices (over trials) with respect to value.
Response Patterns
We also examined the pattern of participants’ trial-to-trial responding. Previous work has shown that young children (4–5-year-olds) switch their responses at an extremely high rate, which is indicative of elevated exploration levels (Blanco & Sloutsky, 2020a, 2020b, Sumner et al. 2019). Figure 3 plots the distribution of switch proportions by age group and by block, showing very high rates of switching in young children, very little switching in adults, and large individual variability in older children. We assessed the effect of age in children using a mixed-effects logistic regression predicting switch responses from age in months with participant as a random effect. There was a significant effect of age, with younger children being more likely to switch responses than older children, β = −0.95, 95% CI = [−1.22, −0.67], z = −6.79, p < 0.001, odds ratio = 0.39. Children were also more likely to switch responses compared to adults, t(212) = 8.98, 95% CI = [0.44, 0.68], p < 0.001, d = 1.88.
Figure 3. Response switching.

A) Young children (3–4-year-olds) tended to switch on almost every trial, while an increasing number of children in older age groups exhibited lower levels of switching. Adults almost never switched, instead they continually exploited the highest-value option. B) There was an increasing tendency with age to adapt behavior over time, transitioning to lower levels of switching (i.e., exploration) throughout the experiment. Error bars represent 95% confidence intervals.
We then assessed the main effect of trial within each age group to examine whether participants adjusted their strategy over time. Mixed-effect logistic regressions (with participant as a random effect) indicated that participants in all age groups significantly decreased their tendency to switch as the experiment progressed, all p’s < 0.001. To evaluate the interaction in children, we performed a mixed-effects logistic regression that used age, trial number, and their interaction to predict switching, with participant as a random effect. This analysis revealed a significant interaction, β = −0.10, 95% CI = [−0.15, −0.05], z = −4.09, p < 0.001, odds ratio = 0.91. This interaction suggests that older participants exhibited a greater tendency to decrease exploration (by switching less) over the course of the experiment (Figure 3B). However, the distributions of switch proportions (Figure 3A) show that even older children (i.e., many individual 7-to-8-year-olds) tended to switch at high levels, unlike adults who almost universally exploited the highest option (resulting in extremely low switch proportions), suggesting that substantial developmental change occurs between age 7 and adulthood.
Computational Modeling
The previous section analyzed switching behavior as an indirect measure of level of exploration, but switch proportions alone cannot establish whether that switching was random or systematic. Importantly, previous work has shown that the majority of 4–5-year-olds’ choices were characterized by highly systematic switching, where on any given trial children are most likely to select the least recently chosen option (Blanco & Sloutsky, 2020a, 2020b). Understanding the development of this behavior across childhood is critical to understanding how decision-making and exploration develop in humans. It is important to examine to what extent other age groups adopt this strategy, as compared to more reward-based or random strategies.
We propose that such “systematic switching” may be a particularly effective and efficient way of exploration early in life in that it allows a type of systematic exploration—often approximating uncertainty-driven exploration but requiring limited prior knowledge and computational capacity. In general, the longer it has been since an option was last sampled, the greater the subjective (or epistemic) uncertainty in the option’s outcome since the environment may be changing over time. Although in the current task rewards are stable, similar to many real-world environments, there is no guarantee that it will remain stable. Therefore, an agent using this type of systematic exploration will often select the option with the greatest uncertainty from a set of options, even if they are not tracking uncertainty directly. Additionally, because participants cannot know that the rewards remain stable throughout the experiment (i.e., they could change after previous selection), their epistemic uncertainty should still follow a pattern wherein uncertainty increases with time. Such epistemic uncertainty reflects the state of the observer. As such, it stands in contrast with more objective (or aleatory) uncertainty (e.g., what will be an outcome of rolling a fair dice) that reflects the state of a random system (Ülkümen, Fox, & Malle, 2016, for related arguments).
To evaluate the relative contribution of epistemic uncertainty on participants’ choices, we applied a simple computational model developed to tease apart the relative contributions of reward value, random exploration, and systematic exploration in determining choices. Choice probabilities in the model are a function of both the expected values of the options and their choice lags, where choice lag is simply the number of trials since an option was last chosen. Choice lag serves as a proxy for epistemic uncertainty in the model, since (as discussed above) uncertainty increases as a function of time since an option was last checked. The relative utilities for the choice options in the model were determined by the following equation:
| (1) |
where Vi,t is the expected reward value of option i on trial t, and Li,t is the lag term encoding the number of trials since option i was last chosen. The free parameter ϕ (0 ≤ ϕ ≤ 1) mediates the relative weights of expected value (i.e., exploitation) and lag (i.e., systematic exploration) on choices. The model is initialized with all Vi set to 0, but Vi is updated to the actual reward value once option i is chosen and its value is experienced. Utilities were converted to probabilities of choosing each option using a Softmax choice rule via the following equation:
| (2) |
where P(ai,t) is the probability of choosing option i on trial t, and γ was a free parameter. Together, the model’s two free parameters, ϕ and γ, determine the levels of exploitation, systematic exploration, and random exploration. Higher values of ϕ indicate greater influence of the lag on choices, and hence more systematic exploration. When ϕ is 0, the model chooses based only on expected value; when ϕ is 1 it chooses only based on the lag. γ was the inverse temperature parameter that controlled the extent that choices were deterministic or stochastic (Sutton & Barto, 1998). Lower values of γ capture more “random” choices (i.e., random exploration) and greater values of γ capture greater consistency of choices (regardless of whether they are driven by value or by lag). This model is similar to “exploration bonus” models (Daw et al., 2006; Kakade & Dayan, 2002) and to some trial-to-trial dependency models (Cogliati Dezza, Yu, Cleeremans, & Alexander, 2017; Meder et al., 2021), but with lag serving as a proxy for epistemic uncertainty and with the learning rate set equal to 1. The model was fit to each participant’s data by finding the set of parameters that maximized the likelihood of the data given the model.
We first examined the effect of age on best-fitting parameter values (Figure 4). A logistic regression predicting the best-fitting value of ϕ from age in months in children revealed a significant effect, β = −0.63, 95% CI = [−1.90, −0.55], z = 3.79, p < 0.001, odds ratio = 0.53, with ϕ being higher for younger children than older children. Children also had a higher ϕ than adults, t(212) = 3.88, 95% CI = [0.15, 0.45], p < 0.001, d = 0.81. A linear regression predicting γ from age in children did not find a significant effect, β = 3.95, 95% CI = [−3.40, 11.30], t(186) = 1.06, p = 0.290. Children and adults did not differ in γ, t(212) = 0.516, 95% CI = [−25.75, 15.07], p = 0.607, d = 0.11. Therefore, while the relative influence of systematic exploration compared to reward value decreased with age, the level of random exploration remained stable across ages.
Figure 4. Model Parameters: systematic and random exploration.

The relative influence of systematic exploration on choices (compared to reward value) was high in young children and decreased with age, while the level of random exploration was equivalent across ages.
The stable value of randomness across the age groups contrasts with recent findings demonstrating decrease in randomness across development (Giron et al, 2022; Meder et al., 2021), however these researchers used a substantially more complicated task than ours (also, see but see E. Schulz, 2019, who used the complicated same task and did not find an age-related decrease in randomness). Importantly, our results indicate that while adults’ and older children’s choices were primarily based on value (as evidenced by low values of ϕ), younger children showed a greater influence of systematic exploration (as evidenced by higher values of ϕ).
Examination of the best-fitting parameters of the model indicate that, while many participants were best-fit by intermediate values of ϕ, many others were best-fit by ϕ values near 0 or 1 (see Supplemental Figure 8). This suggests that these latter participants may be affected by only reward value or only choice lag in determining an option’s utility, rather than a combination of the two. To test whether these “single-process” models characterize participants choices, we fit simplified versions of the model that utilized only reward value (ϕ = 0) or only choice lag (ϕ = 1) to make decisions, henceforth referred to as the Value and Lag models, respectively. We compare these two models to the Full model described above and a Random model that chose all options with equal probability regardless of rewards or lags. We determined the best-fitting of these four models using the Akaike Information Criterion (AIC; Akaike, 1974). As shown in Figure 5, all of the adults and most of the older children were best-fit by the Value model, with the Full model being the second most frequent. In contrast, most of the 3 and 4-year-olds were best-fit by the Lag model, and few were best-fit by the Value model. While some children were best-fit by the Random model, these were an extreme minority. Even among 3-year-olds, less than 10% were best-fit by the random model. Fisher’s Exact Tests (two-sided) suggest that there are three distinct groups. The model fit proportions of 3-year-olds were not different than those of 4-year-olds (p = 0.612), but 4-year-olds were different than 5-year-olds (p < 0.001). The proportions of 5-year-olds were not different than 6-year-olds (p = 0.426), and 6-year-olds were not different than 7-to-8-year-olds (p = 0.201 (though 5-year-olds were marginally different than 7-to-8-year-olds, p = 0.044). Adults were different from all groups of children (all p’s ≤ 0.001). See Supplemental Materials for model recovery analyses and comparisons of model simulations to participants’ data that show that the behavior of the models matches that of participants reasonably well.
Figure 5.

Best-fitting model by age group.
General Discussion
Young children have been shown to exhibit highly exploratory behavior, with exploration decreasing with age. However, there is less clarity about what drives these developmental changes. To address this issue, we examined decision-making in 3–8-year-old children and adults. Our results reveal a developmental trajectory across childhood, from almost exclusively exploring (in 3-year-olds) toward almost exclusive focusing on gains (in adults). Decision-making appears to undergo protracted development similar to other abilities dependent on cognitive control (e.g., selective attention, executive function, or metacognition), with even 7-to-8-year-olds being substantially different from adults. Importantly, decision making progresses from exploratory to exploitative rather than from random to systematic: even in 3-year-olds, choices were not random. In fact, the influence of random exploration was low for all ages and did not differ by age. At the same time, children of all ages reduced exploration over time, at least to some extent, while increasing exploitation, which suggests that their exploratory behavior is not driven by non-sensitivity or indifference to value.
Our results indicate that there are two crucial time points of developmental change, which may reflect changes in needs occurring across development. Across various measures 3–4-year-olds differed from 5–8-year-olds, who, in turn, differed from adults. Three- and 4-year-olds tended to choose all options equally often, regardless of reward value, switching between options on almost every trial in a highly systematic fashion that prioritized options that were less recently selected. This lag-based switching strategy is computationally simple, as it is akin to graded novelty preference, yet it leads to broad sampling of the environment and often approximates uncertainty-based exploration. This behavior may comprise an effective information gathering approach early in life, when the value of learning outweighs the need for immediate gains. In contrast, adults were overwhelmingly exploitative, basing their decisions on reward value and were rarely exploring.
Five-to-eight-year-olds were between these two extremes and were largely characterized by substantial individual variability: some 5–8-year-olds explored as frequently as younger children, whereas others explored as rarely as adults (Figure 3A). The relatively uniform behavior among adults and among the youngest children coupled with large individual variability in 5–8-years-old indicates that this could be a transitional age group. Understanding why the developmental timeline differs between individual 5–8-year-olds in these important behaviors could provide key insights into the mechanisms underlying this transition. These differences between children may be the result of maturational factors, experiential factors, or a combination of the two. Future longitudinal studies coupled with neuroimaging examining brain development would help address these important questions.
We propose that this “systematic switching” observed in most 3- and 4-year-olds may be a particularly effective and efficient way of exploring early in life in that it allows a type of systematic exploration—often approximating uncertainty-driven exploration but requiring limited prior knowledge and computational capacity. In general, the longer it has been since an option was last sampled, the greater the epistemic uncertainty in the option’s outcome since the environment may be changing over time. This situation is not dissimilar from the one in which a person puts a valuable object in a particular location. Despite their encoding for what was hidden and the hiding location, the mere passage of time may decrease the person’s confidence that the item is still there. This is because (a) the environment could have changed, with someone either taking the item or moving it to a different location, and (b) memory potentially becoming less robust with time. Although rewards are stable in the experiment, participants cannot know this without checking each location. Therefore, their epistemic uncertainty about future reward in each given location should increase as a function of lag. An agent using the “systematic switching” strategy will often select the option with the greatest epistemic uncertainty from a set of options, even if they are not tracking objective uncertainty.
The reported results present evidence against the random-to-goal-directed hypothesis. At the same time, because uncertainty reduction could be a variant of novelty preference, they seem to support the novelty-preference-to-goal-directed hypothesis. However, these results also raise an important theoretical question regarding the mechanism of early exploratory behaviors. Is early exploration driven primarily by novelty preference that manifests itself as the tendency to reduce uncertainty? Or is it driven by tendency to reduce uncertainty that appears as novelty preference? Given that novelty preference could be biologically and computationally simpler (as it could be a result of familiarization or habituation) and, given that uncertainty-based exploration is present in 3–4-year-olds and decreases with age, we believe that the former is the case. However, much additional research is needed to answer this important theoretical question.
Limitations and Future Directions
Research presented here advances our understanding of the development of decision making and balancing of exploration and exploitation. At the same time, it leaves a number of questions unanswered, thus offering avenues for future research.
One very general unanswered question is the generalizability of the reported findings outside the population represented in the present study (our sample was drawn from a Western, industrialized, educated, middle class population). As our confidence in the replicability of the results in the current population increases, replications outside of this population become increasingly informative.
There is a number of more specific unanswered questions that warrant further research. First, it could be argued that present research may confound the lag and frequency of options, as these are not independent but are inversely related. Indeed, with lag of a given option approaching infinity, the frequency of the option approaches zero. While this is true in principle, it is not the case in practice. This is because the lag term in children is never very large, peaking at about 4 trials (see Blanco & Sloutsky, 2020a for the analysis of Hazard rates in choices). Second, analyses of choices of individual child participants indicate that, unless participants select the highest value (in which case they will be best fit by the value model), individual participants tend to select all options equally frequently (see Blanco & Sloutsky, 2020a, Figure 3).
At the same, while frequency of all options may be equivalent for the entire experiment (thus making lag independent of frequency for the entire experiment), lag and frequency may be related within a given (relatively short) time window. Therefore, it would be important to conduct a study, in which effects of frequency are estimated independently of the lag. We are currently in the process of conducting such a study.
Third, current research hypothesizes that immature cognitive control early in development results in an elevated exploratory behavior. Presently, there is only indirect support for this link. First, there is evidence in attentional, memory, and category learning tasks that, similar to decision-making tasks, young children tend to “over explore” visual stimuli instead of attending selectively to task- and goal-relevant aspects of the stimuli. Specifically, in contrast to adults who notice and remember mostly task- and goal-relevant information, young children notice and remember both relevant and irrelevant information (Deng & Sloutsky, 2016; Plebanek & Sloutsky, 2017). As a result, when tested on less relevant information, young children’s performance often exceeds that of adults. Interestingly, aging-based decline in cognitive control often results in performance similar to that observed early in development. In particular, older adults are often more likely to notice and remember task-irrelevant information than younger adults (see Healey, Campbell, & Hasher, 2008 for a review). Although aging research offers supporting evidence for the possibility of the link between cognitive control and (visual) exploration, systematic longitudinal investigations are needed to firmly establish the link between cognitive control and exploratory behaviors.
Another interesting avenue for future research is to examine individual differences in exploratory behavior, potentially linking these to “predictability” of home environment. This link is possible given recent findings that children’s beliefs about whether the environment is stable or not (inferred from the experimenter’s behavior) may affect their choices in the delay of gratification task (Kidd, Palmeri, & Aslin, 2012).
We also believe that the modeling approach could be taken further in future research, as the current approach has some limitations. Recall that we first evaluated the Full model and estimated values of ϕ from that model. We then used these parameter estimates in our statistical analyses. However, it could be argued that because only some, but not all participants were best-fit by the Full model, this analysis leaves some uncertainty. This limitation is mitigated somewhat by the fact that some child participants have values of ϕ approaching 0, some have these values approaching 1, and some have intermediate values (see Supplemental Figure 8). We, therefore, believe that the Full model capable of estimating the whole range of ϕ values is an adequate tool for estimating these values for the entire sample. Also, as shown in Supplemental Materials (Supplemental Figures 6 and 7), parameter and model recovery were adequate but not exceptional. Whereas some of these issues are expected when modeling children and with current parameter distributions (see Supplemental Figure 8), these issues should be examined systematically in future research.
Additionally, the current modeling approach mostly accounts for general patterns of choices, but not the learning curves, which are likely to differ both between individuals and age groups. Our task was not designed for in-depth examination of incremental learning over time, and our models likewise ignore this aspect of the decision-making process. Future research may consider an experimental and modelling approach that is capable of accounting for both learning and choice patterns simultaneously.
And finally, the reported results diverge somewhat form the recently reported findings that increasing cognitive load or imposing time pressure in adults resulted in a decrease in directed exploration and an increase in either random responding or repeating the previously chosen option (Cogliati Dezza, Cleeremans, & Alexander, 2019; Wu, Schulz, Pleskac, & Speekenbrink, 2022). Although the extent to which an experimentally manipulated resource limitation approximates development is not clear, the differences in patterns between the present research and the previously reported findings is worth discussing.
The central manipulation in the Cogliati Dezza et al. (2019) paper was to introduce a concurrent task with either high working memory demands (i.e., the High Load condition) or with low working memory demands (i.e., the Low Load condition). Importantly, in contrast to the present research, in the Cogliati Dezza et al. (2019) paper memory for options was not measured (note that memory for the concurrent task items was substantially lower in the High Load condition). Therefore, it is possible that the High Load condition resulted in attenuated memory for options in the main task, leading in turn to the higher proportion of random responding. By contrast, in our work, we explicitly demonstrate that the reported developmental differences do not stem from developmental differences in memory accuracy.
Wu et al. (2022) introduced time pressure and found that this manipulation resulted in reduction in both directed exploration and value-based responding in adults. Instead, participants repeated previous choices more often. Here it is possible that time limits eliminated uncertainty-based responding in adults because uncertainty generates slower responses, and there was simply not enough time to estimate uncertainty. Therefore, because epistemic uncertainty should generate slower responses in children (O’Leary & Sloutsky, 2017), imposing time limits may result in an increasing proportion of random responding in children. This is an interesting possibility because it can contribute to understanding whether early exploratory behavior is driven by novelty preference or by uncertainty reduction. This issue, as well as an analysis of response times in children’s choices, offers exciting directions for future research.
Conclusion
Overall, this research elucidates how children balance exploration and exploitation and how this balance changes with development. Young children exhibited predominately exploratory behavior, with exploitation-driven decision-making emerging gradually in the course of development. At the same time, these young children exhibited sophisticated behaviors—by both using systematic exploration and adapting this exploration over the course of the task, suggesting that their behavior is guided primarily by a drive to explore rather than an inability to prioritize rewarding outcomes. That the development of decision-making follows a similar developmental trajectory as other cognitive control-dependent processes suggests that immature control may play a role in facilitating exploration. Moreover, it offers one potential explanation for why humans exhibit such extended periods of immaturity compared to other animals. The extensive (and systematic) exploration at early ages sets the stage for more focused, planful, and efficient information gathering in the future and more effective, rewarding decisions for a lifetime. Investigating how this process unfolds provides a new window into how children transition from knowing very little of the world, to effectively navigating it while acquiring experience and information that will continue to guide them throughout their lives.
Supplementary Material
Open Practices Statement.
The experiment reported in this article was not formally preregistered. The data reported in this article is available on Open Science Framework at https://osf.io/pvjak/?view_only=79a992b41f0b4d6eb22c7d59f8bcbca9
Funding:
This research was supported by National Institutes of Health grant R01HD078545 to V.M.S.
Footnotes
The reported test statistics for all regression analyses are based on standardized predictors (i.e. the predictors were converted to z scores).
References
- Akaike H (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723. doi: 10.1109/TAC.1974.1100705 [DOI] [Google Scholar]
- Best CA, Yim H, & Sloutsky VM (2013). The cost of selective attention in category learning: Developmental differences between adults and infants. Journal of Experimental Child Psychology, 116, 105–119. 10.1016/j.jecp.2013.05.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blanco NJ, & Sloutsky VM (2019). Adaptive flexibility in category learning? Young children exhibit smaller costs of selective attention than adults. Developmental Psychology, 55(10), 2060–2076. 10.1037/dev0000777 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blanco NJ, & Sloutsky VM (2020a). Systematic exploration and uncertainty dominate young children’s choices. Developmental Science, 00:e13026. 10.1111/desc.13026 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blanco NJ, & Sloutsky VM (2020b). Attentional mechanisms drive systematic exploration in young children. Cognition, 202, 104327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bonawitz EB, van Schijndel TJ, Friel D, & Schulz L (2012). Children balance theories and evidence in exploration, explanation, and learning. Cognitive Psychology, 64, 215–234. doi: 10.1016/j.cogpsych.2011.12.002 [DOI] [PubMed] [Google Scholar]
- Bunge SA, Dudukovic NM, Thomason ME, Vaidya CJ, & Gabrieli JD (2002). Immature frontal lobe contributions to cognitive control in children: evidence from fMRI. Neuron, 33, 301–311. doi: 10.1016/S0896-6273(01)00583-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Casey BJ, Giedd JN, & Thomas KM (2000). Structural and functional brain development and its relation to cognitive development. Biological Psychology, 54, 241–257. doi: 10.1016/S0301-0511(00)00058-2 [DOI] [PubMed] [Google Scholar]
- Coch D, Sanders LD, & Neville HJ (2005). An event-related potential study of selective auditory attention in children and adults. Journal of Cognitive Neuroscience, 17, 605–622. doi: 10.1162/0898929053467631 [DOI] [PubMed] [Google Scholar]
- Cogliati Dezza I, Yu AJ, Cleeremans A, & Alexander W (2017). Learning the value of information and reward over time when solving exploration-exploitation problems. Scientific Reports, 7, 16919. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cogliati Dezza I, Cleeremans A, & Alexander W (2019). Should we control? The interplay between cognitive control and information integration in the resolution of the exploration-exploitation dilemma. Journal of Experimental Psychology: General, 148, 977–993. [DOI] [PubMed] [Google Scholar]
- Cook C, Goodman ND, & Schulz LE (2011). Where science starts: Spontaneous experiments in preschoolers’ exploratory play. Cognition, 120, 341–349. doi: 10.1016/j.cognition.2011.03.003 [DOI] [PubMed] [Google Scholar]
- Daw ND, O’doherty JP, Dayan P, Seymour B, & Dolan RJ (2006). Cortical substrates for exploratory decisions in humans. Nature, 441, 876–879. doi: 10.1038/nature04766 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deng W, & Sloutsky VM (2016). Selective attention, diffused attention, and the development of categorization. Cognitive Psychology, 91, 24–62. doi: 10.1016/j.cogpsych.2016.09.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dubois M, Bowler A, Moses-Payne ME, Habicht J, Moran R, Steinbeis N, & Hauser TU (2022). Exploration heuristics decrease during youth. Cognitive, Affective, & Behavioral Neuroscience, 22, 969–983. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dufresne A, & Kobasigawa A (1989). Children’s spontaneous allocation of study time: Differential and sufficient aspects. Journal of Experimental Child Psychology, 47, 274–296. [Google Scholar]
- Healey MK, Campbell KL, & Hasher L (2008). Cognitive aging and increased distractibility: costs and potential benefits. Progress in Brain Research, 169, 353–363. doi: 10.1016/s0079-6123(07)00022-2 [DOI] [PubMed] [Google Scholar]
- Hills TT, Todd PM, Lazer D, Redish AD, Couzin ID, & Cognitive Search Research Group. (2015). Trends in Cognitive Sciences, 19, 46–54. doi: 10.1016/j.tics.2014.10.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Giron AP, Ciranka S, Schulz E, van den Bos W, Ruggeri A, Meder B, & Wu CM (2022). Developmental changes resemble stochastic optimization. PsyArXiv https://psyarxiv.com/9f4k3/?ref=https://githubhelp.com [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gopnik A (2020). Childhood as a solution to explore-exploit tensions. Philosophical Transactions of the Royal Society B: Biological Sciences, 375(1803), 20190502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gopnik A, O’Grady S, Lucas CG, Griffiths TL, Wente A, Bridgers S, Aboody R, Fung H, & Dahl RE (2017). Changes in cognitive flexibility and hypothesis search across human life history from childhood to adolescence to adulthood. Proceedings of the National Academy of Sciences, 114, 7892–7899. 10.1073/pnas.1700811114 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kakade S, & Dayan P (2002). Dopamine: Generalization and bonuses. Neural Networks, 15, 549–559. doi: 10.1016/S0893-6080(02)00048-5 [DOI] [PubMed] [Google Scholar]
- Kidd C, Palmeri H, & Aslin RN (2013). Rational snacking: young children’s decision-making on the marshmallow task is moderated by beliefs about environmental reliability. Cognition, 126, 109–114. 10.1016/j.cognition.2012.08.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Loewenstein G (1994). The psychology of curiosity: A Review and reinterpretation. Psychological Bulletin, 116, 75–98. doi: 10.1037/0033-2909.116.1.75 [DOI] [Google Scholar]
- Liquin EG, & Gopnik A (2022). Children are more exploratory and learn more than adults in an approach-avoid task. Cognition, 218, 104940. [DOI] [PubMed] [Google Scholar]
- Lockl K, & Schneider W (2004). The effects of incentives and instructions on children’s allocation of study time. European Journal of Developmental Psychology, 1, 153–169. [Google Scholar]
- Meder B, Wu CM, Schulz E, & Ruggeri A (2021). Development of directed and random exploration in children. Developmental Science, 24, e13095. 10.1111/desc.13095 [DOI] [PubMed] [Google Scholar]
- Mehlhorn K, Newell BR, Todd PM, Lee MD, Morgan K, Braithwaite VA, Hausmann D, Fiedler K, & Gonzalez C (2015). Unpacking the exploration–exploitation tradeoff: A synthesis of human and animal literatures. Decision, 2, 191–215. doi: 10.1037/dec0000033 [DOI] [Google Scholar]
- O’Leary AP, & Sloutsky VM (2017). Carving metacognition at its joints: Protracted development of component processes. Child Development, 88, 1015–1032. 10.1111/cdev.12644 [DOI] [PMC free article] [PubMed] [Google Scholar]
- O’Leary AP, & Sloutsky VM (2019). Components of metacognition can function independently across development. Developmental Psychology, 55, 315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pelz M, & Kidd C (2020). The elaboration of exploratory play. Philosophical Transactions of the Royal Society B: Biological Sciences, 375(1803), 20190503. 10.1098/rstb.2019.0503 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Plebanek DJ, & Sloutsky VM (2017). Costs of selective attention: when children notice what adults miss. Psychological Science, 28, 723–732. doi: 10.1177/0956797617693005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Plebanek DJ, & Sloutsky VM (2018). Selective attention, filtering, and the development of working memory. Developmental Science, 164, e12727–12. doi: 10.1111/desc.12727 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ruggeri A, Lombrozo T, Griffiths TL, & Xu F (2016). Sources of developmental change in the efficiency of information search. Developmental Psychology, 52, 2159–2173. 10.1037/dev0000240 [DOI] [PubMed] [Google Scholar]
- Schulz LE, & Bonawitz EB (2007). Serious fun: Preschoolers engage in more exploratory play when evidence is confounded. Developmental Psychology, 43, 1045–1050. doi: 10.1037/0012-1649.43.4.1045 [DOI] [PubMed] [Google Scholar]
- Schulz E, Wu CM, Ruggeri A, & Meder B (2019). Searching for rewards like a child means less generalization and more directed exploration. Psychological Science, 30(11), 1561–1572. [DOI] [PubMed] [Google Scholar]
- Selmeczy D, & Ghetti S (2019). Metacognition. The Encyclopedia of Child and Adolescent Development, 1–10. [Google Scholar]
- Sim ZL, & Xu F (2017). Infants preferentially approach and explore the unexpected. British Journal of Developmental Psychology, 35, 596–608. doi: 10.1111/bjdp.1219 [DOI] [PubMed] [Google Scholar]
- Somerville LH, Sasse SF, Garrad MC, Drysdale AT, Abi Akar N, Insel C, & Wilson RC (2017). Charting the expansion of strategic exploratory behavior during adolescence. Journal of Experimental Psychology: General, 146, 155–164. doi: 10.1037/xge0000250 [DOI] [PubMed] [Google Scholar]
- Stahl AE, & Feigenson L (2015). Observing the unexpected enhances infants’ learning and exploration. Science, 348, 91–94. doi: 10.1126/science.aaa3799 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sumner E, Li AX, Perfors A, Hayes B, Navarro D, & Sarnecka BW (2019, September 4). The Exploration Advantage: Children’s instinct to explore allows them to find information that adults miss. PsyArXiv. Retrieved from https://psyarxiv.com/h437v. 10.31234/osf.io/h437v [DOI] [Google Scholar]
- Sutton RS, & Barto AG (1998). Reinforcement learning: An introduction. Cambridge: MIT press. [Google Scholar]
- Ülkümen G, Fox CR, & Malle BF (2016). Two dimensions of subjective uncertainty: Clues from natural language. Journal of Experimental Psychology: General, 145, 1280–1297. doi: 10.1037/xge0000202 [DOI] [PubMed] [Google Scholar]
- Wu CM, Schulz E, Pleskac TJ & Speekenbrink M (2022). Time pressure changes how people explore and respond to uncertainty. Scientific Reports, 12, 4122. 10.1038/s41598-022-07901-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
