Skip to main content
NPJ Science of Learning logoLink to NPJ Science of Learning
. 2026 Jan 21;11:13. doi: 10.1038/s41539-026-00401-1

Children consider changes in performance over time when reasoning about academic achievements

Ying Hu 1,2, Yuhang Shu 3, Xin Zhao 1,
PMCID: PMC12923749  PMID: 41565694

Abstract

Changes in performance over time can provide important information about one’s ability and effort. Three preregistered studies examined how children aged 4 to 10 perceive and evaluate academic performance changes (N = 256; 131 girls, all Han ethnicity, China). When evaluating two characters with matched final performance but different performance trajectories, with age, children increasingly perceive the character with improving performance as less smart but more hardworking than the one with constant performance, and they evaluate the improving character more favorably than the constant character (Studies 1 and 2). However, they increasingly favor a constant character over one with declining performance (Study 2). When the improving character outperforms the constant character in the final performance and overall performance is matched (Study 3), even 4- to 6-year-olds favor the improving character over the constant character. These findings highlight children’s developing ability to flexibly reason about and evaluate changes in performance over time.

Subject terms: Neuroscience, Psychology, Psychology

Introduction

We all aspire to achieve, yet the paths to achievement can vary greatly from person to person and from situation to situation. Some individuals may excel from the start and maintain high performance consistently, while others may start from a comparatively lower position and improve over time. Although these two individuals may ultimately attain similar levels of achievements, their differing performance trajectories may lead us to draw differing inferences and evaluations. For instance, one might perceive the former individual as demonstrating a high level of ability at the task but attribute less effort to them, whereas the latter individual may be viewed as starting with lower ability but as having exerted substantial effort to improve over time. These differing inferences of ability and effort can also lead to different evaluations of these individuals, depending on the value placed on ability versus effort. Additionally, some individuals may initially perform well but experience a decline over time. Thus, when reasoning about others’ achievements, we intuitively consider not only individuals’ current (or final) performance but also changes over time (or the lack thereof). These changes can be informative regarding both their natural abilities and their persistence in the face of challenges. In this study, we investigate how people, especially children, consider changes in performance over time when reasoning about and evaluating others’ achievements. Examining this question among children is particularly important, as it can contribute to a broader understanding of how children make sense of others’ competence and performance, and may offer practical insights into how children themselves respond to changes in their own performance.

Children’s explanations of achievement outcomes often center on two core constructs—ability and effort—and decades of research has shown that the way children interpret these causes has lasting consequences for their motivation, persistence, and reactions to setbacks1,2. Understanding how children reason about these two constructs is therefore essential for understanding how they make sense of others’ achievements and, ultimately, their own. In recent decades, significant progress has been made in understanding how children reason about achievement, ability, and effort3. Research has delved into various aspects, including whether children believe that ability can be improved through effort (i.e., growth vs. fixed mindset)1, whether children attribute success or failure to ability or effort (among other factors)2, and whether they expect that they can succeed in tasks4,5.

Particularly relevant to young children’s understanding of ability and effort, classic views have argued that young children tend to conflate ability and effort, and not until age 10 do they differentiate the two, understanding that ability and effort separately contribute to academic performance6. However, recent research has modified the methodology and challenged this view, showing that children as young as 5 years old can differentiate between ability and effort, and can understand their joint influence on performance79. For instance, young children can infer an individual’s ability based on their effort and performance; when presented with an individual performing equally well as peers but exerting less effort, 4- to 5-year-old children infer that the individual is smarter than their peers9. Similarly, 3- to 5-year-old children infer someone finding a task easier as smarter than someone finding a task harder8. Furthermore, by around age 5, children can also infer the level of effort from ability and performance, viewing an individual who is less smart but achieves a similar outcome as someone else as more hardworking, and such inferences become more robust with age10.

Although recent work provides increasing evidence that young children can make sense of others’ ability, effort, and achievement, most previous work has focused on children’s reasoning about individuals’ performance at a single point in time. Nonetheless, in the real world, one’s performance does not always stay constant, but in fact frequently changes over time. Importantly, changes in performance over time (or the lack thereof) can be informative and accessible cues as to one’s ability and/or effort. Often, we may not have access to direct indicators of how competent or hardworking someone is, but we may have information on their performance over time, such as a child’s artworks over multiple classes, a student’s exam scores over a year, or an athlete’s performance in a series of matches. Examining whether children use such temporal cues is therefore theoretically meaningful: if children rely on performance trajectories to explain why others succeed or struggle, these explanations may form a core part of their intuitive causal theories of academic performance, with downstream consequences for how they interpret their own successes and failures. Can children make sense of performance over time? Can they infer ability and/or effort based on changes in performance over time (or the lack thereof)? And do they take into account performance trajectories when evaluating others’ achievements? We investigate these questions in the current study.

Children’s reasoning about performance changes may rely on their cognitive capacities to track temporal changes. This involves holding prior outcomes in mind while evaluating current ones—a process supported by executive functions such as working memory and metacognitive skills, both of which show marked development during the preschool and early elementary years11,12. Previous work shows that children can monitor their own past performance as early as preschool13. For example, 3- to 5-year-olds demonstrate metacognitive awareness by reporting feeling less confident on trials where they had erred and were more likely to seek help or skip these trials if given the opportunity14,15. With age, they increasingly become capable of drawing on their past performance to guide later decisions—for example, by ages 5 to 7, they begin allocating study time more strategically and choosing tasks in light of how they performed previously16,17. Here, we examine whether children can track changes in others’ performance as third-person observers, which may impose additional cognitive demands. In particular, children may need to infer the mental states underlying others’ performance changes, a process supported by theory-of-mind abilities that show substantial development during the preschool years and continue to refine through middle childhood18.

Although not focused on academic performance, recent research has examined how children reason about others’ temporal changes in psychological and socio-moral domains, suggesting that young children gradually become capable of tracking changes over time and making appropriate inferences19,20. For instance, by age 3, children can use prior mental states to explain current ones and anticipate others’ future mental states19,21,22. Importantly, with age, children can gradually differentiate various trajectories of event changes and make reasonable predictions20,23. For example, 4- to 10-year-olds increasingly expect an individual to show more positive responses (i.e., feel happy) towards a re-encountered individual who initially acted harmfully but later displayed friendliness, compared to an individual whose behavior shifts from friendly to harmful20. This suggests that, in early and middle childhood, children gradually develop the capacity to track changes over time, at least when reasoning about others’ mental states and social interactions. However, it remains an open question whether children are sensitive to changes in academic performance and whether they use different performance trajectories (i.e., improvement, stability, or decline) to infer others’ ability and effort.

Although, to our knowledge, no research has directly examined these questions, we have reasons to hypothesize that children may gradually differentiate different performance trajectories and infer individuals’ ability and effort with age. Specifically, we think that children may increasingly infer more effort from improvement and infer higher ability (or talent) from consistent success. One potential source that may shape such inferences is children’s own experiences with performance changes. Previous research indicates that children have an early understanding of changes in their own performance and make predictions of their future performance based on their performance trajectory24,25. For instance, children around age 5 compare their current performance with their prior performance in a drawing task and provide a higher self-appraisal when they have improved than when they have gotten worse26. Furthermore, after recalling a time when they improved, older children (8-year-olds and above) feel proud and have a sense of progress27. A recent study directly examines whether 4- to 6-year-olds decide to persist depending on their performance trajectories, and finds that children whose performance improved over time are more likely to persist in the challenging task than those whose performance stayed constant, even when their final trial of performance was matched28. This work suggests that children’s own performance trajectories influence their confidence and persistence in future tasks. While this work doesn’t explicitly address children’s inferences about effort, the positive self-appraisal and confidence following improvement suggest that children may recognize the personal contribution required to achieve progress. However, it remains unknown whether children can track others’ performance over time, and how they make inferences about others’ performance trajectories. When reasoning about others’ performance, children may have less access to the confidence and pride of the individuals, but may draw from their own experiences of making progress and infer that one must have put in effort to improve over time.

In addition to inference, another question we aim to examine is how children evaluate individuals with different performance trajectories. For instance, how do children compare two individuals with matched endpoint performance, one who improves over time and another whose performance remains constant? Previous research suggests that 3- to 8-year-old children have positive expectations of future performance for individuals who consistently succeed or steadily improve, but that they do not have the same positive expectations for those who fail consistently24,25. However, such expectations may primarily depend on overall outcome valence (i.e., success vs. failure) instead of the process or trajectories that lead to those outcomes. Here, we are more interested in how adults and children evaluate individuals with matched final performance but differing performance trajectories (i.e., improvement vs. consistency). This can provide insights into whether adults and children evaluate others not just based on outcomes, but also based on how an outcome was reached.

Some people may value improvement in addition to outcome, just like Tiger Woods put it: “No matter how good you get, you can always get better.” Conversely, others may prefer consistently good performance, valuing ability and consistency. Certainly, these evaluations may depend on how individuals value ability versus effort. Recent research has revealed developmental changes in children’s evaluations of talented individuals (i.e., “naturals”) and hardworking individuals (i.e., “strivers”). Preschoolers prefer those who are naturally smart over those who work hard29,30 and believe that a smart classmate is more likely to achieve academic success than a less smart classmate31. However, this preference shifts as children get older (around age 7); they gradually prefer peers who are hardworking10,32 and allocate more resources to hard workers3335. This body of work suggests that if children can recognize that improvement requires more effort than maintaining constant performance, they may gradually develop more favorable evaluations of individuals who improve compared to those whose performance remains the same, as their appreciation of effort increases with age.

Here, we investigate children’s inferences and evaluations of individuals with varying performance trajectories in a Chinese cultural context. China provides a particularly valuable context for studying this question, as its cultural values place strong emphasis on effort and self-improvement3638. Rooted in traditional philosophies such as Confucianism, the Chinese perspective views ability as something that accumulates through sustained effort and learning rather than as an innate and fixed trait39. Classic sayings such as “Genius comes from hard work and knowledge depends on accumulation” have been widely endorsed among Chinese students40. Traditional Chinese philosophy also acknowledges that individuals with lower initial ability may progress more slowly, but with sufficient effort, they are still expected to “cover the same ground.” 41. In educational contexts, this emphasis on effort is further reinforced through pedagogy: children are explicitly taught the value of diligence and self-improvement42, and parents are more likely to attribute children’s performance to effort rather than ability43,44. Taken together, these cultural emphases on effort and incremental improvement suggest that performance trajectories may be particularly informative to children in this context.

In the current study, we investigate whether and how 4- to 10-year-old Chinese children make inferences about the level of ability and effort based on changes in performance over time, as well as how they evaluate individuals with varying performance trajectories. In Study 1, we ask children to compare two characters with matched final performance but differing trajectories: one character’s performance improves over time while the other’s stays constant. We examine children between the ages of 4 and 10, as this is roughly the age range over which previous research has shown developmental changes in reasoning and evaluation of ability, effort, and academic performance9,10,32. This cross-sectional design enables us to examine these age-related differences across a relatively large age range. Based on prior findings that sensitivity to effort and performance cues increases across early to middle childhood10,32,34, we predict that older children will infer greater effort for the improving character and evaluate the improving character more favorably than younger children. We also test a group of adults and use their responses as a frame of reference for interpreting developmental data.

In Study 2, we replicate the findings of Study 1 with some methodological changes. More importantly, to examine whether children are also sensitive to the direction of change, we test how children compare two characters with matched final performance when one character’s performance decreases over time while the other’s stays constant. We predict that, as older children infer improvement as a sign of effort, they may also infer declining performance as a sign of a lack of effort, and evaluate the declining character less favorably.

In Study 3, we investigate whether there are cases in which the younger children (i.e., 4-to 6-year-olds) form more favorable judgments of the improving character. Specifically, we examine scenarios in which the two characters’ overall performance is matched, but the improving character outperforms the constant character by the end. We predict that younger children (ages 4 to 6) may not prefer the improving character until they see evidence of that character surpassing the constant character.

Results

Study 1

In Study 1, 4-to 10-year-olds were asked to compare two characters with matched final performance but differing trajectories: one character (i.e., the improving character) received three stickers on the first exam and four stickers on the second exam, while the other character (i.e., the constant character) received four stickers on both exams.

Preliminary analyses revealed no significant effect of testing modality (in person or online) (ps > 0.29) or participant gender (ps > 0.48) on any of the measures, and thus we did not include testing modality or participant gender in subsequent models. Following our preregistered analysis plan, we first fit full models as specified below and then removed one interaction term at a time if not significant. We will only report results from the final models. In analyses including age, we treated age as a continuous variable, but note that analyses on age groups would yield results of a similar pattern. In all questions, participants’ selections of “improving character” were scored as 1, while their selections of “constant character” were scored as 0 (see Fig. 1 for their responses).

Fig. 1. Children’s and adults’ responses in Study 1.

Fig. 1

Error bands and error bars represent 95% bootstrapped confidence intervals.

We first examined children’s and adults’ responses to the improvement question (i.e., “Who do you think improved more during the half year?”). For children, we fit a binomial linear mixed effects model predicting responses (0 = constant character, 1 = improving character) as a function of age (mean-centered), with a random intercept for participants. We found a significant effect of age (B = 0.84, SE = 0.16, 95% CI [0.52, 1.15], p < 0.001; β = 1.67): with age, children increasingly answered that the improving character improved more than the constant character. Inspired by the Johnson–Neyman (J–N) technique45, we then estimated the precise ages at which children selected one character as improving more than the other. The J–N technique is typically a post-hoc procedure used to probe interactions involving a continuous moderator. Instead of dividing participants into arbitrary age groups, the J–N method identifies the specific values of age at which the effect of interest becomes statistically significant. Using this approach, we found that younger children selected the constant character significantly above chance up until 5.7 years of age, whereas children began selecting the improving character significantly above chance starting at 7.1 years of age.

For adults, as expected, a predominant majority of the adults answered that the improving character improved more (M = 0.93, SD = 0.26). To test whether this was significantly above chance, we conducted an intercept-only binomial linear mixed effects model. The model revealed that the intercept was significantly greater than 0.5 (the chance level), indicating that adults viewed the improving character as improved more (B = 13.35, SE = 2.48, 95% CI [8.49, 18.21], p < 0.001).

We then examined children’s and adults’ responses to the two inference questions (i.e., “Who is smarter?”, “Who is more hardworking?”). For children, we fit a binomial linear mixed effects model predicting responses as a function of measure, age (mean-centered), and their interaction. We found a significant interaction between age and measure (B = −0.59, SE = 0.17, 95% CI [−0.94, −0.25], p < 0.001; β = −1.18). Simple slope tests revealed that the age effect was significant for the hardworking question (B = 0.43, SE = 0.12, p < 0.001), but not for the smart question (B = −0.17, SE = 0.13, p = 0.200). Specifically, children across ages inferred the constant character as smarter (M = 0.20, SD = 0.40, intercept-only binomial linear mixed effects model, p < 0.001), while, as hypothesized, they became increasingly likely to infer that the improving character was more hardworking with age. Johnson–Neyman analysis revealed that, starting at 7.3 years of age, children significantly inferred the improving character to be more hardworking.

For adults, the binomial linear mixed effects model predicting adults’ responses as a function of measure (smart, hardworking) revealed a significant measure effect (B = −2.80, SE = 0.38, 95% CI [−3.55, −2.05], p < 0.001; β = −2.80): adults were significantly more likely to select the improving character in the hardworking question than in the smart question. Most adults inferred that the improving character was more hardworking (Hardworking: M = 0.80, SD = 0.40, intercept-only binomial linear mixed effects model, p < 0.001), while the constant character was smarter (Smart: M = 0.20, SD = 0.40, intercept-only binomial linear mixed effects model, p < 0.001).

We then looked at children’s responses to the three evaluation questions (i.e., “Who would you award a prize to?”, “Who do you like better?”, “Who do you think will be more successful in the future?”). We fit a binomial linear mixed effects model predicting responses (0 = constant character, 1 = improving character) as a function of measure (prize, preference, success), age (mean-centered), and their interaction. We did not find a significant interaction (ps > 0.422), so the interaction term was dropped. Consistent with our hypothesis, the final model revealed a significant effect of age (B = 1.26, SE = 0.24, 95% CI [0.78, 1.74], p < 0.001; β = 2.50), showing that, with age, children were more likely to favor the improving character on all three evaluation measures: they were more likely to award a prize to the improving character, more likely to say they liked the improving character better, and more likely to view the improving character as more likely to be successful in the future. Specifically, Johnson–Neyman analyses revealed that children selected the constant character significantly above chance up until around 6 years old (prize: 6.0 years; preference: 6.1 years; success: 6.7 years), whereas they selected the improving character significantly above chance starting at around 7 to 8 years old (prize: 7.4 years; preference: 7.7 years; success: 8.7 years). The main effect of measure was also significant (B = −1.07, SE = 0.46, 95% CI [−1.97, −0.16], p = 0.021; β = −1.07), reflecting the difference between the success question and prize question: across ages, children were more likely to select the improving character on the prize question (M = 0.58, SD = 0.50) than on the success question (M = 0.47, SD = 0.50). The other pairwise differences between measures were not significant (ps > 0.085).

For adults, we fit a binomial linear mixed effects model predicting their responses as a function of measure and found significant effects of measure. Adults were more likely to select the improving character on the prize question than on the preference question (B = 1.38, SE = 0.44, 95% CI [0.53, 2.24], p = 0.002; β = 1.38) or the success question (B = 1.75, SE = 0.45, 95% CI [0.86, 2.63], p < 0.001; β = 1.75). There was no significant difference between adults’ responses to the preference question and the success question (p = 0.344). Intercept-only binomial linear mixed effects models revealed that adults significantly favored the improving character on the prize question (M = 0.77, SD = 0.42, B = 1.26, SE = 0.26, 95% CI [0.75, 1.77], p < 0.001), but their responses on the preference question (M = 0.59, SD = 0.50) and the success question (M = 0.52, SD = 0.50) did not differ from chance (ps > 0.132).

Study 1 revealed that when asked to compare an improving character to a constant character with matched endpoint performance, younger children considered the constant character to be both smarter and more hardworking, whereas older children (starting around age 7) gradually shifted to viewing the constant character as smarter but perceiving the improving character as more hardworking. There were also notable developmental changes in their evaluations: younger children tended to favor the constant character, while older children awarded a prize to the improving character more, preferred the improving character more, and believed the improving character would be more successful in the future compared to the constant character.

Study 2

Study 2 was designed to: (1) replicate the findings of Study 1 with several methodological modifications, and (2) investigate whether children consider the direction of performance change (i.e., improving vs. decreasing) in their inferences and evaluations. The first methodological change pertains to how we present the characters’ performance. In Study 1, performance was indicated by the number of red flower stickers awarded on each exam. While this is a common indicator of performance used in Chinese classrooms, children may have simply added up the number of stickers received across the exams, basing their inferences and evaluations simply on the total amount of resources (i.e., stickers) rather than the performance trajectories. For instance, younger children might have favored the constant character merely due to a greater total number of stickers, whereas older children’s increasing preference for the improving character could reflect empathy and resource allocation towards a relatively disadvantaged character. Thus, in Study 2, we instead used ranking on an honor roll to indicate performance. The second methodological change was to make the changing trajectories more salient. In Study 2, we presented characters’ performance on three exams, rather than two. The third and final modification pertains to the wording of the improvement question. In Study 1, we asked “Who improved more over the half year?” and children of 7 years and older correctly answered this question. We think it is possible that younger children can already track changes in performance, but simply find the term “improvement” difficult to grasp. Therefore, in Study 2, we instead asked whose performance “changed more,” which may be simpler for children to understand.

More importantly, to systematically explore children’s understanding of changes in academic performance and their corresponding inferences, it is critical to examine whether children’s inferences and evaluations are also sensitive to the directions of changes (i.e., improving or decreasing). Therefore, in Study 2, in addition to the Improving condition, in which an improving character was compared with a constant character, we also included a decreasing condition, in which a decreasing character was compared with a constant character (with final performance matched).

In the following analysis, we will first present results for the Improving condition, and then those for the Decreasing condition (see Fig. 2). In these analyses, similar to Study 1, we treated age as a continuous variable, and for all models with interaction terms, we dropped the interaction terms if not significant. We will report results from the final models. Additionally, we included the presentation order of the two conditions (improving condition first, or decreasing condition first) as a fixed factor in all models to control for potential sequence effects, but since it was not the main factor that we focused on, we will only report the effect of presentation order when it is significant.

Fig. 2. Children’s and adults’ responses in Study 2.

Fig. 2

Panel (a) shows responses in the improving condition, and panel (b) shows responses in the decreasing condition. Error bands and error bars represent 95% bootstrapped confidence intervals.

Improving condition

We first examined children’s and adults’ responses in the improving condition. Following our pre-registration, we first investigated children’s responses to the change question (i.e., “Whose performance do you think changed more during the semester?”). We fit a binomial linear mixed effects model predicting responses (0 = constant character, 1 = improving character) as a function of age (mean-centered) and presentation order, with a random intercept for participants. We found a significant main effect of age (B = 1.04, SE = 0.25, 95% CI [0.54, 1.54], p < 0.001; β = 1.66): with age, children increasingly answered that the improving character changed more. Using the Johnson–Neyman approach45, we found that children selected the improving character significantly above chance starting at 5.3 years of age. For adults, as expected, the overwhelming majority answered that the improving character changed more (M = 0.97, SD = 0.17, intercept-only binomial linear mixed effects model, p < 0.001).

We then examined children’s responses to the two inference questions (i.e., “Who is smarter?”, “Who is more hardworking?”). We fit a binomial linear mixed effects model predicting responses as a function of measure, age (mean-centered), presentation order, and interaction between age and measure. As in Study 1, we found a significant interaction between age and measure (B = −0.86, SE = 0.28, 95% CI [−1.41, −0.31], p = 0.002; β = −1.36). Simple slope tests revealed that the age effect was significant for the hardworking question (B = 0.98, SE = 0.26, p < 0.001), but not for the smart question (B = 0.12, SE = 0.16, p = 0.459). Children across ages inferred the constant character to be smarter (M = 0.37, SD = 0.48, intercept-only binomial linear mixed effects model, p = 0.007), while with age, children were more likely to infer that the improving character was more hardworking. Specifically, Johnson–Neyman analyses revealed that young children’s responses did not significantly differ from chance, whereas children selected the improving character as more hardworking starting around 6.2 years of age.

For adults, a binomial linear mixed effects model predicting their responses as a function of measure revealed a significant effect of measure (B = −2.86, SE = 0.35, 95% CI [−3.54, −2.17], p < 0.001; β = −2.86). In line with the results of Study 1, adults inferred the constant character to be smarter (M = 0.19, SD = 0.39, intercept-only binomial linear mixed effects model, p < 0.001), whereas they inferred the improving character to be more hardworking (M = 0.80, SD = 0.40, intercept-only binomial linear mixed effects model, p < 0.001).

For children’s responses to the three evaluation questions (i.e., “Who would you award a prize to?”, “Who do you like better?”, “Who do you think will be more successful in the future?”), we fit a binomial linear mixed effects model predicting responses as a function of measure (prize, preference, and success), age (mean-centered), presentation order, and interaction between age and measure. We did not find any significant interaction (ps > 0.735), so the interaction term was dropped. Similar to Study 1, the final model revealed a significant main effect of age (B = 2.44, SE = 0.63, 95% CI [1.20, 3.68], p < 0.001; β = 3.85), showing a developmental trajectory along which children increasingly evaluated the improving character more favorably with age. We also found a significant main effect of measure: Children were more likely to select the improving character on the prize question (M = 0.58, SD = 0.50) than on the preference question (M = 0.51, SD = 0.50; B = 1.16, SE = 0.57, 95% CI [0.04, 2.28], p = 0.043; β = 1.16). There was no other significant difference between measures (ps > 0.186). Johnson–Neyman analyses revealed that children selected the constant character significantly above chance up until around 5 years old (prize: 5.1 years; preference: 5.9 years; success: 5.6 years), and they selected the improving character significantly above chance starting at around 7 years old (prize: 7.0 years; preference: 7.6 years; success: 7.3 years).

For adults, we found a significant main effect of measure. Again, adults were more likely to select the improving character on the prize question (M = 0.72, SD = 0.45) than on the preference question (M = 0.59, SD = 0.49; B = 1.00, SE = 0.40, 95% CI [0.22, 1.78], p = 0.012; β = 1.00) or the success question (M = 0.58, SD = 0.50; B = 1.08, SE = 0.40, 95% CI [0.30, 1.87], p = 0.007; β = 1.08). There was no significant difference between adults’ responses to the preference question and the success question (p = 0.824). Intercept-only binomial linear mixed effects models revealed that adults significantly favored the improving character on the prize question (p < 0.001), but their responses did not show significant differences from chance on the preference question (p = 0.065) and the success question (p = 0.122).

Decreasing condition

We next examined children's and adults’ responses in the decreasing condition. For the change question, we fit a binomial linear mixed effects model predicting responses (0 = constant character, 1 = decreasing character) as a function of age (mean-centered) and presentation order, with a random intercept for participants. We did not find a significant age effect (p = 0.786). Children across ages answered that the decreasing character changed more than the constant character (M = 0.91, SD = 0.28, intercept-only binomial linear mixed effects model, p < 0.001), showing a level comparable to that of adults (M = 0.95, SD = 0.21, intercept-only binomial linear mixed effects model, p < 0.001).

For children’s responses to the two inference questions in the decreasing condition, we fit a binomial linear mixed effects model predicting responses as a function of age (mean-centered), measure, their interaction, and presentation order. We found a significant interaction between age and measure (B = 0.62, SE = 0.30, 95% CI [0.04, 1.20], p = 0.036; β = 0.98). The simple slope analysis revealed that age significantly predicted selection of the constant character on both measures, though the age effect was stronger on the hardworking question (B = −1.33, SE = 0.38, p < 0.001) than on the smart question (B = −0.71, SE = 0.29, p = 0.013). Consistent with our prediction for the hardworking measure, children were less likely with age to select the decreasing character as more hardworking. Johnson–Neyman analyses revealed that, on the hardworking question, children selected the decreasing character significantly above chance up until 5.2 years of age, whereas they selected the constant character significantly above chance starting at 6.9 years old; on the smart question, children selected the decreasing character significantly above chance up until 4.2 years of age, but selected the constant character significantly above chance starting at 8.3 years old. We also found a significant effect of presentation order (B = 1.82, SE = 0.76, 95% CI [0.33, 3.32], p = 0.017; β = 0.91): Children were less likely to select the decreasing character when the decreasing condition was presented after the improving condition.

For adults, a binomial linear mixed effects model predicting responses as a function of measure revealed a significant main effect of measure (B = 3.10, SE = 0.64, 95% CI [1.84, 4.36], p < 0.001; β = 3.10); adults inferred the decreasing character to be smarter (M = 0.61, SD = 0.49, intercept-only binomial linear mixed effects model, p = 0.021), while they inferred the constant character to be more hardworking (M = 0.10, SD = 0.31, intercept-only binomial linear mixed effects model, p < 0.001).

We next analyzed children’s responses to the three evaluation questions in the Decreasing condition. We first ran a binomial linear mixed effects model predicting responses as a function of age (mean-centered), measure, their interaction, and presentation order. No significant interaction was found (ps > 0.142), so the interaction term was dropped. In addition, there was no effect involving measure (ps > 0.214). As we preregistered, to simplify the model, we treated measure as a random intercept. Consistent with our hypothesis, the final model revealed a significant age effect (B = −1.76, SE = 0.48, 95% CI [−2.71, −0.81], p < 0.001; β = −2.78); with age, children were significantly more likely to favor the constant character than the decreasing character. Johnson–Neyman analyses revealed that, except for the preference question (on which younger children’s responses did not significantly differ from chance), children selected the decreasing character significantly above chance up until around 5 years old (prize: 5.5 years; success: 4.9 years), while they selected the constant character significantly above chance starting at around 7 years old (prize: 7.4 years; preference: 7.1 years; success: 7.2 years). We also found a significant effect of presentation order (B = 2.43, SE = 1.14, 95% CI [0.19, 4.68], p = 0.034; β = 1.21): Children were less likely to select the decreasing character when the decreasing condition was presented after the improving condition.

For adults, a binomial linear mixed effects model predicting responses as a function of measure did not find a significant effect of measure (ps > 0.203). Intercept-only binomial linear mixed effects models showed that adults significantly favored the constant character across all three questions (prize: M = 0.23, SD = 0.42, p < 0.001; preference: M = 0.25, SD = 0.43, p < 0.001; success: M = 0.28, SD = 0.45, p < 0.001).

Additionally, when comparing the improving character to the decreasing character, we found that most children (70%) and adults (94%) preferred the improving character.

Study 2 replicated the developmental changes found in Study 1. Specifically, when comparing the improving character versus the constant character (with matched endpoint performance), although children of all ages considered the constant character to be smarter, older children (starting around age 7) inferred the improving character to be more hardworking, and also favored the improving character. Moreover, by using a simpler word (“change more” rather than “improve”), we found that even young children (around age 5) could track performance changes and correctly answer as to whose performance changed more.

We also found that children are sensitive to the direction of changes. When comparing a decreasing character versus a constant character (with matched endpoint performance), with age, children increasingly viewed the constant character as smarter and more hardworking, and favored the constant character over the decreasing character.

Study 3

Studies 1 and 2 found that, when comparing two characters with matched final performance but different performance trajectories, children aged 7 and older infer that someone who improves over time is more hardworking than someone whose performance stays constant over time, and they also evaluate the improving character more favorably. However, younger children (4- to 6-year-olds) in these two studies instead favor the constant character. One possibility is that these younger children do not yet recognize the value of improvement whatsoever and believe that someone who constantly performs well is better in every situation. Alternatively, 4- to 6-year-olds might value improvement, but need more explicit evidence to form a positive evaluation of the improving character. In Studies 1 and 2, because final performance between the two characters was kept matched, the constant character performed better overall (second place in all three exams vs. fourth-third-second for the improving character). Younger children may have favored the constant character because of this better overall performance. In Study 3, we examined whether there are cases in which younger children might also favor the improving character. In addition to the matched condition where the final performance was matched, we designed an outperformed condition in which the two characters’ overall performance across exams was matched, such that the improving character began behind the constant character but ultimately outperformed them. By removing overall-performance differences and providing evidence that the improving character outperformed the constant character at the end, Study 3 allowed us to assess whether more salient evidence of improvement would enable younger children to recognize and favor the improving character.

Below, we present results for each measure (see Fig. 3). We included presentation order of the two conditions (outperformed vs. matched first) as a fixed factor in all models to control for sequence effects. Preliminary analyses showed no significant main effect of presentation order on any measure (ps > 0.26), so we did not report these results further.

Fig. 3. Children’s responses in Study 3.

Fig. 3

Error bars represent 95% bootstrapped confidence intervals.

Most children (94% in the matched condition, 88% in the outperformed condition) correctly answered that the improving characters’ rankings were different over time, whereas the constant characters’ rankings were the same over time. Below, we report analyses including all the children. For children’s responses to the two inference questions (i.e., smart question, hardworking question, see Fig. 3), we first fit a binomial linear mixed effects model predicting responses as a function of measure, age (mean-centered), condition (matched and outperformed), and their interactions, with a random intercept for participants. There were no significant interaction effects (ps > 0.171), all the interaction terms were dropped. We also did not find any significant effect of measure (ps > 0.223). As we preregistered, we treated measure as a random intercept. Consistent with our prediction, the final model revealed a significant main effect of condition (B = 0.83, SE = 0.34, 95% CI [0.17, 1.50], p = 0.015; β = 0.83): Children were more likely to select the improving character as smarter and more hardworking in the outperformed condition (Smart: M = 0.55, SD = 0.50; hardworking: M = 0.63, SD = 0.49) than the matched condition (Smart: M = 0.45, SD = 0.50; hardworking: M = 0.43, SD = 0.50). Follow-up exploratory analysis revealed that the condition effect was only significant for the hardworking measure (B = 1.15, SE = 0.54, 95% CI [0.10, 2.21], p = 0.032; β = 1.15), but not the smart measure (p = 0.224). Intercept-only binomial linear mixed effects models revealed that, for the smart measure, children’s responses were not significantly different from chance in either condition (ps > 0.470); for the hardworking measure, their responses in the matched condition were not significantly different from chance (p = 0.319), but they tended to select the improving character in the outperformed condition (p = 0.067). The overall model also revealed a significant main effect of age (B = 0.76, SE = 0.32, 95% CI [0.14, 1.38], p = 0.016; β = 0.62): With age, children were more likely to select the improving character than the constant character.

For children’s responses to the evaluation questions (i.e., prize, preference, and success), we first fit a binomial linear mixed effects model predicting responses as a function of measure, age (mean-centered), condition, and their interactions, with a random intercept for participants. None of the interaction terms were significant (ps > 0.115), so we dropped all the interaction terms. We found no significant effect of measure (ps > 0.103), so we treated measure as a random intercept. Consistent with our hypothesis, the final model revealed a significant condition effect (B = 1.56, SE = 0.34, 95% CI [0.90, 2.22], p < 0.001; β = 1.56); children evaluated the improving character more favorably in the outperformed condition (M = 0.65, SD = 0.38) compared to the matched condition (M = 0.41, SD = 0.39). There was no significant effect of age (p = 0.201). We then fit intercept-only binomial linear mixed effects models separately for each condition. Children’s evaluations in the matched condition did not significantly differ from chance (p = 0.089), while they significantly favored the improving character in the outperformed condition (B = 0.97, SE = 0.42, 95% CI [0.15, 1.79], p = 0.024).

Taken together, the Study 3 findings complement those from Studies 1 and 2. For ease of comparison, Table 1 summarizes the key results across all three studies.

Table 1.

Summary of results on children’s inferences and evaluations across Studies 1–3

Smart question Hardworking question Evaluation questions (Prize, Preference, Success)

Study 1

Improving vs. Constant (matched final performance, 4–10yos)

No age effect. Children across ages inferred the constant character as smarter. Significant age effect, favoring the improving character after 7.3 years. Significant age effect, favoring the constant character before around 6 years (Prize: 6.0 years; Preference: 6.1 years; Success: 6.7 years), favoring the improving character after around 7–8 years (Prize: 7.4 years; Preference: 7.7 years; Success: 8.7 years).

Study 2

Improving vs. Constant (matched final performance, 4–9yos)

Replicated Study 1: No age effect. Children across ages inferred the constant character as smarter. Replicated Study 1: Significant age effect, favoring the improving character after 6.2 years. Replicated Study 1: Significant age effect, favoring the constant character before around 5 years (Prize: 5.1 years; Preference: 5.9 years; Success: 5.6 years), favoring the improving character after around 7–8 years (Prize: 7.0 years; Preference: 7.6 years; Success: 7.3 years).

Study 2

Decreasing vs. Constant (matched final performance, 4–9yos)

Significant age effect, favoring the decreasing character before 4.2 years, favoring the constant character after 8.3 years.

Significant age effect,

favoring the decreasing character before 5.2 years, favoring the constant character after 6.9 years.

Significant age effect, favoring the decreasing character before around 5 years (Prize: 5.5 years; Success: 4.9 years), favoring the constant character after around 7 years (Prize: 7.4 years; Preference: 7.1 years; Success: 7.2 years).

Study 3

Improving vs. Constant (matched condition vs. outperformed condition, 4–6yos)

No condition effect: no significant preference (compared to chance) in either condition. Significant condition effect: no significant preference (compared to chance) in the matched condition; trend favoring the improving character in the outperformed condition. Significant condition effect: no preference (compared to chance) in the matched condition; favoring the improving character in the outperformed condition.

In Study 3, we found that when the two characters’ overall performance was matched, and the improving character outperformed the constant character in the final performance, even 4- to 6-year-old children showed a trend to infer the improving character as more hardworking and significantly evaluated them more favorably than the constant character. These findings suggest that younger children are not completely insensitive to the value of improvement; rather, when explicit evidence is salient enough (i.e., better endpoint performance of the improving character), even preschoolers can recognize and favor the improving character.

Discussion

Across three studies, we investigated whether and how children consider changes in performance over time when reasoning about and evaluating others’ achievements. We examined how children make inferences about the ability (smartness) and effort of individuals depending on whether their performance changes or stays constant over time, and how they make subsequent evaluations. Our results reveal notable developmental changes between ages 4 and 10: When comparing someone whose performance stayed constant over time versus someone whose performance improved over time (with matched final performance), children across ages infer the constant character to be smarter; however, with age, they increasingly infer the improving character to be more hardworking. They also increasingly evaluate this improving character more favorably (Studies 1 and 2).

When shown two characters with matched overall performance, where the improving character outperformed the constant character at the final time point, even the youngest children in our sample (ages 4 to 6) favored the improving character (Study 3). Furthermore, we also found that children’s reasoning varies depending on the direction of change. When asked to compare someone whose performance decreased over time versus someone whose performance stayed constant over time (with matched final performance), with age, children increasingly infer the constant character to be smarter and more hardworking, as well as evaluate the constant character more favorably (Study 2). These results together reveal children’s increasingly flexible and complex understanding of changes in academic performance.

The current work adds to existing knowledge on children’s reasoning about others’ competence and achievements. Previous research has shown that children can reason about others’ performance at one time, but less is known regarding whether children can reason about performance over time. Our first question was whether children can accurately track others’ performance changes. Our findings show that children as young as 5 can correctly identify the individual whose performance has changed, and whose performance has stayed constant, and this differentiation becomes more robust with age. Together with previous work, our research shows that children are not only sensitive to changes in their own performance (e.g., ref. 28; Stipek & Hoffman24; Stipek et al. 25), but also when reasoning about others’ performance. More broadly, this work also adds to our understanding of children’s reasoning about temporal changes in general. Research suggests that, from early childhood, children possess the ability to make sense of changes in various domains, including others’ mental states and social behaviors22,46. Our findings add to this body of work by showing that children also develop their ability to reason about changes in academic performance in early childhood, and this ability further develops with age.

Our second question was whether and how children use changes in performance or consistency in performance as cues to infer ability and effort. With our experiments, we revealed developmental changes in children’s inferences of level of ability and effort based on changes in performance over time. When comparing two characters with matched final performance but different performance trajectories, both older and younger children, as well as adults, infer the constant character to be smarter than the improving character. However, when inferring the level of effort, children around ages 6–7 (but not younger) and adults infer that the improving character is more hardworking than the constant character.

The different developmental patterns in children’s inferences of ability and effort based on performance over time suggest that the recognition of improvement as an indication of effort may particularly take time and experience to develop. This developmental change is consistent with recent work on children’s first-person experiences of improvement and children’s increasing understanding of overcoming constraints10,27,28. Specifically, with age, children are increasingly likely to feel pride and a sense of progress following improvement, and they attribute greater effort to individuals who overcome internal or external constraints to achieve success. Additionally, children may also receive increasing feedback and praise that link different traits to different performance trajectories, which may further help them understand the link between improvement and effort. Future studies can further investigate how children’s first-person experiences of working hard to improve and how teachers’ (and/or other adults’) direct socialization may help children infer effort from improvement.

This developmental pattern is also consistent with broader age-related changes in cognitive capacities that support reasoning about temporal information. As children’s working memory, metacognitive skills, and theory-of-mind abilities improve across the preschool and early school years1113,17,18, they become better able to retain earlier outcomes, integrate them with new information, and interpret why others’ performance might change.

It is noteworthy that, in Study 3, we do find that, when presented with evidence that the two characters had matched overall performance and the improving character outperformed the constant character on the last exam, even the youngest children in our sample (4- to 6-year-olds) tend to infer the improving character as more hardworking and also form positive evaluations towards this character. However, these patterns did not emerge when the two characters performed equally well on the final exam. This is largely consistent with previous work showing that preschoolers may have difficulty predicting future states in the face of the current47. Perhaps in the condition where the improving character and constant character performed the same on the last exam, it is particularly challenging for young children (but not for older children) to anticipate that the improving character would surpass the constant character on the next exam. In contrast, the outperformed condition provided a clearer cue that improvement had occurred. These findings may suggest that preschoolers are not entirely insensitive to changes in performance over time; rather, they may just need some more explicit evidence to detect those changes and make inferences and evaluations.

These findings together add to previous evidence that children as young as age 5 could employ the intuitive theory of “performance = effort + ability” to reason about performance. Specifically, by around age 5, children can infer the level of ability from effort and performance9, and can infer the level of effort from ability and performance10, and these inferences become more robust with age. This research shows that children can gradually use multiple cues to infer ability and effort, including the time one uses to perform the task, the difficulties one has overcome, and changes in one’s performance over time, among other cues.

It is important to recognize that this research captures one plausible perspective on children’s reasoning. In real-life contexts, many additional factors may shape how children infer and evaluate others’ performance. For example, temporary stressors, luck, or external disruptions might influence performance independently of an individual’s ability or effort. Similarly, individuals who demonstrate consistent performance may be also exerting considerable discipline and hard work to maintain their level of achievement. These complexities suggest that children’s social inferences likely incorporate a wider range of contextual information beyond a simple ability-effort framework.

Our third question was how children evaluate individuals with different performance trajectories. Our work is the first to reveal that, when evaluating achievers with matched final outcomes, with age, children gradually favor the improving character when compared with the constant character. This aligns well with previous research suggesting that children value effort and hard work more with age. For instance, as children grow older, they increasingly value someone who achieves success through hard work over someone who achieves success through talent10,32, and allocate more resources to hard workers34. We think that children’s developing recognition of the role of hard work in making improvements, combined with their valuing of effort (compared to ability), may lead to increasingly favorable evaluation of those who improve over time.

Relevantly, it is noteworthy that the 7- to 10-year-olds in the current research valued improvement to a great extent (favored the improving character on all evaluation measures), and even more so than adults. In terms of preference and success measures, children aged 7 to 10 selected the improving character (who is perceived as more hardworking) more than adults did, which aligns with prior research suggesting that elementary school children value effort more than do adults10. This could be attributed to the emphasis on diligence and hard work in the education of children within this age group, especially in Chinese educational contexts42,48. By contrast, adults may have developed a more balanced perspective on the factors contributing to success, recognizing the multifaceted nature of achievements and the combined influence of both effort and ability on academic outcomes. If this hypothesis holds, a U-shaped developmental pattern may emerge, with children’s evaluation of the improved character peaking at some point and declining later in development. To gain a more comprehensive understanding of the developmental changes, future research should further explore the complete developmental trajectory by examining children in late childhood and adolescence.

Another novel contribution of this research is the findings from Study 2, which show that children’s increasingly sophisticated inferences and positive evaluations of those whose performance increases over time were not due to a mere preference for the presence of change or an aversion to stagnation. Instead, they exhibit sensitivity to the direction of change. By around age 7, children perceived the constant character as smarter but less hardworking compared to the improving character and evaluated the improving character more favorably. Conversely, when compared to a character whose performance decreases (i.e., the decreasing character), these older children instead perceived the constant character as both smarter and more hardworking and evaluated the constant character more favorably. This nuanced inference, which shifts based on the comparison target, demonstrates the flexibility of children’s inferential and evaluative capacities. This finding can inform educational strategies that guide children on how to interpret sustained success, given a potential bias where some people may attribute constant success solely to ability and overlook the significant efforts involved even to maintain consistency. Highlighting the comparison between constant performance and declining performance can help children recognize that even constant performance requires effort and that one may decline if they do not put in effort.

We also found developmental changes when children were asked to compare the decreasing character and the constant character (in Study 2). Younger children seemed to consider the decreasing character as smarter and more hardworking, and to favor them, while older children showed the opposite pattern. We reason that this may be because, when comparing the decreasing character and the constant character with matched final performance, younger children may attend more to the higher initial performance or higher overall performance of the decreasing character, and thus favor them on all measures. In contrast, older children might consider not only the initial or overall performance but also the stability of performance in their inferences and evaluations. This also aligns well with recent research that children aged 8 to 10 feel ashamed after recalling a time when they got worse27. Notably, adults in our sample considered the decreasing character as smarter than the constant character, while older children (8- to 9-year-olds) considered the decreasing character as less smart. We speculate that older children may be particularly averse to a decline in performance (even more so than adults) and may attribute the decreasing character’s initial success to luck, though this warrants further investigation.

It is interesting to consider which information drove 4- to 6-year-olds’ changes in evaluations across conditions. In Studies 1 and 2 and in the Matched condition in Study 3, the improving and constant characters always ended with the same final performance, while the constant character performed better both initially and overall. Under these conditions, younger children consistently favored the constant character, but the design cannot determine whether this reflects reliance on initial performance, overall performance, or both–because these cues pointed in the same direction. In the Outperformed condition of Study 3, overall performance was matched between characters, while the improving character had lower initial performance but higher final performance than the constant character. Younger children favored the improving character in this condition. We reason that they were most likely attending to the final performance to favor the improving character. However, it is also possible that the constant character’s overall performance in this condition (3rd place throughout), which was lower than that of the constant character in Studies 1 and 2 and the Matched condition of Study 3 (second place throughout), made the constant character seem weaker overall. Taken together, because the outperformed and matched conditions differed on multiple factors, including comparative final performance and overall performance, the present design does not allow us to isolate which cue caused the younger children’s shift in responses. Future work that systematically decorrelates trajectory and final performance cues will be needed to determine what younger children attend to when interpreting performance change.

Finally, it is important to consider how cultural context may shape the patterns observed in the present work. Although the current study was conducted in China, prior work suggests that some aspects of children’s reasoning about ability and effort show broadly similar developmental patterns across cultures. For instance, both United States and Chinese children exhibit a shift from favoring “naturals” to favoring “strivers” with age32, and they show parallel developmental changes in related beliefs such as free-will reasoning49. At the same time, the timing and strength of these shifts may vary across cultural contexts, particularly those that differ in the extent to which they emphasize effort, perseverance, and measurable academic progress. Research shows that East Asian educational systems tend to place stronger emphasis on diligence and self-improvement than many Western systems37, whereas US children and adults often place greater weight on innate ability and genetic explanations50,51. Such differences may influence not only the value children place on improvement but also the cues they attend to when interpreting performance trajectories. We encourage future research to examine children’s interpretations of performance change across diverse cultural contexts, which would help clarify the extent to which the developmental patterns observed here reflect universal processes or culture-specific socialization goals.

There are a few limitations of the current research that suggest avenues for future study. First, in our study design, the constant character (in Studies 2 and 3) was always in second or always in third place. This was a deliberate design choice to allow room for improvement. The results might differ if the constant character were always in first place, as there would be no room for improvement (i.e., the constant character could not have performed better). Future research can examine how perceived room for improvement may influence children’s inferences and evaluations of improving vs. constant performance. Additionally, the current investigation focuses only on one-directional changes (i.e., either improving or decreasing). In real life, there can be fluctuations in performance changes (i.e., improve then decrease, or decrease then improve). Future research could explore children’s perceptions of such more complicated fluctuations in performance.

Second, in our study, we used a forced-choice design to provide the strongest test of whether children could distinguish between two performance trajectories. This paradigm is widely used in prior work on children’s reasoning about ability and effort and offers a more sensitive way to capture differentiation than absolute ratings9,10,32. However, an important limitation is that forced-choice responses reflect relative judgments. Selecting one character as “more hardworking” or “smarter” does not necessarily mean that children viewed the other character as not hardworking or not smart. Future work could incorporate single-character or continuous-rating measures to examine whether children make similar inferences and evaluations when judgments are not constrained by direct comparison.

Third, in the current study, the two characters were presented side-by-side, and children were asked to directly compare the two characters, which may have made the relative differences especially salient. Using ranked positions (rather than absolute performance) may have further highlighted these contrasts. It would be interesting to adapt the current paradigm to examine how children evaluate an individual’s progress when no peer is present for direct contrast. Relevantly, we focused on changes in performance outcomes (i.e., exam results indicated by little flowers or rankings) because they are highly familiar, concrete, and easy for children in China to understand. However, presenting these outcomes within an exam-based scenario may have promoted a grade-oriented mindset, directing children’s attention toward scores and relative standing and potentially overshadowing other ways of evaluating growth or progress. In everyday situations, however, children may also encounter improvement in the form of gaining understanding or developing skills rather than changes in test scores or rankings. Such mastery-oriented contexts may highlight personal growth rather than relative standing and could engage different reasoning processes. Future work should examine whether similar developmental patterns emerge when improvement is framed in non-comparative or mastery-oriented ways (e.g., trajectories showing increasing learning or knowledge acquisition).

Another limitation is that children were presented with performance outcomes from several time points almost simultaneously, and the use of memory check questions may have further reduced ecological validity by prompting explicit recall rather than natural observation. In real life, however, tracking performance changes over time may be more challenging, as children typically observe others’ performance over time in a more gradual, less structured way and are unlikely to receive explicit summaries of past performance. Thus, while our findings reveal what children can do under laboratory conditions, they may not fully reflect how children evaluate performance changes in everyday life. Future research could investigate children’s reasoning about performance trajectories in more naturalistic contexts. Also, our sample was relatively homogeneous in terms of race/ethnicity and socioeconomic status, which may limit the generalizability of the findings. Future research should examine whether these patterns hold in more culturally and socioeconomically diverse populations. In addition, because our study used a cross-sectional design, the developmental differences we observed should be interpreted as age-related rather than reflecting change within individuals. Although a cross-sectional approach suited our goal of identifying age differences in this initial investigation, future longitudinal work could provide a complementary perspective on how children’s use of performance cues may evolve over time.

Taken together, our findings reveal important developmental changes in children’s reasoning and evaluations about changes in others’ performance over time. Between ages 4 and 10, children increasingly infer that improvement requires more effort than constant performance and develop positive evaluations towards those whose performance improves over time. They can also flexibly consider factors such as the direction of changes and comparative final performance. Our work contributes to the understanding of how children reason about others’ performance in a dynamic, changing context.

The developmental tendencies we observed also point to meaningful practical implications for children’s own learning and motivation. Younger children’s preference for consistently high performers suggests that children may not spontaneously view improvement itself as a sign of ability or effort. Helping children recognize that progress can be informative and valuable may therefore be especially important. Parents and teachers can support this by making learning processes and improvement more visible, emphasizing effort and growth rather than only outcomes, and creating classroom environments where trajectories are acknowledged as part of achievement. At a broader level, educational policymakers should consider designing adaptive educational evaluation systems that take a dynamic, developmental perspective into account. This approach would ensure that assessments reflect children’s true growth and achievements, ultimately supporting their long-term academic success.

Methods

Study 1

Participants

Informed by developmental studies on comparable topics10, we preregistered a sample size of 96. Data collection was stopped at the conclusion of the first testing day on which this threshold was met. However, due to ethical considerations, we still included families who signed up before we reached the target sample size in our studies. Thus, our final sample included 102 4- to 10-year-olds (4.16 – 10.86 years old, Mage = 7.35 years, SD = 1.99 years, 54 boys, 48 girls) recruited from social media (N = 74), or from local schools in urban Shanghai, China (N = 28). In addition, their parents (N = 87) also participated in the study. The demographic information was available for 90% of participants. In the subset, the participants predominantly came from middle- to high-SES families. Specifically, 95% of the fathers and 88% of the mothers held a bachelor’s degree or above. Approximately 83% of the families had an income of over 200,000 CNY per year. Approximately 61% of the participating children were only children of their families. All children spoke Mandarin Chinese as their native language. The study was approved by the University Committee on Human Research Protection at East China Normal University [title: Children’s Understanding of Choice in Complex Contexts, Protocol Number HR 748-2020]. Written informed consent was obtained from the parents of child participants and from all adult participants. Children also provided verbal assent. The sample size, procedures, and analysis plan for this study were preregistered on AsPredicted (https://aspredicted.org/68C_ZV5). A post hoc power analysis using G*Power 3.1 indicated adequate power (97%) to detect a small to medium effect size (f2 = 0.15) for a linear multiple regression with a sample size of 102 (α = 0.05, two tails, up to three predictors). Eleven additional children were tested but excluded due to parental interference (N = 7) or experimenter error (N = 4).

Materials and procedure

Twenty-eight children were tested in person, and 74 children were tested online, due to the outbreak of COVID-19. These online tests were conducted through VooV Meeting (an online conferencing software similar to Zoom) with a live experimenter. The materials used for online testing were identical to those used for the in-person testing.

Consistent with prior research that examined similar age groups using exam-like scenarios to investigate children’s social evaluations and reasoning9,10, we used a school setting involving exams to frame the story. Specifically, child participants were presented with two story characters (gender-matched with the participants, with accompanying pictures of side-by-side silhouettes). Children were told that both characters went to the same school, where they took two exams over one semester, one at the beginning of the semester and the other half a year later (see https://osf.io/sxpha/overview?view_only=e94c552c303e4f37a9a44f7e9ffb0ac6 for the script). Then we presented children with the two characters’ performance on the two exams, successively (see Fig. 4 for images). We used the number of red flower stickers to indicate exam performance, as this is very familiar to Chinese children and commonly used in Chinese classrooms to indicate achievement. Children across a wide age range, including preschoolers as young as 4 years old, are familiar with this system and understand that more flowers indicate better performance. In this study, the two characters differed on their performance trajectory: one character (i.e., the improving character) received three stickers on the first exam and four stickers on the second exam, while the other character (i.e., the constant character) received four stickers on both exams. The left-right position of the two characters was randomized across participants. After presenting the two characters’ performance on each exam, children were first asked an attention check question (e.g., “Who got a higher score?”) to ensure that they had paid attention and understood the story. If the participating child answered the attention check question incorrectly, the experimenter would repeat the exam performance of the two characters and ask this question again. All children answered correctly within two attempts. To make sure that the children remembered the characters’ exam performance, the experimenter then asked four memory check questions on each character’s performance on each exam (i.e., “How many stickers did Xiaoming/Xiaogang receive on the first/second exam?”). If the participating child answered the memory check questions incorrectly, the experimenter would retell the story and ask the questions again. All children answered the memory check questions within two chances. We then asked participants to select between the two characters in answering the following six forced-choice questions, grouped into three types: one improvement question, two inference questions, and three evaluation questions.

Fig. 4. An example of stimuli in Study 1 (male version).

Fig. 4

The improving character is shown on the left and the constant character on the right, along with their performance on two exams.

Children were first asked an improvement question (i.e., “Who improved more during the half year?”). This was to measure whether children could infer who had improved more from the performance trajectories of the two characters. They were then asked two inference questions (in a randomized order) to assess how children make inferences about the two characters in terms of level of smartness (i.e., “Who is smarter?”) and level of effort (i.e., “Who is more hardworking?”). At last, they were asked three evaluation questions (in a randomized order) to measure children’s comparative evaluations of the improving character and the constant character: a prize question (i.e., “Who would you award a prize to?”), a preference question (i.e., “Who do you like better?”), and a success question (i.e., “Who do you think will be more successful in the future?”).

Adult participants (i.e., parents) read identical scenarios on Qualtrics and answered identical questions, omitting the attention and memory check questions. Children and parents completed their parts of the study independently, ensuring that parents’ responses could not influence children’s responses.

Data were analyzed using R lmerTest and lme4 packages (Kuznetsova et al., 2017). All data and code to replicate all analyses and create all figures can be found at https://osf.io/sxpha/overview?view_only=e94c552c303e4f37a9a44f7e9ffb0ac6.

Study 2

Participants

We recruited 105 children aged 4 to 9 years (Mage = 6.72 years, SD = 1.99 years, 49 boys, 56 girls) from social media (N = 58) or from local schools in urban Chongqing, China (N = 47). The sample size, procedures, and analysis plan for this study were preregistered on AsPredicted (https://aspredicted.org/N54_XLD). Data from another 10 children were excluded due to missing video files (N = 3), failing to finish the task (N = 3), experimenter error (N = 2), or parental interference (N = 2). In addition, 106 Chinese adults (78 females) were recruited online.

Materials and procedure

The procedure was similar to Study 1 except for the following changes. First, instead of using stickers to indicate performance, we used rankings on an Honor Roll to indicate characters’ performance on the exams, which is also a commonly used and familiar indicator for academic performance in Chinese classrooms. Second, rather than showing exams at two time points, we showed three time points so that the changing trends became more salient. Third, to measure whether children can distinguish the direction of changes, in addition to asking children to compare an improving character versus a constant character as in Study 1 (i.e., the Improving condition), we also asked children to compare a decreasing character (i.e., someone whose performance got worse over time) versus a constant character (i.e., the decreasing condition). Note that the constant character in the Improving condition was consistently in the second position, while the constant character in the Decreasing condition was consistently in the fourth position (see details below).

At the beginning of the experiment, to ensure that children understood the ranking, we had a warm-up where the experimenter introduced and explained the ranking with accompanying pictures and examples (see https://osf.io/sxpha/overview?view_only=e94c552c303e4f37a9a44f7e9ffb0ac6 for the exact script). Then, they were presented with two within-subject conditions (the Improving condition, the Decreasing condition). The presenting order of the two conditions was counterbalanced across participants. Specifically, in the Improving condition, children were presented with an improving character and a constant character (gender matched with the participants). The improving character started in fourth place on the first exam, moved to third place on the second exam, and ended up in second place in the ranking for the last exam. The constant character started in second place and maintained that ranking for all three exams. In the Decreasing condition, children were introduced to a decreasing character and a constant character. The decreasing character performed symmetrically worse, in a mirrored version of the improving character’s performance: he/she started in second place on the first exam, moved down to third place on the second exam, and ultimately ended up in fourth place on the last exam. The constant character’s ranking was in fourth place for all three exams. Thus, in each condition, the endpoint performance of the two characters was matched (see Fig. 5 for images).

Fig. 5. An example of stimuli in Study 2 (female version).

Fig. 5

Figure (a) shows the Improving condition (improving character vs. constant character) and picture (b) shows the decreasing condition (decreasing character vs. constant character).

After presenting each condition, to measure whether children understood performance change over time, we asked a change question (i.e., “Whose performance do you think changed more during the semester?”). We then asked the same Inference questions (smart, hardworking) and Evaluation questions (prize, preference, success) as in Study 1. At last, we also asked children to compare the improving character with the decreasing character by asking a preference question (i.e., “Who do you like better?”). The adults received identical stimuli and questions, but read through these materials themselves on Qualtrics and did not receive attention check questions.

Study 3

Participants

We preregistered a sample size of 48. Data collection was stopped at the conclusion of the first testing day on which this threshold was met. Our final sample included 49 4- to 6-year-olds (4.05 - 6.69, Mage = 5.51, SD = 0.82, 22 boys) recruited online (N = 3) or from local preschools in urban Shanghai, China (N = 46). Three additional children were tested but excluded from the analysis due to experimenter error (N = 2) or not providing the date of birth (N = 1). The sample size, procedures, and analysis plan for this study were preregistered on AsPredicted (https://aspredicted.org/2G6_PST). A post hoc power analysis using G*Power 3.1 indicated adequate power (92%) to detect a small to medium effect size (f2 = 0.20) for a linear multiple regression with a sample size of 49 (α = 0.05, two tails, up to three predictors).

Materials and procedure

The procedure and materials were similar to the Improving condition in Study 2, except for the following changes. First, to ensure that children understood the ranking format before completing the main task, we included a brief warm-up activity. As in Study 2, children were told that in the upcoming stories, some characters would stay in the same rank across exams while others would change. Study 3 additionally provided concrete examples illustrating what “staying the same” and “changing” look like (e.g., a child who remained in fourth place each time versus a child whose rank shifted from fifth to second to fourth, see https://osf.io/sxpha/overview?view_only=e94c552c303e4f37a9a44f7e9ffb0ac6 for the exact script).

We added a within-subject condition where we manipulated whether the improving character’s performance on the last exam matched (i.e., matched condition) or outperformed the constant character (i.e., outperformed condition). Specifically, in the matched condition, same as in the Improving condition in Study 2, the improving character improved from the fourth place to third place and then to second place, while the constant character stayed in second place across all three exams. In the outperformed condition, the constant character stayed in third place across all three exams, meaning that the improving character outperformed the constant character on the last exam (see https://osf.io/sxpha/overview?view_only=e94c552c303e4f37a9a44f7e9ffb0ac6 for images). The order of presenting the two conditions and presenting the two characters was counterbalanced across participants.

After presenting each condition, instead of asking the change question (i.e., “Whose performance do you think changed more during the semester?”) as in Study 2, we adapted this question to a more straightforward way for young children: we asked two same or different questions to measure whether young children understood which character’s performance was different over time (i.e., “Whose rankings do you think were all the same over three times?” and “Whose rankings do you think were different over three times?”, order randomized). We then asked the same Inference questions (smart and hardworking) and Evaluation questions (prize, preference, and success) as in Studies 1 and 2. All the materials and procedures can be accessed at https://osf.io/sxpha/overview?view_only=e94c552c303e4f37a9a44f7e9ffb0ac6.

Acknowledgements

We would like to thank Xiaoman Yu and Sixian Li for their help with recruiting participants and data collection. We are also grateful to all children and parents that participated in this research. This work was funded by Chenguang Program of Shanghai Education Development Foundation and Shanghai Municipal Education Commission (22CGA28) awarded to X.Z.

Author contributions

Y.H. and Y.S. contributed to the work equally and should be regarded as co-first authors. All authors (Y.H., Y.S., and X.Z.) designed the study. Y.H. and Y.S. conducted data collection. Y.H. conducted data analysis and drafted the first version of the manuscript. All authors worked on the manuscript revision and approved the submitted version.

Data availability

All data have been made publicly available via the Open Science Framework and can be accessed at https://osf.io/sxpha/overview?view_only=e94c552c303e4f37a9a44f7e9ffb0ac6.

Code availability

The R code used for analysis is available through the Open Science Framework (https://osf.io/sxpha/overview?view_only=e94c552c303e4f37a9a44f7e9ffb0ac6).

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Dweck, C. S. Mindset: The New Psychology of Success (Random House, 2006).
  • 2.Weiner, B. An attributional theory of achievement motivation and emotion. Psychol. Rev.92, 548–573 (1985). [PubMed] [Google Scholar]
  • 3.Cimpian, A. in Handbook of Competence and Motivation: Theory and Application (eds Elliot, A. J., Dweck, C. S. & Yeager, D. S.) Ch. 21 (Guilford Press, 2017).
  • 4.Wigfield, A. & Eccles, J. S. Expectancy–value theory of achievement motivation. Contemp. Educ. Psychol.25, 68–81 (2000). [DOI] [PubMed] [Google Scholar]
  • 5.Bandura, A. & Wessels, S. Self-Efficacy (Cambridge Univ. Press, 1997).
  • 6.Nicholls, J. G. The development of concepts of effort and ability. Child Dev.49, 800–814 (1978). [Google Scholar]
  • 7.Heyman, G. D. & Compton, B. J. Context sensitivity in children’s reasoning about ability. Dev. Sci.9, 616–627 (2006). [DOI] [PubMed] [Google Scholar]
  • 8.Heyman, G. D., Gee, C. L. & Giles, J. W. Preschool children’s reasoning about ability. Child Dev.74, 516–534 (2003). [DOI] [PubMed] [Google Scholar]
  • 9.Muradoglu, M. & Cimpian, A. Children’s intuitive theories of academic performance. Child Dev.91, e902–e918 (2020). [DOI] [PubMed] [Google Scholar]
  • 10.Zhao, X. & Yang, X. Children’s consideration of constraint in academic achievement. Dev. Psychol.59, 594–608 (2023). [DOI] [PubMed] [Google Scholar]
  • 11.Gathercole, S. E., Pickering, S. J., Ambridge, B. & Wearing, H. The structure of working memory from 4 to 15 years of age. Dev. Psychol.40, 177–190 (2004). [DOI] [PubMed] [Google Scholar]
  • 12.Cowan, N. Working memory maturation: Can we get at the essence of cognitive growth? Perspect. Psychol. Sci.11, 239–264 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Lipowski, S. L., Merriman, W. E. & Dunlosky, J. Preschoolers can make accurate judgments of learning. Dev. Psychol.49, 1505–1516 (2013). [DOI] [PubMed] [Google Scholar]
  • 14.Coughlin, C., Hembacher, E., Lyons, K. E. & Ghetti, S. Introspection on uncertainty and judicious help-seeking during the preschool years. Dev. Sci.18, 957–971 (2015). [DOI] [PubMed] [Google Scholar]
  • 15.Lyons, K. E. & Ghetti, S. Introspection on uncertainty supports early strategic behavior. Child Dev.84, 726–736 (2013). [DOI] [PubMed] [Google Scholar]
  • 16.Flavell, J. H., Friedrichs, A. G. & Hoyt, J. D. Developmental changes in memorization processes. Cogn. Psychol.1, 324–340 (1970). [Google Scholar]
  • 17.Baer, C. & Odic, D. Certainty in numerical judgments develops independently of the approximate number system. Cogn. Dev.52, 100817 (2019). [Google Scholar]
  • 18.Wellman, H. M. Making Minds: How Theory of Mind Develops (Oxford Univ. Press, 2014).
  • 19.Lagattuta, K. H. Linking past, present, and future. Child Dev. Perspect.8, 90–95 (2014). [Google Scholar]
  • 20.Lagattuta, K. H. & Sayfan, L. Not all past events are equal. Child Dev.84, 2094–2111 (2013). [DOI] [PubMed] [Google Scholar]
  • 21.Lagattuta, K. H., Wellman, H. M. & Flavell, J. H. Preschoolers’ understanding of the link between thinking and feeling. Child Dev.68, 1081–1104 (1997). [DOI] [PubMed] [Google Scholar]
  • 22.Lagattuta, K. H. Thinking about the future because of the past. Child Dev.78, 1492–1509 (2007). [DOI] [PubMed] [Google Scholar]
  • 23.Zhang, X., Carrillo, B. A., Christakis, A. & Leonard, J. A. Children predict improvement on novel skill learning tasks. Child Dev.96, 1177–1188 (2025). [DOI] [PubMed] [Google Scholar]
  • 24.Stipek, D. J. & Hoffman, J. M. Development of children’s performance-related judgments. Child Dev.51, 912–914 (1980). [Google Scholar]
  • 25.Stipek, D. J., Roberts, T. A. & Sanborn, M. E. Preschool-age children’s performance expectations for themselves and another child as a function of the incentive value of success and the salience of past performance. Child Dev.55, 1983–1989 (1984). [Google Scholar]
  • 26.Butler, R. Age trends in the use of social and temporal comparison for self-evaluation. Child Dev.69, 1054–1073 (1998). [PubMed] [Google Scholar]
  • 27.Gürel, Ç., Brummelman, E., Sedikides, C. & Overbeek, G. Better than my past self. J. Exp. Psychol. Gen.149, 1554–1566 (2020). [DOI] [PubMed] [Google Scholar]
  • 28.Leonard, J. A., Cordrey, S. R., Liu, H. Z. & Mackey, A. P. Young children calibrate effort based on performance trajectories. Dev. Psychol.59, 609–621 (2023). [DOI] [PubMed] [Google Scholar]
  • 29.Lockhart, K. L., Keil, F. C. & Aw, J. A bias for the natural? Dev. Psychol.49, 1669–1683 (2013). [DOI] [PubMed] [Google Scholar]
  • 30.Ma, S., Tsay, C. J. & Chen, E. E. Preference for talented naturals over hard workers. Child Dev.94, 674–690 (2023). [DOI] [PubMed] [Google Scholar]
  • 31.Stipek, D. J. & Daniels, D. H. Children’s use of dispositional attributions in predicting the performance and behavior of classmates. J. Appl. Dev. Psychol.11, 13–28 (1990). [Google Scholar]
  • 32.Yang, X., Zhao, X., Dunham, Y. & Bian, L. Development of a preference for strivers over naturals. Child Dev.95, 593–608 (2024). [DOI] [PubMed] [Google Scholar]
  • 33.Baumard, N., Mascaro, O. & Chevallier, C. Preschoolers are able to take merit into account when distributing goods. Dev. Psychol.48, 492–498 (2012). [DOI] [PubMed] [Google Scholar]
  • 34.Noh, J. Y., D’Esterre, A. & Killen, M. Effort or outcome? J. Exp. Child Psychol.178, 1–14 (2019). [DOI] [PubMed] [Google Scholar]
  • 35.Rizzo, M. T. et al. Children’s recognition of fairness and others’ welfare. Dev. Psychol.52, 1307–1317 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Chen, C. & Uttal, D. H. Cultural values, parents’ beliefs, and children’s achievement in the United States and China. Hum. Dev.31, 351–358 (1988). [Google Scholar]
  • 37.Li, J. Mind or virtue. Curr. Dir. Psychol. Sci.14, 190–194 (2005). [Google Scholar]
  • 38.Ng, F. F. Y. & Wei, J. Delving into the minds of Chinese parents. Child Dev. Perspect.14, 61–67 (2020). [Google Scholar]
  • 39.Fwu, B.-J., Chen, S.-W., Wei, C.-F. & Wang, H.-H. I believe; therefore, I work harder. Think. Skills Creat.30, 19–30 (2018). [Google Scholar]
  • 40.Nian, T., Ruixiang, Z. & Xinyi, Y. An investigation into the current ideology of middle school students. Chin. Educ.17, 6–21 (1984). [Google Scholar]
  • 41.Ridley, C. P. Theories of education in the Ch’ing period. Ch’ing-shih wen-t’i3, 34–49 (1977). [Google Scholar]
  • 42.Ng, F. F. Y. & Wei, J. Delving into the minds of Chinese parents: What beliefs motivate their learning-related practices? Child Dev. Perspect.14, 61–67 (2020). [Google Scholar]
  • 43.Stevenson, H. W. et al. Contexts of achievement: a study of American, Chinese, and Japanese children. Monogr. Soc. Res. Child Dev.55, 1–119 (1990). [PubMed] [Google Scholar]
  • 44.Stevenson, H. & Stigler, J. W. Learning Gap: Why Our Schools are Failing and What We can Learn from Japanese and Chinese Education (Simon and Schuster, 1994).
  • 45.Johnson, P. O. & Neyman, J. Tests of certain linear hypotheses and their applications to some educational problems. Stat. Res. Mem.1, 57–93 (1936). [Google Scholar]
  • 46.Lagattuta, K. H., Elrod, N. M. & Kramer, H. J. How do thoughts, emotions, and decisions align? J. Exp. Child Psychol.149, 116–133 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Mahy, C. E. Young children have difficulty predicting future preferences in the presence of a conflicting physiological state. Infant Child Dev.25, 325–338 (2016). [Google Scholar]
  • 48.Hong, Y. Y. in Student Motivation: The Culture and Context of Learning (Chiu, C. -y., Salili, F. & Hong, Y. -y.) Ch. 6 (Springer, 2001).
  • 49.Zhao, X. et al. Culture moderates the relationship between self-control ability and free will beliefs in childhood. Cognition210, 104609 (2021). [DOI] [PubMed] [Google Scholar]
  • 50.Holloway, S. D. Concepts of ability and effort in Japan and the United States. Rev. Educ. Res.58, 327–345 (1988). [Google Scholar]
  • 51.Lockhart, K. L., Nakashima, N., Inagaki, K. & Keil, F. C. From ugly duckling to swan?: Japanese and American beliefs about the stability and origins of traits. Cogn. Dev.23, 155–179 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All data have been made publicly available via the Open Science Framework and can be accessed at https://osf.io/sxpha/overview?view_only=e94c552c303e4f37a9a44f7e9ffb0ac6.

The R code used for analysis is available through the Open Science Framework (https://osf.io/sxpha/overview?view_only=e94c552c303e4f37a9a44f7e9ffb0ac6).


Articles from NPJ Science of Learning are provided here courtesy of Nature Publishing Group

RESOURCES