Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2025 Dec 26;19(2):e70170. doi: 10.1002/aur.70170

Children With ASD Do Not Understand Hidden Emotions Before False Belief Attribution

Morgane Burnel 1,, Stéphanie Durrleman 2, Anne Reboul 3, Jean Pylouster 1, Monica Baciu 4, Marcela Perrone‐Bertolotti 4
PMCID: PMC12948739  PMID: 41451865

ABSTRACT

Previous studies concluded that theory of mind (ToM) development is deviant in autism spectrum disorders (ASD). Typically developing children's ability to understand that one may hide their emotion would be acquired before false belief understanding in children with ASD (e.g., Peterson and Wellman 2019), but with contradictory results (e.g., Zhang et al. 2016). In the current work, we aim to determine whether the order of acquisition of ToM‐related concepts in ASD differs, using methodological improvements compared to previous studies. Our results support the conclusion of a non‐deviant developmental trajectory for ToM in individuals with ASD, with a general ability to understand hidden emotions that is not mastered before false belief attribution.

Keywords: autism spectrum disorders, development, Guttman, Rasch, theory of mind

Lay Summary

Theory‐of‐mind, a prerequisite to social abilities, is known to be delayed in children with autism. Some studies also showed a different order for acquiring several theory‐of‐mind skills. However, these previous studies present several methodological limitations that could affect their conclusions. In the present study, using a more robust methodology, we found the same developmental course of ToM in children with autism.

1. Introduction

According to DSM‐5 (American Psychological Association 2013), autism spectrum disorder (ASD) is a neurodevelopmental syndrome among others characterized by deficits in social communication and social interaction. These difficulties have been related to theory of mind (ToM) impairment, that is, difficulties in attributing mental states to others (such as beliefs, desires, and intentions) to predict or explain their behavior (Premack and Woodruff 1978). ASD is also characterized by restricted, repetitive behavior patterns, interests, or activities. In addition, subsets of individuals with ASD can also present an intellectual disability (Postorino et al. 2016) or a language delay (Kjelgaard and Tager‐Flusberg 2001; Marrus et al. 2018).

Many studies have shown that ToM is impaired in individuals with ASD from childhood (e.g., Happé 2015; Rasga et al. 2017) and adolescence (e.g., Livingston et al. 2019; Miller 2022) to adulthood (e.g., Brewer et al. 2017; Zalla et al. 2008) independently of intellectual abilities (for a meta‐analysis, see Yirmiya et al. 1998). Nevertheless, relatively little is known regarding ToM development beyond false belief attribution in typically developing (TD) children or in children with ASD, as research has largely focused on false belief tasks to assess ToM in children (Bloom and German 2000; Burnel et al. 2018, 2020; Smith and Wu 2016; Wellman 2014). While false belief tasks provide a robust measure of ToM (Dennett 1978), other mental states (such as desires, intentions, or emotions) are also important and can be assessed using alternative approaches (Bloom and German 2000; Wellman and Liu 2004). As summarized by Smith and Wu (2016, 51) “Although typically developing 4‐year‐olds are often characterized as having a ToM, a better question to ask about ToM development is not ‘When do kids have it?’ but instead, ‘What does a child understand about mental states now, and what still eludes him or her?’”.

Wellman and Liu (2004) addressed the longstanding question of how ToM develops beyond false belief attribution: they proposed a scale that is the first tool to evaluate several stages of ToM development. In particular, the authors showed that children understand that people act according to their desires (i.e., diverse desires, DD) before they understand that people act according to their own beliefs (i.e., diverse beliefs, DB). Later in development, children can understand that someone who did not see inside a container does not know what it contains (i.e., knowledge access, KA). Next, children can attribute false beliefs to people (i.e., content false belief, CFB). Then children understand that emotions felt can differ from those shown (i.e., hidden emotion, HE). These results were further replicated in several studies on American and Australian children (O'Reilly and Peterson 2014; Peterson et al. 2005, 2012; Peterson, O'Reilly, et al. 2016; Peterson, Slaughter, et al. 2016; Peterson and Wellman 2019; Shahaeian et al. 2011, 2013; Wellman et al. 2011).

Regarding ToM development in ASD children, Peterson et al. (2005) found that they followed a different sequence of ToM development in the later stages evaluated by the scale of Wellman and Liu (2004). Whereas TD children can attribute false beliefs (i.e., CFB task) before they can understand that emotions felt can be different from emotions shown (i.e., HE task), children with ASD were found to present the reverse pattern. This result was further replicated in Peterson et al. (2012) and Peterson and Wellman (2019). The HE task depicted a boy being teased by his peers and faking happiness despite feeling sad (see Figure 1). According to Peterson et al. (2005), the early mastery of this task by ASD children could be explained by the fact that they may experience teasing more often than their TD peers. They would thus become able to mask their real emotions sooner than TD children. However, Peterson et al. (2012) in a group of children with Asperger syndrome, and Zhang et al. (2016) found the opposite pattern (see Table 1), suggesting that studies have not yet clearly determined whether children with ASD follow the same ToM developmental sequence as TD children. Differences in Zhang et al. (2016) may partly reflect cultural factors, such as variations in socialization practices, norms around emotional expression, and peer interactions (e.g., Chen and Schwartz 2012). Nevertheless, previous studies on TD children have indeed documented cultural differences at early stages of ToM development in Chinese samples, but not for HE and CFB tasks (Wellman et al. 2006, 2011; Zhang et al. 2016). Because Zhang et al. (2016) remains the only study using the ToM scale with Chinese children with ASD, the role of culture in these findings is still uncertain. More importantly for the present study, additional methodological limitations, discussed below, may also contribute to these discrepancies across studies. These considerations provide the primary rationale for our investigation and underscore the need for further research to clarify whether ToM development in children with ASD follows a deviant or non‐deviant trajectory.

FIGURE 1.

FIGURE 1

Instructions for the content false belief and hidden emotion tasks in Peterson et al. (2005) compared to Burnel et al. (2018).

TABLE 1.

Overview of task performance outcomes based on previous studies involving children with ASD.

References Sample Mean age N Task success CFB–HE Reversed model
DD DB KA CFB HE
Peterson et al. (2005) Australian with autism 9.3 36 31 31 27 17 23 6 Yes
Peterson et al. (2012) Australian with Asperger 9.5 41 40 38 36 26 11 15 No
Australian with autism 6 44 41 38 31 19 23 4 Yes
Zhang et al. (2016) Chinese with ASD 5.1 34 28 16 10 8 5 3 No
Peterson and Wellman (2019) Australian with ASD 8.1 43 41 36 27 9 11 2 Yes
9.5 43 42 41 33 17 20 3 Yes
Total 241 223 200 164 96 93

2. Methodological Limitations of Previous Studies

We will now argue that the inconsistencies in previous findings on ToM developmental trajectories in children with ASD may also arise from methodological limitations. The main goal of the current study is to assess ToM development in children with ASD while specifically addressing these limitations. The following section identifies four key methodological issues in previous research and explains how they can be addressed. First, many studies relied on Guttman analyses rather than Rasch models. Second, ToM measurement was often confounded by cognitive factors such as language, memory, or executive functions. Third, the coding schemes for control question failures may have affected the assessment of ToM abilities. Fourth, studies frequently reported only the total sample size without examining the full distribution of total scores, which can be misleading.

2.1. Guttman Versus Rasch Analyses

All results discussed are based on Guttman and/or Rasch analyses, which both order tasks from easiest to hardest to study development. In a Guttman model, the pattern is strict: participants are expected to succeed on all easier tasks before failing a harder one, and missing or unexpected answers are not allowed. In contrast, the Rasch model is probabilistic, giving a chance of success for each task based on participant ability and task difficulty, allowing exceptions and missing data. The Rasch model is often seen as the probabilistic version of Guttman's strict model. Because Rasch models allow intermediate probabilities and can handle missing responses, they can detect subtle differences in development that Guttman models may miss. Rasch models are well suited for assessing developmental progression because they treat ability as gradual and variable, unlike Guttman models, which assume a strict order. Practically, Rasch models handle missing data, provide precise measurements, and account for individual differences. This flexibility is particularly valuable for groups with unique developmental profiles, such as children with ASD, whose performance can be influenced by ToM, but also language or executive skills (Pellicano 2010). By contrast, Guttman models require complete data and provide only pass/fail outcomes, potentially overlooking subtle but important developmental differences.

However, later studies mainly used Guttman analyses and abandoned Rasch approaches, which may have missed subtle patterns and limited conclusions for children with ASD, especially when data were incomplete. Since 2007, only Burnel et al. (2018) and Zhang et al. (2016) used Rasch analyses to study ToM development, while all other studies relied only on Guttman analyses (Henning et al. 2011; Ilgaz et al. 2022; Kabha and Berger 2020; O'Reilly and Peterson 2014; Peterson et al. 2012; Peterson, O'Reilly, and Wellman 2016; Peterson, Slaughter, et al. 2016; Peterson and Wellman 2019; Shahaeian et al. 2011, 2013; Wellman 2011; Wellman et al. 2011; Yu et al. 2021). However, focusing solely on Guttman analyses and the abandonment of Rasch models to assess ToM development may have limited the conclusions of previous research, as subtle differences may have been overlooked.

In the current study, we address this limitation by applying both Guttman and Rasch analyses, placing particular emphasis on Rasch models to provide a more nuanced and reliable assessment of ToM developmental trajectories in children with ASD.

2.2. Cognitive Confounds in ToM Measurement

The ToM scale (Wellman and Liu 2004) allows the measurement of ToM beyond false belief tasks. Nevertheless, just like classical false belief tasks (Bloom and German 2000), this tool also measures abilities other than ToM, by including instructions relying on linguistic abilities that can be impaired in ASD. More specifically, instructions include using past tenses (e.g., Polly has never seen inside this drawer), which can be difficult for children with ASD (Roberts et al. 2004). Instructions in Wellman & Liu's scale also include the use of pronouns (e.g., He does not like carrots), although pronoun mastery can be challenging in ASD (Tager‐Flusberg et al. 2005). In addition, some stories included in the scale feature complex (embedded) sentences and mental state verbs (e.g., Linda thinks that her cat is in the garage), which have been reported as challenging for specific subsets of children with ASD (Durrleman et al. 2012; Tager‐Flusberg 2000; Tager‐Flusberg and Sullivan 1994). Furthermore, the length of some stories (e.g., the HE task, see Figure 1) imposes significant demands on attention and working memory. Thus, children with ASD may fail some tasks not due to ToM deficits but because of linguistic, attentional, or executive limitations. These additional demands of the original ToM scale could have prevented ToM performance despite good ToM competence. Because the linguistic and executive demands vary across ToM tasks, these confounds may have influenced previous results and potentially distorted the overall developmental sequence.

In light of these considerations, Burnel et al. (2018) proposed a low‐verbal version of Wellman and Liu's (2004) original scale, the LV‐ToM scale, adapted for children with low linguistic abilities (e.g., see Figure 1). Based on a review of the literature, they designed the instructions to use the simplest syntactic structures for each task, ensuring that children's performance would be less influenced by language demands. They specifically avoid linguistic elements such as complex questions, pronouns, clitics, past tenses, embedded sentences, or mental states lexicon that can be difficult for children with ASD (e.g., Durrleman et al. 2016, 2017; Peristeri et al. 2017; Wittke et al. 2017). Illustrations also accompanied instructions, and children were always allowed a non‐verbal answer. In addition, three verbal trainings were added to the original tasks. They were designed to ensure that children had all the linguistic prerequisites for understanding the instructions of the scale's tasks and to more transparently propose a non‐verbal way of answering. Results from Burnel et al. (2018) on 230 TD children confirmed that five of the six tasks form a coherent scale under both Guttman and Rasch models, supporting the validity of the LV‐ToM scale. This scale allows measurement of ToM development with reduced influence from language, making it potentially well‐suited for children with ASD, although this has not yet been tested. Because children with ASD often have language difficulties, they may perform inconsistently across tasks, which can distort the overall developmental sequence. Tools like the LV‐ToM scale are therefore valuable for minimizing such confounding effects and better capturing true ToM development.

Although our goal is to examine the sequence of ToM acquisition originally established by Wellman and Liu (2004), their scale may not be fully suitable for children with ASD because its linguistic and executive demands can obscure true ToM competence. We therefore used the LV‐ToM scale (Burnel et al. 2018), which preserves the original ToM sequence while reducing language complexity, simplifying instructions, and allowing non‐verbal responses. This approach allows us to assess ToM development more accurately in children with ASD and to determine whether their ToM developmental trajectory deviates from typical patterns.

2.3. Coding Scheme of Control Question Failures

An important limitation of previous studies concerns the coding scheme used to handle control question (CQ) failures. The Failure coding scheme assigns a zero to tasks when a CQ is failed, which can lead to an underestimation of children's ToM abilities (Sobel and Austerweil 2016). In contrast, the exclude coding scheme removes data from children who fail CQs, yielding purer ToM measures but reducing the sample size (Burnel et al. 2018). Because most previous studies used Guttman analyses, which cannot handle missing data, they typically relied on the failure coding scheme. Regarding the ToM scale, CQ results suggest that children with ASD often have difficulties with comprehension, attention, or memory. For instance, Peterson et al. (2005) reported CQ failure rates between 8% and 17%, while Zhang et al. (2016) found much higher rates (50%–82%). Using the Failure coding scheme in such cases can make children's ToM abilities appear lower than they truly are and may conflate ToM performance with language or attentional difficulties. This suggests that earlier studies may have mischaracterized ToM development in ASD, as CQ failures affect tasks unequally and can distort the apparent developmental sequence.

The current study addresses this issue by using the Exclude coding scheme to improve the reliability of ToM measurement. In addition, the use of Rasch analyses allows us to include children with missing data, thereby maximizing the usable sample size. Despite these advantages, previous studies have not yet applied this approach.

2.4. Sample Size and Distribution of Total Scores

While Linacre (1994) suggests that 30–50 participants are generally sufficient for stable Rasch estimates of item difficulty, depending on the abilities of the children in the sample, the number of participants who actually contribute to the overall model can be much smaller. Indeed, participants with a total score of 0 or the maximum provide no information to the analyses, so reporting only the total sample size can be misleading. For example, in Peterson et al. (2005), although the total sample included 36 children, only 23 contributed usable data for developmental modeling. Differences between adjacent tasks can depend on even fewer children. In previous ASD studies, contrasts between CFB and HE were based on an average of only 5.5 children (range: 2–15; see Table 1 and Supporting Information). This illustrates that small sample sizes for specific comparisons can reduce the reliability of conclusions about task order or developmental sequences. Interestingly, the results observed in the most reliable sample of children with ASD (Peterson et al. 2012; see Table 1) show that more children passed the CFB task than the HE task, suggesting a non‐deviant sequence of ToM development in autism. Thus, the success of HE before CFB in children with ASD may be specific to certain samples and not yet generalizable to the broader ASD population.

The current study addresses this effective sample size limitation by carefully considering the skill levels of participants when interpreting the results. First, we report the number of children who actually contribute to the developmental model (i.e., those without 0 or maximum scores). Second, for each step of the scale, we report how many children the observed difference is based on. Differences based on very few children are treated as less informative, and caution is needed before generalizing these findings to the broader population.

To summarize, previous findings on ToM developmental trajectories in children with ASD are inconsistent, which may partly reflect methodological limitations. Four key issues can be raised: (1) cognitive confounds in ToM measurement, (2) inappropriate coding schemes for CQ failures, (3) reliance on Guttman rather than Rasch analyses, and (4) over‐reliance on total sample size without examining full score distributions.

3. The Current Study

The main goal of the current study is to assess whether the ToM developmental trajectory is deviant in children with ASD, especially in CFB and HE, as has been previously reported in some studies (Peterson et al. 2005, 2012; Peterson and Wellman 2019). To achieve this, we addressed the methodological limitations of previous research to provide a clearer assessment of ToM development in children with ASD.

First, to minimize cognitive confounds such as language, memory, and executive demands, we used the LV‐ToM scale (Burnel et al. 2018), which is designed to assess ToM with minimal influence from these factors.

Second, to avoid underestimating children's ToM performance, we analyzed the data using two coding schemes: the Failure coding scheme, which maximizes sample size at the cost of measurement purity, and the Exclude coding scheme, which provides a more precise measure of ToM but introduces missing data.

Third, to prevent overlooking subtle developmental patterns, we applied Rasch analyses in addition to Guttman analyses. In particular, unlike Guttman, Rasch analyses allow participants with missing data to be included and are compatible with the Exclude coding scheme, providing a purer measure of ToM while preserving the sample size.

Fourth, to prevent misleading conclusions from reporting only total sample sizes, since some participants may contribute little or no information to specific analyses, we carefully examined total score distributions when interpreting the results.

Together, these methodological choices address the key gaps in previous studies and enable a more nuanced, reliable, and accurate assessment of ToM development in children with ASD.

4. Methods

4.1. Participants

The participants were 51 French‐speaking children with ASD (44 boys, six girls) aged between 3.7 and 13.3 years (mean = 8.7, SD = 2.3; see Figure 2). Two more children were assessed but excluded from the final sample because they did not provide answers during any task.

FIGURE 2.

FIGURE 2

Distribution of children's ages.

All participating children had a formal diagnosis of ASD according to DSM‐5 criteria, established by qualified clinical professionals. While specific diagnostic scores from ADOS‐G or ADI‐R were not available, inclusion criteria ensured that participants met standard diagnostic guidelines for ASD. This study received a favorable opinion of the local ethics committee for non‐interventional research, n°IRB00010290‐2016‐01‐05‐01. Parents of children gave their written consent for their child to participate in the study.

4.2. Tasks

The tasks proposed were exactly as in Burnel et al. (2018). They included three verbal trainings (VT) (i.e., thought bubbles, seeing, and knowing) and six ToM tasks (i.e., diverse desires, DD; diverse beliefs, DB; knowledge access, KA; explicit false belief, EFB; content false belief, CFB; hidden emotion, HE). Skills assessed by each VT and ToM task are presented briefly in Table 2; for exhaustive details of the instructions, the reader is invited to refer to Burnel et al. (2018).

TABLE 2.

Abilities assessed by verbal training (VT) and theory of mind (ToM) tasks.

Task Description
VT Thought bubbles: understand thought bubble representations.
Seeing: differentiate between “seeing” and “not seeing” and can use the corresponding pictograms.
Knowing: differentiate between “knowing” and “not knowing” and can use the corresponding pictograms.
ToM Diverse desires (DD): understand that people will act according to their desires.
Diverse belief (DB): understand that people will act according to their beliefs.
Knowledge access (KA): understand that someone who has not seen inside a box does not know what it contains.
Explicit false belief (EFB): can link someone's search behavior to their (explicit) false belief.
Content false belief (CFB): can attribute a false belief to someone.
Hidden emotion (HE): Understand that emotions felt can differ from those shown.

For each task, the instructions involved the easiest syntactic structures according to the literature (see Burnel et al. 2018). Illustrations accompanied them, and children were always allowed a non‐verbal answer. All tasks entailed a reduced memory load, using short scenarios with illustrative pictures every step of the way. These pictures remained visible during the entire task so the children could look at them if they had forgotten one element of the narration. VT allowed children to be taught how to answer non‐verbally by choosing a picture. They also ensured the children could understand the basic syntax and vocabulary used in the narrations. Each ToM task entailed a test question (TQ) and four of them (i.e., KA, EFB, CFB, HE) also entailed a CQ to identify children who were not fully alert during the instructions.

4.3. Procedure

Children were tested individually in a quiet room in their home or their psychologist's office. The total duration of the study varied from 15 to 30 min, depending on the children's speed. Children were always congratulated after they answered, whether right or wrong. The scale tasks were divided into three blocks of difficulty (i.e., Block 1: DD, DB; Block 2: KA, CFB, EFB; Block 3: HE). VT was always proposed at the same time: training for thought bubbles was done first, before Block 1; Seeing training and Knowing training were always done right before Block 2; and the emotion control (i.e., the control task for the HE task) was always done before Block 3. The three blocks were continually assessed in the same order, but the order of the tasks within each block was counterbalanced across children. The VT was repeated until children made no mistake (i.e., more than four consecutive correct answers) or until the experimenter was convinced that the child could not perform the task (i.e., no correct answer and no improvement after four trials).

4.4. Scoring and Data Analyses

In the current study, we combined two coding schemes (i.e., failure and exclude) with two analytical approaches (i.e., Guttman and Rasch). The Failure coding scheme maximizes sample size but can introduce noise because it treats all task failures the same, even when children fail for different reasons, whereas the Exclude scheme provides a purer measure of ToM but results in missing data. While missing data reduces the sample size in Guttman analyses, Rasch analyses can include participants with missing responses, preserving sample size and measuring task difficulty and children's abilities more accurately.

Three datasets were constructed for data analyses (see Table 3 for a summary). Dataset A is based on the Failure coding scheme and thus entails no missing data (N = 51). It is analyzed according to Guttman, then Rasch, for comparison purposes with previous studies on ASD children. Dataset B is based on the Exclude coding scheme, and each participant with missing data is excluded. We run both Guttman and Rasch analyses for comparison purposes with results on TD children from Burnel et al. (2018). Finally, Dataset C is also based on the Exclude coding scheme, but participants with two or fewer missing data are included. We use the Failure coding scheme and Guttman analyses to compare with previous studies. However, the most reliable results are those of Rasch analysis using the Exclude coding scheme. Indeed, this analysis guarantees a more precise assessment of children's ToM skills while maximizing the sample size.

TABLE 3.

Final score to a subtask depending on failure (i.e., 0) or success (i.e., 1) to verbal trainings, control questions, and test questions for each dataset. Dataset A was constructed using the failure coding scheme, whereas the datasets B and C were constructed using the exclude coding scheme. Participants with missing data are excluded from Dataset B but not Dataset C.

Verbal trainings
Failure Success
Coding Scheme Control question 0 0 1 1 0 0 1 1
Test question 0 1 0 1 0 1 0 1
Failure Dataset A (N = 51) 0 0 0 1 0 0 0 1
Exclude Dataset B (N = 39) Exclude data Exclude participant Exclude participant 0 1
Exclude Dataset C (N = 50) Exclude data Missing data Missing data 0 1

Data are available on OSF at the following link * temporary anonymous link for peer‐review only: https://osf.io/bg65h/ *.

5. Results

5.1. Success and Failure on Verbal Training and Control Questions

The initial sample included 53 children, but two failed the verbal training (VT) and were excluded, leaving 51 participants for Dataset A. Twelve participants failed at least one control question (CQ) and were excluded from Dataset B, leaving 39 participants. Dataset C initially had 19 missing values out of 306 possible (6%) due to CQ failures. One child with missing data for more than two tasks was excluded, leaving 50 participants for Dataset C: eight with one missing value and three with two, totaling 14 missing values out of 300 (4.7%). Missing data were distributed across the KA, EFB, and HE tasks (two points each) and the CFB task (eight points).

5.2. Total Scores Distribution

Participants with total scores of 0 or 6 do not contribute to Guttman or Rasch models, as only one response pattern is possible for each. Depending on the dataset, 0–2 participants scored 0, and 12 scored the maximum. This resulted in 37 participants for Dataset A, 27 for Dataset B, and 37 for Dataset C (see Table 4).

TABLE 4.

Score distribution in Dataset A (failure coding scheme), Dataset B (exclude coding scheme without missing data), and Dataset C (exclude coding scheme with missing data).

Coding scheme Missing data Total score
0 1 2 3 4 5 6
Dataset A (N = 51) Failure No missing data 2 4 9 8 7 9 12
Dataset B (N = 39) Exclude Exclude participant 0 1 6 5 5 10 12
Dataset C (N = 50) Exclude Keep participant 2 3 9 9 6 10 12

5.3. Orders of Task Difficulty

Across the three datasets, DD, DB, and KA were generally the easiest tasks, while EFB, CFB, and HE were consistently the most difficult. However, the exact order of difficulty, based on percentage of success, varied slightly depending on the coding scheme and dataset (see Table 5 for details). For the easier tasks, the order in Dataset A was DD < DB < KA, with differences observed in two to four participants. In Dataset B, KA < DD = DB, with differences based on six participants. In Dataset C, KA < DB < DD, with differences based on two to four participants. In contrast, for the more difficult tasks, the order EFB < CFB < HE was consistent across datasets, with differences observed in eight to nine children, depending on the dataset.

TABLE 5.

Success in tasks depending on the coding scheme and treatment of missing data.

Coding scheme Missing data DD DB KA EFB CFB HE
Dataset A (N = 51) Failure No missing data

36/51

71%

40/51

78%

38/51

75%

30/51

59%

28/51

55%

19/51

37%

Dataset B (N = 39) Exclude Exclude participant

31/39

79%

31/39

79%

37/39

95%

27/39

69%

26/39

67%

18/39

46%

Dataset C (N = 50) Exclude Keep participant

35/50

70%

39/50

78%

41/48

85%

30/48

67%

28/42

63%

19/48

40%

5.4. Results of Guttman Analyses

Guttman analyses assess the reproducibility of patterns of answers given total scores. For example, on Dataset A, a child with a score of 1 should always succeed in the easiest task (DB). A child with a score of 2 should always succeed in the two most manageable tasks (i.e., DB + KA), a score of 3 should always correspond to success in the three most manageable tasks (i.e., DB + KA + DD), etc. Guttman analyses were computed using software specially designed by the fourth author. In Dataset A, the DB < KA < DD < EFB < CFB < HE pattern of answers was exactly followed by 63% of children (cf. Table 6). However, the Coefficient of Reproducibility (CR), using Green's method of estimation (Green 1956), was 0.83 (values equal to or greater than 0.90 indicate scalable items), indicating insufficiently scalable items. In addition, Green's Index of Consistency (IC), which tests whether the observed reproducibility coefficient is above the chance level, was 0.55 (values greater than 0.50 are significant). More precisely, in Dataset A, among the 16 participants with a total score of 4 or 5, 10 followed exactly the DB < KA < DD < EFB < CFB < HE pattern, whereas only 3 followed the reversed pattern DB < KA < DD < EFB < HE < CFB.

TABLE 6.

Six‐item Guttman scale for Dataset A and Dataset B.

Dataset A (N = 51)–failure coding scheme
Task/pattern 1 2 3 4 5 6 7 Other N
Diverse beliefs (DB) + + + + + +
Knowledge access (KA) + + + + +
Diverse desires (DD) + + + +
Explicit false belief (EFB) + + +
False belief (CFB) + +
Hidden emotion (HE) +
Total 2 2 2 4 2 8 12 19 51
CR = 0.83 IC = 0.55 Mean age 8.7 7.9 9.1 7.5 10.5 7.7 10.4
Dataset B (N = 39)–exclude coding scheme
Task/pattern 1 2 3 4 5 6 7 Other N
Knowledge access (KA) + + + + + +
Diverse desires (DD) + + + + +
Diverse beliefs (DB) + + + +
Explicit false belief (EFB) + + +
Contents false belief (CFB) + +
Hidden emotion (HE) +
Total 0 0 1 2 2 9 12 13 39
CR = 0.87 IC = 0.49 Mean age 7.9 8.4 10.5 7.3 10.4

In addition, we computed and assessed every possible Guttman model (see Supporting Information). The significant Guttman models were all 3‐item scales, including KA < CFB < HE (CR = 0.90, IC = 0.81) and DD < CFB < HE (CR = 0.91, IC = 0.79), all arguing for the acquisition of CFB before HE. More precisely, in Dataset A, which used the failure scoring method as in previous studies, on the 21 children who passed either the CFB or the HE task, without taking into account success to other tasks, 16 of 21 (76%) followed the CFB < HE classical pattern.

In Dataset B, the KA < DD < DB < EFB < CFB < HE pattern was followed by 67% of children, but with CR = 0.87 and IC = 0.49, indicating non‐scalable items. More precisely, in Dataset B, among the 15 participants with a total score of 4 or 5, 11 followed exactly the KA < DD < DB < EFB < CFB < HE pattern, whereas only 4 followed the alternative pattern KA < DD < DB < EFB < HE < CFB. In Dataset B, which used the exclude scoring method as in Burnel et al. (2018), on the 18 children who passed either the CFB or the HE task (without taking into account success to other tasks), 13 of 18 (72%) followed the CFB < HE classical pattern.

The only significant 5‐item Guttman model was KA < DD < DB < EFB < HE with CR = 0.90 and IC = 0.60. Note, for comparison purposes, that results with CFB instead of EFB were non‐significant (CR = 0.90 and IC = 0.46). As for Dataset A, all possible Guttman's models were also tested (and are provided in the Supporting Information).

5.5. Results of Rasch Analyses

Data were analyzed with a Rasch model using the WINSTEPS/BIGSTEPS computer program (Linacre 2023; Linacre and Wright 1993). We are first interested in MNSQ values. A fit inferior to 0.5 indicates a Guttman scale, between 0.5 and 1.5 indicates a task productive for Rasch measurements, and superior to 2 indicates a task that creates distortion or degradation of the scale. When a fit value is superior to 1.5, looking at the corresponding Z‐STD is necessary. Positive Z‐STD values greater than 2.0 indicate greater unpredictable variation than expected. Negative values suggest the scale is more deterministic than expected. Both Outfit and Infit are estimated for each task. The outfit is sensitive to unexpected responses that are far from the task at hand. On the contrary, Infit is more sensitive to unexpected answers near the task.

Before conducting Rasch analyses, we replaced outlier answers with missing data. Outlier data correspond to the most unexpected responses, with a misfit superior to 1.5 and a corresponding Z‐STD superior to 2.0. There were six outliers in Dataset A, 8 in Dataset B, and 9 in Dataset C. The replacement of outlier data with missing data was conducted to the exclusion of one participant in Dataset C because they had more than two missing values. See Supporting Information for details.

As shown in Table 7, in the three datasets, MNSQ for outfit is below 1.5 for all except two tasks. Indeed, in Dataset B, the outfit for CFB is 1.57, but with a corresponding Z‐STD below 2.0 (Z‐STD = 0.99), thus not statistically significant. Regarding Dataset C, the outfit of 4.6 for HE indicates a tendency for this task to add greater variation than expected, with unexpected responses on the scale items far from HE (i.e., DD, DB, KA). However, the corresponding Z‐STD is below 2 (Z‐STD = 1.88), thus not statistically significant. In addition, all Infits are below 1.5, with corresponding Z‐STDs below 2.0, as expected. Thus, in the three datasets, the results of Rasch analyses show scalable items, but with varying orders of tasks. Whereas the order was DB < KA < DD < EFB < CFB < HE for Dataset A, it was KA = DB < DD < EFB < CFB < HE for Dataset B, and KA < DB < DD < EFB < CFB < HE for Dataset C.

TABLE 7.

Summary of fit statistics of 6‐item Rasch analyses depending on datasets.

Dataset A (N = 51) Dataset B (N = 39) Dataset C (N = 49)
Tasks Outfit Infit Tasks Outfit Infit Tasks Outfit Infit
MNSQ Z‐STD MNSQ Z‐STD MNSQ Z‐STD MNSQ Z‐STD MNSQ Z‐STD MNSQ Z‐STD
HE 1.17 0.58 1.10 0.37 HE 0.47 −0.01 0.93 0.19 HE 4.60 1.88 1.11 0.40
CFB 0.98 0.09 1.03 0.20 CFB 1.57 0.99 1.36 1.00 CFB 0.75 −0.26 1.09 0.40
EFB 0.91 −0.12 1.15 0.71 EFB 0.92 0.13 1.17 0.58 EFB 0.76 −0.30 1.03 0.22
DD 0.48 −0.88 0.68 −1.81 DD 0.29 −0.25 0.58 −1.72 DD 0.42 −0.69 0.66 −1.75
KA 1.07 0.34 1.13 0.71 DB 0.41 −0.07 0.82 −0.59 DB 0.55 −0.11 0.93 −0.24
DB 0.61 −0.20 0.96 −0.12 KA 0 0 0 0 KA 1.29 0.66 1.20 0.66

6. Discussion

Numerous studies showed that TD children acquire ToM skills in a specific order (Burnel et al. 2018; Henning et al. 2011; Ilgaz et al. 2022; O'Reilly and Peterson 2014; Peterson et al. 2005, 2012; Peterson, O'Reilly, and Wellman 2016; Peterson, Slaughter, et al. 2016; Peterson and Wellman 2019; Shahaeian et al. 2011, 2013; Wellman 2011; Wellman et al. 2006, 2011; Wellman and Liu 2004; Zhang et al. 2016). Overall, the results bring out two blocks of tasks. On one hand, the block of the more manageable tasks includes the understanding that people will act according to their desires (i.e., diverse desires, DD), beliefs (i.e., diverse beliefs, DB), or knowledge (i.e., knowledge access, KA). On the other hand, the block of the more challenging tasks includes the understanding of false beliefs (i.e., explicit false belief, EFB; content false belief, CFB) and that an emotion felt can be different from an emotion shown (i.e., hidden emotion, HE). Inside the easiest block, the classical developmental order found in previous studies is DD < DB < KA in most studies (O'Reilly and Peterson 2014; Peterson et al. 2005, 2012; Peterson and Wellman 2019; Shahaeian et al. 2013; Wellman et al. 2011; Wellman and Liu 2004), but with contradictory results (Burnel et al. 2018; Peterson, O'Reilly, and Wellman 2016; Shahaeian et al. 2011; Wellman et al. 2011). Moreover, results of previous studies showed that inside the block of the more challenging tasks, TD children understand CFB before HE (Burnel et al. 2018; Ilgaz et al. 2022; O'Reilly and Peterson 2014; Peterson et al. 2005, 2012; Peterson, Slaughter, et al. 2016; Peterson and Wellman 2019; Shahaeian et al. 2011, 2013; Wellman et al. 2006, 2011; Wellman and Liu 2004; Zhang et al. 2016). However, studies conducted on ASD children have yielded different results, with success at the HE task appearing before false belief attribution (Peterson et al. 2005, 2012; Peterson and Wellman 2019). Still, these previous studies entailed several methodological limitations: (1) cognitive confounds in ToM measurement, (2) inappropriate coding schemes for CQ failures, (3) reliance on Guttman rather than Rasch analyses, (4) over‐reliance on total sample size without examining full score distributions. These limitations may have led to an underestimation of ToM competencies, a distortion of the observed developmental trajectory, and a premature generalization of the observed results. The present study therefore aimed to examine ToM development in ASD while addressing these methodological limitations, in order to determine whether a reversed pattern for CFB and HE tasks would still appear in a sample of French children with ASD. Specifically, we (1) used the LV‐ToM scale (Burnel et al. 2018) to minimize cognitive confounds; (2) applied the Exclude coding scheme for CQ (Sobel and Austerweil 2016) to reduce measurement noise; (3) combined these with Rasch analyses, which accommodate missing data, to maximize sample size; and (4) interpreted the findings with caution, taking into account the total score distribution to prevent overgeneralization.

6.1. Statistical Power Achieved

The total scores distribution shows that among our initial sample of 51 participants, the model was assessed based on 27 participants to 37 participants depending on the scoring and type of model tested. The largest useful sample size was achieved using the Exclude scoring method and Rasch's analysis (i.e., Dataset C), illustrating the interest in that combination of methods. This also emphasizes how crucial it is to report total scores distribution, given that the effective sample size can vary considerably. For a 6‐item Rasch model, a sample size between 30 and 50 participants is usually enough to get stable item estimates (Linacre 1994). This requirement is met in our study. There are, however, differences in power depending on which adjacent tasks are compared. Our sample is well powered for the comparison between the CFB and HE tasks (i.e., difference based on 16 children), which was the main goal of the study. In contrast, comparisons involving DD, DB, and KA are based on smaller groups (i.e., 4–6 children) and should be interpreted with caution.

6.2. Tasks' Order of Difficulty

In our sample, the order of task difficulty was generally consistent, with DD, DB, and KA as the more manageable tasks and EFB, CFB, and HE as the hardest. We first discuss the comparison between the more difficult tasks, which is well powered and stable across analyses. Second, we address the comparison between the most manageable tasks, which is not well powered and varies depending on the analyses.

Regarding the more difficult tasks we found that EFB was easier than CFB, which was easier than HE (i.e., EFB < CFB < HE). This is the classical CFB < HE developmental pattern found in previous studies in French TD children (Burnel et al. 2018), Australian TD children (Peterson et al. 2005, 2012; Peterson and Wellman 2019), and Chinese TD children (Zhang et al. 2016). Note that EFB was usually not proposed to children because Wellman and Liu (2004) found it too similar to CFB to add helpful information on the developmental stages of ToM. However, Burnel et al. (2018) argued that even if developmentally close, EFB and CFB assess two different abilities and can be particularly informative in populations where ToM is impaired. Indeed, in EFB, the (false) belief of the character is explicitly given to the child. We then assessed the child's ability to predict the character's behavior according to that mental state (i.e., a false belief). In the CFB task, the child must still predict someone's behavior based on a false belief but must also attribute this mental state to the character. Thus, children find it easier to predict behavior according to a mental state than to attribute a mental state and predict behavior.

Results relative to the tasks' order of difficulty are less obvious regarding the more manageable tasks (i.e., DD, DB, KA). Indeed, the order varied depending on the coding scheme used or the handling of missing data. Results showed DB < KA < DD (Dataset A), KA < DD = DB (Dataset B), or KA < DB < DD (Dataset C). The inconsistent results on the more manageable tasks should be related to the fact that only a few children in our sample had low total scores on the scale. This again illustrates the need to report such information. The instability of results depending on datasets could also highlight the role of the coding scheme and the method for handling missing data.

6.3. Developmental Models

Guttman analyses showed non‐significant 6‐item models in both Dataset A and B. Again, it is crucial to consider the interpretation of these results given the small number of children with low total scores. We can hypothesize that an insufficient number of children with low total scores in our sample of participants is the main reason why results are inconsistent across datasets and differ from the literature. This interpretation is reinforced when all possible Guttman's models are tested. Indeed, the significant models in Dataset A were only those that excluded two of the three more manageable tasks (i.e., DD, DB, or KA), indicating that the three more manageable tasks together add too much noise to a possible scale. In Dataset B, there was also a significant Guttman model which was KA < DD < DB < EFB < HE, with KA misplaced compared to the classical DD < DB < KA < CFB < HE pattern, and with CFB replaced by EFB (e.g., see Wellman and Liu 2004). Thus, a consistent result between Datasets A and B is that EFB and CFB were easier than HE. In addition, for Dataset A, significant models excluded EFB or CFB (such as in Wellman and Liu 2004). In contrast, in Dataset B, the pattern EFB < CFB < HE was followed by most children, such as in Burnel et al. (2018). Guttman analyses thus reveal inconsistent results for the more manageable tasks, likely due to the small number of children succeeding in only a few tasks on the scale. However, Guttman analyses results indicate that children tend to acquire false belief understanding (i.e., EFB and CFB) before understanding that an emotion can be hidden (i.e., HE). This consistent result across Datasets A and B pleads in favor of a classical developmental pattern for ToM in ASD.

As argued before, we consider Rasch model results from Dataset C to be the most reliable analysis given that this combination of scoring method (i.e., Exclude coding scheme) and handling of missing data both minimizes measurement error and maximizes the sample size. Contrary to Guttman's analyses, Rasch's analyses showed significant scalable items for all six tasks in each Dataset. The more manageable tasks were always DD, DB, and KA, whereas EFC, CFB, and HE were always the hardest. As for Guttman analyses, results were consistent across Datasets for the more complex tasks (i.e., EFB < HE) but inconsistent for the more manageable tasks (i.e., DD, DB, KA). Regarding the more complex tasks, results again plead in favor of a classical developmental pattern of ToM in ASD children. Regarding the more manageable tasks, the order of difficulty was DB < KA < DD in Dataset A, but KA < DD in Dataset B and Dataset C. In light of the small number of children succeeding in only a few tasks, again, this information should always be reported before concluding on the generalizability of a developmental model. In addition, it is interesting to note that Rasch models are more likely to be significant even based on a small number of children. Indeed, its probabilistic nature makes it more tolerant to noise in the measure, and thus more likely to propose a model based on minimal data. Thus, even though combining the Exclude scoring methods and Rasch analyses minimizes measurement error while maximizing the sample size, results should always be discussed regarding the total scores distribution.

6.4. Hidden Emotions Versus False Beliefs in ASD

A systematic result in our sample is that, contrary to previous studies, we cannot conclude that hidden emotion is mastered before false belief attribution (i.e., reversed pattern). Instead, our data support the pattern usually found for typically developing children. As can be seen in Table 1, up to now four of six samples studied showed a reversed pattern (i.e., HE < CFB) in Australian children with ASD (Peterson et al. 2005, 2012; Peterson and Wellman 2019), while two of six studies showed a classical pattern (i.e., CFB < HE) in Chinese (Zhang et al. 2016) or Australian (Peterson et al. 2012) children with ASD. Nevertheless, as highlighted in the Introduction section, the most reliable study could be the one from Peterson et al. (2012), where the ASD sample includes 26 children with Asperger Syndrome succeeding at the CFB task against 11 succeeding at the HE task. From all six previous studies, this is the most significant sample, based on 15 participants, to study the difference between CFB and HE (cf. Table 1). Taken together, it could be that the HE<CFB inversion found in children with ASD is related to measurement error in previous studies, due to confounding variables in the former version of the scale, small sample size, the use of the Failure scoring method, and/or relying solely on Guttman analyses. Thus, we can conclude that the development of ToM does not follow a deviant trajectory in ASD. A further argument strengthening this interpretation is that a new study (Zhou et al. 2026), conducted subsequent to our own, applied the LV‐ToM Scale With ExtEnded Trials (LV‐ToM SWEET) to 46 Chinese children with ASD. The LV‐ToM SWEET is a computerized Chinese version of the LV‐ToM (Burnel et al. 2018), including three trials for each task instead of one. Zhou et al. (2026) obtained the same results as the present study, that is, no reversal between CFB and HE, with a sample difference of 14 participants. However, as both the present study and that of Zhou et al. (2026) are among the first to apply the LV‐ToM or LV‐ToM SWEET we cannot rule out that the observed differences from Peterson et al. (2005, 2012), Peterson and Wellman (2019) reflect slight variations in the ToM abilities assessed by the scale.

Firstly, previous studies showed that difficulties persist in complex syntax in people with ASD compared to TD (e.g., Durrleman et al. 2015). However, as can be seen in Figure 1, the grammatical complexity of the instructions in the original ToM scale (Wellman and Liu 2004), varies between CFB and HE task, with the instructions in HE (e.g., He did not think it was funny. But the boy did not want the others to see how he felt. […] How did the boy try to look on his face when everyone laughed at him and teased him?) being more complex than in CFB (e.g., What do you think is in it? […] What does the boy think is in it?). Differently, the LV‐ToM scale (Burnel et al. 2018) relies on the easiest syntactic structures for both the CFB task (i.e., In your opinion, it is what in the box?) and HE task (i.e., For real, Julien feels how? (…) Julien looks how in his face?). This particular point might be enough to explain the HE < CFB order found for children with ASD using the original ToM scale, but not the LV‐ToM scale. However, this explanation deserves further investigation.

Secondly, it is essential to highlight that the HE task used in the current study, from Burnel et al. (2018), is different from the HE task used in previous studies, and has not previously been used in children with ASD, except very recently in Zhou et al. (2026), who obtained results similar to ours in a sample of Chinese children. The prior application was in French TD children (Burnel et al. 2018). The original HE task from Wellman and Liu (2004) told the story of a boy teased by some friends, who pretended to laugh when he felt sad. However, given the difficulty in reducing the verbal component of this task, Burnel et al. (2018) chose to use a slightly different task, where a boy pretends to be happy when he does not get the gift he wants (cf. Figure 1). A plausible explanation of our results might be that the two tasks are different and do not target the exact same skill. Autistic children could first understand hidden emotion in a harassment situation (i.e., original HE task from the ToM scale; Wellman and Liu 2004), second understand false belief attribution (i.e., EFB and CFB tasks), and third understand hidden emotion when receiving a disappointing gift (i.e., HE task from the LV‐ToM scale; Burnel et al. 2018). The understanding that one may display an emotion different from what is felt would be context‐dependent. Thus, we can at least conclude that a general ability to understand hidden emotions is not mastered before false belief attribution in children with ASD.

7. Conclusion

In conclusion, our study supports the hypothesis of a classic ToM developmental pattern in children with ASD. However, given the limitations of previous research, further studies are needed. Future research should (1) use tasks that minimize confounding variables, such as the LV‐ToM scale (Burnel et al. 2018) or LV‐ToM SWEET (Zhou et al. 2026), (2) provide detailed descriptions of score distributions to clarify the scope of findings, (3) apply Rasch analyses together with the Exclude coding scheme to improve measurement reliability and handle missing or unexpected data, and (4) include multiple trials per task to increase precision, particularly in clinical populations, as with the LV‐ToM SWEET developed by Zhou et al. (2026).

Funding

The authors have nothing to report.

Ethics Statement

Ethics approval was obtained from the local ethics committee of Grenoble‐Alpes University (n°IRB00010290‐2016‐01‐05‐01).

Consent

Patients were always asked for consent.

Conflicts of Interest

The authors declare no conflicts of interest.

Supporting information

Data S1: Supporting Information.

AUR-19-0-s001.docx (280.9KB, docx)

Acknowledgments

We sincerely thank Raphaëlle Bertolini, Coralie Fiquet, Pascale Daniel, Enola Pichaud, and Rose Spillman, as well as the schools, teachers, families, and children who participated in this research.

Data Availability Statement

The data that support the findings of this study are openly available in Children with ASD do not understand hiding emotions before false belief attribution, at https://osf.io/bg65h/.

References

  1. American Psychiatric Association (APA) . 2013. Diagnostic and Statistical Manual of Mental Disorders (DSM‐V). 5th ed. American Psychiatric Publishing. 10.1176/appi.books.9780890425596. [DOI] [Google Scholar]
  2. Bloom, P. , and German T. P.. 2000. “Two Reasons to Abandon the False Belief Task as a Test of Theory of Mind.” Cognition 77, no. 1: B25–B31‐B25‐B31. 10.1016/s0010-0277(00)00096-2. [DOI] [PubMed] [Google Scholar]
  3. Brewer, N. , Young R. L., and Barnett E.. 2017. “Measuring Theory of Mind in Adults With Autism Spectrum Disorder.” Journal of Autism and Developmental Disorders 0, no. 0: 1–15. 10.1007/s10803-017-3080-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Burnel, M. , Durrleman S., Reboul A., Arnaud C., Baciu M., and Perrone‐Bertolotti M.. 2020. “Theory‐Of‐Mind During Childhood: Investigating Syntactic and Executive Contributions.” Social Development 00: 1–22. 10.1111/sode.12471. [DOI] [Google Scholar]
  5. Burnel, M. , Perrone‐Bertolotti M., Reboul A., and Baciu M.. 2018. “Reducing the Language Content in ToM Tests: A Developmental Scale.” Developmental Psychology 54: 293–307. 10.1037/dev0000429. [DOI] [PubMed] [Google Scholar]
  6. Chen, P. Y. , and Schwartz I. S.. 2012. “Bullying and Victimization Experiences of Students With Autism Spectrum Disorders in Elementary Schools.” Focus on Autism and Other Developmental Disabilities 27, no. 4: 200–212. [Google Scholar]
  7. Dennett, D. C. 1978. “Beliefs About Beliefs.” Behavioral and Brain Sciences 1, no. 4: 568–570. 10.1017/S0140525X00076664. [DOI] [Google Scholar]
  8. Durrleman, S. , Delage H., Prévost P., and Tuller L.. 2017. “The Comprehension of Passives in Autism Spectrum Disorder.” Glossa: A Journal of General Linguistics 2, no. 1: 1–30. 10.5334/gjgl.205. [DOI] [Google Scholar]
  9. Durrleman, S. , Hadjikhani N., Hippolyte L., Zufferey S., and Iglesias K.. 2012. “Investigating Syntax in Autism Spectrum Disorders: A Study With Relative Clauses.” In Innovative Research in Autism. [DOI] [PubMed] [Google Scholar]
  10. Durrleman, S. , Hippolyte L., Zufferey S., Iglesias K., and Hadjikhani N.. 2015. “Complex Syntax in Autism Spectrum Disorders: A Study of Relative Clauses.” International Journal of Language & Communication Disorders 50, no. 2: 260–267. [DOI] [PubMed] [Google Scholar]
  11. Durrleman, S. , Marinis T., and Franck J.. 2016. “Syntactic Complexity in the Comprehension of Wh‐Questions and Relative Clauses in Typical Language Development and Autism.” Applied PsychoLinguistics 37, no. 6: 1501–1527. 10.1017/S0142716416000059. [DOI] [Google Scholar]
  12. Green, B. F. 1956. “A Method of Scalogram Analysis Using Summary Statistics.” Psychometrika 21, no. 1: 79–88. [Google Scholar]
  13. Happé, F. 2015. “Autism as a Neurodevelopmental Disorder of Mind‐Reading.” Journal of the British Academy 3: 197–209. 10.5871/jba/003.197. [DOI] [Google Scholar]
  14. Henning, A. , Spinath F. M., and Aschersleben G.. 2011. “The Link Between Preschoolers' Executive Function and Theory of Mind and the Role of Epistemic States.” Journal of Experimental Child Psychology 108, no. 3: 513–531. 10.1016/j.jecp.2010.10.006. [DOI] [PubMed] [Google Scholar]
  15. Ilgaz, H. , Allen J. W. P., and Haskaraca F. N.. 2022. “Is Cultural Variation the Norm ? A Closer Look at Sequencing of the Theory of Mind Scale.” Cognitive Development 63: 101216. 10.1016/j.cogdev.2022.101216. [DOI] [Google Scholar]
  16. Kabha, L. , and Berger A.. 2020. “The Sequence of Acquisition for Theory of Mind Concepts: The Combined Effect of Both Cultural and Environmental Factors.” Cognitive Development 54: 100852. 10.1016/j.cogdev.2020.100852. [DOI] [Google Scholar]
  17. Kjelgaard, M. M. , and Tager‐Flusberg H.. 2001. “An Investigation of Language Impairment in Autism: Implications for Genetic Subgroups.” Language & Cognitive Processes 16, no. 2–3: 287–308. 10.1080/01690960042000058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Linacre, J. M. 1994. “Sample Size and Item Calibration Stability.” Rasch Measurement Transactions 7, no. 4: 328. [Google Scholar]
  19. Linacre, J. M. 2023. Winsteps® Rasch Measurement Computer Program User's Guide. Version 5.6.0, 1–340. Winsteps.com. [Google Scholar]
  20. Linacre, J. M. , and Wright B. D.. 1993. A User's Guide to BIGSTEPS: Rasch‐Model Computer Program. Mesa Press. [Google Scholar]
  21. Livingston, L. A. , Carr B., and Shah P.. 2019. “Recent Advances and New Directions in Measuring Theory of Mind in Autistic Adults.” Journal of Autism and Developmental Disorders 49, no. 4: 1738–1744. 10.1007/s10803-018-3823-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Marrus, N. , Hall L. P., Paterson S. J., et al. 2018. “Language Delay Aggregates in Toddler Siblings of Children With Autism Spectrum Disorder.” Journal of Neurodevelopmental Disorders 10, no. 1: 29. 10.1186/s11689-018-9247-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Miller, S. A. 2022. Advanced Theory of Mind. Oxford University Press. [Google Scholar]
  24. O'Reilly, J. , and Peterson C.. 2014. “Scaling Theory of Mind Development in Indigenous‐ and Anglo‐Australian Toddlers and Older Children.” Journal of Cross‐Cultural Psychology 45, no. 9: 1489–1501. 10.1177/0022022114542285. [DOI] [Google Scholar]
  25. Pellicano, E. 2010. “The Development of Core Cognitive Skills in Autism: A 3‐Year Prospective Study: Developmental Changes in Cognition in Autism.” Child Development 81, no. 5: 1400–1416. 10.1111/j.1467-8624.2010.01481.x. [DOI] [PubMed] [Google Scholar]
  26. Peristeri, E. , Andreou M., and Tsimpli I. M.. 2017. “Syntactic and Story Structure Complexity in the Narratives of High‐ and Low‐Language Ability Children With Autism Spectrum Disorder.” Frontiers in Psychology 8: 2027. 10.3389/fpsyg.2017.02027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Peterson, C. , O'Reilly K., and Wellman H. M.. 2016. “Deaf and Hearing Children's Development of Theory of Mind, Peer Popularity, and Leadership During Middle Childhood.” Journal of Experimental Child Psychology 149: 146–158. 10.1016/j.jecp.2015.11.008. [DOI] [PubMed] [Google Scholar]
  28. Peterson, C. , Slaughter V., Moore C., and Wellman H. M.. 2016. “Peer Social Skills and Theory of Mind in Children With Autism, Deafness, or Typical Development.” Developmental Psychology 52, no. 1: 46–57. 10.1037/a0039833. [DOI] [PubMed] [Google Scholar]
  29. Peterson, C. , and Wellman H. M.. 2019. “Longitudinal Theory of Mind (ToM) Development From Preschool to Adolescence With and Without ToM Delay.” Child Development 90, no. 6: 1917–1934. 10.1111/cdev.13064. [DOI] [PubMed] [Google Scholar]
  30. Peterson, C. , Wellman H. M., and Liu D.. 2005. “Steps in Theory of Mind Development for Children With Deafness or Autism.” Child Development 76, no. 2: 502–517. 10.1111/j.1467-8624.2005.00859.x. [DOI] [PubMed] [Google Scholar]
  31. Peterson, C. , Wellman H. M., and Slaughter V.. 2012. “The Mind Behind the Message: Advancing Theory‐Of‐Mind Scales for Typically Developing Children, and Those With Deafness, Autism, or Asperger Syndrome: The Mind Behind the Message.” Child Development 83, no. 2: 469–485. 10.1111/j.1467-8624.2011.01728.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Postorino, V. , Fatta L. M., Sanges V., et al. 2016. “Intellectual Disability in Autism Spectrum Disorder: Investigation of Prevalence in an Italian Sample of Children and Adolescents.” Research in Developmental Disabilities 48: 193–201. 10.1016/j.ridd.2015.10.020. [DOI] [PubMed] [Google Scholar]
  33. Premack, D. , and Woodruff G.. 1978. “Does the Chimpanzee Have a Theory of Mind?” Behavioral and Brain Sciences 1, no. 4: 515–526. 10.1017/S0140525X00076512. [DOI] [Google Scholar]
  34. Rasga, C. , Quelhas A. C., and Byrne R. M. J.. 2017. “How Children With Autism Reason About Other???S Intentions: False‐Belief and Counterfactual Inferences.” Journal of Autism and Developmental Disorders 0, no. 0: 1–12. 10.1007/s10803-017-3107-3. [DOI] [PubMed] [Google Scholar]
  35. Roberts, J. A. , Rice M. L., and Tager–Flusberg H.. 2004. “Tense Marking in Children With Autism.” Applied PsychoLinguistics 25, no. 03: 429–448‐429‐448. [Google Scholar]
  36. Shahaeian, A. , Nielsen M., Peterson C. C., and Slaughter V.. 2013. “Cultural and Family Influences on Children's Theory of Mind Development: A Comparison of Australian and Iranian School‐Age Children.” Journal of Cross‐Cultural Psychology 45, no. 4: 555–568. 10.1177/0022022113513921. [DOI] [Google Scholar]
  37. Shahaeian, A. , Peterson C., Slaughter V., and Wellman H. M.. 2011. “Culture and the Sequence of Steps in Theory of Mind Development.” Developmental Psychology 47, no. 5: 1239–1247. 10.1037/a0023899. [DOI] [PubMed] [Google Scholar]
  38. Smith, C. E. , and Wu I.. 2016. “Mother‐Child Conversations About Thoughts, Desires, and Emotions: Relations to Children's Understanding of the Mind.” In Socializing Children Through Language, edited by Davis‐kean P. E. and Tang S., 49–78. Academic Press. 10.1016/B978-0-12-803624-2.00003-5. [DOI] [Google Scholar]
  39. Sobel, D. M. , and Austerweil J. L.. 2016. “Coding Choices Affect the Analyses of a False Belief Measure.” Cognitive Development 40: 9–23. 10.1016/j.cogdev.2016.08.002. [DOI] [Google Scholar]
  40. Tager‐Flusberg, H. 2000. “Language and Understanding Minds: Connections in Autism.” In Understanding Other Minds: Perspectives From Developmental Cognitive Neuroscience, vol. 2, 124–149. [Google Scholar]
  41. Tager‐Flusberg, H. , Paul R., and Lord C.. 2005. “Language and Communication in Autism.” In Handbook of Autism and Pervasive Developmental Disorders, edited by Volkmar F. R., Paul R., Rogers S. J., and Pelphrey K. A., vol. 1, 335–364. Wiley. [Google Scholar]
  42. Tager‐Flusberg, H. , and Sullivan K.. 1994. “A Second Look at Second‐Order Belief Attribution in Autism.” Journal of Autism and Developmental Disorders 24, no. 5: 577–586. [DOI] [PubMed] [Google Scholar]
  43. Wellman, H. M. 2011. “From Fancy to Reason: Scaling Deaf and Hearing Children's Understanding of Theory of Mind and Pretence.” British Journal of Developmental Psychology 27, no. pt. 2: 297–310. 10.1348/026151008X299728. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Wellman, H. M. 2014. Making Minds: How Theory of Mind Develops. Oxford University Press. [Google Scholar]
  45. Wellman, H. M. , Fang F., Liu D., Zhu L., and Liu G.. 2006. “Scaling of Theory‐Of‐Mind Understandings in Chinese Children.” Psychological Science 17, no. 12: 1075–1081‐1075‐1081. 10.1111/j.1467-9280.2006.01830.x. [DOI] [PubMed] [Google Scholar]
  46. Wellman, H. M. , Fuxi F., and Peterson C.. 2011. “Sequential Progressions in a Theory‐Of‐Mind Scale: Longitudinal Perspectives: Longitudinal Sequences of Development.” Child Development 82, no. 3: 780–792. 10.1111/j.1467-8624.2011.01583.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Wellman, H. M. , and Liu D.. 2004. “Scaling of Theory‐Of‐Mind Tasks.” Child Development 75, no. 2: 523–541‐523‐541. [DOI] [PubMed] [Google Scholar]
  48. Wittke, K. , Mastergeorge A. M., Ozonoff S., Rogers S. J., and Naigles L. R.. 2017. “Grammatical Language Impairment in Autism Spectrum Disorder: Exploring Language Phenotypes Beyond Standardized Testing.” Frontiers in Psychology 8: 532. 10.3389/fpsyg.2017.00532. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Yirmiya, N. , Erel O., Shaked M., and Solomonica‐Levi D.. 1998. “Meta‐Analyses Comparing Theory of Mind Abilities of Individuals With Autism, Individuals With Mental Retardation, and Normally Developing Individuals.” Psychological Bulletin 124, no. 3: 283–307. 10.1037/0033-2909.124.3.283. [DOI] [PubMed] [Google Scholar]
  50. Yu, C.‐L. , Stanzione C. M., Wellman H. M., and Lederberg A. R.. 2021. “Theory‐Of‐Mind Development in Young Deaf Children With Early Hearing Provisions.” Psychological Science 32, no. 1: 109–119. 10.1177/0956797620960389. [DOI] [PubMed] [Google Scholar]
  51. Zalla, T. , Stopin A., and Leboyer M.. 2008. “Faux Pas Detection and Intentional Action in Asperger Syndrome. A Replication on a French Sample.” Journal of Autism and Developmental Disorders: 373–382. 10.1007/s10803-008-0634-y. [DOI] [PubMed] [Google Scholar]
  52. Zhang, T. , Shao Z., and Zhang Y.. 2016. “Developmental Steps in Theory of Mind of Typical Chinese Children and Chinese Children With Autism Spectrum Disorder.” Research in Autism Spectrum Disorders 23, no. 1: 210–220. 10.1016/j.rasd.2015.10.005. [DOI] [Google Scholar]
  53. Zhou, M. , Su Y. ( E.), Burnel M., Chu X., Hou W., and Li L.. 2026. “Theory of Mind in Chinese Autistic Children: Evidence for a Delayed and Unexpectedly Deviant Pattern.” Research in Autism 129: 202747. 10.1016/j.reia.2025.202747. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data S1: Supporting Information.

AUR-19-0-s001.docx (280.9KB, docx)

Data Availability Statement

The data that support the findings of this study are openly available in Children with ASD do not understand hiding emotions before false belief attribution, at https://osf.io/bg65h/.


Articles from Autism Research are provided here courtesy of Wiley

RESOURCES