Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Dec 1.
Published in final edited form as: Dev Psychol. 2020 Oct 26;56(12):2236–2245. doi: 10.1037/dev0001127

Measurement Models for Studying Child Executive Functioning: Questioning the Status Quo

Marie Camerota 1, Michael T Willoughby 1, Clancy B Blair 2
PMCID: PMC8284867  NIHMSID: NIHMS1716769  PMID: 33104374

Abstract

Despite widespread interest in the construct of executive functioning (EF), we currently lack definitive evidence regarding the best measurement model for representing the construct in substantive analyses. The most common practice is to represent EF ability as a reflective latent variable, with child performance on individual EF tasks as observed indicators. The current manuscript critically evaluates the dominant use of reflective latent variable models in the child EF literature and compares them to composite models, a reasonable alternative. We review the literature suggesting that reflective latent variable models may not be the most appropriate representation of the construct of EF. Using preschool (Mage = 48.3 months) and first grade (Mage = 83.5 months) data from the Family Life Project (N= 920), we also investigate the implications of measurement model specification for the interpretation of study findings. Children in this sample varied in terms of sex (49% male), race (43% black) and socioeconomic status (76% low-income). Our findings show that the conclusions we draw from two substantive analyses differ depending on whether EF is modeled as a reflective latent variable versus a composite variable. We describe the implications of these findings for research on child EF and offer practical recommendations for producers and consumers of developmental research.

Keywords: latent variable, measurement model, executive function, childhood


Executive functioning (EF) refers to the set of higher-order cognitive abilities that enable individuals to plan and execute goal-directed behaviors. EF is comprised of three component abilities – working memory, inhibitory control, and cognitive flexibility – which, although conceptually distinct, are empirically undifferentiated in early childhood (e.g., Wiebe, Espy, & Charak, 2008). There has been great interest in the construct of EF across developmental psychology and related fields, in part because of the demonstrated relationships between child EF and salient outcomes, including school readiness (Blair, 2002), academic achievement (Allan, Hume, Allan, Farrington, & Lonigan, 2014; Jacob & Parkinson, 2015), and mental (Pauli-Pott & Becker, 2011; Schoemaker et al., 2013; Willcutt et al., 2005) and physical health (e.g., Reinert, Po’e, & Barkin, 2013).

Despite widespread interest in EF, what remains lacking is a critical evaluation of the best way to represent the construct in substantive analyses. Among researchers who use EF task batteries, a common strategy is to administer a number of performance-based tasks, each of which taps one or more facets of EF (e.g., Stroop-like inhibition tasks, working memory span tasks). Subsequently, one of several measurement models can be fit to EF task scores to obtain a single estimate of child EF ability. Different types of measurement models have been described in detail in the psychometric literature (Bollen & Bauldry, 2011). We provide a brief overview of the three most common measurement models here and describe how they have been used thus far in the child EF literature.

Overview of Common Measurement Models

Confirmatory factor analysis (CFA) models are by far the most common measurement model in the literature. CFA models fall into the broader category of latent variable measurement models with reflective (i.e., effect) indicators. Reflective latent variable (RLV) models assume that a latent construct gives rise to, or ‘causes’ the observed indicators. Applying this type of model to EF, we would say that child performance on EF tasks is caused by their latent EF ability. In RLV models, the latent construct is defined by the variance that is shared among a set of observed indicators. As applied to EF, this means that what is common (i.e., the shared variance) across a set of EF tasks is taken as a measure of a child’s underlying EF ability. Relatedly, what is left over after accounting for this shared variance (i.e., task-specific residuals) is assumed to be measurement error.

An alternative specification is the latent variable model with formative (i.e., causal) indicators. Formative latent variable (FLV) models assume that observed indicators give rise to, or ‘cause’ the latent variable, rather than vice versa. As opposed to RLV models which rely on shared variance, FLV models make use of the total variance across a set of indicators to define the latent construct. Because they do not define the latent construct based on shared variance, FLV models do not imply that indicators should be highly correlated (Bollen & Lennox, 1991). Therefore, FLV models can be used to model constructs that are thought to be comprised of multiple facets. FLV models are latent in that the focal construct is not assumed to be perfectly determined by the set of observed indicators (i.e., there may be additional indicators of the target construct which are not measured). One challenge of FLV models is that they are not statistically identified on their own (MacCallum & Browne, 1993). In fact, FLV models cannot be estimated unless they include two reflective indicators or structural paths to two outcome variables, which are statistically equivalent in a latent modeling framework. These practical difficulties have led to criticism (e.g., Howell, Breivik, & Wilcox, 2007) about whether FLV models should be considered measurement models, considering they cannot be estimated in the absence of additional paths.

A third common approach is to use principal components analysis (PCA) to create a composite (CMP) score. CMP models define a focal construct as a weighted combination of observed indicators. Whereas PCA scores are based on empirically-determined weights for each item, simpler realizations (e.g., sum or mean scores) assume equal weighting across all items. Regardless of realization, CMP models are similar to FLV models in that they make use of the total variance across a set of items, rather than relying solely on shared variance. Further, CMP and FLV models both imply that observed indicators give rise to or define the construct of interest. The CMP model is different from the FLV model in that it produces a composite variable that contains no disturbance term (i.e., error variance). That is, CMP models define the focal construct as being perfectly determined by its set of observed indicators, a likely untenable assumption. Despite this limitation, there are practical reasons to prefer the use of CMP models, as they are simpler to estimate than FLV models and are not plagued by the identification problems described earlier. Therefore, while not interchangeable with FLV models, CMP models are a reasonable, accessible alternative.

Measurement Models and Child EF

Prior research has almost universally relied on CFA models to represent the construct of EF, espousing RLV models as the best way to obtain ‘pure’ estimates of an individual’s EF ability that are not contaminated by measurement error (Miyake et al., 2000). RLV models have been used to test fundamental questions about the dimensionality of EF, and whether the dimensionality changes across development (Karr et al., 2018; Miyake et al., 2000; Miyake & Friedman, 2012; Wiebe et al., 2008). The dominant use of RLV models has shaped how the construct of EF is viewed (e.g., as the variance that is common across a set of EF tasks). The dominant use of RLV models is not unique to the EF literature. As others have noted (Bollen & Bauldry, 2011; Bollen & Lennox, 1991), they appear to have become the default choice of measurement model in psychological studies, with little consideration of alternative specifications. Our goal in the current manuscript is to explain why the default RLV specification may be inappropriate for studying EF, and to empirically demonstrate the consequences of applying different measurement models in substantive applications.

The hegemony of CFA models in psychology broadly, and within studies of EF more specifically, is due in part to one key feature of RLV models. Previously, we described how RLV models decompose variance in observed indicators (i.e., scores on EF tasks) into two parts: one which represents the intended construct (i.e., EF ability), and one which represents measurement error. By decomposing variance into these two sources, RLV models are thought to result in more precise representations of underlying psychological constructs, as opposed to FLV or CMP models, which do not decompose variance in this way. The consequences of ignoring measurement error are well-established and include biased path coefficients and reduced statistical power (e.g., Cole & Preacher, 2014). However, we argue that these salient demonstrations have resulted in researchers prioritizing the treatment of measurement error as the sole criterion by which they make decisions about measurement models, rather than critically evaluating the advantages and disadvantages of all available options (e.g., RLV, FLV, CMP models).

A more reasoned approach for selecting measurement models is needed, in part due to the potential dangers of applying RLV models to data that do not conform to such a model. Just as ignoring measurement error can lead to biased results, a recent investigation showed that researchers may similarly introduce bias when they apply RLV measurement models to data that do not fit that representation (Rhemtulla et al., 2019). The authors of this study demonstrate that the direction (e.g., over- or under-estimation) and magnitude of coefficient bias that results from incorrectly applying an RLV model is variable and may change depending upon the specification of the substantive model (i.e., whether predictors and/or consequences of the focal construct are included). They also show that the degree of bias is inversely related to the magnitude of standardized factor loadings in the RLV model, indicating that model misspecification is particularly problematic when observed indicators have less shared variance (Rhemtulla et al., 2019). Finally, the authors demonstrate via empirical reanalysis of published findings how the use of RLV versus alternative (e.g., CMP) measurement models can influence the conclusions of a substantive investigation. Interestingly, the empirical reanalysis was performed using child working memory as the focal construct, highlighting the relevance of measurement model (mis)specification for research on child EF.

RLV Models and Child EF – Reasons for Doubt?

Unfortunately, there is no foolproof way to determine whether a construct is best represented by a RLV model, as opposed to alternatives (i.e., FLV or CMP models). Rather, one must rely on accumulating evidence from several sources. One might consider thought experiments, such as those outlined by Bollen (1989), regarding the hypothesized relationship among a construct and its observed indicators. For example, is a latent construct thought to give rise to its indicators, or vice versa? Vanishing tetrad tests (VTTs) are a type of empirical test that can help determine whether data are consistent with a RLV model (Bollen & Ting, 2000). Although the specifics are outside the scope of the current investigation, we used VTTs in our previous work to demonstrate that EF task data do not conform to a fully RLV model (Willoughby & Blair, 2016).

There are other reasons to doubt whether RLV models are the best representation of the construct of EF. Consider that the average correlation among individual EF tasks in a task battery tends to be quite low (r ≈ .2 to .4), a phenomenon that has been frequently documented in the literature (e.g., Brydges et al., 2014; Miyake et al., 2000; Willoughby, Holochwost, Blanton, & Blair, 2014). A potential reason for this low correlation is that EF tasks in general are characterized by low between-person variance, in part because these and other cognitive tasks were originally designed to test within- (rather than between-) person experimental effects (Dang et al., 2020; Hedge et al., 2018). Regardless of reason, modest correlations between tasks means that there is very little shared variance (between 4% and 16%) to define the construct of EF in RLV models. In turn, this low level of shared variance calls into question the interpretation of the resulting latent variable, which could just as easily represent some shared, non-executive ability that EF tasks draw upon (e.g., motivation, attentional capacity), rather than EF ability per se. Low between-person variance also contributes to inconsistency in the number of identified EF factors in research using RLV measurement models. A recent re-analysis using data from 46 samples found that no single RLV factor structure was universally accepted in either child or adult samples, with solutions ranging from one to five factors (Karr et al., 2018). It is worth noting that low inter-indicator correlations are not a contraindication for models that rely on total (i.e., FLV, CMP), rather than shared variance.

Finally, weak observed correlations among individual EF tasks mean that the majority of item variance (84% to 96%) is relegated to the error term when RLV models are used. Previous research suggests that residuals in RLV models of EF may contain meaningful information, rather than being comprised solely of measurement error. For example, Nguyen and colleagues (2019) found significant residual correlations between EF tasks and math performance, both in their own sample and in six out of ten previously published datasets (including earlier data collected from the current study). The presence of these residual correlations is not conclusive evidence against a RLV model of EF, but it does highlight the need to evaluate alternative models.

The Current Study

There is growing concern among some psychometricians that we have reflexively adopted RLV models as the de-facto standard without considering the potential consequences of doing so (Bollen & Diamantopoulos, 2017; Rhemtulla et al., 2019). An overarching goal of the current study is to demonstrate the impact of this issue using an applied example that is of interest to developmental psychologists. Above, we describe numerous reasons to doubt whether the RLV model is the best approximation of the construct of child EF. The goal for the remainder of the manuscript is to investigate the implications of measurement model specification for the interpretation of study findings. That is, are there any practical consequences that result from using an RLV model rather than alternatives?

To answer this question, we leverage the analytic technique employed by Rhemtulla and colleagues (2019) and compare whether the answers to two substantive research questions differ depending on the measurement model used to summarize child EF ability. First, we will examine stability in EF across the transition to school (preschool to grade 1). Second, we will test whether there is a unique effect of preschool EF on third grade academic outcomes, or whether this relationship is entirely mediated through first grade EF. We picked these questions because of their relevance to the EF intervention literature, which has grown in interest in recent years (Takacs & Kassai, 2019).

In the following sections, we describe and report the results from parallel sets of analyses where EF is represented using either a RLV or a CMP model. Our choice of the CMP model as an alternative to the RLV model is consistent with previous studies (Rhemtulla et al., 2019) and allows for a closer comparison between model results. This is because, unlike FLV models, CMP models do not require additional paths for identification. Additionally, because RLV models imply effect indicators, while CMP models imply formative indicators, these two models are maximally different in terms of their assumed relationships among indicators. Finally, CMP models are easily estimable and interpretable, making them a reasonable and accessible alternative to RLV models. Because different measurement models define the construct of EF differently (e.g., using shared or total variance), we hypothesize that our substantive conclusions will differ when RLV and CMP models are used to summarize child EF ability.

Methods

Participants

Data for these analyses were drawn from the Family Life Project (FLP), a longitudinal investigation of children and families residing in two regions with high rural poverty rates. Families living in target counties in North Carolina (NC) and Pennsylvania (PA) were recruited using a stratified random sampling approach. A representative sample of 1,292 families were recruited over a 1-year period spanning September 2003 through September 2004. Low-income families were oversampled in both states, and African-American families were oversampled in NC, to ensure adequate power to test study aims. Additional details about FLP sampling and recruitment can be found elsewhere (Vernon-Feagans & Cox, 2013).

Only children who completed a minimum number of EF tasks at preschool (3 tasks) and first grade (2 tasks) were included in the current analyses. This final sample (N = 920) did not differ from the full FLP sample in terms of child race, gender, research site (NC or PA), or poverty status at recruitment. All data collection activities were approved by the institutional review board at the University of North Carolina at Chapel Hill (protocol #07-0646), and informed consent and assent were obtained from study participants.

Procedures

Data were collected during a home visit conducted when children were approximately 48 months (Mage = 48.3; SD = 1.50) and school visits conducted when children were in their second (Mage = 83.5; SD = 3.68) and fourth years (Mage = 107; SD = 3.54) of formal schooling. For these school visits, the majority of children were in the first (95%) and third (88%) grade. For simplicity, we refer to these school visits as representing the first and third grade. On average, three years elapsed between the 48 month and first grade visits, and two years elapsed between the first and third grade visits.

Measures

Executive function – Preschool

At 48 months, children completed a battery of EF tasks that were presented in an open spiral bound flip book format. Each page (8 x 14 in) presented stimuli for the child on one page and scripted instructions for the research assistant on the other. Each task required children to pass a set of training trials to ensure comprehension of task procedures before proceeding to test items. The battery of EF tasks included three measures of inhibitory control (Silly Sounds Stroop, Spatial Conflict Arrows, and Animal Go/No-Go), two measures of working memory (Working Memory Span and Pick the Picture Game), and one measure of set shifting (Something’s the Same). Details regarding individual tasks and the psychometric properties of this task battery have been reported elsewhere (Willoughby et al., 2010; Willoughby, Blair, et al., 2012; Willoughby, Wirth, et al., 2012). To summarize, tasks in the EF battery do a relatively better job of measuring EF in children with low to average EF ability, show expected mean-level differences based on child age and poverty status, and exhibit good criterion validity. Consistent with the analytic approach used previously in this sample (Willoughby, Wirth, & Blair, 2012), item response theory (IRT) models were used to create expected a posteriori (EAP) scores for each task. These EAP scores have been shown to have increased precision to differentiate children’s EF ability compared to raw (i.e., percent correct) scores (Willoughby et al., 2011). Individual EAP scores for each task were used as indicators of child EF at preschool.

Executive function – Grade 1

Children completed four EF tasks at grade 1. The Hearts and Flowers task (Davidson et al., 2006) requires children to press a button on the same side of the screen when they see an image of a heart, but to press a button on the opposite side of the screen when they see an image of a flower. Three task blocks include hearts-only, flower-only, and mixed (hearts and flowers) trials. Children’s accuracy on the flowers (HF_F) and mixed (HF_X) block were used as measures of children’s inhibitory control and set shifting, respectively. Trials in which children responded faster than 200ms were excluded from analyses. Only children who completed at least 75% of task trials received task scores.

The NIH Toolbox version of the Dimensional Change Cart Sort task (DCCS; Zelazo, 2006) assessed set shifting. The Toolbox DCCS is computer-based task in which children are first asked to sort items by one dimension (e.g., shape). Then, they were asked to sort the same set of items by a different dimension (e.g., color). In the mixed block, children sorted 50 items according to an audio cue played at the beginning of each trial, which indicated which dimension they should sort by (shape or color). Child accuracy during the mixed block was used as the final task score. As described previously, trials in which children responded faster than 200ms were excluded from analyses, and only children who completed at least 75% of task trials received task scores.

The multi-source interference test (MSIT; Bush & Shin, 2006) was used as a measure of inhibitory control. The MSIT is a Stroop-like task that requires participants to press a button corresponding to the number that is different from the other two numbers in a three-number sequence. For example, if the sequence 121 was presented, the correct response would be to press the number 2. On incongruent trials (e.g., 211) 2 is out of its ordinal position and as such, children must suppress the tendency to respond with the key that is in the same position as the mismatched number, and instead respond with the key that represents the same value as the mismatched number. Child accuracy across 47 incongruent trials was used as the MSIT task score. Again, trials in which children responded faster than 200ms were excluded from analyses, and only children who completed at least 75% of task trials received task scores.

Finally, the Backward Word Span task (BWS; Davis & Pratt, 1995) assessed working memory. Children were required to repeat a list of familiar, single-syllable words in reverse order. Children were allowed up to three practice trials to demonstrate understanding of task demand. The list of words to be repeated increased with each successful trial, up to a length of five words. Children’s task scores represented the highest number of words correctly recalled.

Academic outcomes – Grades 1 and 3

Teachers rated children’s academic competence at first and third grade using the Social Skills Rating System (SSRS; Gresham & Elliot, 1990), a measure that evaluates the social behaviors of children. The SSRS is appropriate for children aged 3 to 18 years and consists of three subscales. Standard scores from the Academic Competence scale at first and third grade were used as academic outcomes.

Analytic Plan

Our analyses tested whether the results from two substantive analyses differed based on whether EF was modeled as a RLV or a CMP variable. To generate RLV scores, we estimated a two-factor CFA model (Figure 1) to represent EF at preschool and first grade. For CMP models, EF scores were created by averaging together individual task scores at preschool and first grade. Therefore, four scores were created in total: RLV and CMP scores for preschool and first grade EF.

Figure 1.

Figure 1.

Latent measurement model for EF at preschool and first grade. GNG = Animal Go/No-Go; SSS = Silly Sounds Stroop; SCA = Spatial Conflict Arrows; STS = Something’s the Same; HF_X = Hearts and Flowers (Mixed Block); HF_F = Hearts and Flowers (Flowers Block); DCCS = Dimensional Change Cart Sort Task; MSIT = Multi-Source Interference Task; BWS = Backward Word Span. *p < .05, **p < .01, ***p < .001.

Using these scores, we tested two substantive questions. First, we estimated the stability of EF across the transition to school, by examining the correlations between EF scores at preschool and at first grade. We compared the magnitude of these correlations when EF was represented according to a RLV or CMP model. Next, we tested whether preschool EF was a unique contributor to third grade academic competence, beyond first grade measures. To do this, we estimated a path model that included a direct path between preschool EF and third grade academic competence, as well as indirect paths via first grade EF and academic competence. We compared the magnitude and significance of direct and indirect paths in models that represented EF as a REV versus a CMP score.1

Descriptive statistics were conducted in SAS 9.4, while substantive analyses were conducted in Mplus 8.1 (Muthén & Muthén, 2017) using full-information maximum likelihood with robust standard errors. All substantive analyses accounted for the complex sampling design by incorporating appropriate population weight and stratification variables.

Results

Descriptive Statistics

Descriptive statistics and correlations among preschool and first grade EF tasks are displayed in Table 1. Individual preschool (r = .07 - .28, p < .05) and first grade (r = .16 - .55, p < .05) EF tasks were modestly correlated with one another, with the largest correlation observed between the two blocks of the Hearts and Flowers task (r = .55, p < .001). Cross-time correlations between EF tasks ranged from small to modest (r = .05 - .33, p < .14). Fitting a two-factor RLV model to these data (Figure 1) resulted in good model fit, χ2(42) = 57.9, p = .05, CFI = .98, RMSEA = .02. Individual EF tasks loaded strongly onto the EF latent factor at both 48 months (λ = .34 - .62, p < .001; R2 = .12 to .39) and first grade (λ = .37 - .61, p < .001; R2 = .14 to .45). Because we made use of scores from two different blocks of the Hearts & Flowers task, this model included a freely-estimated residual correlation between scores from the flowers and mixed blocks.

Table 1.

Descriptive statistics and correlations among EF tasks at preschool and first grade

1 2 3 4 5 6 7 8 9 10 11
1. PTP (48) --
2. WMS (48) .27*** --
3. GNG (48) .27*** .19*** --
4. SSS (48) .20*** .12*** .20*** --
5. SCA (48) .23*** .17*** .12*** .07* --
6. STS (48) .26*** .22*** .14*** .18*** .28*** --
7. HF_X (G1) .33*** .25*** .21*** .11*** .15*** .20*** --
8. HF_F (G1) .26*** .17*** .12** .12*** .09** .14*** .55*** --
9. DCCS (G1) .21*** .16*** .15*** .07 .13*** .17*** .35*** .24*** --
10. MSIT (G1) .24*** .24*** .20*** .14*** .18*** .19*** .36*** .24*** .23*** --
11. BWS (G1) .18*** .19*** .13*** .05 .12*** .20*** .23*** .16*** .19*** .25*** --
N 873 890 745 838 910 904 899 901 755 887 907
M −.29 −.11 −.16 −.10 .10 .04 .81 .87 .91 .61 2.46
SD .88 .83 .87 .87 .93 .71 .17 .20 .11 .21 .71

Note. Sample sizes for correlations ranged from 623 to 899. PTP = Pick the Picture; WMS = Working memory span; GNG = Animal Go/No-Go; SSS = Silly Sounds Stroop; SCA = Spatial Conflict Arrows; STS = Something’s the Same; HF_X = Hearts and Flowers (Mixed Block); HF_F = Hearts and Flowers (Flowers Block); DCCS = Dimensional Change Cart Sort Task; MSIT = Multi-Source Interference Task; BWS = Backward Word Span; 48 = 48 months; G1 = Grade 1; M = mean; SD = standard deviation.

*

p < .05

**

p < .01

***

p < .001.

Comparison of Findings Across RLV and CMP Models

We tested whether the conclusions from two substantive analyses changed depending on whether EF was represented as a RLV or CMP score. We tested this question in the context of the cross-time stability in EF, as well as the relationship between preschool EF and academic competence.

Stability in EF from preschool to first grade

We examined whether the association between children’s EF ability at preschool and first grade differed based on the measurement model that was used to represent EF. In the RLV measurement model, we found that estimates of EF ability were strongly correlated across time (r = .79, p < .001). When EF ability was approximated using CMP scores, the cross-time correlation was more modest (r = .45, p < .001). Put another way, EF ability at preschool accounted for 62% of the variance in EF ability at first grade, when EF was modeled as a RLV at both time points. When EF was modeled as a CMP in both preschool and first grade, EF ability at preschool accounted for 20% of the variance in EF ability at first grade. These results show that estimates of cross-time stability are nearly twice as large when EF is represented by a RLV measurement model, compared to a CMP model.

Contribution of preschool EF to academic competence

We next tested whether there was a unique effect of preschool EF on third grade academic competence, or whether this effect was entirely mediated through first grade measures (Figure 2). Again, we examined whether the answer to this question differed depending on whether EF was modeled using a RLV or CMP model. When EF was modeled as a RLV (Figure 2a), we found no direct effect of preschool EF on third grade academic competence (β = −.02, p = .87). Associations between preschool EF and third grade academic competence were entirely indirect, mediated via first grade EF (β = .26, p = .007) and first grade academic competence (β = .29, p < .001).

Figure 2.

Figure 2.

Direct and indirect associations between preschool EF and third grade academic competence, when EF is modeled as a (a) reflective latent variable and (b) composite score. Ac Comp = academic competence; G1 = Grade 1. *p < .05, **p < .01, ***p < .001.

On the contrary, when EF was modeled as a CMP variable (Figure 2b), there was a significant direct effect of preschool EF on academic competence at third grade (β = .08, p = .008). Consistent with the RLV model, there were also significant indirect effects of preschool EF via first grade EF (β = .08, p < .001) and first grade academic competence (β = .25, p < .001). The magnitude of the indirect effect (EF preschool → EF 1st Grade → Academic Competence 3rd Grade) was three times as large when EF was modeled as a RLV (β = .26, p = .007) versus a CMP (β = .08, p < .001), likely because the estimate of cross-time stability (EF preschool → EF 1st Grade) was much larger in the RLV model (β = .79, p < .001) compared to the CMP model (β = .46, p < .001). The total effect of EF at 48 months on academic competence in third grade was also larger in the RLV model (β = .53, p < .001) as compared to the CMP model (β = .40, p < .001).

Discussion

The motivation for the current manuscript was to critically evaluate the dominant use of RLV models in the child EF literature and compare them to a reasonable alternative. Our review of the literature uncovered a number of reasons to question whether RLV models are a plausible representation for child EF ability. Our empirical findings demonstrate that there are serious interpretational differences that arise depending on the measurement model that is chosen to represent EF in substantive analyses. These results add to a growing undercurrent of resistance against the default use of RLV measurement models in the psychological literature more broadly. Although this viewpoint that is by no means new (Bollen & Lennox, 1991), it has yet to gain widespread appreciation outside of the psychometric arena.

In the absence of definitive tests that tell us which measurement model to apply to our data, we must rely on the preponderance of evidence for or against certain models. Similar to previous studies (Nguyen et al., 2019; Willoughby et al., 2014), we found low correlations among individual indicators of EF (i.e., scores from different EF tasks). These low correlations suggest that EF tasks (and the latent constructs derived from them) may do a poor job of indexing individual differences, a phenomenon which has been reported elsewhere (Dang et al., 2020; Hedge et al., 2018). These low correlations also lead us to question what the shared variation extracted by RLV models really represents. This question is particularly germane given the well-known task impurity problem that is ubiquitous in all areas of cognitive assessment, including the assessment of EF ability. To the extent that performance on EF tasks represents a confluence of executive and non-executive factors, it is impossible to conclude the true meaning of a RLV applied to task data. Besides these concerns regarding low inter-task correlations, previous research has identified a number of other reasons why RLV models may not be appropriate, including the results of vanishing tetrad tests (Willoughby & Blair, 2016). Therefore, the preponderance of evidence suggests that we should consider alternatives to the RLV model for representing child EF data. However, to date, CFA models remain the most common measurement model reported in the EF literature.

An important corollary question is why applied researchers should be concerned about the measurement model they use to represent EF in their substantive analyses. A main aim of our manuscript was to investigate whether interpretational differences arise when EF is modeled as a RLV versus a reasonable alternative (i.e., a CMP model). Our analyses clearly show that the measurement model we use for EF has implications for our study conclusions. For example, we found that estimates of cross-time stability from preschool to grade 1 were twice as large when EF was modeled as a RLV, compared to when it was modeled as a CMP. Importantly, this high stability in latent EF is conceptually hard to reconcile with the weak to modest correlations observed between individual tasks across time (in this sample, they ranged from r = .05 - .33). Moreover, our findings are consistent with previously reported findings in this sample, which investigated cross-time stability from age 3 to 5 (Willoughby & Blair, 2016). Specifically, this previous study showed that the correlation between latent EF across the preschool years was greater than 0.90, similar to what was observed in the current study. When EF was modeled as a CMP (i.e., mean) score, stability estimates were more modest, ranging from about 0.3 to 0.5. The current study confirms that this finding holds true across a wider period of time that spans both preschool and early childhood. Additionally, although this previous study used the same tasks to measure EF at multiple timepoints, the current investigation used different tasks at preschool and first grade. Thus, it is noteworthy that we observed similarly high correlations in latent EF across time.

Interestingly, recent findings from Helm and colleagues (2019) show that when children’s EF ability was summarized as the first component in a PCA, estimates of cross-time stability from age 4 to 6 were similarly modest (β = .34, p < .001), compared to what we observed when we related EF composites from age 4 to 7 (β = .46, p < .001). Therefore, it is reasonable to question whether the high cross-time stability that is only observed when EF is modeled as a RLV is an artifact of the measurement model itself. These PCA findings also suggest that it is not just the difference in weighting between our CMP (unit weights) and RLV scores (empirical weights) that leads to our disparate findings. Rather, the crux of the problem appears to be defining EF ability on the basis of the small amount of shared variance among individual EF tasks. As we stated earlier, it is impossible to say with certainty what a RLV truly represents. Therefore, what is extracted and referred to as “EF” in RLV models may be another psychological construct altogether, one which exhibits strong stability across the early years of life.

We also observed that preschool EF had a unique effect on academic competence in third grade, but only when EF was represented using a CMP model. When EF was represented as a RLV, the impact of preschool EF on third grade academic competence was entirely mediated through earlier measures of these constructs. These findings are likely attributable to the greater cross-time stability in EF that is observed in RLV models, such that there was little unique variance in preschool EF to significantly predict downstream outcomes. Regardless of underlying explanation, our study demonstrates that the choice of measurement model is not a trivial one: different conclusions are likely to be drawn from analyses that use RLV versus CMP models to represent the construct of EF.

On the basis of studies that model EF as a RLV, we would conclude that a child’s EF ability at first grade is almost entirely determined by their level of EF in preschool and that the rank ordering of children is largely conserved over time, despite the complex changes that occur across the transition to formal schooling (Margetts, 2002). Taken a step further, we might conclude that EF skills are highly heritable, as others have concluded on the basis of RLV models (Friedman et al., 2008) and that EF development can be characterized as a maturational unfolding that is largely genetically determined. While it is impossible to say whether these conclusions are warranted or not, it is worth pointing out that they stand in stark contrast to a number of assertions in the developmental literature.

For example, considerable research points to the preschool period as a time of rapid growth and reorganization in executive abilities (Garon et al., 2008), guided in part by rapid development in the prefrontal cortex and associated neural networks (Fiske & Holmboe, 2019). Additionally, environmental factors such as poverty (e.g., Blair et al., 2011), parenting (e.g., Fay-Stammbach, Hawes, & Meredith, 2014), and childcare (e.g., Berry, Blair, Ursache, Willoughby, & Granger, 2014) are well documented contributors to individual differences in EF in the preschool and early elementary years. Finally, EF has been shown to be amenable to different types of interventions during the preschool years (e.g., Takacs & Kassai, 2019). All of these findings suggest the opposite of what we would conclude from RLV models examining cross-time stability – that EF is exceedingly malleable, as opposed to primarily immutable. In contrast, the modest stability estimates that we obtained when EF was modeled as a CMP are more in line with this body of research findings.

Depending on the measurement model chosen, we would also draw different conclusions about how and when to intervene with children to promote academic success. The RLV and CMP models make different predictions about the effectiveness of interventions, conditional on what the interventions target. Because the RLV is defined by shared variance across a number of EF tasks, interventions that improve performance on a single task or a subset of tasks are unlikely to show relations with downstream academic outcomes. Using a CMP model, interventions that improve single task performance are more likely to be related to improved child outcomes. Additionally, the degree of stability in EF between preschool and first grade EF may have implications for the best timing of intervention. If the stability in EF between preschool and school entry is as high as is suggested by RLV models, then interventions to improve EF would necessarily need to happen during the preschool years. Contrast this with initiatives like EF+Math (efmathprogram.org) that have been designed to improve children’s math skills in elementary and middle school by improving their concurrent EF. Unless there was the possibility for EF to change between preschool and early childhood, programs like this would not be expected to result in academic gains for children. Instead, it would be more advantageous to focus on children’s academic skills directly as a point of intervention during the elementary years. However, if the cross-time stability in EF is more modest, as suggested by CMP models, then EF interventions occurring during the elementary years might be a worthwhile investment. Thus, deciding between different measurement models is not just important for methodologists or substantive researchers alone. These choices may have important pedagogical and policy implications as well.

Thus, there is increasing reason to doubt the use of RLV models as applied to child EF data, and continued misuse is likely to result in biased conclusions. Recent papers have been similarly critical of the CFA models reified by Miyake and colleagues (2000; 2012). In a reanalysis of 46 samples, no single CFA model of EF was unequivocally accepted or selected (Karr et al., 2018), suggesting that increased attention to replicability and alternative measurement models are needed. Just as the consequences of ignoring measurement error have been acknowledged and disseminated to mainstream psychology audiences (e.g., Cole & Preacher, 2014), it is time to acknowledge that inappropriate use of RLV models is also a cause for concern. Simulation studies now show that the bias that results from applying an RLV model to data that are better represented as a CMP model can be worse than the bias that results from ignoring measurement error (Rhemtulla et al., 2019). The degree of bias depends in part on the amount of shared variance among observe indicators. When there is more unique variance (such as is the case with EF), there is a greater probability for bias. Unfortunately, model fit tells us nothing about the presence or degree of bias, as well-fitting CFA models (such as those commonly observed in the EF literature) can still produce biased parameters. Moreover, in the absence of definitive evidence supporting one type of measurement model over another, Rhemtulla and colleagues (2019) argue that CMP models may be preferable, because they are more robust to alterations in the structural model. That is, regression coefficients in a structural equation model are less influenced by the inclusion of different predictors and/or outcomes when the focal construct is modeled as a CMP rather than as a RLV. Therefore, there are reasons to prefer CMP models over RLV models for representing psychological constructs, particularly when observed indicators are weakly correlated.

On the other hand, CMP models have limitations of their own that should be considered. It is unlikely that a CMP model ever perfectly represents an underlying construct, because to do so would require all indicators of the construct be accounted for and be measured without error. Another criticism is that CMP models are not true measurement models, because the construct is completely determined by the indicators and therefore has no error term (Bollen & Bauldry, 2011). Thus, although we believe that CMP models may safeguard against some of the dangers of RLV models, we acknowledge that they are not a perfect solution. Given the estimation difficulties inherent in FLV models, there is a clear need to develop alternative measurement models that can be easily estimated in an SEM framework (Rhemtulla et al., 2019). An exploration of alternative measurement models may be particularly useful in the context of EF, as there is the possibility that neither traditional RLV, FLV, or CMP models are appropriate. Additionally, regardless of the model used to summarize performance across tasks, there exist many choices for scoring individual tasks (e.g., mean scores versus IRT-based scores). This topic was not the focus of the current manuscript, but has received attention in its own right in the context of child EF (e.g., Camerota et al., 2019).

In the meantime, we offer two practical recommendations for those who conduct and consume developmental research. These recommendations represent incremental improvements to the status quo (i.e., reflexive use of RLV models) that may safeguard the field from an abundance of biased or inaccurate conclusions. Our first suggestion is for consumers of developmental research to pay attention to the type of measurement models that are used to test substantive hypotheses and to be aware that the conclusions drawn from an investigation may depend on those modeling decisions. When synthesizing results across multiple studies, differences in measurement models should be considered as one factor that may lead to divergent findings. This critical practice may help bring clarity to the existing EF literature, particularly given conflicting evidence regarding the stability in EF across time.

Second, we believe that producers of developmental research should do due diligence by justifying their choice of measurement model(s) in a given investigation, while also conducting sensitivity analyses to test the robustness of their findings. In the absence of absolute guidance or tests that can determine the most appropriate measurement model, it can be useful to benchmark how one’s substantive conclusions would change if the measurement model for the construct of interest were changed. In some cases, this will mean comparing results that use RLV models to results that use CMP models, as we demonstrated in the current investigation. When possible, FLV models can also serve as an appropriate comparison model. Of course, study-specific results are likely to differ from those presented in the current manuscript based on sample characteristics, covariates, focal predictors, and outcomes, and may not always show such stark differences between RLV and alternative models. However, our results indicate the possibility that major interpretive differences may arise, and if they do, researchers should interpret findings with caution until more data can be brought to bear on the question. Additional concern in selecting measurement models is expected to bring increased rigor to the developmental literature as a whole and to research on child EF specifically.

In sum, we argue that the uncritical use of RLV models in the developmental literature is a problematic phenomenon worthy of future study. Although we chose EF as an exemplar in this manuscript, we believe that our findings are just one illustration of a larger trend in the field. We hope that the low-cost solutions offered here provide initial guidance to concerned researchers who wish to actively address the issue of measurement model specification in their own work.

Supplementary Material

Supplemental Material

Acknowledgements

This study is part of the Family Life Project [https://flp.fpg.unc.edu/]. The Family Life Project Phase II Key investigators include: Lynne Vernon-Feagans, The University of North Carolina at Chapel Hill; Mark T. Greenberg, The Pennsylvania State University; Clancy B. Blair, New York University; Margaret R. Burchinal, The University of North Carolina at Chapel Hill; Martha Cox, The University of North Carolina at Chapel Hill; Patricia T. Garrett-Peters, The University of North Carolina at Chapel Hill; Jennifer L. Frank, The Pennsylvania State University; W. Roger Mills-Koonce, University of North Carolina-Greensboro; and Michael T. Willoughby, RTI International.

Data collection for this study was supported by NICHD P01 HD039667, with co-funding from the National Institute of Drug Abuse. Data analysis and writing for this study was supported by the Office of The Director, National Institutes of Health under Award Number UG3OD023332. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Footnotes

1

We conducted a third set of analyses where factor scores were exported from the RLV model and used to test these same substantive questions. Findings were remarkably similar between the full RLV and factor score models (see Supplemental Materials).

References

  1. Allan NP, Hume LE, Allan DM, Farrington AL, & Lonigan CJ (2014). Relations between inhibitory control and the development of academic skills in preschool and kindergarten: A meta-analysis. Developmental Psychology, 50(10), 2368–2379. 10.1037/a0037493 [DOI] [PubMed] [Google Scholar]
  2. Berry D, Blair C, Ursache A, Willoughby MT, & Granger DA (2014). Early childcare, executive functioning, and the moderating role of early stress physiology. Developmental Psychology, 50(4), 1250–1261. 10.1037/a0034700 [DOI] [PubMed] [Google Scholar]
  3. Blair C (2002). School Readiness: Integrating Cognition and Emotion in a Neurobiological Conceptualization of Children’s Functioning at School Entry. American Psychologist, 57(2), 111–127. 10.1037//0003-066X.57.2.111 [DOI] [PubMed] [Google Scholar]
  4. Blair C, Granger D. a., Willoughby MT, Mills-Koonce R, Cox M, Greenberg MT, Kivlighan KT, & Fortunato CK (2011). Salivary Cortisol Mediates Effects of Poverty and Parenting on Executive Functions in Early Childhood. Child Development, 82(6), 1970–1984. 10.1111/j.1467-8624.2011.01643.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bollen KA (1989). Structural equations with latent variables. John Wiley & Sons, Inc. [Google Scholar]
  6. Bollen KA, & Bauldry S (2011). Three Cs in measurement models: Causal indicators, composite indicators, and covariates. Psychological Methods, 16(3), 265–284. 10.1037/a0024448 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bollen KA, & Diamantopoulos A (2017). In Defense of Causal-Formative Indicators: A Minority Report. Psychological Methods, 22(3), 581–596. 10.1037/met0000056 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bollen KA, & Lennox R (1991). Conventional wisdom on measurement: A structural equation perspective. Psychological Bulletin, 110(2), 305–314. 10.1037/0033-2909.110.2.305 [DOI] [Google Scholar]
  9. Bollen KA, & Ting K (2000). A tetrad test for causal indicators. Psychological Methods, 5(1), 3–22. 10.1037/1082-989X.5.1.3 [DOI] [PubMed] [Google Scholar]
  10. Brydges CR, Fox AM, Reid CL, & Anderson M (2014). The differentiation of executive functions in middle and late childhood: A longitudinal latent-variable analysis. Intelligence, 47, 34–43. 10.1016/j.intell.2014.08.010 [DOI] [Google Scholar]
  11. Bush G, & Shin LM (2006). The Multi-Source Interference Task: an fMRI task that reliably activates the cingulo-frontal-parietal cognitive/attention network. Nature Protocols, 1(1), 308–313. 10.1038/nprot.2006.48 [DOI] [PubMed] [Google Scholar]
  12. Camerota M, Willoughby MT, & Blair CB (2019). Speed and accuracy on the Hearts and Flowers task interact to predict child outcomes. Psychological Assessment, 31(8), 995–1005. 10.1037/pas0000725 [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Cole DA, & Preacher KJ (2014). Manifest variable path analysis: Potentially serious and misleading consequences due to uncorrected measurement error. Psychological Methods, 19(2), 300–315. 10.1037/a0033805 [DOI] [PubMed] [Google Scholar]
  14. Dang J, King KM, & Inzlicht M (2020). Why Are Self-Report and Behavioral Measures Weakly Correlated? Trends in Cognitive Sciences, 24(4), 267–269. 10.1016/j.tics.2020.01.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Davidson MC, Amso D, Anderson LC, & Diamond A (2006). Development of cognitive control and executive functions from 4 to 13 years: Evidence from manipulations of memory, inhibition, and task switching. Neuropsychologia, 44(11), 2037–2078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Davis HL, & Pratt C (1995). The development of children’s theory of mind: The working memory explanation. Australian Journal of Psychology, 47(1), 25–31. 10.1080/00049539508258765 [DOI] [Google Scholar]
  17. Fay-Stammbach T, Hawes DJ, & Meredith P (2014). Parenting Influences on Executive Function in Early Childhood: A Review. Child Development Perspectives, 8(4), n/a-n/a. 10.1111/cdep.12095 [DOI] [Google Scholar]
  18. Fiske A, & Holmboe K (2019). Neural substrates of early executive function development. Developmental Review, 52, 42–62. 10.1016/_j.dr.2019.100866 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Friedman NP, Miyake A, Young SE, DeFries JC, Corley RP, & Hewitt JK (2008). Individual differences in executive functions are almost entirely genetic in origin. Journal of Experimental Psychology: General, 137(2), 201–225. 10.1037/0096-3445.137.2.201 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Garon N, Bryson SE, & Smith IM (2008). Executive function in preschoolers: A review using an integrative framework. Psychological Bulletin, 134(1), 31–60. 10.1037/0033-2909.134.1.31 [DOI] [PubMed] [Google Scholar]
  21. Gresham FM, & Elliot SN (1990). Social Skills Rating System manual. American Guidance Services. [Google Scholar]
  22. Hedge C, Powell G, & Sumner P (2018). The reliability paradox: Why robust cognitive tasks do not produce reliable individual differences. Behavior Research Methods, 50(3), 1166–1186. 10.3758/sl3428-017-0935-l [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Helm AF, McCormick SA, Deater-Deckard K, Smith CL, Calkins SD, & Bell MA (2020). Parenting and Children’s Executive Function Stability Across the Transition to School. Infant and Child Development, 29(1), 1–19. 10.1002/icd.2171 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Howell RD, Breivik E, & Wilcox JB (2007). Is Formative Measurement Really Measurement? Reply to Bollen (2007) and Bagozzi (2007). Psychological Methods, 12(2), 238–245. 10.1037/1082-989X.12.2.238 [DOI] [PubMed] [Google Scholar]
  25. Jacob R, & Parkinson J (2015). The Potential for School-Based Interventions That Target Executive Function to Improve Academic Achievement: A Review. Review of Educational Research, 85(4), 512–552. 10.3102/0034654314561338 [DOI] [Google Scholar]
  26. Karr JE, Areshenkoff CN, Rast P, Hofer SM, Iverson GL, & Garcia-Barrera MA (2018). The unity and diversity of executive functions: A systematic review and re-analysis of latent variable studies. Psychological Bulletin, 144(11), 1147–1185. 10.1037/bul0000160 [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. MacCallum RC, & Browne MW (1993). The use of causal indicators in covariance structure models: Some practical issues. Psychological Bulletin, 114(3), 533–541. 10.1037/0033-2909.114.3.533 [DOI] [PubMed] [Google Scholar]
  28. Margetts K (2002). Transition to school — Complexity and diversity. European Early Childhood Education Research Journal, 10(2), 103–114. 10.1080/13502930285208981 [DOI] [Google Scholar]
  29. Miyake A, & Friedman NP (2012). The nature and organization of individual differences in executive functions: Four general conclusions. Current Directions in Psychological Science, 21(1), 8–14. 10.1177/0963721411429458.The [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Miyake A, Friedman NP, Emerson MJ, Witzki AH, Howerter A, & Wager TD (2000). The unity and diversity of executive functions and their contributions to complex “Frontal Lobe” tasks: a latent variable analysis. Cognitive Psychology, 41(1), 49–100. [DOI] [PubMed] [Google Scholar]
  31. Muthén LK, & Muthén BO (2017). Mplus User’s Guide (Eighth Edi). Muthén & Muthén. [Google Scholar]
  32. Nguyen T, Duncan RJ, & Bailey DH (2019). Theoretical and methodological implications of associations between executive function and mathematics in early childhood. Contemporary Educational Psychology, 58(April), 276–287. 10.1016/j.cedpsych.2019.04.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Pauli-Pott U, & Becker K (2011). Neuropsychological basic deficits in preschoolers at risk for ADHD: A meta-analysis. Clinical Psychology Review, 31(4), 626–637. 10.1016/j.cpr.2011.02.005 [DOI] [PubMed] [Google Scholar]
  34. Reinert KRS, Po’e EK, & Barkin SL (2013). The relationship between executive function and obesity in children and adolescents: a systematic literature review. Journal of Obesity, 2013(2), 820956. 10.1155/2013/820956 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Rhemtulla M, van Bork R, & Borsboom D (2019). Worse Than Measurement Error: Consequences of Inappropriate Latent Variable Measurement Models. Psychological Methods. 10.1037/met0000220 [DOI] [PubMed] [Google Scholar]
  36. Schoemaker K, Mulder H, Deković M, & Matthys W (2013). Executive Functions in Preschool Children with Externalizing Behavior Problems: A Meta-Analysis. Journal of Abnormal Child Psychology, 41(3), 457–471. 10.1007/sl0802-012-9684-x [DOI] [PubMed] [Google Scholar]
  37. Takacs ZK, & Kassai R (2019). The efficacy of different interventions to foster children’s executive function skills: A series of meta-analyses. Psychological Bulletin. 10.1037/bul0000195 [DOI] [PubMed] [Google Scholar]
  38. Vernon-Feagans L, & Cox M (2013). Poverty, Rurality, Parenting, and Risk: An introduction. Monographs Of The Society For Research In Child Development, 78(5), 1–23. 10.llll/j.1468-5922.2010.01872_2.x [DOI] [PubMed] [Google Scholar]
  39. Wiebe SA, Espy KA, & Charak D (2008). Using confirmatory factor analysis to understand executive control in preschool children: I. Latent structure. Developmental Psychology, 44(2), 575–587. 10.1037/0012-1649.44.2.575 [DOI] [PubMed] [Google Scholar]
  40. Willcutt EG, Doyle AE, Nigg JT, Faraone SV, & Pennington BF (2005). Validity of the Executive Function Theory of Attention- Deficit/Hyperactivity Disorder : A Meta-Analytic Review. Biological Psychiatry, 57, 1336–1346. 10.1016/j.biopsych.2005.02.006 [DOI] [PubMed] [Google Scholar]
  41. Willoughby MT, & Blair CB (2016). Measuring executive function in early childhood: A case for formative measurement. Psychological Assessment, 28(3), 319–330. 10.1037/pas0000152 [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Willoughby MT, Blair CB, Wirth RJ, & Greenberg M (2010). The measurement of executive function at age 3 years: psychometric properties and criterion validity of a new battery of tasks. Psychological Assessment, 22(2), 306–317. 10.1037/a0018708 [DOI] [PubMed] [Google Scholar]
  43. Willoughby MT, Blair CB, Wirth RJ, & Greenberg M (2012). The measurement of executive function at age 5: Psychometric properties and relationship to academic achievement. Psychological Assessment, 24(1), 226–239. 10.1037/a0025361 [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Willoughby MT, Holochwost SJ, Blanton ZE, & Blair CB (2014). Executive Functions: Formative Versus Reflective Measurement. Measurement: Interdisciplinary Research and Perspectives, 12(3), 69–95. 10.1080/15366367.2014.929453 [DOI] [Google Scholar]
  45. Willoughby MT, Wirth RJ, & Blair CB (2011). Contributions of modern measurement theory to measuring executive function in early childhood: An empirical demonstration. Journal of Experimental Child Psychology, 108(3), 414–435. 10.1016/jjecp.2010.04.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Willoughby MT, Wirth RJ, & Blair CB (2012). Executive function in early childhood: Longitudinal measurement invariance and developmental change. Psychological Assessment, 24(2), 418–431. 10.1037/a0025779 [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Zelazo PD (2006). The Dimensional Change Card Sort (DCCS): A method of assessing executive function in children. Nature Protocols, 1, 297–301. 10.1038/nprot.2006.46 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material

RESOURCES