Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2024 Oct 12;14:23880. doi: 10.1038/s41598-024-74488-0

The validity of studying healthy aging with cognitive tests measuring different constructs

Oula Hatahet 1, Mohamed L Seghier 1,2,
PMCID: PMC11470937  PMID: 39396067

Abstract

A clinically useful characterization of the cognitive aging process requires the development of valid and robust behavioral tests, with an emphasis on explaining and understanding typical inter-individual variability in cognition. Here, using a dataset that includes behavioral scores collected with the National Institute of Health Toolbox Cognition Battery (NIHTB-CB) and other auxiliary tests, we examined (1) the differences between young and old adults across different cognitive domains, (2) the strength of across-subject correlations in behavioral test scores, (3) the consistency of low-dimensional behavioral representations across age using factor analysis, and (4) the accuracy of behavioral scores in predicting participants’ age. Our results revealed that (1) elderly females had better verbal episodic memory scores than elderly males, (2) across-subject correlations between behavioral tests varied with age group, (3) although a three-factor model explained the behavioral data in both age groups, some tasks loaded to different factors between the two groups, and (4) age-performance relationship (i.e. a regression model linking age to cognitive scores) in one group cannot be extrapolated to predict age in the other group, indicating an inconsistency in age-performance relationships across groups. These findings suggest that executive function tests might tap into different cognitive processes in different age groups, which might ultimately suggest that a statistically significant between-group difference in test performance might not always reflect differences in the same underlying cognitive processes. Overall, this study calls for more caution when interpreting age-related differences and similarities between age groups with different cognitive abilities even when the same tests are used.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-024-74488-0.

Keywords: Healthy aging, Cognitive assessment, Inter-individual variability, Factor analysis, Executive function, Memory

Subject terms: Cognitive ageing, Cognitive neuroscience

Introduction

As we age, gradual and natural changes can be observed and quantified across diverse cognitive and affective dimensions. Perhaps most importantly, the rate of such changes varies considerably across people, suggesting different aging trajectories caused by either healthy (normal) aging or pathological (abnormal) aging processes that are mediated by a complex interplay between several mechanisms1. Accurate characterization of aging trajectories along the cognitive dimension relies on the accurate assessment of decline in cognitive abilities, including measures of working memory, processing speed, and executive function2, where age-related changes in processing speed are believed to account for changes in several cognitive domains3. Such gradual cognitive decline might occur while most language skills remain intact and semantic knowledge can even improve with age4. To offer an accurate characterization of pathological aging at the individual level, it is important to define what is ‘normal’ aging, that is the range of performance over cognitive tests that could be considered normal or typical. In other words, it is important to fully characterize what is normal aging to enable clinically useful diagnostic decisions about abnormal aging.

One potential cause of cognitive aging is neural dedifferentiation5,6, which reflects brain responses becoming less precise and selective as people get older. Neural dedifferentiation has been negatively associated with cognitive performance specifically on tasks that require fluid processing and memory in healthy aging6. It has also been reported as an age-related increase in the correlation between different cognitive abilities710. Thus, understanding the role of neural dedifferentiation in cognitive performance is essential to explaining aging trajectories, but dedifferentiation might reflect both age-dependent and age-independent factors (see discussion in6). Therefore, it is of paramount importance to study cognitive aging with tests that are sensitive to age-dependent factors. Here, we show how some widely used cognitive tests can tap into different processes depending on age.

Healthy aging versus pathological aging is commonly reported as significant differences in performance in well-documented cognitive tests, many of them already available in existing validated neuropsychological batteries such as the National Institute of Health Toolbox Cognitive Battery (NIHTB-CB). Such cognitive tests are not independent, i.e., they are strongly correlated and can thus be assigned to a common underlying domain or construct (e.g., memory, executive function, language, . etc.). Indeed, prior work has shown that cognitive scores in different tests can be reduced to a few dimensions (components or latent variables), meaning that such tests are highly dependent across individuals11,12. Consequently, individual scores across many cognitive tests can be reduced to a behavioral representation (profile) with few dimensions. Such low-dimensional behavioral representation can explain the common variance in the observed measures across subjects, thereby offering a lower space into which individual cognitive scores can be projected. One question of interest concerns the consistency of such low-dimensional behavioral spaces across age. Are the identified components similar or different across the lifespan? Do tests measure the same abilities in groups with different demographics? Are behavioral scores, with performance measured as accuracy or reaction time, sensitive enough to subtle age-induced variations? Do age-related differences in behavioral scores reflect changes in domain-specific or domain-general processes13?

These questions are of paramount importance to the study of aging, and for the development of reliable neuropsychological assessment tools. To study the reliability of the measured constructs by a battery across age, previous studies examined measurement invariance that reflects the equivalence of a given construct across age groups. This is because measurement invariance underpins valid interpretation of group differences in cognitive scores14. Measurement invariance is typically assessed at four nested levels: (1) configural invariance estimates the consistency of the latent cognitive structure, (2) metric invariance measures the similarity of the strength of the relationship between the observed behavioral scores and the assumed cognitive domain (e.g. factor loadings), (3) scalar invariance tests whether the starting points (intercepts) are equivalent for the tested groups, and (4) at the most detailed level, strict invariance examines whether the unexplained variance is consistent across groups. For instance, Kiselica et al. (2020) examined the invariance of a hypothesized factor model (attention, visual, processing speed/executive, memory, and language factors) for the Uniform Dataset (UDS) neuropsychological battery using confirmatory factor analysis across sex, age and education on a sample aged 68.33 ± 10.26 years15. They showed that the factor model achieved at minimum a metric invariance over a 1-year follow-up assessment and across different demographic groups. In a 4-year longitudinal study with a diverse sample of participants aged over 54 years old, a 2-factor model of memory and executive function showed strict invariance of the neuropsychological battery used in the Brazilian Longitudinal Study of Adult Health (ELSA-Brasil)16. Similarly, another study reported metric invariance on healthy older adults aged 65.7 ± 8.2 years over an average period of two years17. Using the NIHTB-CB tests, a 5-factor model (vocabulary, reading, episodic memory, working memory and executive function/processing speed) and a second-order two-factor model (fluid vs. crystallized skills) were found to consistently explain the measures recorded from different age groups (aged 20–60 years and 60–85 years)18.

In contrast, other studies showed differences across groups when behavioral data were reduced to a low-dimensional space. For instance, by examining the latent structure of executive function across 3 age groups19, it has been shown that a three-factor model of shifting, inhibition, and updating was a better fit for behavioral data in young adults but not in the older groups. Instead, the data of older groups were better explained by a two-factor model of updating/inhibition and shifting19. Likewise, a recent study evaluated the factor structure of the NIHTB-CB tests in a sample of 411 cognitively healthy and impaired participants aged between 45 and 94 years old20. A 2-factor model (fluid vs. crystallized skills) was a good fit for participants aged 65 and above while younger participants’ performance was better explained using a three-factor model of language, executive functions, and memory20. These studies suggested that measurement invariance was not met over the tested groups, raising concerns about the consistency of the measured constructs by these tests as identified by the factorial representations across age. This begs the question of the validity of the comparisons in cognitive scores between young and old adults, considering that the original tests used to measure these cognitive scores might presumably relate to different behavioral dimensions or factors across age groups.

In this context, we examined here the assumption of measurement invariance for some well-established behavioral tests administered to both healthy young and older adults. Specifically, we examined quantitative differences between young and older subjects using validated tests from the NIHTB-CB battery, the Mini-Mental State Examination (MMSE) test, in addition to validated supplementary tests. We also highlighted differences in cognitive scores that might be influenced by sex or education. Correlations, in the across-subject dimension, between the different tasks were assessed for each group, with an emphasis on their variability between and within each domain (e.g., executive function, memory). Lastly, with factor analyses, we investigated the qualitative equivalency of the constructs measured by validated tests between young and old groups. Our analysis revealed that not all tasks were functionally (psychometrically) equivalent in both age groups.

Methods

Data description

This study used the cognitive measures from the neurocognitive aging dataset21,22. The dataset was collected from 163 younger adults, aged 18–34 years old (males: 70, females: 93), and 120 older adults, aged 60–89 years (males: 54, females: 66) in two different recording sites. All participants had a minimum of 12 years of schooling. The administered battery included tasks from NIHTB-CB23: the Rey Auditory Verbal Learning, Picture Sequence Memory, Flanker Inhibitory Control and Attention, Dimensional Change Card Sort (DCCS), List Sorting Working Memory (LSWM), Picture Vocabulary, and Oral Reading Recognition tests. In addition, validated auxiliary tasks were also included: The Verbal Paired Associates from the Wechsler Memory Scale-IV24, the Associative Recall Paradigm25, Shipley-2 Vocabulary26, Trail Making Test B-A (TMT)27, the Reading Span Task28, and the Symbol Digit Modalities Test (SDMT)29. The Mini-Mental State Examination (MMSE) test was also administered to all participants. These tasks were chosen to target the episodic memory, semantic memory, executive functions and processing speed domains21; a detailed description of the tasks can be found in Table S1 of the Online Resource. Demographic data including age, gender, and education level were also collected.

Data preparation

Raw performance measures in each task were obtained21, where each task was evaluated using either accuracy, reaction time, or a composite score of accuracy and reaction time as instructed by the tests’ manuals (Table S1). For the purposes of this study, the following preprocessing steps were carried out: Trail Making B-A score was given as the time difference between the two parts of the test, therefore we reversed the sign of this score such that, on every test, a higher score reflected a better performance14. Second, missing values were imputed using Multivariate Imputation by Chained Equations (MICE) for each age group separately. In the current version of the dataset, less than 5% of each variable was missing with a total of 16.56% from the young group and 20.83% from the older group. Raw scores were transformed to Z-scores for all measures and used for all the following steps except when comparing the task performance with 2-sample t-tests between the two age groups. Lastly, outliers were detected and removed using the boxplot method with a threshold of 3 (resulting in 17 outliers from the young and 8 outliers from the elderly). All variables followed a normal distribution as indicated by the kurtosis and skewness estimates. However multivariate abnormality was observed in the young adults group.

Data analysis

We implemented three analyses (noted Analysis A, B and C) to address different but complementary questions.

Analysis A involved the assessment of between-group comparisons of scores for each task and the assessment of across-subject correlations between tasks. Specifically, multiple statistical t-tests were used to assess performance differences between the two age groups and between genders at a Bonferroni-corrected significance level of p < 0.05. Pairwise Pearson r correlation, in the across-subject dimension, was calculated to identify relationships between tasks as well as correlations between tasks and other variables (age, years of education) for each age group separately. Our rationale here is that tasks that covary across participants (i.e. their scores going down and up across participants) can be considered as part of the same domain (positive correlation) or opposite domains (negative correlation). This analysis yielded a 17 × 17 correlation matrix for each group. The statistical difference between the correlation matrices was assessed using Fisher’s method at a Bonferroni-corrected significance level of p < 0.05. Last but not least, we also tested whether all participants, regardless of their age, could be clustered into two groups on the basis of their behavioral scores only. For that aim, K-means clustering was used to reclassify participants into groups based on their cognitive performance rather than their age. Specifically, the z-scores of the cognitive measures from all participants were submitted to the K-means clustering method with a pre-specified number of clusters of two. The clustering provided a means to examine whether the behavioral scores pattern (range, variance, correlation) is specific to each age group; i.e. if the obtained data-driven groups with K-means match the original young and older groups. A good matching between the data-driven K-means groups and the original age groups would provide strong evidence that variation in behavioral scores does certainly reflect age-related effects.

Analysis B identified the low-dimensional behavioral representations (i.e. constructs) of the included cognitive measures using factor analyses. Confirmatory factor analysis was used to fit a baseline model in each group using robust maximum likelihood estimator to account for the violation of the multivariate normality assumption30. The cognitive ability model was based on the Cattel-Horn-Carroll (CHC) theoretical model as well as on prior work on the factor structure of the NIHTB-CB battery18,20,31,32. The hypothesized model contained three factors: (1) long-term storage and retrieval (Glr) (Verbal Paired Associates: Immediate Recall, Delayed Recall, Delayed Free Recall; Associative Recall, NIH Cognition Rey Auditory; NIH Cognition Picture Sequence Memory), (2) crystallized intelligence (Gc) (Shipley Vocabulary; NIH Cognition Picture Vocabulary; NIH Cognition Oral Reading Recognition), (3) visual processing (Gv)/processing speed (Gs) (Trail Making Task: B-A; NIH Cognition Flanker Inhibitory Control and Attention; NIH Cognition Dimensional Change Card Sort; NIH Cognition List Sort Working Memory). This model was developed while following the rule of thumb that at least three indicators are required per factor to identify a factor model properly33. Processing speed could not be identified in a separate factor, which is a common problem when an insufficient number of tests is administered to support an independent factor34,35. We relied on the Comparative Fit Index (CFI), Tucker-Lewis Index (TLI), Root Mean Square Error of Approximation (RMSEA), and Standardized Root Mean Square Residual (SRMR) to judge the fit of the model. CFI and TLI values larger than 0.9 and RMSEA and SRMR values smaller than 0.07 would indicate an adequate fit36,37. Next, we tested for measurement invariance using the Multi-Group Confirmatory Factor Analysis (MGCFA), including configural, metric, scalar, and strict models. These models were compared statistically using the difference in the chi-square-statistics38.

Since full measurement invariance could not be established across groups (see Results Section below), a follow-up Exploratory Factor Analysis (EFA) was conducted to model the underlying structure of the cognitive measures without imposing strict theoretical constraints for each age group separately39,40. The factorability of the data was evaluated using the Kaiser-Meyer-Olkin (KMO) test and Bartlett’s sphericity test41. KMO represents the percentage of variance among variables that could be considered common variance, with higher KMO values indicating the suitability of the data for factor analysis42. Bartlett’s sphericity test determines if a correlation matrix differs considerably from the identity matrix43, with significant results indicating the suitability of the variables in the correlation matrix for factor analysis. Several factor analysis extraction methods can be used for EFA, such as Maximum Likelihood (ML) and Principal Axis Factoring (PAF). Here, we used PAF as it is recommended for relatively small samples4446, with oblique rotation (ProMax) for easier interpretation of the results. To determine the optimal number of factors to be considered, a technique called Parallel Analysis (PA) was implemented44. It compares the results of factor analyses conducted on the original data to randomly generated synthetic data with the same dimensions as the original dataset, over multiple iterations (here we performed 100 iterations). The optimal number of factors is identified at the point where the eigenvalues from the synthetic data exceed those from the original data. Results from PA were also compared to the scree plot, and the smallest to the largest number of factors from these two methods were tested. The best model was selected based on interpretability, where each factor should have a minimum of 3 variables loading saliently on that factor20,47. Furthermore, a hierarchical clustering analysis was performed to verify the EFA results in each group with correlation as a distance metric. The final step involved comparing the resulting factor model across the two age groups. All factor analyses were performed in RStudio (v1.1.456) using Lavaan package.

In Analysis C, regression models were built to assess how age can be predicted from behavioral scores. First, an Ordinary Least Square (OLS) regression with lasso regularization and “leave one out” cross-validation was used to predict the age of participants based on their performance on all tasks (age as the dependent variable and behavioral scores as the independent variables). Second, we tested whether the relationships between the cognitive measures and age observed in one age group hold true to another age group. This was implemented in two stages: (1) an OLS regression model was fitted using the behavioral scores as independent variables and age as the dependent variable from one age group (e.g. the young adults group), and (2) that estimated regression model was then used to predict the age of the other group (e.g. the older adults group). Given the recent interest in brain age48,49, and assuming that the behavioral tests reflect brain function, this analysis helped (1) assess how well age is reflected in the cognitive scores measured with the used neuropsychological assessment tool, (2) test if scores from old adults represent a natural progression (extrapolation) of scores from the young group. The latter would essentially verify whether the age-behavior relationship is relatively consistent across different age groups.

Results

Analysis A

All task scores were significantly different between the age groups at the corrected p < 0.05 level. As expected, young adults had higher scores on the MMSE, episodic memory and executive function measures, whereas older adults had significantly better performance on the semantic knowledge tasks. This pattern of differences was also reported by the original authors and held true after controlling for gender, education and recording site21. However, when we examined gender effects within each age group, we found that in the older adult group, female participants had higher scores on the verbal episodic memory tasks (Immediate Recall (p < 0.01), Delayed Recall (p = 0.02), Delayed Free Recall (p = 0.05) and Associative Recall (p = 0.01)) compared to male participants.

Figure 1 shows the correlations between the tasks, including age and years of education, for each age group. Two semantic memory tasks were moderately correlated with education, while the symbol digit modality test and the dimensional change card sorting were moderately negatively correlated with age in the older adult group only. Likewise, using Fisher’s method, stronger correlations were also observed between the list sorting task and the semantic memory tasks in the young adults compared to the older participants (Fig. 2).

Figure 1.

Figure 1

Spearman correlation matrix of the neurocognitive measures along with age and education for each age groups. vpa_imm_tot: Verbal Paired Associates Immediate Recall, vpa_delay: Verbal Paired Associates Delayed Recall, vpa_fr_delay: Verbal Paired Associates Delayed Free Recall, associative_recall: Associative Recall, nihcog_rey: NIH Cognition Rey Auditory Verbal Learning: Immediate Recall, nihcog_psm: NIH Cognition Picture Sequence Memory, shipley_vocab: Shipley Vocabulary, nihcog_orr: NIH Cognition Oral Reading Recognition, nihcog_pva: NIH Cognition Picture Vocabulary, nihcog_lswm: NIH Cognition List Sorting Working Memory, nihcog_dccs: NIH Cognition Dimensional Change Card Sort, nihcog_flanker: NIH Cognition Flanker Inhibitory Control and Attention, trails_b.a: Trail Making Task: B-A, sdmt_oral: Symbol Digits Modality Test.

Figure 2.

Figure 2

Variables with significant correlation differences (p < 0.05 corrected) between the two age groups. vpa_imm_tot: Verbal Paired Associates Immediate Recall, vpa_delay: Verbal Paired Associates Delayed Recall, vpa_fr_delay: Verbal Paired Associates Delayed Free Recall, associative_recall: Associative Recall, shipley_vocab: Shipley Vocabulary, nihcog_pva: NIH Cognition Picture Vocabulary, nihcog_lswm: NIH Cognition List Sort Working Memory.

Interestingly, the two groups resulting from the data-driven K-means clustering were significantly different in age, with a 90% match to the original age groups. Specifically, Group 1 had 13 and 133 young and old adults respectively, whereas Group 2 had 100 and 10 young and old adults respectively.

Analysis B

The proposed cognitive ability model, based on the Cattel-Horn-Carroll (CHC) theoretical model, was tested with CFA across the two groups. CFA results indicated poor fit in both young adults (Inline graphic), and older adults (Inline graphic). Modification indices were examined, and two error terms were allowed to correlate for both groups (nihcog dccs-nihcog flanker, vpa_imm_tot-vpa_delay); these scores originate from the same domain, and each pair share the same scoring method. This resulted in the following fit indices: young adults: (Inline graphic), and older adults: (Inline graphic), suggesting that the model shows acceptable fit across groups according to the predefined goodness-of-fit criteria.

Next, we tested the model with the modification indices using MGCFA. Configural invariance models demonstrated a good fit to the data. The test of the next higher level of invariance, metric invariance, requires that the factor loadings between items and constructs are invariant across groups. This model was rejected according to the chi-square difference test. Partial invariance was tested at this level by relaxing constraints on variable loadings that show the largest modification index, one variable at a time. First, the loadings of the LSWM from the NIHTB-CB were freely estimated; however, the chi-square test still indicated a lack of metric invariance. After re-examining the modification indices, verbal paired associate delayed recall test loadings were freely estimated across the two groups. This resulted in an improved fit with p = 0.076. Strong invariance was established after constraining the intercepts of the model from the previous step to be equal in both groups. Lastly, partial strict invariance was shown at p = 0.054; see Table 1 for the results of the chi-square difference test.

Table 1.

Fit indices for MGCFA models and difference tests between the two age groups. nihcog_lswm: NIH Cognition List Sorting Working Memory, vpa_fr_delay: Verbal Paired associates delayed Free Recall. * indicates significant difference (chi-square difference tests between the original model without assuming measurement invariance and the comparison model).

df AIC BIC Chisq Pr(> chisq)
1 Configural 144 8958.459 9292.437 210.792
2 Metric 155 8971.5 9266.4 246.567 0.000*
2a Partial metric (except nihcog_lswm) 154 8961.276 9259.724 232.182 0.021*
2b Partial metric (except nihcog_lswm, vpa_fr_delay) 153 8957.725 9259.727 226.586 0.076
3 Partial strong 164 8938.254 9201.173 229.243 0.995
4 Partial strict 178 8937.019 9150.197 253.112 0.054

Next, EFA was conducted to model the underlying structure of the cognitive measures for each group separately. Bartlett and KMO statistics for both groups’ correlation matrices fell within the acceptable ranges for factor analysis, young group (KMO = 0.72, Barlett’s test < 0.001), old adults (KMO = 0.73, Barlett’s test < 0.001). PA resulted in a 3-factor solution for both groups, while the scree plot supported a 4-factor model. Both were evaluated in each group. The 4-factor models were not clearly interpretable because of the cross-loading for some of the variables (Table S2 of the Online Resource). Hence, the 3-factor models were selected for both age groups. The observed variables’ loadings on the resulting factors are shown in Table 2 along with the eigenvalues, cumulative explained variance, and internal consistency. The factors in both groups could be interpreted as semantic memory, episodic memory, and executive function factors. However, there were differences in the executive function loadings between the two groups. In the young adults’ group, the Symbol Digit Modality Task was associated with the episodic memory tasks, and the List Sorting Working Memory and Trails Making B-A tasks were associated with the semantic memory tasks, thus resulting in a factor with two loadings only. On the other hand, in the older adults’ group, both tasks were associated with the executive functions factor. Also, the Trails Making B-A test had low loading scores in both groups. The resulting clusters from the hierarchical clustering analysis confirmed the EFA results (Figure S1 of the Online Resource).

Table 2.

3-factor latent cognitive structure of the two age groups as indicated by exploratory factor analysis. Columns 2 and 3 for the older adults’ group were swapped for easier comparison between the groups. vpa_imm_tot: Verbal Paired associates Immediate Recall, vpa_delay: Verbal Paired associates delayed Recall, vpa_fr_delay: Verbal Paired associates delayed Free Recall, associative_recall: Associative Recall, nihcog_rey: NIH Cognition Rey Auditory Verbal Learning: Immediate Recall, nihcog_psm: NIH Cognition Picture Sequence Memory, shipley_vocab: Shipley Vocabulary, nihcog_orr: NIH Cognition Oral Reading Recognition, nihcog_pva: NIH Cognition Picture Vocabulary, nihcog_lswm: NIH Cognition List Sorting Working Memory, nihcog_dccs: NIH Cognition Dimensional Change Card Sort, nihcog_flanker: NIH Cognition Flanker Inhibitory Control and attention, trails_b.A: trail making Task: B-A, sdmt_oral: symbol digits modality test.

Young Adults Group Older Adults Group
Factor1 Factor2 Factor3 Factor1 Factor2 Factor3
vpa_imm_tot 0.79 -0.102 -0.107 0.998 0.006 -0.239
vpa_delay 0.741 -0.191 -0.04 0.935 -0.019 -0.206
vpa_fr_delay 0.446 -0.041 -0.079 0.771 0.056 -0.050
associative_recall 0.551 0.228 0 0.759 0.075 0.030
nihcog_rey 0.599 -0.035 0.096 0.505 -0.054 0.267
nihcog_psm 0.515 0.034 -0.009 0.362 -0.064 0.252
shipley_vocab -0.099 0.837 -0.034 0.068 0.867 -0.091
nihcog_orr -0.096 0.562 0.067 -0.018 0.622 0.178
nihcog_pva -0.103 0.863 -0.093 0.001 0.731 -0.075
nihcog_lswm 0.152 0.498 0.088 0.084 0.032 0.307
nihcog_dccs -0.008 -0.114 0.722 -0.090 -0.037 0.690
nihcog_flanker -0.091 -0.006 0.768 -0.155 -0.001 0.685
trails_b_a 0.015 0.239 -0.068 -0.023 0.049 0.296
sdmt_oral 0.363 0.181 0.185 0.008 -0.069 0.589
Eigenvalues 2.494 2.214 1.201 3.471 1.698 1.758
Proportional variance 0.178 0.158 0.086 0.248 0.121 0.126
Cumulative variances 0.178 0.336 0.422 0.248 0.369 0.495
Cronbach alpha 0.77 0.73 0.7 0.86 0.78 0.62

Analysis C

The leave-one-out cross-validation analysis revealed that the cognitive measures incorporated in the study accounted for 39.6% of the variance of older adults’ age, whereas for the group of young adults, it accounted for a comparatively lower value of 18.5%. This means that age was better predicted by the behavioral scores in the older group than in the younger group. The results from the mixed regression models are shown in Fig. 3 (i.e., predicting participant’s age in one age group from the regression model of the other age group). The performance of these models was evaluated using the absolute mean error, and both models had relatively very low predictive power with an error higher than 45 years. This very low predictive power cannot be due to differences in intercepts only, as there was no correlation between true and predicted age (see regression lines of the mixed models in Fig. 3). This implies that the relationship between behavioral scores and age in the young group cannot be simply extrapolated to predict age in the older group, suggesting that age-behavior relationships are age-group-dependent.

Figure 3.

Figure 3

Different OLS regression models to predict participants’ age using their cognitive measures A) model fitted on the young adults data to predict their age. B) model fitted on the elderly data to predict the young group age. C) model fitted on the young data to predict the elderly group age, D) model fitted on the elderly data to predict their age.

Discussion

Fundamentally, a between-group comparison in behavioral scores assumes that the tests used to collect these scores measure the same concepts in both groups. There is a rich literature in psychometry about construct validity that deals with measurement validity by ensuring that multiple observable or measurable indicators (e.g. test scores) are well reflecting the constructs of interest (e.g. cognitive abilities). Here, we showed that, despite using the same tests in both age groups, some executive functions tests did not load to the same factors in both groups. Our study calls for more caution when comparing scores between different age groups even when using well-studied tests such as the NIHTB-CB battery.

Overall, the comparisons in task performance between the two age groups were consistent with previous work2,50, including a decline in executive functions presumably due to a significant decrease in brain connectivity51,52, that is also mediating53,54 a decline in episodic memory5557. Our results also confirmed previous reports on healthy aging about older people showing relative preservation or even improvement in semantic memory, in particular in tasks that rely on vocabulary storage56,58,59. Although sex differences in cognitive performance were not consistent across studies60,61, our results revealed better performance in females than males during verbal episodic memory tasks in the older group. This has important clinical implications, as verbal memory scores are typically used in the diagnosis of amnestic mild cognitive impairment (aMCI) and Alzheimer’s disease (AD) dementia often without sex adjustment. This might relate to a different underlying neuropathology in the transition from aMCI to AD dementia in women, thus introducing a sex bias in the detection of MCI62,63. This sex effect in the older adult group might be due to differences in comorbidities along the aging process as women tend to have a lower prevalence of focal lesions (microbleeds, lacunes, and cortical infarcts) than men64.

The difference in the associations (i.e. correlations) between the cognitive scores of the two age groups (Fig. 2) might reflect age-related changes in brain networks7,65. Specifically, there was an increase in the shared variance among verbal episodic memory tasks with increasing age, which is a novel observation not reported in previous studies. On the other hand, the association between list sorting working memory task and the two semantic memory tasks (Shipley vocabulary and picture vocabulary) did not persist with advancing age. A recent study supports this observation where the capacity of visual working memory was found to be dependent on semantic knowledge66. In the same way, the regression results further highlight the complex interplay between age and cognition. Cognitive scores in late adulthood were more sensitive to chronological age than in young adulthood, suggesting that age-related differences in cognition are more pronounced in later life. These results are in line with previous research showing that cognition is not a unitary model and that cognitive abilities have different developmental trajectories that can vary significantly across the lifespan67,68.

Both Analyses B and C provided evidence that the constructs or domains measured by well-validated cognitive tests differed across age groups. The mixed regression models (Analysis C) suggested that brain-behavior relationships might not be the same for both age groups. This can explain why the model used to predict age from behavioral scores in one age group failed to accurately predict age in the other age group. Likewise, our factor analysis (Analysis B) offered important insights into the main constructs measured by the NIHTB-CB battery and other auxiliary tests across the lifespan. In contrast to previous reports on similar tasks (e.g18), full measurement invariance of the hypothesized CHC model could not be established due to the multifaceted nature of the targeted executive functions processes and the limited number of tests included in return. Overall, The NIHTB-CB fluid tests have been found recently to have poor to adequate construct validity, with low correlations between the executive functions tests in particular69,70. Although partial strict invariance was achieved here, other studies argued that partial invariance is insufficient to assure meaningful comparisons, given that the fit relied on data-driven modification indices with a high risk of overfitting the sample data71. Indeed, simulation studies have suggested that modification indices might limit the generalizability of the resulting model7274. Likewise, Steinmetz showed that, while partial invariance allows meaningful comparisons of latent means, it may be unreliable for comparing observed scores75. We reckon that interpreting invariance for group comparison is not always straightforward (e.g7678). Furthermore, our EFA results revealed inconsistent loadings and latent cognitive models across the two age groups. More specifically, processing speed, executive functions and working memory scores were scattered across the 3 factors in the young adults’ group, whereas in the older group, they were found to aggregate into a separate factor from episodic and semantic memory tests. Notably, these three domains are known to account for age-related cognitive decline79. This shift in task loadings could be explained by the fact that memory impairment may manifest at a later age or at a different rate compared to executive functions, consequently, these two cognitive constructs may become more distinct with advancing age80. Indeed, recent work has shown that performance in executive functions is better explained by a 2-factor model in older adults compared to a 3-factor model in young adults19,81. Moreover, it was suggested that as people get older, they rely more on general executive control processes due to the reduction and allocation of cognitive resources as opposed to younger adults19. Recent studies suggested that typical in executive functions tests may not accurately measure individual differences in executive functions, but rather reflect differences in information processing speed82. Other findings have shown that measures of executive functions do not reveal a distinct latent variable unless they are combined with tasks that assess other cognitive processes, such as processing speed or working memory capacity83. We note that existing literature reckons the difficulty in estimating and interpreting the factor structure of executive functions. Overall, existing literature suggests that measures of executive functions might be either unreliable or invalid in measuring individual differences in executive functions82,84.

These findings from the three analyses have implications on how to understand differences in cognitive scores between young and older adults. For example, the List Sorting Working Memory task was found to have variant intercepts and loadings between the two age groups, it was also associated with different factors in both age groups. Therefore, a significant difference in performance in that task between young and older adults might presumably reflect different constructs being measured in both groups. It is interesting to note that prior work has shown that the List Sorting Working Memory test can load to different factors, for instance loading similarly to other working memory tasks18 despite significant correlations with executive functions tasks85. We reckon that the NIHTB-CB battery has been tested and validated in different groups86, and there is already evidence in current literature about its clinical usefulness for the assessment of cognitive abilities in diverse populations8792. However, in line with previous studies, our findings pointed to the complexity of identifying and measuring individual differences in executive functions82,84, and to the validity of the constructs measured by the NIHTB-CB20,70. Fundamentally, any comparison between young and old adults should reflect the same construct or domain to ensure valid and interpretable differences. There is no clear interpretation of any mean differences if there is no common structure93.

Our findings can be explained by several aspects. For instance, our analysis did not explicitly account for the influence of different subcomponents of a given construct that may vary across age groups94,95, which could translate into the same factor being dominated by different subcomponents. Furthermore, the older group might show more pronounced heterogeneity than the younger group, due for instance to significant inter-variability differences in decline rates in abilities96. Although our K-means clustering showed a matched grouping to the original age groups, we cannot rule out that subgroups might exist within the older group. This possibility raises some interesting questions (hypotheses) that warrant future research. For example, if the older group can be subdivided into homogenous subgroups, one can hypothesize that a subgroup with the same identified low-dimensional representation (i.e. the same factors) as the young group would most likely follow a healthy aging process, whereas a subgroup with a different low-dimensional representation than the young group might indicate a high risk of pathological aging for the individuals that belong to that subgroup.

There are some conceptual and methodological issues that need to be acknowledged. First, the cross-sectional design of this study limits inferences about any causal associations between age differences and cognitive ability, in addition to poor interpretability and validity of individual differences due to confounding effects from within-subject associations97. Second, executive functions involves a heterogeneous set of processes that entertain complex relationships with processing speed and working memory98,99. This might complicate the interpretation of factor analysis results, making executive functions tests not well suited for factor analyses (see discussion in7). Third, the dataset14 used here has missing values that we imputed for each age group separately. Fourth, the performance indicators used could also account for the non-invariance between the models of the two age groups, the common time component in the Flanker, DCCS and TMT scores (see scoring criteria in Table S1) may have contributed to the age differences in the latent factor structure, as reaction speed differs significantly between the two age groups; e.g. a similar pattern has been reported previously20. Fifth, the similarity between tests was computed as a correlation across subjects. While this covariance-based approach is suitable to better capture variability across subjects (e.g100), it is also notoriously sensitive to outliers101, therefore we down-weighted their influence before computing the correlations between tests. Sixth, measurement invariance is necessary for uncomplicated interpretation of validity correlations and differences in test scores across groups102. However, observed correlations (cf. Figure 2) may also vary across age groups because of (1) real inherent differences in correlations between the same invariant constructs, or (2) existing differences in the reliability of the test scores. Put another way, observing different patterns of correlations between the same set of cognitive scores in different age groups is possible even under measurement invariance, suggesting that variability in correlations does not necessarily reflect differences in cognitive constructs. Last but not least, the generalizability of the 3-factor structure, identified here in a data-driven way with exploratory factor analysis, needs future investigations. Our main assumptions during the identification of the 3-factor structure were (1) test scores based on reaction times and accuracy represented accurate measurements of task performance in both age groups, (2) correlated tests represented a common factor, (3) weakly correlated tests were representing distinct factors, and (4) each identified factor was equated with one broad construct. Discrepancies between studies in the exact underlying factorial structure for the NIHTB-CB battery might be explained by possible differences in how those assumptions were satisfied.

In conclusion, our study shows that age affects the factorial structure of cognitive assessment in neurotypicals. It raises questions about the validity of significant differences between age groups when the used tests are measuring different constructs. It also highlights the challenge to comprehensively measure age-dependent changes in cognitive abilities with a set of tests that span multiple cognitive dimensions. For instance, some tasks, such as the List Sorting Working Memory, can behave differently across age groups, presumably because their successful execution might put variable demands on different processes in both age groups. In this context, it is important that users of psychometrically validated cognitive assessment batteries are aware of the tests that are not factorable as well as the impact of demographics on task performance (here age and gender were shown to have an impact on some tests). Overall, existing test batteries have been extremely useful in the clinical setting for both diagnostic and prognostic purposes. Future work needs to investigate how measured constructs (factors) by these test batteries would generalize across populations with diverse demographics, spoken languages, affective skills, socioeconomics and co-morbidities.

Supplementary Information

Supplementary Material 1 (2.6MB, docx)

Author contributions

O.H. was involved in the data analysis, figures preparation, interpretation of the results, and manuscript writing. M.L.S was involved in the conception of the study, interpretation of the results, manuscript writing and funding.

Funding

This work was funded by Khalifa University [grant numbers FSU-2022-006 and RC2-2018-022].

Data availability

The Neurocognitive aging data used in this study is available to the public within the Open Science Framework project “Goal-Directed Cognition in Older and Younger Adults”: https://doi.org/10.17605/OSF.IO/YHZXE.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Stern, Y., Barnes, C. A., Grady, C., Jones, R. N. & Raz, N. Brain reserve, cognitive reserve, compensation, and maintenance: operationalization, validity, and mechanisms of cognitive resilience. Neurobiol Aging83, 124–129. 10.1016/j.neurobiolaging.2019.03.022 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Murman, D. L. The Impact of Age on Cognition. Semin Hear.36(3), 111. 10.1055/S-0035-1555115 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Salthouse, T. A., Atkinson, T. M. & Berish, D. E. Executive functioning as a potential mediator of age-related cognitive decline in normal adults. J Exp Psychol Gen132(4), 566–594. 10.1037/0096-3445.132.4.566 (2003). [DOI] [PubMed] [Google Scholar]
  • 4.Haitas, N., Amiri, M., Wilson, M., Joanette, Y. & Steffener, J. Age-preserved semantic memory and the CRUNCH effect manifested as differential semantic control networks: An fMRI study. PLoS One16(6), e0249948. 10.1371/JOURNAL.PONE.0249948 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Logan, J. M., Sanders, A. L., Snyder, A. Z., Morris, J. C. & Buckner, R. L. Under-recruitment and nonselective recruitment: dissociable neural mechanisms associated with aging. Neuron33(5), 827–840. 10.1016/S0896-6273(02)00612-8 (2002). [DOI] [PubMed] [Google Scholar]
  • 6.Koen, J. D., Srokova, S. & Rugg, M. D. Age-related neural dedifferentiation and cognition. Curr. Opin. Behav. Sci.32, 7–14. 10.1016/j.cobeha.2020.01.006 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Bock, O., Haeger, M. & Voelcker-Rehage, C. Structure of executive functions in young and in older persons. PLoS One. 14 (5). 10.1371/journal.pone.0216149 (2019). [DOI] [PMC free article] [PubMed]
  • 8.Baltes, P. B. & Lindenberger, U. Emergence of a powerful connection between sensory and cognitive functions across the adult life span: a new window to the study of cognitive aging? Psychol. Aging. 12 (1), 12–21. 10.1037/0882-7974.12.1.12 (1997). [DOI] [PubMed] [Google Scholar]
  • 9.de Frias, C. M., Lövdén, M., Lindenberger, U. & Nilsson, L. G. Revisiting the dedifferentiation hypothesis with longitudinal multi-cohort data. Intelligence35(4), 381–392. 10.1016/J.INTELL.2006.07.011 (2007). [Google Scholar]
  • 10.Anstey, K. J., Hofer, S. M. & Luszcz, M. A. Cross-sectional and longitudinal patterns of dedifferentiation in late-life cognitive and sensory function: the effects of age, ability, attrition, and occasion of measurement. J Exp Psychol Gen132(3), 470–487. 10.1037/0096-3445.132.3.470 (2003). [DOI] [PubMed] [Google Scholar]
  • 11.Bisogno, A. L. et al. A low-dimensional structure of neurological impairment in stroke. Brain Commun.3 (2). 10.1093/braincomms/fcab119 (2021). [DOI] [PMC free article] [PubMed]
  • 12.Halai, A. D., Woollams, A. M. & Lambon Ralph, M. A. Using principal component analysis to capture individual differences within a unified neuropsychological model of chronic post-stroke aphasia: revealing the unique neural correlates of speech fluency, phonology and semantics. (2016). 10.1016/j.cortex.2016.04.016 [DOI] [PMC free article] [PubMed]
  • 13.Baciu, M. et al. Strategies and cognitive reserve to preserve lexical production in aging. Geroscience43(4), 1725–1765. 10.1007/S11357-021-00367-5 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Avila, J. F. et al. Measurement Invariance of Neuropsychological Measures of Cognitive Aging Across Race/Ethnicity by Sex/Gender Groups. Neuropsychology34(1), 3. 10.1037/NEU0000584 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Kiselica, A. M., Webber, T. A., Benge, J. F., Benge, J. F. & Benge, J. F. The uniform dataset 3.0 neuropsychological battery: Factor structure, invariance testing, and demographically adjusted factor score calculation. J Int Neuropsychol Soc26(6), 576–586. 10.1017/S135561772000003X (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Bertola, L. et al. Longitudinal measurement invariance of neuro-psychological tests in a diverse sample from the ELSA-Brasil study. Brazilian J. Psychiatry. 43 (3), 254–261. 10.1590/1516-4446-2020-0978 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Moreira, P. S. et al. Longitudinal measurement invariance of memory performance and executive functioning in healthy aging. PLoS One. 13 (9). 10.1371/journal.pone.0204012 (2018). [DOI] [PMC free article] [PubMed]
  • 18.Mungas, D. et al. Factor Structure, Convergent Validity, and Discriminant Validity of the NIH Toolbox Cognitive Health Battery (NIHTB-CHB) in Adults. J. Int. Neuropsychol. Soc.20(6), 579–587. 10.1017/S1355617714000307 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Glisky, E. L. et al. Differences between young and older adults in unity and diversity of executive functions. Aging Neuropsychol Cogn28(6), 829–854. 10.1080/13825585.2020.1830936 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Ma, Y. et al. Latent Factor Structure and Measurement Invariance of the NIH Toolbox Cognition Battery in an Alzheimer’s Disease Research Sample. J. Int. Neuropsychol. Soc.27(5), 412–425. 10.1017/S1355617720000922 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Spreng, R. N. et al. Neurocognitive aging data release with behavioral, structural and multi-echo functional MRI measures. Sci. Data9(1), 1–11. 10.1038/s41597-022-01231-7 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Spreng, R. N. Goal-Directed Cognition in Older and Younger Adults. [Online]. Available: https://osf.io/yhzxe/
  • 23.Gershon, R. C. et al. NIH toolbox for assessment of neurological and behavioral function. Neurology80(11 Supplement 3), S2–S6. 10.1212/WNL.0B013E3182872E5F (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Weschsler Adult Intelligence Scale. 4th Edition (WISC-IV) | Pearson. Accessed: 03 Oct 2023. [Online]. Available: https://www.pearsonassessments.com/store/usassessments/en/Store/Professional-Assessments/Cognition-%26-Neuro/Wechsler-Adult-Intelligence-Scale-%7C-Fourth-Edition/p/100000392.html
  • 25.Brainerd, C. J. et al. Dual-retrieval models and neurocognitive impairment. J Exp Psychol Learn Mem Cogn40(1), 41–65. 10.1037/A0034057 (2014). [DOI] [PubMed] [Google Scholar]
  • 26.Kaya, F., Delen, E. & Bulut, O. Test Review: Shipley-2 Manual. J Psychoeduc Assess30(6), 593–597. 10.1177/0734282912440852 (2012). [Google Scholar]
  • 27.Reitan, R. M. Validity of the trail making test as an indicator of organic brain damage. Percept Mot Skills8(3), 271–276. 10.2466/PMS.1958.8.3.271 (1958). [Google Scholar]
  • 28.Daneman, M. & Carpenter, P. A. Individual differences in working memory and reading. J Verbal Learning Verbal Behav19(4), 450–466. 10.1016/S0022-5371(80)90312-6 (1980). [Google Scholar]
  • 29.Symbol Digit Modalities Test. Accessed: 12 Nov 2023. [Online]. Available: https://psycnet.apa.org/doiLanding?doi=10.1037%2Ft27513-000
  • 30.Li, C. H. Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares. Behav. Res. Methods48(3), 936–949. 10.3758/S13428-015-0619-7/TABLES/6 (2016). [DOI] [PubMed] [Google Scholar]
  • 31.Flanagan, D. P. & Dixon, S. G. The Cattell-Horn‐Carroll Theory of cognitive abilities, in Encyclopedia of Special Education, Wiley, 10.1002/9781118660584.ese0431. (2014).
  • 32.Caemmerer, J. M., Keith, T. Z. & Reynolds, M. R. Beyond individual intelligence tests: application of Cattell-Horn-Carroll Theory. Intelligence. 7910.1016/j.intell.2020.101433 (2020).
  • 33.Floyd, F. J. & Widaman, K. F. Factor analysis in the development and refinement of clinical assessment instruments. Psychol. Assess.7(3), 286–299. 10.1037/1040-3590.7.3.286 (1995). [Google Scholar]
  • 34.Gross, A. L., Khobragade, P. Y., Meijer, E. & Saxton, J. A. Measurement and Structure of Cognition in the Longitudinal Aging Study in India–Diagnostic Assessment of Dementia. J. Am. Geriatr. Soc.68, S11. 10.1111/JGS.16738 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Park, L. Q. et al. Confirmatory factor analysis of the ADNI neuropsychological battery. Brain Imaging Behav6(4), 528–539. 10.1007/s11682-012-9190-3 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Hu, L. T. & Bentler, P. M. Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct. Equ Model.6 (1), 1–55. 10.1080/10705519909540118 (1999). [Google Scholar]
  • 37.Browne, M. W. & Cudeck, R. Alternative ways of assessing Model Fit. Sociol. Methods Res.21 (2), 230–258. 10.1177/0049124192021002005 (1992). [Google Scholar]
  • 38.Bollen, K. A. Structural equations with latent variables. Struct. Equations Latent Variables. 1–514. 10.1002/9781118619179 (1989).
  • 39.Flora, D. B. & Flake, J. K. The purpose and practice of exploratory and confirmatory factor analysis in psychological research: Decisions for scale development and validation. Can J Behaviour Sci49(2), 78–88. 10.1037/CBS0000069 (2017). [Google Scholar]
  • 40.Schmitt, T. A. Current methodological considerations in exploratory and confirmatory factor analysis. J. Psychoeduc Assess.29 (4), 304–321. 10.1177/0734282911406653 (2011). [Google Scholar]
  • 41.KMO and Bartlett’s Test - IBM Documentation. & Accessed 20 May 2023. [Online]. Available: https://www.ibm.com/docs/en/spss-statistics/28.0.0?topic=detection-kmo-bartletts-test
  • 42.Kaiser, H. F. An index of factorial simplicity. Psychometrika39(1), 31–36. 10.1007/BF02291575/METRICS (1974). [Google Scholar]
  • 43.Bartlett, M. S. A Note on the Multiplying Factors for Various χ2 Approximations. J. R Stat. Soc. Ser. B Stat. Methodol.16(2), 296–298. 10.1111/J.2517-6161.1954.TB00174.X (1954). [Google Scholar]
  • 44.Watkins, M. W. Exploratory factor analysis: a guide to best practice. J. Black Psychol.44 (3), 219–246. 10.1177/0095798418771807 (2018). [Google Scholar]
  • 45.Santos, N. C. et al. Exploring the factor structure of neurocognitive measures in older individuals. PLoS One10(4), e0124229. 10.1371/journal.pone.0124229 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Asbee, J., Kelly, K., McMahan, T. & Parsons, T. D. Factor analysis of the virtual reality Stroop task. J Clin Exp Neuropsychol44(8), 604–617. 10.1080/13803395.2022.2150749 (2022). [DOI] [PubMed] [Google Scholar]
  • 47.Velicer, W. F. & Fava, J. L. Effects of variable and subject sampling on factor pattern recovery. Psychol. Methods. 3 (2), 231–251. 10.1037/1082-989X.3.2.231 (1998). [Google Scholar]
  • 48.Lu, H., Li, J., Fung, A. W. T. & Lam, L. C. W. Diversity in verbal fluency performance and its associations with MRI-informed brain age matrices in normal ageing and neurocognitive disorders. CNS Neurosci. Ther.29 (7). 10.1111/CNS.14144 (2023). [DOI] [PMC free article] [PubMed]
  • 49.Korbmacher, M. et al. Bio-psycho-social factors’ associations with brain age: a large-scale UK Biobank diffusion study of 35,749 participants. Front. Psychol.1410.3389/FPSYG.2023.1117732 (2023). [DOI] [PMC free article] [PubMed]
  • 50.Salthouse, T. A. Selective review of cognitive aging. J. Int. Neuropsychol. Soc.16 (5), 754. 10.1017/S1355617710000706 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Fjell, A. M., Sneve, M. H., Grydeland, H., Storsve, A. B. & Walhovd, K. B. The disconnected brain and executive function decline in aging. Cerebral Cortex27(3), 2303–2317. 10.1093/CERCOR/BHW082 (2017). [DOI] [PubMed] [Google Scholar]
  • 52.Idowu, M. I. & Szameitat, A. J. Executive function abilities in cognitively healthy young and older adults—A cross-sectional study. Front. Aging Neurosci.15, 976915. 10.3389/FNAGI.2023.976915/BIBTEX (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Isingrini, M. & Taconnat, L. Episodic memory, frontal functioning, and aging. Rev Neurol (Paris)1643(Suppl 3), S91-5. 10.1016/S0035-3787(08)73297-1 (2008). [DOI] [PubMed] [Google Scholar]
  • 54.Guillaume, C. et al. Early age-related changes in episodic memory retrieval as revealed by event-related potentials. Neuroreport20(2), 191–196. 10.1097/WNR.0B013E32831B44CA (2009). [DOI] [PubMed] [Google Scholar]
  • 55.Tromp, D., Dufour, A., Lithfous, S., Pebayle, T. & Després, O. Episodic memory in normal aging and Alzheimer disease: Insights from imaging and behavioral studies. Ageing Res Rev24(Pt B), 232–262. 10.1016/J.ARR.2015.08.006 (2015). [DOI] [PubMed] [Google Scholar]
  • 56.Luo, L. & Craik, F. I. M. Aging and memory: a cognitive approach. Can. J. Psychiatry. 53, 346–353. 10.1177/070674370805300603 (2008). [DOI] [PubMed] [Google Scholar]
  • 57.Liampas, I. et al. Longitudinal episodic memory trajectories in older adults with normal cognition. Clin. Neuropsychol.37 (2), 304–321. 10.1080/13854046.2022.2059011 (2023). [DOI] [PubMed] [Google Scholar]
  • 58.Verhaeghen, P. Aging and vocabulary scores: a meta-analysis. Psychol Aging18(2), 332–339. 10.1037/0882-7974.18.2.332 (2003). [DOI] [PubMed] [Google Scholar]
  • 59.Hoffman, P. An individual differences approach to semantic cognition: divergent effects of age on representation, retrieval and selection. Sci. Rep.8 (1). 10.1038/s41598-018-26569-0 (2018). [DOI] [PMC free article] [PubMed]
  • 60.Pauls, F., Petermann, F. & Lepach, A. C. Gender differences in episodic memory and visual working memory including the effects of age. Memory21(7), 857–874. 10.1080/09658211.2013.765892 (2013). [DOI] [PubMed] [Google Scholar]
  • 61.Golchert, J. et al. Women outperform men in verbal episodic memory even in oldest-old age: 13-year longitudinal results of the AgeCoDe/AgeQualiDe study. J Alzheimer’s Dis69(3), 857–869. 10.3233/JAD-180949 (2019). [DOI] [PubMed] [Google Scholar]
  • 62.Sundermann, E. E. et al. Improving Detection of Amnestic Mild Cognitive Impairment with sex-specific cognitive norms. J. Alzheimers Dis.84 (4), 1763–1770. 10.3233/JAD-215260 (2021). [DOI] [PubMed] [Google Scholar]
  • 63.Sundermann, E. E. et al. Female advantage in verbal memory. Neurology87(18), 1916–1924. 10.1212/WNL.0000000000003288 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Vinke, E. J. et al. Trajectories of imaging markers in brain aging: the Rotterdam Study. Neurobiol. Aging71, 32–40. 10.1016/J.NEUROBIOLAGING.2018.07.001 (2018). [DOI] [PubMed] [Google Scholar]
  • 65.Bagarinao, E. et al. Reorganization of brain networks and its association with general cognitive performance over the adult lifespan. Sci. Rep.9(1), 11352. 10.1038/s41598-019-47922-x (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Starr, A., Srinivasan, M. & Bunge, S. A. Semantic knowledge influences visual working memory in adults and children. PLoS One15(11), e0241110. 10.1371/journal.pone.0241110 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Hartshorne, J. K. & Germine, L. T. When does cognitive functioning peak? The asynchronous rise and fall of different cognitive abilities across the life span. Psychol Sci26(4), 433–443. 10.1177/0956797614567339 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Nichols, E. S., Wild, C. J., Owen, A. M. & Soddu, A. Cognition across the lifespan: investigating age, sex, and other sociodemographic influences. Behav. Sci.11(4), 51. 10.3390/bs11040051 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Scott, E. P., Sorrell, A. & Benitez, A. Psychometric Properties of the NIH Toolbox Cognition Battery in healthy older adults: reliability, validity, and Agreement with Standard Neuropsychological tests HHS Public Access. J. Int. Neuropsychol. Soc.25 (8), 857–867. 10.1017/S1355617719000614 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Ott, L. R. et al. Construct validity of the NIH Toolbox Cognitive Domains: a comparison with conventional neuropsychological assessments. Neuropsychology. 36 (5), 468–481. 10.1037/neu0000813 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.MacCallum, R. C., Roznowski, M., Necowitz, L. B. Model modifications in covariance structure analysis: the problem of capitalization on chance. Psychol Bull, (1992). Accessed: 15 Aug 2024.  https://psycnet.apa.org/fulltext/1992-25917-001.html [DOI] [PubMed]
  • 72.MacCallum, R. M. P. Specification searches in covariance structure modeling. Psychol Bull, (1986). Accessed: 15 Aug 2024. Available: https://psycnet.apa.org/fulltext/1986-26275-001.html
  • 73.French, B. F. & Finch, W. H. Multigroup confirmatory factor analysis: locating the invariant referent sets. Struct Equ Modeling15(1), 96–113. 10.1080/10705510701758349 (2008). [Google Scholar]
  • 74.Jorgensen, T. D. Applying permutation tests and multivariate modification indices to configurally invariant models that need respecification. Front Psychol8, 250494 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Steinmetz, H. Analyzing observed composite differences across groups: is partial measurement invariance enough? Methodology. 9 (1), 1–12. 10.1027/1614-2241/A000049 (2013). [Google Scholar]
  • 76.Tse, W. W. Y., Lai, M. H. C. & Zhang, Y. Does strict invariance matter? Valid group mean comparisons with ordered-categorical items. Behav Res Methods56(4), 3117–3139. 10.3758/S13428-023-02247-6 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Shi, D., Song, H. & Lewis, M. D. The impact of partial factorial invariance on cross-group comparisons . 26, (7), 1217–1233. 10.1177/1073191117711020. (2017) [DOI] [PubMed]
  • 78.Robitzsch, A. & Lüdtke, O. Why full, partial, or approximate measurement Invariance are not a prerequisite for meaningful and valid Group comparisons. Struct. Equ Model.30 (6), 859–870. 10.1080/10705511.2023.2191292 (2023). [Google Scholar]
  • 79.Rozas, A. X. P., Juncos-Rabadán, O. & González, M. S. R. Processing speed, inhibitory control, and working memory: three important factors to account for age-related cognitive decline. 66, (2), 115–130, (2008). 10.2190/AG.66.2.B [DOI] [PubMed]
  • 80.Ferguson, H. J., Brunsdon, V. E. A. & Bradford, E. E. F. The developmental trajectories of executive function from adolescence to old age. Sci. Rep.11, 1382. 10.1038/s41598-020-80866-1 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Karr, J. E. et al. The unity and diversity of executive functions: a systematic review and re-analysis of latent variable studies. Psychol. Bull, 144 (11) 1147. 10.1037/BUL0000160 (2018). [DOI] [PMC free article] [PubMed]
  • 82.Löffler, C., Frischkorn, G. T., Hagemann, D., Sadus, K. & Schubert, A. L. ·The common factor of executive functions measures nothing but speed of information uptake. Psychol Res, (2024). 10.1007/s00426-023-01924-7 [DOI] [PMC free article] [PubMed]
  • 83.Schubert, A. L. & Rey-Mermet, A. Does process overlap theory replace the issues of general intelligence with the issues of attentional control?. J Appl Res Mem Cogn8(3), 277–283. 10.1016/j.jarmac.2019.06.004 (2019). [Google Scholar]
  • 84.Hedge, C., Powell, G. & Sumner, P. The reliability paradox: Why robust cognitive tasks do not produce reliable individual differences. Behav. Res. Methods50(3), 1166–1186. 10.3758/S13428-017-0935-1/TABLES/5 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Tulsky, D. S. et al. NIH Toolbox Cognition Battery (NIHTB-CB): list sorting test to measure working memory. J. Int. Neuropsychol. Soc.20 (6), 599–610. 10.1017/S135561771400040X (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Fox, R. S. et al. Uses of the NIH toolbox® in clinical samples: a scoping review. Neurol Clin Pract12(4), 307–319. 10.1212/CPJ.0000000000200060 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Shields, R. H. et al. Validation of the NIH Toolbox Cognitive Battery in intellectual disability. Neurology94(12), e1229–e1240. 10.1212/WNL.0000000000009131 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Zelazo, P. D. et al. NIH toolbox cognition battery (CB): validation of executive function measures in adults. J. Int. Neuropsychol. Soc.20 (6), 620–629. 10.1017/S1355617714000472 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Heaton, R. K. et al. Reliability and validity of composite scores from the NIH Toolbox Cognition Battery in adults. J. Int. Neuropsychol. Soc.20 (6), 588–598. 10.1017/S1355617714000241 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Hackett, K. et al. Utility of the NIH Toolbox for assessment of prodromal Alzheimer’s disease and dementia. Alzheimers Dement. (Amst)10, 764–772. 10.1016/J.DADM.2018.10.002 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Halter, C. M. et al. Construct validation of NIH Toolbox Cognition Battery premorbid cognitive functioning scores in Black and White older americans with and without mild cognitive impairment. J. Int. Neuropsychol. Soc.10.1017/S1355617723000425 (2023). [DOI] [PMC free article] [PubMed]
  • 92.Carlozzi, N. E. et al. Construct validity of the NIH Toolbox Cognition Battery in individuals with stroke. Rehabil Psychol62(4), 443–454. 10.1037/REP0000195 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Fischer, R. & Karl, J. A. A primer to (cross-cultural) multi-group invariance testing Possibilities in R. Front. Psychol, 10, (2019). 10.3389/FPSYG.2019.01507 [DOI] [PMC free article] [PubMed]
  • 94.McAlister, C. & Schmitter-Edgecombe, M. Executive function subcomponents and their relations to everyday functioning in healthy older adults. J Clin Exp Neuropsychol38(8), 925–940. 10.1080/13803395.2016.1177490 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Maldonado, T., Orr, J. M., Goen, J. R. M. & Bernard, J. A. Age differences in the subcomponents of executive functioning. J Gerontol B Psychol Sci Soc Sci75(6), e31–e55. 10.1093/GERONB/GBAA005 (2020). [DOI] [PubMed] [Google Scholar]
  • 96.Idowu, M. I. & Szameitat, A. J. Executive function abilities in cognitively healthy young and older adults-A cross-sectional study. Front. Aging Neurosci.1510.3389/FNAGI.2023.976915 (2023). [DOI] [PMC free article] [PubMed]
  • 97.Hsu, S., Poldrack, R., Ram, N. & Wagner, A. D. Observed correlations from cross-sectional individual differences research reflect both between-person and within-person correlations. PsyArXiv10.31234/osf.io/zq37h (2022). [Google Scholar]
  • 98.McCabe, D. P., Roediger, H. L., McDaniel, M. A., Balota, D. A. & Hambrick, D. Z. The relationship between working memory capacity and executive functioning evidence for a common executive attention construct. Neuropsychology. 24, (2), 222, (2010). 10.1037/A0017619 [DOI] [PMC free article] [PubMed]
  • 99.Albinet, C. T., Boucard, G., Bouquet, C. A. & Audiffren, M. Processing speed and executive functions in cognitive aging: how to disentangle their mutual relationship?. Brain Cogn.79(1), 1–11. 10.1016/J.BANDC.2012.02.001 (2012). [DOI] [PubMed] [Google Scholar]
  • 100.Seghier, M. L. & Price, C. J. Dissociating functional brain networks by decoding the between-subject variability. Neuroimage45(2), 349–359. 10.1016/J.NEUROIMAGE.2008.12.017 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Devlin, S. J., Gnanadesikan, R. & Kettenring, J. R. Robust estimation and outlier detection with correlation coefficients. Biometrika62(3), 531–545. 10.1093/BIOMET/62.3.531 (1975). [Google Scholar]
  • 102.Bowden, S. C. et al. Measurement invariance of core cognitive abilities in heterogeneous neurological and community samples. Intelligence32(4), 363–389. 10.1016/J.INTELL.2004.05.002 (2004). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (2.6MB, docx)

Data Availability Statement

The Neurocognitive aging data used in this study is available to the public within the Open Science Framework project “Goal-Directed Cognition in Older and Younger Adults”: https://doi.org/10.17605/OSF.IO/YHZXE.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES