Abstract
INTRODUCTION
The present study examined the dimensional structure of the neuropsychological test batteries from the National Alzheimer's Coordinating Center (NACC) Uniform Data Set (UDS) versions 2.0 and 3.0 and measurement equivalence across UDS versions and race/ethnicity groups.
METHODS
There were 49,895 participants included in the present study. The best‐fitting model was developed and tested in separate samples. Multiple group confirmatory factor analysis (CFA) evaluated measurement equivalence across UDS versions and race/ethnicity groups.
RESULTS
Results identified a best‐fitting four‐factor model with residual structure. Multiple group CFA supported partial scalar invariance by UDS version and race/ethnicity group. Regarding race/ethnicity groups, the Language and Attention domains had more non‐invariant intercepts, which most affected the White group.
DISCUSSION
A four‐factor model effectively summarizes the UDS neuropsychological test batteries across UDS versions and race/ethnicity groups. Crucial differences in measurement parameters must be accounted for in studies using these neuropsychological tests as outcomes.
Highlights
A four‐factor model summarizes cognition across Uniform Data Set (UDS) versions and race/ethnicity groups.
Measurement invariance exists across race/ethnicity groups.
Model fit differs between cognitively impaired and unimpaired samples.
Accounting for differences in measurement parameters across groups is essential.
Tailored normative data are crucial for certain UDS tests, including category fluency.
Keywords: cognition, measurement equivalence, multiple group confirmatory factor analysis, National Alzheimer's Coordinating Center, race/ethnicity, Uniform Data Set
1. BACKGROUND
Investigating methods to better understand the impact of biological, sociocultural, and environmental factors on cognitive aging across individuals from diverse racial and ethnic backgrounds is crucial for addressing health disparities and improving the quality of life for older adults in the United States. 1 Measurement of cognition is critical to this endeavor, and tests that measure cognitive abilities in the same way across individuals from diverse racial and ethnic backgrounds are a prerequisite. Most neuropsychological assessment tools used to assess cognitive aging were developed and validated using relatively homogenous samples (i.e., US‐born, highly educated, monolingual English‐speaking, non‐Hispanic White individuals). 2 It is unclear whether assumptions about these neuropsychological measures will hold for individuals not represented by their original validation cohorts. This may lead to biased estimates of group differences in cognition and the relationship of cognition to biopsychosocial factors thought to influence cognitive aging. 3 , 4 , 5
Measurement bias is present when there are systematic differences in expected test scores of individuals who have the same underlying ability level in different groups. 6 This means that scores of individuals from different groups cannot be directly compared. Measurement invariance, or measurement equivalence, is present when test scores of individuals from different groups are measured in the same way and thus, are directly comparable. 7 , 8 Measurement invariance is essential for valid cross‐group comparison, including evaluation of the relationship between possible contributors to cognitive aging (e.g., hypertension, Alzheimer's disease [AD] genetic risk status) and cognitive performance across race/ethnicity groups. 7 , 8 Unfortunately, measurement invariance is often assumed rather than tested. 9
Previous research examining measurement invariance across race and ethnicity groups has been mixed. 9 , 10 , 11 , 12 , 13 For example, Barnes et al. examined the latent factor structure of a common neuropsychological battery maintained by three community‐based cohorts (Minority Aging Research Study [MARS], Rush Memory and Aging Project [MAP], and the Religious Orders Study [ROS]) and found that a five‐factor model of cognition showed scalar invariance across Black and White older adults, 9 meaning that neuropsychological scores of individuals from these race groups are measured in the same way and can be directly compared. In contrast, using data from another community‐based cohort study (Washington/Hamilton Heights Inwood Columbia Aging Project [WHICAP]) and a different test battery, Avila et al. found partial scalar invariance for a three‐factor model of cognition across non‐Hispanic White, Black, and Hispanic participant groups due to non‐invariant intercepts of neuropsychological measures in the Language domain (e.g., naming). 12 Partial scalar invariance means that some factor intercepts differed across groups, but there were also invariant intercepts. The invariant intercepts can be used as linking items to establish a common metric and make it possible to compare scores from different groups. 14
The purpose of the present study is to examine the dimensional structure and factor equivalence of the National Alzheimer's Coordinating Center (NACC) Uniform Data Set (UDS) neuropsychological test batteries across different race/ethnicity groups. 15 Despite evidence that the neuropsychological test batteries of UDS version 2.0 and 3.0 are related and have the same underlying structure, 16 , 17 , 18 previous studies have not used confirmatory factor analysis (CFA) to examine the factor structure for the UDS neuropsychological test batteries. Moreover, studies exploring measurement invariance across groups have not combined data across UDS versions. 19 , 20 Thus, to increase the size and diversity of our sample and to extend the application of our model, data from participants who completed both versions of the NACC UDS neuropsychological test battery were included in our model through the use of the UDS crosswalk sample. 16 , 17 Thus, our robust study sample (N = 53,382) comprised 2.8% Asian, 14.9% Black, 9.4% LatinX, and 72.9% White participants. We hypothesized that, like previous studies of measurement invariance across race/ethnicity groups in older adults, we would find that the same cognitive factor model could be used to summarize neuropsychological test performance across groups when adjusting for the few non‐invariant intercepts. Given the similar makeup of the neuropsychological test batteries across UDS versions, we also hypothesized that the same factor structure could be used to summarize both neuropsychological test batteries, enabling large‐scale studies of factors associated with differences in cognitive aging across a larger portion of the NACC dataset.
2. METHODS
2.1. Participants
All participants were selected from the March 2025 NACC UDS data freeze. NACC participants are recruited on a referral basis (e.g., clinician, family, or self‐referrals) through active recruitment in community organizations and volunteers who wish to participate in research. Informed written consent is obtained from all participants. The NACC database is exempt from institutional review board review and approval because the NACC database provides deidentified data for secondary analyses.
This study used data from UDS versions 2.0 and 3.0 collected at baseline visits from all 32 Alzheimer's Disease Research Centers (ADRCs) between 2005 through 2025 and included participants who were cognitively normal or who had a UDS consensus clinical diagnosis of mild cognitive impairment or dementia at baseline. Only participants who reported English as their primary language were included in primary study analyses.
2.2. Race/ethnicity
Race/ethnicity categories include: White or Caucasian, Black or African American, American Indian or Alaska Native, Native Hawaiian or other Pacific Islander, Asian, Other (write‐in field), unknown, and self‐report as Hispanic or LatinX. For this study, we coded a new variable to include originally collected responses of White or Caucasian, Black or African American, Hispanic or LatinX, and Asian. For brevity, the groups will be referred to as White for “White or Caucasian,” Black for “Black or African American,” and LatinX for “Hispanic or LatinX.”
Participants reporting more than one race or ethnicity identity were coded into a single category using a predetermined prioritization schema. Specifically, any participant self‐identifying as LatinX in addition to another race/ethnicity group was coded as LatinX. If the participant did not self‐identify as LatinX but did self‐identify as Black, they were coded as Black. The same logic was applied for individuals self‐identifying as Native American, then as Asian. Given the small sample size of the Native American group, this group of participants was not included in the present study.
RESEARCH IN CONTEXT
Systematic review: The authors searched PubMed for studies related to measurement and structural invariance in the National Alzheimer's Coordinating Center (NACC) Uniform Data Set (UDS). Results revealed that previous studies have examined measurement and/or structural invariance by sex and race/ethnicity in subsamples of NACC and within single UDS versions, as well as by sex, race/ethnicity, and/or language in regional datasets.
Interpretation: We extend this work by applying advanced psychometric methods to baseline neuropsychological data from all NACC participants classified as cognitively normal, mild cognitive impairment, and dementia across versions of the UDS (2.0 and 3.0) and race/ethnicity groups.
Future directions: We found partial scalar invariance by UDS version and race/ethnicity group of our four‐factor model with important implications for clinical and research settings. Longitudinal measurement and structural invariance of NACC UDS neuropsychological test batteries is needed to further determine the utility of UDS neuropsychological tests as outcome measures in racially/ethnically diverse samples.
2.3. NACC UDS measures
In 2008, it was recommended that non‐proprietary versions replace proprietary neuropsychological tests in the UDS neuropsychological test battery (see Table S1 in supporting information). 21 , 22 A crosswalk study was conducted from December 2013 to April 2014 in which Alzheimer's Disease Centers were asked to administer both the previous and new tests to new and returning participants in randomized order. 16 Monsell et al. assessed the relationship of neuropsychological test measures from UDS version 2.0 with their replacement measure in UDS version 3.0 and developed a conversion table to compare test performance across UDS versions. 16 They found reasonably high correlations between proprietary and non‐proprietary test versions. 16 In a more recent study, Culhane et al. completed multiple factor analysis (an extension of principal component analysis) across UDS versions and found that both UDS versions have the same underlying structure. 17
To include the largest and most representative dataset possible in the current study, factors are linked by common individuals (participants who completed UDS version 2.0 and 3.0 neuropsychological test batteries at the same assessment as part of the crosswalk study, n = 935) and/or common items (items included in UDS versions 2.0 and 3.0). Measures from each neuropsychological test included in the present study are listed in Table S2 in supporting information.
2.4. Data analysis
2.4.1. Measures and data processing
UDS version 2.0 and 3.0 measures were included as observed indicators in further model testing. Initially, expert raters (D.M., L.G., F.L.) considered each measure based on existing theory and assigned each to one or several cognitive domains. D.M. completed quality control, including confirming the direction of each variable and recoding as needed so that higher scores indicated better performance. Blom transformation was applied to the full sample to normalize variables and establish a common, standardized scale. 23 The Blom transformation replaces raw scores with the normal equivalent deviate of the raw score percentile rank. The transformed variables were used in subsequent analyses; therefore, scores across measures are on a common, standardized metric.
2.4.2. Model testing
CFA was used to compare fit across multiple alternative models of the factor structure for the UDS neuropsychological test batteries. The alternative models were guided by theory and previous literature. 7 Models are listed in Table 1 in order of increasing complexity. Importantly, we included a priori residual correlations among observed test score indicators in models when there was a shared method that would create dependence among the indicators beyond what would be explained by the common factor (i.e., would violate the assumption of local independence). For example, Logical Memory IIA (Delayed) tests recall for the same story that was recalled in Logical Memory IA (Immediate). Similarly, Trail Making Test Part A and Trail Making Test Part B share a common sequential drawing component and are both timed. Additionally, if expert raters determined that a cognitive measure relates to one or more cognitive domain, alternative or dual loading structures were assessed in subsequent modeling.
TABLE 1.
Hypothesized configural models and constituent factors, and indicators.
| Model | Common factors |
|---|---|
| 1a | Global cognition, without residual structure |
| 1b | Global cognition, with residual structure |
| 2 | Episodic Memory and Non‐Memory |
| 3 | Episodic Memory, Executive Function, Language (letter/category fluency) |
| 4a | Episodic Memory, Executive Function, Language (letter/category fluency), Visuospatial Abilities |
| 4b | Episodic Memory, Executive Function, Language (letter/category fluency), Attention |
| 4c | Episodic Memory, Executive Function (letter/category fluency), Language (letter/category fluency), Attention |
| 4d | Episodic Memory, Executive Function (letter fluency), Language (letter fluency/category fluency), Attention |
| 4e | Episodic Memory, Executive Function (letter fluency), Language (category fluency), Attention |
| 4f | Episodic Memory, Executive Function (letter/category fluency), Language, Attention |
| 5a | Episodic Memory, Executive Function (letter/category fluency), Language (letter/category fluency), Visuospatial Abilities, Attention |
| 5b | Episodic Memory, Executive Function (letter fluency), Language (letter/category fluency), Visuospatial Abilities, Attention |
Fit statistics were used to determine the best‐fitting model. Fit statistics included (1) confirmatory fit index (CFI), 24 (2) Tucker–Lewis index (TLI), 25 and (3) root mean squared error of approximation (RMSEA). 26 Missingness was handled using the full information maximum likelihood method. 27 Merging UDS version 2.0 and 3.0 data resulted in substantial missing data; tests unique to UDS version 3.0 had missing values in UDS version 2.0 assessments and vice versa. The 954 individuals who participated in the crosswalk study completed all UDS tests across versions, and 4 of the 22 tests were common to UDS version 2.0 and 3.0. These two design features provide links for estimating factors that include UDS version 2.0 and 3.0 variables. The pattern of missing UDS version 2.0 and 3.0 data corresponds to missing by design, and full information maximum likelihood estimation provides unbiased estimation under this condition. 27 , 28
We randomly divided the overall sample into equally sized learning and validation subsamples. Model development was performed in the learning sample, and that model was subsequently applied in the validation sample. The strategy was chosen because, to some extent, elements of the measurement invariance evaluation process were data driven, especially the iterative partial scalar invariance process. This method allowed us to determine whether the model fit decreased in a different sample.
2.4.3. Measurement equivalence
Model estimation was performed with R/R‐Studio 29 , 30 using the lavaan package. 31 We first used the full sample that included all race/ethnicity groups to compare alternative factor models and identify the best‐fitting factor structure. We then proceeded to test the measurement invariance of this best‐fitting model using multiple group CFA. 7 , 8 , 10 , 14 A factor model specifies how observed variables (e.g., Logical Memory IA) relate to latent variables (factors; e.g., episodic memory). Model parameters include factor loadings, intercepts, and residual variances for each observed variable as well as means and variances for each factor. Using multiple group CFA, specific factor parameters can be constrained to be equal across groups, and model fit can be compared to a less restrictive model. If model fit is not worse in the more constrained model, it suggests that the parameters for which constraints were applied are invariant across groups. In contrast, a worse fit indicates non‐invariance of parameters across groups. Concretely, in the context of this study, non‐invariant parameters suggest that the neuropsychological measures relate to cognitive factors in different ways across groups.
First, configural invariance was evaluated by applying the best‐fitting dimensional structure identified in earlier analyses of the full sample. The loading and intercept of one indicator per factor were constrained to equality across groups for model identification purposes. Factor means were fixed at 0.0, and factor variances were fixed at 1.0 in the reference group and were freely estimated in the other groups. The choice of reference group is arbitrary, and the LatinX group was chosen as the reference group in this study. All other factor loadings, intercepts, and residual variances were freely estimated. Configural invariance is demonstrated by good model fit, and if present, it can be assumed that the same latent cognitive factors exist across groups and that the same observed variables define these latent cognitive factors across groups.
Second, once configural invariance was established, we then evaluated weak, or metric invariance: essentially, to determine whether observed variables relate to (i.e., correlate with) factors in the same way. This analysis modified the configural model by additionally constraining all loadings to equality across race/ethnicity groups. To determine whether weak invariance was present, model fit statistics were compared to the baseline, configural invariance model. If weak invariance is established, then factors can be assumed to have the same meaning across different groups, and cross‐group comparison of cognitive factors with outside variables (e.g., genetic status, medical diagnoses) is permissible.
Third, scalar or strong invariance was examined by modifying the metric invariance model by additionally constraining all intercepts to equality across groups. If scalar invariance is established, factor means and variances and means of observed variables can be compared across groups. 32 Scalar invariance is generally the maximum level of invariance investigated, given that more restrictive levels of invariance (e.g., equal residual variances = strict invariance) are infrequently obtained and are only needed to approve comparison of individual test scores, which introduces additional sources of bias into analyses. 9
We used change in CFI and RMSEA values from a less constrained model to a more constrained, nested model, to determine whether model fit significantly differed across invariance models. Change in CFI values ≥ −0.01 and change in RMSEA values ≤ 0.015 indicated a lack of significant change in model fit with more model constraints. 33
In the absence of scalar invariance, partial scalar invariance was evaluated. Partial scalar invariance allows some intercepts to differ across groups as long as there are non‐invariant intercepts to provide linkage across groups. Partial invariance establishes a common metric that allows factor scores from different groups to be compared and identifies individual variables that can be directly compared across groups. 14 We used an iterative process to identify invariant and non‐invariant intercepts. This process involved multiple steps. In the first step, Forward Step 1, beginning with the metric model, one additional intercept at a time was constrained to equality across the LatinX reference group and one of the other groups, and model fit was compared to the fit of the metric model. When the model fit was not significantly worse, the intercept was identified as invariant. A p value of 0.001 was selected for these comparisons to balance type I and type II error rates. This was repeated for the three group comparisons (LatinX with Asian, Black, and White) for each test variable and for all test variables that had freely estimated intercepts in the metric model. An intercept could be invariant for one group comparison (e.g., LatinX–Black) but not for a different comparison (e.g., LatinX–Asian). The invariant intercepts identified in this manner, along with the constrained intercepts in the metric model, were passed to the next step as potential invariant intercepts. In the next step, Backward Step 1, all the potential invariant intercepts identified in the previous step were constrained to equality in the metric model to form a base comparison model. Constraints were then removed one at a time (a constraint for a specific group comparison for a specific test variable), and the fit of the less constrained model was compared to that of the more constrained base model to identify non‐invariant intercepts. Next, in Forward Step 2, the base model from Backward Step 1 was modified by freeing all the non‐invariant intercepts identified in that step. Forward iteration, like in Forward Step 1, proceeded from this new base model. Last, in Backward Step 2, the base model from Forward Step 2 was modified by constraining intercepts identified as invariant in that analysis step. Backward iteration was performed to identify a new set of non‐invariant intercepts. Alternation between forward and backward iteration proceeded until the lists of invariant and non‐invariant intercepts did not change across successive backward iteration steps.
2.4.4. Version/form invariance
We performed an additional analysis to evaluate whether UDS version 2.0 and 3.0 analogs had the same measurement properties and could be considered equivalent in analyses combining both versions. Analogs in the two versions were: Logical Memory (immediate and delayed recall) with Craft Stories (verbatim and paraphrase scoring, immediate and delayed recall); Wechsler Adult Intelligence Scale–Revised Digit Span (forward and backward) and Number Span (forward and backward), and Boston Naming Test with Multilingual Naming Test. We used the final partial scalar invariance model as the base model, further constrained the loadings and intercepts for the UDS version 2.0 and 3.0 analogs to be equal, and compared the difference in fit of these two models.
2.4.5. Sensitivity analyses
Measurement equivalence testing was repeated in a sample including only cognitively normal participants. We did this in two ways. First, we estimated the partial scalar model developed in the full sample in a sample restricted to cognitively normal individuals, and compared model fit across the two samples. Second, we duplicated the measurement invariance evaluation process in the cognitively normal subsample. This repeated the full analytic process, including iterative evaluation of partial scalar invariance. Finally, we performed an additional sensitivity analysis to evaluate whether results would differ if individuals tested in languages other than English were included. We applied the final partial scalar model for the full sample to an expanded sample that included individuals who were not tested in English and compared results from this analysis to those from the English administration sample. 27
3. RESULTS
3.1. Sample characteristics
The total sample included 53,382 participants (Table 2; 57.3% female; Asian = 2.8%, Black = 14.9%, LatinX = 9.4%, White = 72.9%) with an average age of 71.1 ± 10.4 and average years of education of 15.8 ± 7.7. Race/ethnicity groups had a similar makeup by clinical diagnosis and diagnostic severity (cognitively normal = 45.6%; mild cognitive impairment = 22.4%; dementia = 31.9%).
TABLE 2.
Total sample characteristics.
| Variable | Asian | Black | LatinX | White | Total |
|---|---|---|---|---|---|
| N | 1494 (2.8%) | 7956 (14.9%) | 5010 (9.4%) | 38,922 (72.9%) | 53,382 (100.0%) |
| Sex—female | 887 (59.4%) | 5726 (72.0%) | 3321 (66.3%) | 20,644 (53.0%) | 30,578 (57.3%) |
| Language—English | 1151 (77.0%) | 7842 (98.6%) | 2451 (48.9%) | 38,451 (98.8%) | 49,895 (93.5%) |
| Language—other | 320 (21.4%) | 43 (0.5%) | 12 (0.2%) | 49 (0.1%) | 424 (0.8%) |
| Language—Spanish | 0 (0.0%) | 4 (0.1%) | 2,436 (48.6%) | 23 (0.1%) | 2,463 (4.6%) |
| Language—missing | 23 (1.5%) | 67 (0.8%) | 111 (2.2%) | 399 (1.0%) | 600 (1.1%) |
| Age (years)—mean (SD) | 69.7 (± 10.2) | 71.2 (± 9.0) | 70.0 (± 10.1) | 71.3 (± 10.7) | 71.1 (± 10.4) |
| Education (years)—mean (SD) | 17.2 (± 9.4) | 14.7 (± 6.8) | 13.0 (± 10.6) | 16.4 (± 7.2) | 15.8 (± 7.7) |
| Clinical diagnosis—cognitively normal | 755 (50.5%) | 4262 (53.6%) | 2249 (44.9%) | 17,089 (43.9%) | 24,355 (45.6%) |
| Clinical diagnosis—mild cognitive impairment | 383 (25.6%) | 1956 (24.6%) | 1230 (24.6%) | 8408 (21.6%) | 11,977 (22.4%) |
| Clinical diagnosis—dementia | 356 (23.8%) | 1738 (21.8%) | 1531 (30.6%) | 13,425 (34.5%) | 17,050 (31.9%) |
Abbreviation: SD, standard deviation.
There were 3487 participants who reported a primary language other than English or did not report a primary language (other: N = 424; Spanish: N = 2463; Missing: N = 600). These participants were removed from the sample used for primary analyses. The final sample included 49,895 participants. Prior to model testing, the final sample was randomly divided into a learning sample for model development and a validation sample for model validation.
3.2. Best‐fitting model/configural model
We tested multiple alternative models in the full learning sample to determine the best‐fitting model to represent cognitive domains (Table 3). The first model, a single‐factor model without residual covariances, was the worst‐fitting model (1a). Incorporating a priori residual covariances into the single‐factor model significantly improved model fit statistics (1b). Model fit subsequently improved across two‐, three‐, and four‐factor models. Two four‐factor models were evaluated: a model with Episodic Memory, Executive Function, Language, and Visuospatial factors (4a) and a model with Episodic Memory, Executive Function, Language, and Attention factors (4b). Model fit statistics indicated that the latter model structure was stronger. In the latter model, Benson Complex Figure Copy (Immediate) was included in the Executive Function factor, and Benson Complex Figure Copy (Delayed) was included in the Episodic Memory factor. Similar to Kiselica et al., 20 five‐factor models were tested, including both Attention and Visuospatial factors (5a, 5b); however, these models did not substantially improve model fit, and the Visuospatial factor was felt to be insubstantially supported, given that it only included loadings from the Benson Complex Figure Copy task. Therefore, given its superior model fit compared to models with fewer factors and its theoretical support, the four‐factor model (Episodic Memory, Executive Function, Language, Attention) was chosen as the best‐fitting model to be used in all further analyses.
TABLE 3.
Model fit statistics by sample and by configural model number.
| Model | Observations | χ 2 | DF | p | CFI | TLI | AIC | RMSEA | p | SRMR |
|---|---|---|---|---|---|---|---|---|---|---|
| Models tested in the learning sample | ||||||||||
| 1a | 20769 | 46667 | 189 | <0.001 | 0.72 | 0.69 | 466714 | 0.11 | <0.001 | 0.14 |
| 1b | 20769 | 10496 | 178 | <0.001 | 0.94 | 0.93 | 430565 | 0.05 | <0.001 | 0.08 |
| 2 | 20769 | 5974 | 177 | <0.001 | 0.97 | 0.96 | 426046 | 0.04 | 1 | 0.07 |
| 3 | 20769 | 3558 | 175 | <0.001 | 0.98 | 0.98 | 423633 | 0.03 | 1 | 0.08 |
| 4a | 20769 | 3623 | 173 | <0.001 | 0.98 | 0.98 | 423702 | 0.03 | 1 | 0.08 |
| 4b | 20769 | 3028 | 172 | <0.001 | 0.98 | 0.98 | 423110 | 0.03 | 1 | 0.05 |
| 4c | 20769 | 2941 | 168 | <0.001 | 0.98 | 0.98 | 423030 | 0.03 | 1 | 0.05 |
| 4d | 20769 | 2951 | 170 | <0.001 | 0.98 | 0.98 | 423037 | 0.03 | 1 | 0.05 |
| 4e | 20769 | 3301 | 172 | <0.001 | 0.98 | 0.98 | 423383 | 0.03 | 1 | 0.05 |
| 4f | 20769 | 5548 | 173 | <0.001 | 0.97 | 0.96 | 425628 | 0.04 | 1 | 0.06 |
| 5a | 20769 | 2981 | 165 | <0.001 | 0.98 | 0.98 | 423077 | 0.03 | 1 | 0.06 |
| 5b | 20769 | 2992 | 167 | <0.001 | 0.98 | 0.98 | 423084 | 0.03 | 1 | 0.06 |
| Model tested in the validation sample | ||||||||||
| 4d | 20708 | 2772 | 170 | <0.001 | 0.98 | 0.98 | 420975 | 0.03 | 1 | 0.05 |
| Model tested in the full sample | ||||||||||
| 4d | 138247 | 19862 | 170 | <0.001 | 0.98 | 0.98 | 3007303 | 0.03 | 1 | 0.06 |
Abbreviations: AIC, Akaike information criterion; CFI, comparative fit index; DF, degrees of freedom; RMSEA, root mean square error of approximation; SRMR, standardized root mean square residual; TLI, Tucker–Lewis index.
We next evaluated potential cross‐loadings of fluency measures across Language and Executive Function factors. Including category and letter fluency measures on the Executive Function factor and not the Language factor resulted in worse model fit (4f). While including cross‐loadings of both types of fluency measures did improve model fit (4c), category fluency measures loaded only minimally onto the Executive Function factor (< 0.10). Including a cross‐loading of letter fluency measures on the Language and Executive Function factors (4d) improved model fit compared to the four‐factor model without cross‐loadings. Letter fluency measures loaded reasonably onto both Language and Executive Function factors (Letter Fluency on Language = 0.51 and 0.50; Letter Fluency on Executive Function = 0.21 and 0.24). This model had a similar fit in the learning and validation samples. In summary, a four‐factor model with cross‐loadings of letter fluency measures on Language and Executive Function factors and a priori residual structure was selected. Factors were strongly intercorrelated, ranging from 0.60 for Episodic Memory and Attention to 0.85 for Language and Executive Function in the full sample. Final model fit statistics in the full sample indicated good model fit. The best‐fitting four‐factor model is depicted in Figure 1.
FIGURE 1.

Best‐fitting configural model for the UDS neuropsychological test battery cognitive factors. Ovals represent latent (i.e., unobserved) factors. Rectangles represent observed variables; black outlines indicate that the measure was included in UDS version 2.0, red outlines indicate that the measure was included in UDS version 3.0, blue outlines indicate that measures were included in both UDS versions 2.0 and 3.0. An arrow represents a causal path. Double‐headed arrows connecting observed variables indicate covariance between observed variables. UDS, Uniform Data Set; WAIS‐R, Wechsler Adult Intelligence Scale–Revised.
3.3. Measurement invariance by race/ethnicity group
Table 4 presents the results of measurement invariance testing in the learning and validation samples. First, the best‐fitting four‐factor model fit the entire sample undifferentiated by groups well and at about the same level as the configural model in the learning and validation samples. Second, the fit of the metric invariance model (all factor loadings constrained to equality across all groups while intercepts are freely estimated, except for identification constraints) was essentially the same as that of the configural model in both the learning and validation samples (∆CFI ≤ −0.001, ∆RMSEA = 0), and this supported metric invariance. In contrast, the fit of the scalar invariance model (all loadings and intercepts constrained to equality across groups) was poorer compared to the metric model for ∆CFI (−0.012), which exceeded the cutoff (−0.01) for significantly worse fit. RMSEA fit was worse (∆RMSEA = 0.007), but this difference did not exceed the cutoff (0.015). Because there was evidence of non‐invariant intercepts, we used iterative partial scalar models that freed or constrained one intercept parameter at a time to identify non‐invariant intercepts. Fit of the final partial scalar invariance model did not significantly differ from that of the metric model (∆CFI = −0.001, ∆RMSEA = 0), suggesting that partial scalar invariance was present.
TABLE 4.
Measurement invariance testing in the learning and validation samples of the final four factor model.
| Model | 𝜒2 | DF | p | RMSEA | CFI | TLI |
|---|---|---|---|---|---|---|
| Models tested in the learning sample | ||||||
| Combined | 7988 | 170 | <0.001 | 0.031 | 0.982 | 0.977 |
| Configural | 4766 | 680 | <0.001 | 0.032 | 0.980 | 0.976 |
| Metric | 5021 | 737 | <0.001 | 0.031 | 0.980 | 0.977 |
| Partial scalar | 5094 | 766 | <0.001 | 0.031 | 0.979 | 0.977 |
| Scalar | 7492 | 788 | <0.001 | 0.038 | 0.968 | 0.966 |
| Models tested in the validation sample | ||||||
| Combined | 7988 | 170 | <0.001 | 0.031 | 0.982 | 0.977 |
| Configural | 4565 | 680 | <0.001 | 0.031 | 0.981 | 0.977 |
| Metric | 4854 | 737 | <0.001 | 0.031 | 0.980 | 0.977 |
| Partial scalar | 4985 | 766 | <0.001 | 0.030 | 0.979 | 0.977 |
| Scalar | 7278 | 788 | <0.001 | 0.037 | 0.968 | 0.966 |
Abbreviations: CFI, comparative fit index; DF, degrees of freedom; RMSEA, root mean square error of approximation; TLI, Tucker–Lewis index.
The partial scalar model evaluation process revealed that all four cognitive factors had invariant and non‐invariant intercepts across race/ethnicity groups (Table 5). Four of the seven (4/7) observed variables contributing to the Episodic Memory factor, two of the four (2/4) observed variables for the Attention factor, four of the six (4/6) observed variables for the Executive Function factor, and five of the six (5/6) observed variables for the Language factor had non‐invariant intercepts. All four factors had at least one test variable with invariant intercepts across all four race/ethnicity groups, and many of the test variables with non‐invariant intercepts had invariant intercepts for the LatinX group compared to one or two other groups. This is important for model identification and for establishing a common metric that allows factor scores to be comparable across groups.
TABLE 5.
Freely estimated variant and non‐invariant (bolded) intercept parameters in the learning sample for the final four‐factor multiple group model.
| Variable | White | Black | Asian | LatinX |
|---|---|---|---|---|
| Logical Memory IA (Immediate) | −0.153 (0.024) | −0.153 (0.024) | −0.153 (0.024) | −0.153 (0.024) |
| Logical Memory IIA (Delayed) | −0.169 (0.022) | −0.169 (0.022) | −0.169 (0.022) | −0.169 (0.022) |
| Craft Story 21 Recall (Immediate, Verbatim) | −0.351 (0.027) | −0.330 (0.029) | −0.405 (0.029) | −0.405 (0.029) |
| Craft Story 21 Recall (Immediate, Paraphrase) | −0.358 (0.028) | −0.378 (0.029) | −0.416 (0.029) | −0.416 (0.029) |
| Craft Story 21 Recall (Delayed, Verbatim) | −0.368 (0.026) | −0.368 (0.026) | −0.368 (0.026) | −0.368 (0.026) |
| Craft Story 21 Recall (Delayed, Paraphrase) | −0.381 (0.026) | −0.419 (0.027) | −0.361 (0.026) | −0.361 (0.026) |
| Benson Complex Figure Copy (Delayed) | −0.281 (0.020) | −0.154 (0.023) | −0.154 (0.023) | −0.154 (0.023) |
| WAIS‐R Digit Span (Backward) | −0.241 (0.026) | −0.241 (0.026) | −0.241 (0.026) | −0.241 (0.026) |
| Number Span Test (Backward) | −0.515 (0.029) | −0.515 (0.029) | −0.515 (0.029) | −0.515 (0.029) |
| WAIS‐R Digit Span (Forward) | −0.137 (0.021) | −0.129 (0.025) | −0.393 (0.032) | −0.393 (0.032) |
| Number Span Test (Forward) | −0.315 (0.024) | −0.263 (0.029) | −0.615 (0.031) | −0.615 (0.031) |
| Trail Making Test, Part B | −0.452 (0.032) | −0.452 (0.032) | −0.452 (0.032) | −0.452 (0.032) |
| WAIS‐R Digit Symbol Coding | −0.181 (0.029) | −0.088 (0.029) | 0.130 (0.050) | −0.088 (0.029) |
| Trail Making Test, Part A | −0.261 (0.026) | −0.261 (0.026) | −0.261 (0.026) | −0.261 (0.026) |
| Benson Complex Figure Copy (Immediate) | −0.217 (0.018) | −0.147 (0.021) | −0.147 (0.021) | −0.147 (0.021) |
| Letter Fluency (F‐words) | −0.758 (0.029) | −0.443 (0.025) | −0.443 (0.025) | −0.443 (0.025) |
| Letter Fluency (L‐words) | −0.753 (0.031) | −0.481 (0.025) | −0.481 (0.025) | −0.481 (0.025) |
| Boston Naming Test | −0.595 (0.028) | −0.587 (0.026) | −0.335 (0.027) | −0.335 (0.027) |
| Multilingual Naming Test | −0.740 (0.023) | −0.740 (0.023) | −0.740 (0.023) | −0.740 (0.023) |
| Category Fluency (Animals) | −0.865 (0.035) | −0.419 (0.030) | −0.123 (0.027) | −0.123 (0.027) |
| Category Fluency (Vegetables) | −0.852 (0.033) | −0.180 (0.027) | −0.061 (0.039) | −0.180 (0.027) |
Abbreviation: WAIS‐R, Wechsler Adult Intelligence Scale–Revised.
Distribution plots (Figure 2A–D) illustrate how failure to account for non‐invariant intercepts influenced cognitive factor scores across race/ethnicity groups. The multigroup partial scalar model accounts for measurement non‐invariance and was the comparison standard for the single‐group (combined) model, which assumes measurement invariance. Factor scores from these two models were compared, and highly similar distributions of scores were observed across the partial scalar and single group model for all race/ethnicity groups for Episodic Memory (Figure 2A) and Executive Function (Figure 2C) factors. There were differences for the Attention (Figure 2B) and Language factors (Figure 2D). For the Attention factor, failure to account for measurement non‐invariance resulted in the artificial spread of the score range for the Asian group. Stated differently, measurement bias inherent in the single group model resulted in the range of true Attention abilities being overestimated in the Asian group. For the Language factor, failure to account for measurement non‐invariance resulted in lower Language factor scores for White and Black participants and slightly higher scores for Asian participants. Stated differently, language ability was substantially underestimated in White participants, slightly underestimated in Black participants, and was slightly overestimated in Asian participants.
FIGURE 2.

Distribution plots of cognitive factor scores for partial scalar invariance and single group (combined) models by race/ethnicity. Distribution plots for (A) Episodic Memory, (B) Attention, (C) Executive Function, and (D) Language display the range of true ability on each cognitive factor estimated by the partial scalar invariance model (top) and the single group (combined) model (bottom). Line color represents race/ethnicity groups.
Vegetable fluency showed especially strong measurement non‐invariance for the Language factor, and Figure 3 shows differential associations of the observed vegetable fluency score with the Language factor score across groups. In the White and Black groups, vegetable fluency in the single group model underestimated language ability compared to the partial scalar model standard. That is, for any vegetable fluency value, the corresponding Language factor score was lower for the single group model. A different way of characterizing this finding is that the difficulty of the vegetable fluency measure was underestimated for White and Black participants in the single group model; any given Language factor score value was associated with a higher observed vegetable fluency score. In contrast, vegetable fluency overestimated language ability in the single group model in Asian participants, and correspondingly, the difficulty of this item was overestimated in the single group model.
FIGURE 3.

Differential association of vegetable fluency score with Language factor score for partial scalar invariance and single‐group (combined) models by race/ethnicity. Each graph shows the association of the vegetable fluency score with the single group (combined; orange) and partial scalar invariance (teal) factor scores by race/ethnicity group. Item difficulty is represented by the value of the Language factor score (x axis) when the graphed curve intersects with the 0 value for vegetable fluency. A higher Language factor score indicates greater item difficulty. Dots represent the distribution of actual vegetable fluency scores and Language factor scores by model.
3.4. Measurement invariance by UDS version
We tested measurement invariance by UDS version (2.0 and 3.0) by constraining loadings and intercepts of version 2.0 and 3.0 analogs to equality in the final partial scalar model and comparing model fit to the final partial scalar model in which these parameters were freely estimated across versions. In the learning sample, fit of the partial scalar invariance model was adequate compared to the metric model (∆CFI = −0.01, ∆RMSEA = 0.005). Model fit for the partial scalar invariance model was similar across learning and validation samples (learning sample: 𝜒 2 = 7094 df = 790 p < 0.001, RMSEA = 0.036, CFI = 0.97, TLI = 0.968; validation sample: 𝜒 2 = 6808 df = 790 p < 0.001, RMSEA = 0.036, CFI = 0.971, TLI = 0.969). ,
3.5. Sensitivity analyses
We ran additional analyses assessing measurement equivalence within only cognitively normal participants. First, we applied the model developed within the all‐diagnoses learning sample to the cognitively normal only learning subsample. Table S3 in supporting information summarizes model fit in these samples. The configural, metric, and partial scalar model did not significantly differ across the two samples using ∆CFI and ∆RMSEA metrics, but the fit of the scalar model was significantly poorer by the ∆CFI criterion.
We also performed the iterative partial scalar invariance testing process in the cognitively normal sample. Similar to the model including all participants, when invariance model testing by race/ethnicity was completed in cognitively normal participants only, fit of the partial scalar invariance model was adequate compared to the metric model (∆CFI = −0.001, ∆RMSEA = 0), suggesting that partial scalar invariance was present (Table S4 in supporting information). Similar to the main analysis model, automated model selection revealed that all four cognitive factors had invariant and non‐invariant intercepts across race/ethnicity groups (Table S5 in supporting information). The same number of non‐invariant intercepts were present per factor. In contrast to the main analysis model with all diagnoses, letter fluency rather than category fluency measures displayed especially strong measurement non‐invariance, particularly affecting the estimation of the White group.
Finally, we estimated the final all‐diagnoses partial scalar model in a sample that included non‐English (primarily Spanish) language of test administration. Model fit was the same as in the English‐only sample (RMSEA = 0.031, CFI = 0.979, TLI = 0.977).
4. DISCUSSION
Measurement bias can lead to erroneous conclusions in research on individuals from diverse backgrounds and can also contribute to clinical errors. The present study examined the dimensional structure and factor equivalence of the NACC UDS neuropsychological test battery across different race/ethnicity groups and UDS versions 2.0 and 3.0. Our study revealed that a four‐factor model (Episodic Memory, Attention, Executive Function, Language) best explained the correlations among neuropsychological tests from the UDS version 2.0 and 3.0 test batteries. This four‐factor model was shown to apply to the four race/ethnicity groups included in this study and satisfied requirements for partial scalar measurement invariance. This means that the factors are measured on the same scale or metric across groups, and this makes it possible to directly compare factor scores. Our results also showed that UDS version 2.0 and 3.0 analogs have invariant measurement properties with respect to this four‐factor model. Stated differently, these variables related to the four underlying factors in the same way, such that the UDS version 2.0 and 3.0 tests are interchangeable in terms of measuring these cognitive factors. Thus, the same factor model can be used to summarize neuropsychological test performance across UDS versions when adjusting for very few non‐invariant intercepts. This modeling technique can enable more accurate and larger scale studies of cognitive aging using a common metric across UDS test versions. ,
Past literature is mixed regarding the dimensional structure of the UDS and similar neuropsychological test batteries. 19 , 20 For example, studies examining the dimensional structure of the UDS neuropsychological test battery have variably included a visuospatial domain. 18 , 19 , 20 There are only two measures within the UDS version 2.0 and 3.0 neuropsychological test batteries that we considered including in the visuospatial domain: Benson Complex Figure Copy Immediate and Delayed Recall. Similar to Hayden et al., who evaluated the dimensional structure of the UDS version 2.0 battery, we included Benson Complex Figure Copy (Delayed) in our Episodic Memory domain, given that this measure has both visual and episodic memory components. 19 , 34 , 35 , 36 , 37 Benson Complex Figure Copy (Immediate), which also involves executive skills such as planning and organization, 38 , 39 loaded reasonably onto the Executive Function factor. With only one neuropsychological test contributing loadings to the Visuospatial factor and little change to model fit statistics when it was included, there was not adequate support for inclusion of a Visuospatial factor. Regarding fluency, researchers have implemented many different loading structures to best represent category and letter fluency measures. 12 , 13 , 18 , 20 In the present study, although cross‐loadings of both fluency measures improved model fit, category fluency measures loaded only minimally onto our Executive Function factor. In comparison, letter fluency loaded reasonably onto both Language and Executive Function factors. Thus, our best‐fitting model included cross‐loadings of letter fluency, not category fluency, onto Language and Executive Function factors.
Measurement invariance testing indicated partial scalar invariance of the UDS neuropsychological test battery by race/ethnicity in this sample of older adults across the AD and related dementias (ADRD) spectrum (cognitively normal, mild cognitive impairment, dementia). Partial scalar invariance means that some test scores measure the same latent factors across different groups and have the same correlation with the latent abilities that are being measured, while scores of the tests with non‐invariant intercepts are not on the same scale in different groups. For example, participants in the White group in this study who had average Language factor ability had lower vegetable fluency scores than did individuals from the other groups who had average Language ability (Table 5, −0.85 standard deviation in White participants, −0.18 in Asian and Black participants, −0.06 in LatinX participants). This indicates that the absolute scores of some, but not all, observed variables can be directly compared across groups. 12 The Attention and Language factors had a greater number of non‐invariant intercepts compared to the Episodic Memory and Executive Function factors. Vegetable fluency showed especially strong measurement non‐invariance for the Language factor. Failure to account for measurement invariance (inherent in the combined, single group model) resulted in the vegetable fluency score underestimating actual Language ability (optimally estimated by the partial scalar model) in the White and Black groups, and slightly overestimating Language ability in the Asian group. Sensitivity analyses in a sample of cognitively normal participants revealed nearly identical results by race/ethnicity group, with one major exception. Letter fluency displayed especially strong measurement non‐invariance compared to category fluency, and this was not present in the all‐diagnoses sample. Possible sources for this discrepancy include sample size, differences in race/ethnicity composition across cognitive status groups, other sources of bias (e.g., recruitment and enrollment differences), or some combination of factors that differently influenced derived parameter estimates. 40 , 41
The finding that several strongly non‐invariant intercepts were present for variables contributing to the Language factor is largely consistent with prior measurement invariance studies of neuropsychological batteries stratified by key sociodemographic variables (i.e., sex, education, race/ethnicity, language). In sex‐based comparisons, Kiselica et al. found evidence for metric invariance of the NACC UDS version 3.0 neuropsychological test battery, with particularly notable non‐invariance of the intercept of vegetable fluency. 20 Using the WHICAP cohort, Siedlecki et al. found several non‐invariant intercepts, including on their Language factor across English‐ and Spanish‐speaking samples (i.e., naming, letter fluency, category fluency, comprehension, and similarities), which were attenuated within education‐matched language samples. 42 Avila et al. also evaluated measurement invariance in the WHICAP cohort by race/ethnicity and sex/gender groups and found non‐invariant intercepts on their Language factor (e.g., similarities and naming) with differing intercepts across non‐Hispanic White female, non‐Hispanic White male, and Hispanic female participants. 12 Zahodne et al. examined structural and measurement invariance of the neuropsychological battery from the Detroit Area Wellness Network by race/ethnicity group (Middle Eastern/North African, Black, White) and language (Arabic, English), and found partial scalar invariance across race/ethnicity groups, suggesting a White testing advantage on a measure of animal fluency. 43
The present study adds to prior work in several ways. We leveraged the UDS crosswalk study to compile the largest study to date of older adults with different race/ethnicity backgrounds and cognitive status classifications (normal cognition, mild cognitive impairment, and dementia). The large sample size allowed us to derive models in one randomly selected sample and validate these models in a different randomly selected sample. We provide robust evidence for partial scalar invariance across UDS versions and race/ethnicity groups, suggesting the UDS can offer an unbiased assessment of domain‐specific cognitive abilities such that the factor scores from the partial scalar model are directly comparable across individuals from different race/ethnicity groups. Moreover, findings suggest that some individual test scores (those with invariant intercepts) can be directly compared across groups, but that direct comparison is problematic for other test scores (those with non‐invariant intercepts). Use of the partial scalar model developed and tested in this study allows for an unbiased assessment of domain‐specific cognitive abilities through the free estimation of these non‐invariant intercepts.
Our results showed that UDS version 2.0 and 3.0 neuropsychological test variables are interchangeable for measuring the domain‐specific cognitive abilities represented by the four‐factor model structure. Monsell et al. reported a high correlation of proprietary measures of UDS version 2.0 and their non‐proprietary UDS version 3.0 analogs. 16 Additionally, Culhane et al. investigated the common underlying dimensional structure of UDS versions 2.0 and 3.0 using dimensional analysis. 17 Mukherjee et al. have created co‐calibrated cognitive domain scores across multiple national aging cohorts, including NACC, using UDS versions 1.0, 2.0, and 3.0, to permit their direct comparison. 18 Our study contributes to this body of work by showing that cognitive factor scores are interchangeable across UDS versions 2.0 and 3.0 and can likely be used to track longitudinal change of individuals completing different UDS neuropsychological test batteries across visits. Moreover, given the limited revisions that were made to the UDS version 3.0 neuropsychological test battery in the recently released UDS version 4.0, the model developed in our study is likely to be applicable to new NACC participants with minimal amendment.
However, this study is not without limitations. We did not explicitly test for longitudinal structural and measurement invariance of our four‐factor model. Unlike other studies of UDS 2.0 and 3.0 factor structure, we did not test a higher order model of cognition including a second order, general cognitive factor, which may be viewed as a potential limitation. 20 , 44 However, the purpose of the present study was to characterize domain‐specific cognitive abilities as we expect domain specificity in the relationship of cognitive ability to biopsychosocial factors thought to influence cognitive aging (e.g., AD pathology, cardiovascular health).
In summary, we leveraged the large NACC UDS dataset to evaluate measurement invariance across the multiple race/ethnicity groups represented by ADRC cohorts across the United States. Although measurement differences were found for some of the UDS neuropsychological tests, we found that the same four domain‐specific dimensions were present and can be comparably measured across UDS versions and race/ethnicity groups, and that cross‐group comparison of associations of cognitive factors with outside variables is permissible. 13 This study also identified individual test scores for which direct comparison across groups is problematic. Clinicians and researchers should use caution when comparing these scores across individuals from different groups. As an alternative, researchers are encouraged to use factor scores that account for measurement non‐invariance or compare only individual tests that were shown to be invariant across groups.
CONFLICT OF INTEREST STATEMENT
The authors declare no conflicts of interest. Author disclosures are available in the Supporting Information.
CONSENT STATEMENT
Informed written consent is obtained from all participants. The NACC database is exempt from institutional review board review and approval because the NACC database provides deidentified data for secondary analyses.
Supporting information
Supporting Information
Supporting Information
ACKNOWLEDGMENTS
The NACC database is funded by NIA/NIH Grant U24 AG072122. NACC data are contributed by the NIA‐funded ADRCs: P30 AG062429 (PI James Brewer, MD, PhD), P30 AG066468 (PI Oscar Lopez, MD), P30 AG062421 (PI Bradley Hyman, MD, PhD), P30 AG066509 (PI Thomas Grabowski, MD), P30 AG066514 (PI Mary Sano, PhD), P30 AG066530 (PI Helena Chui, MD), P30 AG066507 (PI Marilyn Albert, PhD), P30 AG066444 (PI David Holtzman, MD), P30 AG066518 (PI Lisa Silbert, MD, MCR), P30 AG066512 (PI Thomas Wisniewski, MD), P30 AG066462 (PI Scott Small, MD), P30 AG072979 (PI David Wolk, MD), P30 AG072972 (PI Charles DeCarli, MD), P30 AG072976 (PI Andrew Saykin, PsyD), P30 AG072975 (PI Julie A. Schneider, MD, MS), P30 AG072978 (PI Ann McKee, MD), P30 AG072977 (PI Robert Vassar, PhD), P30 AG066519 (PI Frank LaFerla, PhD), P30 AG062677 (PI Ronald Petersen, MD, PhD), P30 AG079280 (PI Jessica Langbaum, PhD), P30 AG062422 (PI Gil Rabinovici, MD), P30 AG066511 (PI Allan Levey, MD, PhD), P30 AG072946 (PI Linda Van Eldik, PhD), P30 AG062715 (PI Sanjay Asthana, MD, FRCP), P30 AG072973 (PI Russell Swerdlow, MD), P30 AG066506 (PI Glenn Smith, PhD, ABPP), P30 AG066508 (PI Stephen Strittmatter, MD, PhD), P30 AG066515 (PI Victor Henderson, MD, MS), P30 AG072947 (PI Suzanne Craft, PhD), P30 AG072931 (PI Henry Paulson, MD, PhD), P30 AG066546 (PI Sudha Seshadri, MD), P30 AG086401 (PI Erik Roberson, MD, PhD), P30 AG086404 (PI Gary Rosenberg, MD), P20 AG068082 (PI Angela Jefferson, PhD), P30 AG072958 (PI Heather Whitson, MD), P30 AG072959 (PI James Leverenz, MD). This work was supported by and occurred as part of the 2021 Advanced Psychometric Methods in Cognitive Aging Research conference funded by the National Institute on Aging (NIA; R13 AG030995, D Mungas, PI).
Gaynor LS, Lopez FV, Van Hulle CA, et al. Measurement equivalence of the UDS version 2.0 and 3.0 neuropsychological batteries. Alzheimer's Dement. 2025;21:e70720. 10.1002/alz.70720
REFERENCES
- 1. Hill CV, Perez‐Stable EJ, Anderson NA, Bernard MA. The National Institute on Aging Health Disparities Research framework. Ethn Dis. 2015;25(3):245‐254. doi: 10.18865/ed.25.3.245 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Avila JF, Renteria MA, Jones RN, et al. Education differentially contributes to cognitive reserve across racial/ethnic groups. Alzheimer's & dementia. 2021;17(1):70‐80. doi: 10.1002/alz.12176 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Manly JJ, Jacobs DM, Sano M, et al. Effect of literacy on neuropsychological test performance in nondemented, education‐matched elders. J Int Neuropsychol Soc. 1999;5(3):191‐202. doi: 10.1017/s135561779953302x [DOI] [PubMed] [Google Scholar]
- 4. Weuve J, Barnes LL, Mendes de Leon CF, et al. Cognitive Aging in Black and White Americans: cognition, cognitive decline, and incidence of Alzheimer disease dementia. Epidemiology (Cambridge, Mass). 2018;29(1):151‐159. doi: 10.1097/EDE.0000000000000747 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Weuve J, Proust‐Lima C, Power MC, et al. Guidelines for reporting methodological challenges and evaluating potential bias in dementia research. Alzheimers Dement. 2015;11(9):1098‐1109. doi: 10.1016/j.jalz.2015.06.1885 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Beller M, Gafni N, Hanani P. Constructing, adapting, and validating admissions tests in multiple languages: the Israeli case. In: Hambleton RKM, PF.; Spielberger CD, eds. Adapting Educational and Psychological Tests for Cross‐Cultural Assessment. Lawrence Erlbaum Associates, Inc; 2005:297‐319. [Google Scholar]
- 7. Horn JL, McArdle JJ. A practical and theoretical guide to measurement invariance in aging research. Exp Aging Res. 1992;18(3‐4):117‐144. doi: 10.1080/03610739208253916 [DOI] [PubMed] [Google Scholar]
- 8. Pedraza O, Mungas D. Measurement in cross‐cultural neuropsychology. Neuropsychol Rev. 2008;18(3):184‐193. doi: 10.1007/s11065-008-9067-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Barnes LL, Yumoto F, Capuano A, Wilson RS, Bennett DA, Tractenberg RE. Examination of the factor structure of a global cognitive function battery across race and time. J Int Neuropsychol Soc. 2016;22(1):66‐75. doi: 10.1017/S1355617715001113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Blankson AN, McArdle JJ. Measurement invariance of cognitive abilities across ethnicity, gender, and time among older Americans. J Gerontol B Psychol Sci Soc Sci. 2015;70(3):386‐397. doi: 10.1093/geronb/gbt106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Dolan CV. Investigating Spearman's hypothesis by means of multi‐group confirmatory factor analysis. Multivariate Behav Res. 2000;35(1):21‐50. doi: 10.1207/S15327906MBR3501_2 [DOI] [PubMed] [Google Scholar]
- 12. Avila JF, Renteria MA, Witkiewitz K, Verney SP, Vonk JMJ, Manly JJ. Measurement invariance of neuropsychological measures of cognitive aging across race/ethnicity by sex/gender groups. Neuropsychology. 2020;34(1):3‐14. doi: 10.1037/neu0000584 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Mungas D, Widaman KF, Reed BR, Tomaszewski Farias S. Measurement invariance of neuropsychological tests in diverse older persons. Neuropsychology. 2011;25(2):260‐269. doi: 10.1037/a0021090 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Vandenberg RJ, Lance CE. A review and synthesis of the measurement invariance literature: suggestions, practices, and recommendations for organizational research. Organ Res Methods. 2000;3(1):4‐70. doi: 10.1177/109442810031002 [DOI] [Google Scholar]
- 15. Morris JC, Weintraub S, Chui HC, et al. The Uniform Data Set (UDS): clinical and cognitive variables and descriptive data from Alzheimer Disease Centers. Alzheimer Dis Assoc Disord. 2006;20(4):210‐216. doi: 10.1097/01.wad.0000213865.09806.92 [DOI] [PubMed] [Google Scholar]
- 16. Monsell SE, Dodge HH, Zhou XH, et al. Results from the NACC Uniform Data Set neuropsychological battery crosswalk study. Alzheimer Dis Assoc Disord. 2016;30(2):134‐139. doi: 10.1097/WAD.0000000000000111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Culhane JE, Chan KCG, Teylan MA, et al. Factor consistency of neuropsychological test battery versions in the NACC Uniform Data Set. Alzheimer Dis Assoc Disord. 2020;34(2):175‐177. doi: 10.1097/WAD.0000000000000376 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Mukherjee S, Choi S‐E, Lee ML, et al. Cognitive domain harmonization and cocalibration in studies of older adults. Neuropsychology. 2023;37(4):409. doi: 10.1037/neu0000835 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Hayden KM, Jones RN, Zimmer C, et al. Factor structure of the National Alzheimer's Coordinating Centers uniform dataset neuropsychological battery: an evaluation of invariance between and within groups over time. Alzheimer Dis Assoc Disord. 2011;25(2):128‐137. doi: 10.1097/WAD.0b013e3181ffa76d [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Kiselica AM, Webber TA, Benge JF. The Uniform Dataset 3.0 neuropsychological battery: factor structure, invariance testing, and demographically adjusted factor score calculation. J Int Neuropsychol Soc. 2020;26(6):576‐586. doi: 10.1017/S135561772000003X [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Weintraub S, Besser L, Dodge HH, et al. Version 3 of the Alzheimer disease centers' neuropsychological test battery in the Uniform Data Set (UDS). Alzheimer Dis Assoc Disord. 2018;32(1):10‐17. doi: 10.1097/WAD.0000000000000223 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Weintraub S, Salmon D, Mercaldo N, et al. The Alzheimer's disease centers’ Uniform Data Set (UDS): the neuropsychological test battery. Alzheimer Dis Assoc Disord. 2009;23(2):91‐101. doi: 10.1097/WAD.0b013e318191c7dd [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Blom G. Statistical Estimates and Transformed Beta‐Variables. Wiley; Almqvist & Wiksell; Chapman & Hall; 1958:176. [Google Scholar]
- 24. Bentler PM. Comparative fit indexes in structural models. Psychol Bull. 1990;107(2):238‐246. doi: 10.1037/0033-2909.107.2.238 [DOI] [PubMed] [Google Scholar]
- 25. Tucker LR, Lewis C. A reliability coefficient for maximum likelihood factor analysis. Psychometrika. 1973;38(1):1‐10. doi: 10.1007/BF02291170 [DOI] [Google Scholar]
- 26. Cudeck R, Browne MW. Cross‐validation of covariance Structures. Multivariate Behav Res. 1983;18(2):147‐167. doi: 10.1207/s15327906mbr1802_2 [DOI] [PubMed] [Google Scholar]
- 27. Enders CK, Bandalos DL. The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Struct Equat Model. 2001;8(3):430‐457. [Google Scholar]
- 28. Little TD, Jorgensen TD, Lang KM, Moore EWG. On the joys of missing data. J Pediatr Psychol. 2014;39(2):151‐162. [DOI] [PubMed] [Google Scholar]
- 29. RStudio: Integrated Development for R. RStudio, PBC; 2020. [Google Scholar]
- 30. R: A language and environment for statistical computing. R Foundation for Statistical Computing; 2025. https://www.R‐project.org/ [Google Scholar]
- 31. Rosseel Y. Lavaan: an R package for structural equation modeling. J Stat Softw. 2012;48(2):1‐36. doi: 10.18637/jss.v048.i02 [DOI] [Google Scholar]
- 32. Widaman KF, Reise SP. Exploring the measurement invariance of psychological instruments: applications in the substance use domain. In: The Science of Prevention: Methodological Advances from Alcohol and Substance Abuse Research. American Psychological Association; 1997:281‐324. [Google Scholar]
- 33. Chen FF. Sensitivity of goodness of fit indexes to lack of measurement invariance. Struct Equat Model. 2007;14(3):464‐504. doi: 10.1080/10705510701301834 [DOI] [Google Scholar]
- 34. Dodge HH, Goldstein FC, Wakim NI, et al. Differentiating among stages of cognitive impairment in aging: version 3 of the Uniform Data Set (UDS) neuropsychological test battery and MoCA index scores. Alzheimers Dement. 2020;6(1):e12103. doi: 10.1002/trc2.12103 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Gross AL, Khobragade PY, Meijer E, Saxton JA. Measurement and structure of cognition in the longitudinal aging study in India‐Diagnostic Assessment of Dementia. J Am Geriatr Soc. 2020;68 Suppl 3(3):S11‐S19. doi: 10.1111/jgs.16738 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Larrabee GJ, Kane RL, Schuck JR. Factor analysis of the WAIS and Wechsler Memory Scale: an analysis of the construct validity of the Wechsler Memory Scale. J Clin Neuropsychol. 1983;5(2):159‐168. doi: 10.1080/01688638308401162 [DOI] [PubMed] [Google Scholar]
- 37. Pedraza O, Lucas JA, Smith GE, et al. Mayo's older African American normative studies: confirmatory factor analysis of a core battery. J Int Neuropsychol Soc: JINS. 2005;11(2):184‐191. doi: 10.1017/s1355617705050204 [DOI] [PubMed] [Google Scholar]
- 38. Freeman RQ, Giovannetti T, Lamar M, et al. Visuoconstructional problems in dementia: contribution of executive systems functions. Neuropsychology. 2000;14(3):415‐426. doi: 10.1037//0894-4105.14.3.415 [DOI] [PubMed] [Google Scholar]
- 39. Avila RT, de Paula JJ, Bicalho MA, et al. Working memory and cognitive flexibility mediates visuoconstructional abilities in older adults with heterogeneous cognitive ability. J Int Neuropsychol Soc. 2015;21(5):392‐398. doi: 10.1017/S135561771500034X [DOI] [PubMed] [Google Scholar]
- 40. Chan CK, Lane KA, Gao S, et al. Referral sources across racial and ethnic groups at alzheimer's disease research centers. J Alzheimers Dis. 2024;101(4):1167‐1176. doi: 10.3233/jad-240485 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Hou CE, Yaffe K, Pérez‐Stable EJ, Miller BL. Frequency of dementia etiologies in four ethnic groups. Dement Geriatr Cogn Disord. 2006;22(1):42‐47. doi: 10.1159/000093217 [DOI] [PubMed] [Google Scholar]
- 42. Siedlecki KL, Manly JJ, Brickman AM, Schupf N, Tang MX, Stern Y. Do neuropsychological tests have the same meaning in Spanish speakers as they do in English speakers?. Neuropsychology. 2010;24(3):402‐411. doi: 10.1037/a0017515 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Zahodne LB, Brauer S, Tarraf W, Morris EP, Antonucci TC, Ajrouch KJ. Measurement and structural invariance of a neuropsychological battery among Middle Eastern/North African, Black, and White older adults. Neuropsychology. 2023;37(8):975‐984. doi: 10.1037/neu0000902 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Gavett BE, Vudy V, Jeffrey M, John SE, Gurnani AS, Adams JW. The delta latent dementia phenotype in the uniform data set: cross‐validation and extension. Neuropsychology. 2015;29(3):344‐352. doi: 10.1037/neu0000128 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supporting Information
Supporting Information
