Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Sep 23.
Published before final editing as: Alzheimers Dement Behav Socioecon Aging. 2025 Sep 17;1(3):10.1002/bsa3.70037. doi: 10.1002/bsa3.70037

Cognitive data harmonization across two racially diverse cohorts in the United States

Michelle Flesaker 1,2,*, A Zarina Kraal 3, Justina F Avila-Rieger 4, M Maria Glymour 2, Jaimie L Gradus 1,2, Emily M Briceño 5, Jennifer J Manly 3, Lindsay C Kobayashi 6, Marcia Pescador Jimenez 2
PMCID: PMC12453055  NIHMSID: NIHMS2110413  PMID: 40988999

Abstract

INTRODUCTION:

Few cohorts have sufficient diversity to identify drivers of racial disparities in cognitive aging. Pooling data from different samples can increase sample size and diversity.

METHODS:

We statistically harmonized cognitive function data from two US cohorts: 2010 Health and Retirement Study (HRS; n=18,422) and 2009–2013 Reasons for Geographic and Racial Differences in Stroke waves (REGARDS; n=19,690). We used confirmatory factor analysis (CFA) to derive harmonized scores for general and domain-specific cognitive function, leveraging common cognitive test items across studies and retaining those unique to each study. We assessed validity of the cognitive scores by regressing them on age, sex/gender, and education.

RESULTS:

The combined sample had a mean age of 67.69 (SD=10.22) years. CFA models had good fit. Harmonized cognitive scores demonstrated good criterion validity.

DISCUSSION:

Pooled analyses of harmonized cognitive scores are a feasible means to increase cohort diversity for understanding drivers of racial disparities in cognitive aging.

Keywords: statistical harmonization, cognitive function, Alzheimer’s, dementia, older adults

1. Background

Black and Hispanic individuals face the highest risk of Alzheimer’s Disease and Alzheimer’s disease and related dementias (AD/ADRD) in the US, relative to individuals from other racial and ethnic groups.13 Impairment of cognitive function (ie., cognitive aging) is a hallmark of AD/ADRD.4 However, the majority of cognitive aging research in the US centers on white older adults, and, as a result, causes of disparities cognitive function have been understudied among older adults of other racial identities. This is an important gap in the literature.

Despite the growing ethnic and racial diversity of the US and increasing proportion of older adults in the population, the lack of national data with substantial racial diversity and assessment of cognitive function serve as the largest barriers to advancing understanding of racial and ethnic disparities in cognitive aging. Individual cohorts, even with oversampling of Black individuals, often include too few Black respondents to deliver precise information on drivers of risk of declining cognitive function in Black adults or determinants of disparities. In the US, the Black population has grown by more than 10 million since 2000, marking a 32% increase over roughly two decades,5 while the Hispanic population accounted for just under 71% of the overall population growth between 2022 and 2023.6 Although the Health and Retirement Study (HRS) oversamples Black and Hispanic adults, the sample sizes are below national proportions, and there is no oversampling for other racial or ethnic minorities.

Inconsistencies across studies in associations between race and ethnicity and risk factors for risk of cognitive aging could be due to methodological reasons such as differing sampling strategies across studies, regional variability, and use of different cognitive tests that vary in their measurement of cognitive function. The use of statistical harmonization with differential item functioning (DIF) across large scale studies using representative samples from different geographic regions to obtain homogenized cognitive scores would be especially relevant to clarifying how risk factors may affect AD/ADRD racial and ethnic disparities. Previous research using the Health and Retirement Study (HRS) International Partner Studies has established that it is possible to combine existing cognitive data sets across HRS cohorts with adequate assessment of cognitive function across racial and ethnic groups.79 Differences in cognitive function assessments between cohorts pose challenges for combining studies, as bias due to incommensurate measurement by test items can distort true group differences in cognitive function.10 However, statistical harmonization methods facilitate the evaluation of questions related to racial disparities in cognitive aging and expand studies to questions that require more statistical power and increased generalizability.10 Previous work using statistical harmonization methods to combine cognitive data from different cohorts have established feasibility in certain samples, including diverse groups of participants.79,1115 However, producing harmonized factor scores combining HRS and REGARDS, two racially diverse cohorts in the US, provides a large and robust foundation upon which to conduct studies exploring determinants of cognitive aging in racially minoritized populations.

To address this gap in the field, we statistically harmonized general and domain-specific cross-sectional cognitive function measurements from well-established and racially diverse cohorts in the United States: HRS and the Reasons for Geographic and Racial Differences in Stroke (REGARDS) Study. In our first aim, we derived statistically harmonized factor scores for general and domain-specific (memory, language, and orientation) cognitive function. Then, we performed criterion validation of the resulting harmonized factor scores by examining their relationships with age, sex/gender, and educational attainment, as these variables are known to be associated with cognitive function. These harmonization methods and our resulting cognitive scores enhance the power of pooled cohorts to advance understanding of drivers of racial/ethnic disparities in cognitive health.

2. Methods

2.1. Data sources

HRS is a US-based longitudinal cohort study of initially non-institutionalized adults 51 years of age and older and their spouses of any age. The cohort is described in detail elsewhere.16 In brief, investigators regularly gather information on financial, social, and physical wellbeing relevant to aging outcomes via in-person and telephone surveys. This nationally representative cohort of the contiguous 48 states oversamples Florida as well as neighborhoods with a high proportion of minority residents. Core survey data are collected every 2 years, starting in 1992. For the purposes of this analysis, we used data from the HRS 2010 Core, which fielded data collection between February 2010 and November 2011.17 We chose the 2010 Core for this harmonization procedure because HRS included several new cognitive function instruments in the 2010 core, including a verbal fluency test (animal naming) that was comparable with a measure in REGARDS. Additionally, the 2010 core was closest in time to the first instance of the full battery of REGARDS baseline cognitive assessments, which began in 2009. All participants provided informed consent and IRB approval was granted through the University of Michigan.

REGARDS is a longitudinal cohort study of non-Hispanic African American and White adults living in the Southeastern U.S. which enrolled participants between 2003–2007. Details on the REGARDS design are described in existing published work.18 Participants were initially sampled randomly and recruited by mail, then administered an in-person baseline survey. REGARDS interviewers asked participants about their demographic, social, psychological, and physical health at baseline, and participants receive follow-up calls every 6 months. The battery of cognitive tests included in the present study was initially implemented in follow-up calls between 2006 and 2009 and synchronized to be included in every 18-month follow-up calls in March 2009. We selected the timing of the REGARDS cognitive tests for the present study to balance comparability with HRS, in which all included participants responded to cognitive tests from 2010–2011. Data included from REGARDS are from the first instance in which each participant was administered the cognitive test battery after these tests were all included and synchronized on the same follow-up schedule in 2009. Because some participants did not respond to their first scheduled instance of the synchronized cognitive test battery, the final data set included REGARDS data collected between March 2009 and December 31st, 2013. All participants of REGARDS provided informed consent and IRB approval was received at all participating institutions.

2.2. Participants

Our analytic sample from HRS included participants aged 47 and older who did not report an Alzheimer’s Disease or dementia diagnosis at the time of the 2010 Core interview and completed the interview in English. To maximize comparability with the REGARDS cohort (aged 47 and older), in HRS we used a 46-year age-cutoff (eliminating n = 525 under 47), which only applied to participants’ spouses, as the main cohort only included individuals 50 years and older. For the statistical harmonization process, we excluded individuals who used a proxy respondent to the HRS survey, as these proxies did not complete the same battery of cognitive tests as direct respondents. We also excluded participants who were missing all cognitive tests of interest, but included all remaining participants, regardless of whether the interview was by telephone or in person. Our final analytic sample from HRS included 18,422 participants.

Our analytic sample from REGARDS included participants free of cognitive impairment at baseline entry into the cohort (2003–2007). By design of the study, participants were 47 years of age and older, responded to the cognitive tests by telephone, did not have proxy responders, and were interviewed in English. Our final analytic sample from REGARDS included 19,690 participants.

2.3. Measures & pre-statistical harmonization

The pre-statistical harmonization methodology used in this project built upon a similar effort using data from the Harmonized Cognitive Assessment Protocol (HCAP), which facilitates cross-national comparisons of cognitive outcomes among older adults across populations with diverse cultural, educational, social, economic, and political contexts.19 We assigned cognitive tests to domains (memory, language, and orientation) based on previous literature7 in consultation with a team of neuropsychologists (authors AZK, JFAR, JJM, and EMB) specialized in cognitive aging and cultural neuropsychology. Then, for each cognitive test in each cohort, we gathered all relevant details on the test wording, scoring, coding, and administration using publicly available materials (See Supplementary Table 1). Based on our pre-statistical assessment of the comparability of measures across cohorts, we identified which cognitive tests were unique to each cohort (“non-linking”) and which were equivalent between cohorts (“linking”), assessed cross-sectionally (Figure 1).

Figure 1.

Figure 1.

Cross-sectional cognitive test item indicators and comparability after pre-statistical harmonization between 2010 Core Health and Retirement Study (HRS; n = 18,422) and 2009–2013 data from the Reasons for Geographic and Racial Differences in Stroke (REGARDS; n = 19,690) cohorts

NINDS = National Institute of Neurological Disorders and Stroke

Cells with check marks indicate the presence of that particular test in this cohort

a HRS cognitive tests listed here were measured every 2 years beginning in 1998 or later, with the exception of the Animal Fluency Test (starred) which began data collection in 2010.

b REGARDS cognitive tests listed here were measured every 2 years beginning at month 18 of follow-up, beginning in 2008; data utilized were collected between March 2009 and December 31st, 2013

c Immediate and delayed word recall tasks vary between HRS and REGARDS: participants receive one learning trial in HRS and three in regards, both to remember a list of 10 words after a delay of 5 minutes

We included additional variables to characterize the sample and examine the criterion validity of the harmonized factor scores. Age was self-reported in HRS and REGARDS and measured continuously in years at the time of cognitive tests. Sex/gender was self-reported by individuals and classified as female or male.20 In HRS, race was self-reported by individuals and classified as White, Black/African American, or other; HRS masks other races due to low sample size. Ethnicity was self-reported by participants and classified as Hispanic/Latino and non-Hispanic/Latino. In REGARDS, race was self-reported by individuals and classified as White or Black/African American, and all participants were non-Hispanic/Latino. Educational attainment was self-reported by participants and categorized as less than high school, high school degree or equivalent, some college, and college graduate and above.

2.4. Statistical analysis

First, we used descriptive statistics (means and standard deviations [SDs] for continuous variables, and frequency and percentage for categorical variables) to characterize the sample by age, sex/gender, race, ethnicity, educational attainment, and mode of cognitive test administration.

Next, following previously used methodology using the HCAP data,7 we implemented confirmatory factor analysis (CFA) models to derive statistically harmonized factor scores to characterize general cognitive function and by domain (Figure 2). First, we ran a CFA model in HRS and saved the parameters (factor loadings and intercepts/thresholds) from the model in an item bank. Factor loadings represent the extent to which each item is correlated with the other items in the factor, where a loading above 0.30 indicates a meaningful relationship between the item and the trait. Thresholds (for categorical items) and intercepts (for continuous items) represent the latent factor score for which the likelihood of responding with that accuracy or higher is 50%.13 Second, we ran a CFA model using REGARDS data in which we constrained parameters for linking items to be equivalent between HRS and REGARDS. For models in each separate cohort, we evaluated model fit. We deemed each model to have adequate fit if it had Comparative Fit Index (CFI) values ≥ 0.90, Root Mean Square Error of Approximation (RMSEA) values ≤ 0.08, and Standardized Root Mean Residual (SRMR) values ≤ 0.08, in each cohort separately, based on previous work.7 We removed two items (state and city naming in REGARDS) due to low standardized factor loadings (<0.20). Because initial model fit was poor, we added the residual correlations between variables theorized to be related:7,12,21 immediate word recall trials with one another (e.g., first trial with second and third) and with delayed recall; letter fluency with animal fluency in REGARDS. After finding a model of good fit, we combined the data sets and ran a third and final CFA, fixing all parameters to be equal to those estimated in the individual cohort CFA models.

Figure 2.

Figure 2.

Pre-statistical and statistical harmonization steps method flow chart

Next, we estimated domain-specific scores separately from the global score. To do this, we repeated the process above, except instead we estimated a CFA model with three factors, one factor for the memory domain, one for the orientation domain, and one for the language domain. As in the general cognitive function model, we added residual correlations between variables as necessary to improve model fit (immediate recall trials with one another and with delayed recall, but only in REGARDS).

We evaluated DIF by cohort for the four linking items. To do so, we used multiple-indicator multiple-cause (MIMIC) models adjusting for age and sex/gender based on documented Mplus code.22 We classified an item of having DIF if the resulting odds ratio or exponentiated standardized beta coefficient for the direct path between the test item and study was outside of the range of 0.5–1.66.7 We then performed DIF-adjusted analyses, accounting for those variables with effect sizes outside of that range. DIF was considered salient if the difference between original and DIF-adjusted scores was >0.3 SD for 10% of the sample or more. We subsequently evaluated DIF by interview mode (in-person/telephone) in the same manner.

We then evaluated marginal reliability for general and domain scores by creating plots examining the relationship between factor score (on the x-axis) and reliability (on the y-axis) by cohort, where reliability was calculated for each participant as one minus the squared standard error of the factor score.7

After analyses checking for uniform DIF and necessary adjustments, the resulting general factor scores indicated the level of overall cognitive function for each participant in a manner which is comparable across cohorts. Factor scores are scaled to have a mean of 0 and a SD of 1 in the pooled sample of HRS and REGARDS. All item parameters are presented in Supplemental Tables 2 (general score) and 3 (domain-specific scores).

We assessed criterion validity of the resulting factor scores by examining the relationships between harmonized cognitive factor scores (both general cognitive scores and by domain) and variables known to be associated with cognitive function, including continuous age, sex/gender (male and female), and educational attainment, in each cohort. This method to establish criterion validity follows the work of several other key harmonization projects.7,9 We used linear regression to quantify the relationships between each of these variables and cognitive scores, adding the other demographic variables as covariates in each model.7 Finally, we repeated these criterion validity analyses in Black participants by cohort and in the pooled cohort to determine whether statistical precision was enhanced with increased sample size.

We conducted descriptive analyses in R version 4.3.223 and confirmatory factor analyses in Mplus. All Mplus code is available in Supplemental File 1.

3. Results

Participant characteristics are presented in Table 1. Of the 38,112 participants in the analytic sample, 48% were in HRS (n = 18,422) and 52% were in REGARDS (n = 19,690). The pooled mean age (m = 67.7 years) and sex/gender distribution (58% female) of participants was similar between cohorts. The majority of participants were white (68%) and non-Hispanic (97%), and REGARDS had more Black/African American participants (36%) than HRS (21%). Educational attainment differed by cohort, where 50% of the HRS cohort had some college and above, while 67% of REGARDS did. Most HRS participants in the analytic sample were interviewed face-to-face (67%), and the remaining were interviewed by telephone (33%). All REGARDS participants were interviewed by telephone.

Table 1.

Characteristics of the analytic sample (n = 38,112) of the 2010 Core Health and Retirement Study (HRS) and the 2009–2013 Reasons for Geographic and Racial Differences in Stroke cohort (REGARDS)

Pooled sample HRS REGARDS
n (%)a n (%)a n (%)a
Sample size, n 38,112 18,422 19,690
Age, years, mean (SD) 67.69 (10.22) 65.77 (11.11) 69.49 (8.94)
Sex/gender
 Female 21,923 (58%) 10,778 (59%) 11,145 (57%)
 Male 16,189 (42%) 7,644 (41%) 8,545 (43%)
Race
 White 26,079 (68%) 13,488 (73%) 12,591 (64%)
 Black/African American 10,922 (29%) 3,823 (21%) 7,099 (36%)
 Other 11,111 (3%) 1,111 (6%) 0 (0%)
Ethnicity
 Non-Hispanic/Latino 36,881 (97%) 17,191 (93%) 19,690 (100%)
 Hispanic/Latino 1,212 (3%) 1,212 (7%) 0 (0%)
Missing 19 (<1%) 19 (<1%) 0 (0%)
Educational attainment
 Less than high school education 4,836 (13%) 3,084 (17%) 1,742 (9%)
 High school education or equivalent 10,871 (29%) 6,040 (33%) 4,831 (25%)
 Some college 9,969 (26%) 4,640 (25%) 5,329 (27%)
 College graduate and above 12,350 (32%) 4,568 (25%) 7,782 (40%)
Missing 86 (<1%) 80 (<1%) 6 (<1%)
Interview mode
 Telephone 25,701 (67%) 6,011 (33%) 19,690 (100%)
 Face-to-face 12,411 (33%) 12,411 (67%) 0 (0%)
a

Statistics are displayed as n (%) unless otherwise noted.

The CFA model fit statistics to estimate general cognition are presented in Table 2. After adding residual correlations, the model fit was adequate. In HRS, the model had generally good fit statistics, including CFI = 0.97, RMSEA = 0.03, and SRMR = 0.06. All values surpassed the cut points we determined in the analytic planning phase. Similarly, in REGARDS, the CFA model had adequate fit statistics, including CFI = 0.90, RMSEA = 0.07, and SRMR = 0.08. In the 3-factor domain solution, correlations between factors are as follows: memory and language (0.636), orientation and language (0.590), memory and orientation (0.603).

Table 2.

Fit statistics for confirmatory factor analysis (CFA) for general cognition within each cohort; 2010 Core Health and Retirement Study (HRS) and REasons for Geographic and Racial Differences in Stroke (REGARDS), 2009–2013 studies (n = 38,112)

HRS (n = 18,422) REGARDS (n = 19,690)
Fit statistic Value Assessmenta Value Assessmenta
Comparative Fit Index (CFI) 0.97 Good 0.90 Adequate
Root Mean Square Error of Approximation (RMSEA) 0.03 Good 0.07 Adequate
Standardized Root Mean Square Residual (SRMR) 0.06 Adequate 0.08 Adequate

HRS = Health and Retirement Study; REGARDS = Reasons for Geographic and Racial Differences in Stroke cohort

a

Fit statistics are evaluated according to the following criteria: good if CFI ≥ 0.95, RMSEA ≤ 0.05, SRMR ≤ 0.05; adequate if CFI ≥ 0.90, RMSEA ≤ 0.08, SRMR ≤ 0.08.

In MIMIC models examining DIF by study, only day of the week had DIF-adjusted estimates outside of the cutoff range (OR in general cognitive function score = 1.96; OR in domain score = 3.47; Supplemental Table 4). We then re-estimated the CFA models for general and domain-specific cognitive function, adjusting for DIF by day of the week and determined for how many participants the adjusted factor scores differed from the original by more than 0.30 SD. For general cognitive function, this happened for n = 130 individuals, representing 0.34% of the total sample. For domain-specific cognitive function, this happened for n = 646 individuals, representing 1.70% of the total sample. Because this was below the previously established 10% cutoff, we did not consider there to be salient DIF and proceeded with the unadjusted cognitive function scores.

In MIMIC models examining DIF by interview mode (Supplemental Table 5), no variables had a DIF-adjusted estimate outside of the cutoff range. We therefore did not consider there to be salient DIF according to mode of interview and proceeded with cognitive function scores unadjusted for interview mode.

Marginal reliability plots are provided in Supplemental Figures 14. Reliability of REGARDS factor scores was somewhat higher than for HRS, particularly for those with scores above the mean. Additionally, reliability of the orientation factor score was low, with all participants below generally accepted criteria, especially in REGARDS.24

Analyses examining criterion validity of resulting harmonized scores are detailed in Table 3. In linear regression analyses of the pooled data jointly adjusting for all relevant covariates (age, sex/gender, and educational attainment), older age was associated with lower mean general cognitive function scores (β = −0.03 per year, 95% CI = −0.03, −0.03). Older age was also associated with lower mean memory (β = −0.03, 95% CI = −0.03, −0.03), language (β = −0.03, 95% CI = −0.03, −0.03), and orientation scores (β = −0.02, 95% CI = −0.02, −0.02). Relationships between age and cognitive scores were similar among HRS and REGARDS cohorts separately. In pooled analyses, females had higher harmonized cognitive scores than males, particularly for memory (β = 0.29, 95% CI = 0.28, 0.31) and general cognition (β = 0.22, 95% CI = 0.20, 0.23). The magnitude of the association between sex/gender and cognitive score was greater among REGARDS than HRS participants. Finally, higher levels of educational attainment were associated with higher cognitive function scores, with largest associations in the pooled sample for language scores (College graduate compared to less than high school: β = 0.79, 95% CI = 0.77, 0.81). The magnitude of the association between education and cognitive score was greater among HRS compared to REGARDS participants.

Table 3.

Cross-sectional linear associations between demographic factors known to be associated with cognitive functioning and harmonized general cognitive scores in the analytic sample, 2009–2013 (n = 38,112)

Pooled HRS REGARDS
β (95% CI) β (95% CI) β (95% CI)
General
Agea −0.03 (−0.03, −0.03) −0.02 (−0.02, −0.02) −0.03 (−0.04, −0.03)
Sex/gender
 Male 0.0 (ref) 0.0 (ref) 0.0 (ref)
 Female 0.22 (0.20, 0.23) 0.14 (0.12, 0.16) 0.28 (0.26, 0.30)
Degree
 Less than HS 0.0 (ref) 0.0 (ref) 0.0 (ref)
 HS grad or eq 0.37 (0.34, 0.39) 0.39 (0.36, 0.42) 0.36 (0.32, 0.40)
 Some college 0.54 (0.51, 0.56) 0.57 (0.54, 0.60) 0.54 (0.51, 0.58)
 College grad 0.78 (0.75, 0.80) 0.86 (0.83, 0.89) 0.77 (0.73, 0.80)
Memory
Agea −0.03 (−0.03, −0.03) −0.02 (−0.03, −0.02) −0.03 (−0.04, −0.03)
Sex/gender
 Male 0.0 (ref) 0.0 (ref) 0.0 (ref)
 Female 0.29 (0.28, 0.31) 0.27 (0.25, 0.30) 0.30 (0.28, 0.32)
Degree
 Less than HS 0.0 (ref) 0.0 (ref) 0.0 (ref)
 HS grad or eq 0.38 (0.36, 0.41) 0.42 (0.39, 0.46) 0.36 (0.32, 0.40)
 Some college 0.54 (0.51, 0.57) 0.60 (0.56, 0.63) 0.54 (0.50, 0.58)
 College grad 0.77 (0.75, 0.80) 0.89 (0.85, 0.93) 0.75 (0.72, 0.79)
Orientation
Agea −0.02 (−0.02, −0.02) −0.02 (−0.02, −0.02) −0.02 (−0.02, −0.02)
Sex/gender
 Male 0.0 (ref) 0.0 (ref) 0.0 (ref)
 Female 0.16 (0.14, 0.17) 0.14 (0.13, 0.16) 0.16 (0.15, 0.18)
Degree
 Less than HS 0.0 (ref) 0.0 (ref) 0.0 (ref)
 HS grad or eq 0.29 (0.27, 0.31) 0.33 (0.30, 0.35) 0.24 (0.21, 0.27)
 Some college 0.42 (0.40, 0.44) 0.46 (0.43, 0.48) 0.39 (0.36, 0.42)
 College grad 0.61 (0.59, 0.63) 0.69 (0.66, 0.72) 0.56 (0.53, 0.59)
Language
Agea −0.03 (−0.03, −0.02) −0.02 (−0.02, −0.02) −0.03 (−0.03, −0.03)
Sex/gender
 Male 0.0 (ref) 0.0 (ref) 0.0 (ref)
 Female 0.11 (0.09, 0.12) 0.10 (0.08, 0.12) 0.10 (0.08, 0.12)
Degree
 Less than HS 0.0 (ref) 0.0 (ref) 0.0 (ref)
 HS grad or eq 0.32 (0.30, 0.35) 0.36 (0.34, 0.39) 0.28 (0.24, 0.32)
 Some college 0.51 (0.49, 0.53) 0.55 (0.52, 0.58) 0.49 (0.45, 0.53)
 College grad 0.79 (0.77, 0.81) 0.84 (0.81, 0.87) 0.77 (0.74, 0.81)

Models are jointly adjusted for all other presented demographic variables; HRS = Health and Retirement Study; REGARDS = Reasons for Geographic and Racial Differences in Stroke cohort; β = unstandardized beta coefficient from linear regression; CI = confidence interval; HS = high school

a

Age is examined as a continuous variable.

Analyses examining changes in precision of the pooled sample of Black/African American participants are presented in Table 4. The precision of estimates for relationships between all examined variables (age, sex/gender, degree) and general cognitive score was greater among the pooled (n = 10,922) compared to the HRS (n = 3,823) and REGARDS (n = 7,099) samples alone. For example, the CI for the relationship between sex/gender and general cognitive score was narrower among the pooled sample (β = 0.27, 95% CI = 0.24, 0.30; SE = 0.017) compared to HRS Black/African American participants (β = 0.14, 95% CI = 0.08, 0.20; SE = 0.029) or REGARDS Black/African American participants (β = 0.32, 95% CI = 0.28, 0.36; SE = 0.021) separately.

Table 4.

Cross-sectional linear associations between demographic factors known to be associated with cognitive functioning and general harmonized cognitive scores among Black participants in the analytic sample, 2009–2013 (n = 10,922)

Pooled (n = 10,922) HRS (n = 3,823) REGARDS (n = 7,099)
β (95% CI) SE β (95% CI) SE β (95% CI) SE
Agea −0.035 (−0.037, −0.034) 0.001 −0.029 (−0.031, −0.026) 0.001 −0.041 (−0.044, −0.039) 0.001
Sex/gender
 Male 0.0 (ref) 0.0 (ref) 0.0 (ref)
 Female 0.27 (0.24, 0.30) 0.017 0.14 (0.08, 0.20) 0.029 0.32 (0.28, 0.36) 0.021
Degree
 Less than HS 0.0 (ref) 0.0 (ref) 0.0 (ref)
 HS grad or eq 0.39 (0.34, 0.44) 0.025 0.42 (0.34, 0.49) 0.037 0.37 (0.31, 0.44) 0.033
 Some college 0.68 (0.63, 0.73) 0.025 0.77 (0.69, 0.85) 0.039 0.62 (0.56, 0.69) 0.033
 College grad 0.92 (0.87, 0.97) 0.025 1.03 (0.94, 1.11) 0.049 0.86 (0.79, 0.92) 0.033

Models are jointly adjusted for all other presented demographic variables; HRS = Health and Retirement Study; REGARDS = Reasons for Geographic and Racial Differences in Stroke cohort; β = unstandardized beta coefficient from linear regression; CI = confidence interval; SE = standard error; HS = high school

a

Age is examined as a continuous variable.

4. Discussion

We used statistical harmonization methods to generate cognitive scores for general, memory, orientation, and language domains across two US-based cohorts, HRS and REGARDS to facilitate further investigation of cognitive function across minoritized populations. Analysis of factor models found that the models fit well after implementing residual correlations, and resulting factor scores demonstrated criterion validity through correlations with factors known to relate to cognitive function. These findings generated a pooled sample of nearly 40,000 adults with the advantage of enhanced precision among the Black/African American participants compared to each individual sample. We used two well-established NIH-funded cohorts with exceptional racial and ethnic diversity, and cognitive assessment. Using statistical harmonization with DIF, we were able to assess harmonize cognitive tests across studies on representations of general and specific cognitive domains.15 The harmonized dataset will open pathways to future research on modifiable risk factors of cognitive aging across race in the pooled sample. This work is particularly critical because AD/ADRD is a prominent and growing public health concern among racial minorities, such as Black populations.

Our results are consistent with and add to a growing body of literature demonstrating the feasibility of pooling cognitive function measures across cohort studies in a methodologically sound manner with statistical harmonization. In particular, a rigorous statistical harmonization study on which many aspects of our analytic approach are modeled, Gross et al.,7 similarly reported that many of their CFAs had good fit and found criterion validity regression results of comparable direction and magnitude for their US-based cohort. However, Gross et al., like many robust statistical harmonization studies, focused on cross-national comparisons,7,9,13,14 which, while of critical general health importance, have distinct considerations and potential applications from our racially and ethnically diverse US-based sample. We expand upon two US-based harmonization efforts25,26 to include HRS, a frequently used and well-established US-based cohort. Our study therefore provides novel insight into deriving harmonized cognitive function measures across the diverse racial/ethnic population of the US for use in studying critical disparities in risk of cognitive aging.

This study has several strengths.19,27 First, this study provides a reproducible process, including publicly available R and Mplus code (see Supplemental Materials) and pre-statistical harmonization analysis for use in future harmonization studies and use of resulting harmonized scores. Second, this method generated scores with criterion validity in a large, diverse pooled US-based sample with sufficient racial diversity to study disparities in cognitive function and their determinants. This is a novel contribution to the cognitive function literature.

However, there are also limitations to consider. While we undertook a thorough pre-statistical harmonization process under the guidance of neuropsychologists, there were some remaining limitations regarding the comparability of measurements by cohort.27 First, all REGARDS interviews took place over the phone, but only 27% of HRS interviews occurred over the phone, with the remaining being face-to-face. Previous work has found that there are small differences between cognitive test performance by those responding on the phone compared to those in person, particularly in word recall assessments.28 Although we found no evidence of DIF by test mode in our linking items, there may remain small differences in test performance by mode. Second, there was only one linking item in the memory domain, immediate word recall, which may limit the validity of the memory factor if this item does not truly perform as a linking item. Furthermore, the single immediate word recall list in REGARDS differed from the four immediate word recall lists in HRS (Supplemental Table 6), although there were a few overlapping words (butter, engine, letter). This difference occurred because HRS administered variants of the CERAD immediate word recall lists to minimize learning effects.29 While the word lists differ in their exact words, all lists similarly contain 10 frequently used one- to two-syllable words (A and AA rated from Thorndike & Lorge30) and a previous study that harmonized the CERAD immediate recall test across different languages of administration did not observe DIF across variants of this item with differing words in different languages.7 Therefore, we believe this linking item and the overall memory factor is valid. Notably, we were unable to perform marginal liability analyses for memory domain scores due to the over-identified model. Additionally, the criterion validation analyses provided in this paper are limited to analyses of the correlations between harmonized factor scores and variables known to be associated with cognitive function (age, sex/gender, and education), as is common in the literature.7,9 More work evaluating the relationships between these or similar harmonized cognitive scores and other indicators of cognitive function (e.g., biomarkers, clinical diagnosis) is necessary. Furthermore, to achieve adequate model fit of our confirmatory factor analysis models, we added residual correlations between individual test items. While these structures do not affect the validity of our results, they reduce the generalizability of these models. We also note that using a 3-factor model to estimate domain factor scores may elevate correlations between factors compared to modeling each domain separately by Wright’s rules. Additionally, users of the resultant cognitive scores should be aware that there is a risk of increased Type II errors as compared with cognitive function measurements in individual cohorts, particularly because our harmonized scores rely on the use of individual cognitive test items. Finally, as individuals who responded to the surveys with a proxy did not complete the same battery of cognitive tests as did those who did not use a proxy, these individuals, who are likely the most cognitively impaired, were not included in this analysis. However, future analyses using these harmonized factor scores to study cognitive function disparities can leverage past work which has provided methods to pool proxy and non-proxy cognitive interviews in HRS.31

Our statistical harmonization methods presented herein provide a critical foundation for future analyses implementing cognitive harmonization and using harmonized scores to examine disparities in cognitive function by race and ethnicity. We included detailed pre-statistical considerations and robust code to facilitate replication of our analysis. The harmonized cognitive data resulting from HRS (N = 40,000+) and REGARDS (N= 30,000+) would yield an unprecedented large-scale study (resulting N =70,000+) using representative samples from different geographic regions that has lacked in previous work. The resulting harmonized scores will allow future work on cognitive health across national-level samples representing racial and ethnic diversity of the US. While additional work including longitudinal data, incorporating cognitive function data proxy responders, and expanding analyses to examine external validity is recommended to maximize potential utility in subsequent analyses, these methods represent a large step toward leveraging existing data sources to perform needed racial and ethnic disparities research in cognitive function.

Supplementary Material

Supplemental files B
Supplemental files A

Acknowledgments:

This research project is supported by cooperative agreement U01 NS041588 co-funded by the National Institute of Neurological Disorders and Stroke (NINDS) and the National Institute on Aging (NIA), National Institutes of Health, Department of Health and Human Service. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NINDS or the NIA. Representatives of the NINDS were involved in the review of the manuscript but were not directly involved in the collection, management, analysis or interpretation of the data. The authors thank the other investigators, the staff, and the participants of the REGARDS study for their valuable contributions. A full list of participating REGARDS investigators and institutions can be found at: https://www.uab.edu/soph/regardsstudy/. The HRS (Health and Retirement Study) is sponsored by the National Institute on Aging (grant number NIA U01AG009740) and is conducted by the University of Michigan.

Funding sources:

Marcia Pescador Jimenez is supported by NIH grants R00AG066949 and R01AG087199. Lindsay C. Kobayashi is supported by NIH grant AG070953.

Footnotes

Conflicts: The authors report there are no competing interests to declare.

Consent statement: All human subjects included in this study provided informed consent.

References

  • 1.Babulal GM, Quiroz YT, Albensi BC, et al. Perspectives on ethnic and racial disparities in Alzheimer’s disease and related dementias: Update and areas of immediate need. Alzheimer’s & dementia. 2019;15(2):292–312. doi: 10.1016/j.jalz.2018.09.009 [DOI] [Google Scholar]
  • 2.Mehta KM, Yeo GW. Systematic review of dementia prevalence and incidence in United States race/ethnic populations. Alzheimer’s & Dementia. 2017;13(1):72–83. doi: 10.1016/j.jalz.2016.06.2360 [DOI] [Google Scholar]
  • 3.Matthews KA, Xu W, Gaglioti AH, et al. Racial and ethnic estimates of Alzheimer’s disease and related dementias in the United States (2015–2060) in adults aged ≥65 years. Alzheimer’s & Dementia. 2019;15(1):17–24. doi: 10.1016/j.jalz.2018.06.3063 [DOI] [Google Scholar]
  • 4.Albert MS. Changes in Cognition. Neurobiology of aging. 2011;32(1):S58–S63. doi: 10.1016/j.neurobiolaging.2011.09.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Moslimani M, Tamir C, Budiman A, Noe-Bustamante L, Mora L. Facts About the U.S. Black Population. https://www.pewresearch.org/social-trends/fact-sheet/facts-about-the-us-black-population/. Updated 2024. Accessed Jan 9, 2025
  • 6.U.S. Census Bureau. New Estimates Highlight Differences in Growth Between the U.S. Hispanic and Non-Hispanic Populations. https://www.census.gov/newsroom/press-releases/2024/population-estimates-characteristics.html#:~:text=JUNE%2027%2C%202024%20–%20Between%202022,from%20the%20U.S.%20Census%20Bureau.. Updated 2024. Accessed Jan 9, 2025
  • 7.Gross AL, Li C, Briceño EM, et al. Harmonisation of later-life cognitive function across national contexts: results from the Harmonized Cognitive Assessment Protocols. The Lancet. Healthy longevity 2023;4(10):e573–e583. doi: 10.1016/S2666-7568(23)00170-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Kobayashi LC, Jones RN, Briceño EM, et al. Cross-national comparisons of later-life cognitive function using data from the Harmonized Cognitive Assessment Protocol (HCAP): Considerations and recommended best practices. Alzheimer’s & dementia. 2024;20(3):2273–2281. doi: 10.1002/alz.13694 [DOI] [Google Scholar]
  • 9.Arce Rentería M, Briceño EM, Chen D, et al. Memory and language cognitive data harmonization across the United States and Mexico. Alzheimer’s & dementia : diagnosis, assessment & disease monitoring. 2023;15(3):e12478–n/a. doi: 10.1002/dad2.12478 [DOI] [Google Scholar]
  • 10.Gavett BE, Ilango SD, Koscik R, et al. Harmonization of cognitive screening tools for dementia across diverse samples: A simulation study. Alzheimer’s & dementia : diagnosis, assessment & disease monitoring. 2023;15(2):e12438–n/a. doi: 10.1002/dad2.12438 [DOI] [Google Scholar]
  • 11.Kobayashi LC, Gross AL, Gibbons LE, et al. You Say Tomato, I Say Radish: Can Brief Cognitive Assessments in the U.S. Health Retirement Study Be Harmonized With Its International Partner Studies? The journals of gerontology. Series B, Psychological sciences and social sciences 2021;76(9):1767–1776. doi: 10.1093/geronb/gbaa205 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Mukherjee S, Choi S, Lee ML, et al. Cognitive Domain Harmonization and Cocalibration in Studies of Older Adults. Neuropsychology. 2023;37(4):409–423. doi: 10.1037/neu0000835 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Vonk JMJ, Gross AL, Zammit AR, et al. Cross-national harmonization of cognitive measures across HRS HCAP (USA) and LASI-DAD (India). PloS one. 2022;17(2):e0264166. doi: 10.1371/journal.pone.0264166 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Giorgio J, Tanna A, Malpetti M, et al. A robust harmonization approach for cognitive data from multiple aging and dementia cohorts. Alzheimer’s & dementia : diagnosis, assessment & disease monitoring. 2023;15(3):e12453–n/a. doi: 10.1002/dad2.12453 [DOI] [Google Scholar]
  • 15.Gross AL, Mungas DM, Crane PK, et al. Effects of Education and Race on Cognitive Decline: An Integrative Study of Generalizability Versus Study-Specific Results. Psychology and Aging. 2015;30(4):863–880. doi: 10.1037/pag0000032 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Sonnega A, Faul JD, Ofstedal MB, Langa KM, Phillips JW, Weir DR. Cohort Profile: the Health and Retirement Study (HRS). International Journal of Epidemiology. 2014;43(2):576–585. doi: 10.1093/ije/dyu067 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Health and Retirement Study. (2010 Core) public use data set.. Updated 2012
  • 18.Howard VJ, Cushman M, Pulley L, et al. The Reasons for Geographic and Racial Differences in Stroke Study: Objectives and Design. Neuroepidemiology. 2005;25(3):135–143. doi: 10.1159/000086678 [DOI] [PubMed] [Google Scholar]
  • 19.Briceño EM, Arce Rentería M, Gross AL, et al. A Cultural Neuropsychological Approach to Harmonization of Cognitive Data Across Culturally and Linguistically Diverse Older Adult Populations. Neuropsychology. 2023;37(3):247–257. doi: 10.1037/neu0000816 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Adkins-Jackson PB, George KM, Besser LM, et al. The structural and social determinants of Alzheimer’s disease related dementias. Alzheimer’s & dementia. 2023;19(7):3171–3185. doi: 10.1002/alz.13027 [DOI] [Google Scholar]
  • 21.Gibbons RD, Bock RD, Hedeker D, et al. Full-Information Item Bifactor Analysis of Graded Response Data. Applied psychological measurement. 2007;31(1):4–19. doi: 10.1177/0146621606289485 [DOI] [Google Scholar]
  • 22.Brown TA. Confirmatory factor analysis for applied research. Second edition ed. The Guilford Press; 2015:278–279. [Google Scholar]
  • 23.R Core Team. R: A Language and Environment for Statistical Computing. 2023;4.3.0
  • 24.Nunnally JC. Psychometric theory. Third edition ed. McGraw-Hill, inc; 1994. http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=006514109&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA [Google Scholar]
  • 25.Levine DA, Gross AL, Briceño EM, et al. Association Between Blood Pressure and Later-Life Cognition Among Black and White Individuals. Archives of neurology (Chicago). 2020;77(7):810–819. doi: 10.1001/jamaneurol.2020.0568 [DOI] [Google Scholar]
  • 26.Levine DA, Chen B, Galecki AT, et al. Associations Between Vascular Risk Factor Levels and Cognitive Decline Among Stroke Survivors. JAMA network open. 2023;6(5):e2313879. doi: 10.1001/jamanetworkopen.2023.13879 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Briceño EM, Gross AL, Giordani BJ, et al. Pre-Statistical Considerations for Harmonization of Cognitive Instruments: Harmonization of ARIC, CARDIA, CHS, FHS, MESA, and NOMAS. Journal of Alzheimer’s disease. 2021;83(4):1803–1813. doi: 10.3233/JAD-210459 [DOI] [Google Scholar]
  • 28.Smith JR, Gibbons LE, Crane PK, et al. Shifting of Cognitive Assessments Between Face-to-Face and Telephone Administration: Measurement Considerations. The journals of gerontology. Series B, Psychological sciences and social sciences 2023;78(2):191–200. doi: 10.1093/geronb/gbac135 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Ofstedal MB, Fisher G, Herzog AR. Documentation of Cognitive Functioning Measures in the Health and Retirement Study. 2005. doi: 10.7302/24800 [DOI] [Google Scholar]
  • 30.Thorndike EL, Lorge I. The teacher’s word book of 30,000 words. Bureau of publications, Teachers College, Columbia University; 1944 [Google Scholar]
  • 31.Wu Q, Tchetgen Tchetgen EJ, Osypuk TL, White K, Mujahid M, Maria Glymour M. Combining Direct and Proxy Assessments to Reduce Attrition Bias in a Longitudinal Study. Alzheimer disease and associated disorders. 2013;27(3):207–212. doi: 10.1097/WAD.0b013e31826cfe90 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental files B
Supplemental files A

RESOURCES