Skip to main content
Alzheimer's & Dementia logoLink to Alzheimer's & Dementia
. 2024 Mar 27;20(5):3305–3321. doi: 10.1002/alz.13774

Longitudinal normative standards for cognitive tests and composites using harmonized data from two Wisconsin AD‐risk‐enriched cohorts

Erin M Jonaitis 1,2, Bruce P Hermann 3, Kimberly D Mueller 4,5, Lindsay R Clark 5,6, Lianlian Du 1, Tobey J Betthauser 2,7, Karly Cody 2, Carey E Gleason 2,5,6,7, Bradley T Christian 2,8,9, Sanjay Asthana 2,7, Richard J Chappell 10,11, Nathaniel A Chin 2,5, Sterling C Johnson 1,2,6,7, Rebecca E Langhough 1,2,7,
PMCID: PMC11095443  PMID: 38539269

Abstract

INTRODUCTION

Published norms are typically cross‐sectional and often are not sensitive to preclinical cognitive changes due to dementia. We developed and validated demographically adjusted cross‐sectional and longitudinal normative standards using harmonized outcomes from two Alzheimer's disease (AD) risk‐enriched cohorts.

METHODS

Data from the Wisconsin Registry for Alzheimer's Prevention and the Wisconsin Alzheimer's Disease Research Center were combined. Quantile regression was used to develop unconditional (cross‐sectional) and conditional (longitudinal) normative standards for 18 outcomes using data from cognitively unimpaired participants (N = 1390; mean follow‐up = 9.25 years). Validity analyses (N = 2456) examined relationships between percentile scores (centiles), consensus‐based cognitive statuses, and AD biomarker levels.

RESULTS

Unconditional and conditional centiles were lower in those with consensus‐based impairment or biomarker positivity. Similarly, quantitative biomarker levels were higher in those whose centiles suggested decline.

DISCUSSION

This study presents normative standards for cognitive measures sensitive to pre‐clinical changes. Future directions will investigate potential clinical applications of longitudinal normative standards.

Highlights

  • Quantile regression was used to construct longitudinal norms for cognitive tests.

  • Poorer percentile scores were related to concurrent diagnosis and Alzheimer's disease biomarkers.

  • A ShinyApp was built to display test scores and norms and flag low performance.

Keywords: neuropsychology, normative standards, preclinical cognitive change

1. BACKGROUND

Modest declines in memory and executive functioning are commonly observed even in “typical” aging. 1 , 2 , 3 Studies with several years of assessment prior to diagnoses of mild cognitive impairment (MCI) or dementia have shown that the first signs of cognitive decline may appear 4 to 8 years before an MCI diagnosis 4 and even longer before a dementia diagnosis. 5 , 6 Moreover, declines associated with neurodegenerative diseases are typically steeper than in healthy aging. In a recent systematic review of 35 studies, 7 change points associated with accelerated decline were identified 3–7 years before MCI and 1–11 years before dementia diagnoses. The ability to use model‐based estimates to distinguish typical aging from nascent cognitive decline due to neurodegenerative disease may provide an opportunity for early intervention that can prevent or delay the onset of dementia.

Diagnosis of dementia or MCI typically relies on use of published norms and clearly defined criteria. 8 , 9 Published norms, however, are often not sensitive to the earliest signs of prodromal decline. 10 , 11 In addition, when assessment protocols use neuropsychological tests from varying sources, published norms vary in ways that impact generalizability. For example, one test's norms may be age adjusted while another test's norms may be age, sex, and education adjusted. Other methods have been developed, including reliable change methods 12 , 13 and robust internal norms, 1 , 14 , 15 , 16 , 17 , 18 , 19 which use distance from estimated average performance to identify worrisome cognitive performance. While these approaches have been shown to identify persons at risk of progressing to MCI or dementia, their focus on modeling the mean may not directly capture what we most want to understand: the characteristics of impaired people, whose performance is far from the mean and/or follows a different trajectory.

An alternate modeling strategy well‐suited to this context is quantile regression, which allows researchers to predict particular percentile(s), and in this way model quantities associated with impairment directly. Recent work has demonstrated the advantage of quantile regression over linear regression 20 and deployed this modeling framework to construct internal norms or standards. 21 , 22 Briefly, unconditional (normative) standards refer to estimates of a person's performance relative to their in‐cohort peers, as defined on the basis of age, sex, educational attainment, and literacy. In turn, conditional (normative) standards reflect a person's performance relative to those predictors plus their own baseline performance and the amount of practice they have had with the battery. Unconditional standards reflect local norms that are conceptually similar to published norms, but which may be more sensitive in identifying lower‐than‐expected performance in research cohorts. Conditional standards reflect unusual change from baseline, and so are more akin to reliable change indices or other measures of performance change. Conditional norms may help identify people who have declined from baseline but whose performance remains above cross‐sectional impairment thresholds.

Although internal norms may offer greater sensitivity when a cohort differs from the general population in a significant way, 23 sufficient sample sizes are needed to promote reliable estimates across the demographics on which the norms are based. Here, we considered the possibility of combining cohorts of similar middle‐aged and older participants to create updated norms relative to a consistent set of covariates that can be used across cohorts. This study leveraged data from two Wisconsin longitudinal mid‐life Alzheimer's disease (AD) risk‐enriched cohorts with similar neuropsychological test protocols to create demographically adjusted norms. We first harmonized individual tests across the two cohorts and used them to generate cognitive composite scores likely to be sensitive to preclinical AD‐related change. Using data from cognitively unimpaired (CU) participants from these cohorts, we generated unconditional (i.e., cross‐sectional) and conditional (i.e., longitudinal) normative standards for cognitive performance for harmonized tests and composite scores. We then examined criterion validity evidence for the normative standards. Finally, we created a ShinyApp 24 that depicts performance of hypothetical subjects relative to the unconditional and conditional normative standards, which we illustrate with a case example.

2. METHODS

2.1. Overview of study cohorts

2.1.1. Wisconsin Registry for Alzheimer's Prevention

The Wisconsin Registry for Alzheimer's Prevention (WRAP) is a longitudinal study of a cohort enriched for risk of dementia by virtue of a parental family history of AD. Recruitment and assessment began in 2001, enrolling cognitively healthy participants ages 40–65, and participants return approximately biennially for follow‐up visits. Special efforts are made to recruit participants from Native American and African American communities. Complete study details are presented in Johnson et al. 25

2.1.2. Wisconsin Alzheimer's Disease Research Center

The Wisconsin Alzheimer's Disease Research Center (WADRC) comprises three longitudinal cohorts at varying stages of disease progression, enrolling cognitively healthy participants ages 45–65 and adults of any age with MCI and dementia. Recruitment and assessment began in 2009, and participants return annually for follow‐up visits (biennially for those who are CU and <65 years old); as in WRAP, the WADRC cohorts are enriched for risk of dementia due to enrolling a high proportion with a parental history of AD. Special efforts are made to recruit participants from Native American and African American communities. WADRC and WRAP investigators collaborate closely, which has allowed for coordinated implementation of test protocols in many cases.

RESEARCH IN CONTEXT
  1. Systematic review: Using primarily Google Scholar and PubMed, search terms included variations of normative standards, quantile regression, validity, algorithmic/actuarial cognitive status, pre‐clinical decline, cognitive composites.

  2. Interpretation: Understanding the earliest cognitive signs of dementia‐related changes remains a critical challenge in clinical care and trial design. Published neuropsychological norms are generally only cross‐sectional and not very sensitive to subtle decline. Having co‐normed tests and composites facilitates deeper understanding of pre‐dementia cognitive changes.

  3. Future directions: Future analyses will examine additional ways of defining algorithmic cognitive decline that may be useful for understanding preclinical changes and/or increased risk of dementia. In addition, analyses will investigate whether longitudinal norms may be useful as response‐to‐treatment markers.

2.1.3. Helsinki declaration

Human subjects participation in these studies was approved by the University of Wisconsin Health Sciences Institutional Review Board, in accordance with the Helsinki declaration. All participants provided informed consent.

2.2. Inclusion/exclusion criteria

2.2.1. Normative standards development sample flieu of expanding

Work on the standards began in Fall 2018, using a May 2018 data freeze. Building on the concept of robust norms, 17 inclusion in this sample was determined on a test‐wise basis and required at least three consecutive observations, starting from the first administration of the test, at which the participant was between the ages of 40 and 85 and CU by consensus review (CU; see Section 2.3.3 for consensus review details). From this set, participants were excluded who reported multiple sclerosis, stroke, Parkinson's disease, epilepsy/seizure disorder, bi‐polar disorder, and schizophrenia in the first two observations. For eligible participants, all visits were included in the standards development sample until the first non‐CU cognitive status occurred (e.g., MCI or dementia) resulting in 1390 unique individuals (1083 WRAP, 307 WADRC) being eligible for inclusion in standards development for at least one test (see additional details in Section 2.3).

2.2.2. Hypothesis testing samples

Hypotheses related to validity were tested on a larger, racially representativea set of participants. This sample overlapped with the standards development sample but included data through May 2022, allowing time for more participants to progress to clinical impairment. Inclusion in this sample was also test‐wise and required participants to be CU at the first assessment and to have at least three observations, so as to enable testing of hypotheses related to predictive validity (see Section 2.5 for additional sample details). Among this set, those with at least one amyloid (A) and/or tau (T) positron emission tomography (PET) scan were included in the subsets used for testing hypotheses related to AD biomarkers.

2.3. Cognitive assessments

2.3.1. Study batteries

Participants in each cohort complete a comprehensive neuropsychological test battery at each study visit. The WRAP battery has been described elsewhere. 25 Recent changes include the substitution of the Multilingual Naming Test 26 for the Boston Naming Test (BNT). WADRC uses the Uniform Datasets (UDS) recommended by the National Alzheimer's Coordinating Center. 27 WADRC testing began in 2009 with UDS2, and in March, 2015 the center switched to UDS3. 28 In addition, WADRC collects some measures that are in WRAP but not in the UDS, including the Rey Auditory Verbal Learning Test (AVLT 29 ). For these analyses, we initially identified 13 individual tests common to both batteries and/or with validated alternatives for which a nonparametric mapping of equivalent scores, or crosswalk, had been developed. We also used a subset of these to calculate cognitive composites (see Table 1 for a list of tests and composites and Section 2.5.1.2 for details on the composites). Tests represented domains of learning, memory, executive function, and language. Following the start of the severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) pandemic, some assessments were conducted over the telephone. All scores in the normative standards development sample came from in‐person assessments; among 11,559 assessments eligible for inclusion in hypothesis testing, 836 (7%) were conducted remotely.

TABLE 1.

Qualifying N and reasons for exclusion from the normative standards‐development sample, for individual tests (top half of table) and composites (bottom half)

Outcome Cross‐walked from Usable N Clin diagnosis in first 3 visits Neuro diagnosis in first 2 visits Psych diagnosis in first 2 visits Outside age range Too few visits
Single‐test outcomes
Rey AVLT Trials 1–5 Total 1380 325 98 18 21 608
Rey AVLT Trial 7 Delayed Recall 1379 325 98 18 21 604
WMS‐R Logical Memory IA § Craft Story Immediate 1211 352 91 16 5 791
WMS‐R Logical Memory IIA § Craft Story Delay 1211 352 91 16 5 791
WAIS‐III Digit Span Forward [Link] , [Link] WAIS‐R Digit Span; Number Span 1380 336 98 18 21 610
WAIS‐III Digit Span Backward [Link] , [Link] WAIS‐R Digit Span; Number Span 1380 336 98 18 21 610
Letter Fluency (C,F,L) F,L ‐ > C,F,L 1179 253 99 18 21 882
Animal Naming Total 865 366 94 16 5 1109
WAIS‐R Digit Symbol 1064 256 79 11 5 1050
Trail‐Making Test Part A 1379 332 98 18 21 602
Trail‐Making Test Part B Truncated at 300 1379 316 98 18 21 618
Boston Naming Test § Multilingual Naming 1377 336 98 18 21 612
Mini‐Mental State Exam [Link] , [Link] Montreal Cognitive Assessment 1212 354 91 16 5 787
Composite outcomes
PACC‐3 with TMT‐B § 1203 325 91 16 5 826
PACC‐4 with TMT‐B § 1203 325 91 16 5 826
PACC‐3 with CFL § 997 261 92 16 5 1097
PACC‐4 with CFL and TMT‐B § 991 245 92 16 5 1119
PACC‐3 with Digit Symbol (WRAP) 901 62 70 9 0 1426
PACC‐4 with Digit Symbol and MMSE (WRAP) 901 62 70 9 0 1426
PACC‐5 with Digit Symbol, MMSE, and Animal Naming (WRAP) 550 75 73 9 0 1762
Memory Composite § 1208 342 91 16 5 804

Note: The original dataset included scores from 2471 unique participants on at least one test (with per‐test total Ns ranging from 1252 to 2466). From this original set, exclusions were applied in the order listed in the table (left to right). Participants who had too few qualifying visits for one test, due to age, changes to the battery, and/or development of a disqualifying condition over time, could still qualify for inclusion in normative standards for another test due to the staggered introduction of tests into the battery. For example, if a participant was first diagnosed with MCI at visit 4, they would be eligible for inclusion in standards for any tests with complete data in visits 1–3, but not for any tests introduced at visit 2 or later.

§ Indicates a crosswalked test as described by Monsell et al., 2016 (or a composite calculated with tests that include at least one cross‐walked test). Composite scores were computed as unweighted averages of z‐transformed inputs, reflected as needed such that higher z‐scores indicated better performance. All PACC composites include Rey AVLT Total Learning Trials and Logical Memory IIA (crosswalked), along with other tests as indicated in the name. The Memory composite includes Rey AVLT Total Learning and Delayed Recall, and Logical Memory IA and IIA. As a consequence, all composites other than those specific to WRAP contain at least one crosswalked test. For validated crosswalk tables, see https://naccdata.org/data‐collection/forms‐documentation/crosswalk.

Indicates tests that were included initially but failed to meet model assumptions as described in Sections 2.5.2 and 3.2.

In addition, both batteries included one or more published reading recognition tests; reading/literacy measures such as those included in WRAP and WADRC are considered to be proxies for premorbid verbal abilities and better indicators of quality of education than years of education. 30 , 31 WRAP uses the reading recognition test from the Wide Range Achievement Test (3rd edition; WRAT3 32 ) as its measure of literacy/premorbid abilities. The WADRC has used three different literacy measures over the years: the WRAT3, the AMNART (using MOANS norms from 33 ), and the National Institutes of Health (NIH) Toolbox Oral Reading Recognition test (ORRT 34 ). Reading tests were harmonized into four quartiles as described in Section 2.5.1.1.

2.3.2. Dementia staging via informant report

Both cohorts assess clinical function using the Clinical Dementia Rating (CDR) scale. 35 Because most WRAP participants are CU, a shorter screener, the Quick Dementia Rating System (QDRS), 36 is first used with all participants and informants, and is followed by the CDR for all participants with a QDRS global rating ≥0.5, along with an equal number with a recent QDRS global rating of 0. Both scales incorporate the judgment of an informant nominated by the participant and QDRS scores are easily cross‐walked to CDR scores (for validation data on this approach in WRAP, see ref. [37]). Crosswalked CDR Sum of Boxes (CDR‐SB) was used as a secondary outcome.

2.3.3. Consensus review cognitive status determination

Each cohort assigns cognitive statuses to participants via consensus conference. The performance of all WADRC participants, including the neuropsychological battery and informant ratings, is reviewed by a multidisciplinary panel of experts and is categorized as CU, MCI, dementia, or other (non‐MCI) impairment. The process in WRAP is similar, but the CU group is divided into CU‐Stable (CU‐S) and CU‐Declining (CU‐D), the latter applied when evidence of subclinical decline is present. Because most WRAP participants remain CU, their performance is first screened algorithmically such that only those with test scores ≥1.5 standard deviations below internal robust norms are reviewed at consensus, and the remainder are assigned a status of CU‐S. Details on the WRAP consensus process and validity evidence related to CU‐D have been published elsewhere. 19 In analyses focused on clinical cognitive status, we group these together for joint use across cohorts. For both cohorts, MCI and dementia diagnoses are based on National Institute on Aging–Alzheimer's Association (NIA‐AA) criteria, without regard to biomarkers. 8 , 9

2.4. Imaging biomarker assessments

The most recent imaging data available as of October 2022 was used to examine relationships between most recent centiles and most recent A and T positivity values. Radiopharmaceutical production, image acquisition and reconstruction, as well imaging processing and quantification of PET imaging data have been described previously. 38

2.4.1. A PET

[C‐11]‐PiB‐PET (PiB) was used to measure brain A burden. Total A burden was estimated with a global PiB distribution volume ratio (DVR) value derived from the full 70‐min dynamic acquisition and applying eight bilateral regions of interest (ROIs). 39 Using a previously identified threshold, 40 participants having global PiB DVR ≥1.19 were classified as amyloid positive (A+); others were classified as amyloid negative (A‐). Two methods 41 , 42 were applied to DVR values to estimate age of A onset (i.e., age at which estimated DVR = 1.19) for all A+ participants.

2.4.2. T PET

[F‐18]‐MK‐6240 (MK) was used to measure the presence of fibrillar brain T. In accordance with previous T PET studies, T was quantified as a standardized uptake value ratio (SUVR; 70–90 min, inferior cerebellar gray matter reference region) in a mesiotemporal cortical (MTC) region of interest encompassing the entorhinal cortex, amygdala, parahippocampal gyrus, fusiform gyrus, and inferior and middle temporal gyri. 43 Tau positivity (T+) was determined as MK MTC SUVR >1.30 (i.e., 2.5 standard deviations above the mean of a young, unimpaired A− reference group); others were classified as tau negative (T‐).

2.5. Statistical methods

2.5.1. Aim 1: Harmonization of cognitive tests and creation of cognitive composites

2.5.1.1. Harmonization

Crosswalking was performed to accommodate differences in batteries between cohorts and over time, with scores on the less common version of the test mapped into the space defined by the range of scores on the more common version. Table 1 lists the relevant test replacements for this study. Where a shorter version of a test was used, scores were imputed using an appropriate multiplicative factor (e.g., multiplying 30‐item BNT scores by 2 to obtain an estimated 60‐item BNT score; for validation of the 30‐item short form, see ref. [44]). When mapping UDS3 scores into the UDS2 range, published crosswalks were used as described by ref. [28] and provided at https://naccdata.org/data‐collection/forms‐documentation/crosswalk. We used distribution percentiles as described in the supplement to crosswalk Digit Span from the Wechsler Memory Scale–Revised (WADRC, UDS 2) and Number Span (WADRC, UDS 3) to Digit Span from the Wechsler Adult Intelligence Scale, third edition (WAIS‐III) (WRAP). A full list of mappings produced is shown in Table S1.

Since we wished to use reading recognition performance as a covariate in our models and three reading tests were used over time in the WADRC (including the WRAT3), we also incorporated equipercentile rating to crosswalk reading tests used by the WADRC to WRAT3 categories used in our previous development of demographically adjusted, quantile‐regression based WRAP norms. 45 For a full description of reading test harmonization, please see the first section of the Supplement.

2.5.1.2. Calculation of composites

To reduce within‐person measurement variability, 46 we created composite scores averaging multiple tests, including one general learning and memory composite (“Memory composite” incorporating both immediate and delayed‐recall items of the AVLT and harmonized Logical Memory scores), and several versions of the Preclinical Alzheimer's Cognitive Composite (PACC). 47 , 48 , 49 To calculate a given composite, each of its constituent cross‐walked scores was standardized against the first available observation for CU participants and multiplied by −1 where necessary to ensure that higher scores represent better performance. These standardized raw scores were averaged to produce a composite, which was itself then rescaled to a mean of 100 and a standard deviation of 15 within this same CU group, creating test scores that are on a familiar scale for clinical users, the same as used by the Wechsler series of intelligence scales. 50 The composites and their constituent tests are included in Table 1; the composites include a Memory composite and four versions of a PACC, all using harmonized WRAP and WADRC data plus three WRAP‐specific versions of a PACC composite. Every version of the PACC composites included AVLT sum of learning trials and (crosswalked) Logical Memory II Story A. The total number of tests contributing to each PACC composite and additional tests are denoted as part of the name (e.g., the PACC score incorporating AVLT Total Learning Trials, Logical Memory IIA, and Trail‐Making Test B (TMT‐B) is labeled PACC3‐TMTB). For validity analyses, we used the harmonized PACC3‐TMTB and Memory composites (details in Section 2.5.3). We selected these composites because they have high face validity for detecting Alzheimer's‐related change and are relatively free of within‐person measurement variability.

2.5.2. Aim 2: Development of unconditional and conditional standards

We developed unconditional and conditional standards to characterize the harmonized WRAP and WADRC sample data using quantile regression. Our earlier work with WRAP alone used unrestricted regression quantiles derived from CU WRAP participants. 22 However, a weakness of this approach is that quantile estimates may cross, leading to paradoxical situations in which the expected value of the outcome for some quantile τ1 may exceed that for a nominally larger quantile τ2 at some covariate value that is within the support of the models. One solution is the use of restricted regression quantiles, which constrain models by first estimating the median in the usual way and then estimating quantiles further from the median by means of a location‐scale transformation method. 51 , 52 Restricted regression quantiles were fit in R using the Qtools package. 53 Model fit statistics were evaluated using quantreg. 54

In our earlier work, we also performed no model selection. A potential weakness of that approach was that, if baseline performance and demographic covariates are collinear in predicting current performance, coefficient estimates may be unreliable. Consequently, for each outcome in these analyses, we fit an exhaustive set of possible models subject to a polynomial order constraint (i.e., retaining highest significant polynomial up to a cubic and all lower order related terms). We then selected the best‐fitting model on the basis of minimizing quantile crossings (all selected models had zero), maximizing the fit of Khmaladze test statistics (which evaluate the extent of misspecification imposed by the location‐scale restrictions), and minimizing the Akaike information criterion. For some outcomes, all models featured quantile crossings and/or misspecification as determined by Khmaladze test statistics; for these, standards were not fit.

Once models were fit, each participant's scores on each available test were mapped to percentile scores, or centiles. To do this, we used these models to construct personalized estimates of test scores corresponding to centiles 0.01–0.99, given the relevant person‐ and visit‐level characteristics as covariates, and selected the centile with the smallest residual as reflecting a person's performance on that test at that visit. For descriptive purposes, we plotted the last observed participant‐wise pair of unconditional versus conditional centiles for each variable and labeled with corresponding Spearman rank correlation coefficients.

2.5.3. Aim 3: Examine validity evidence for the normative standards and algorithmically‐defined cognitive statuses

Previously, we demonstrated using the WRAP cohort that a subclinical level of decline called CU‐D could be identified via a combination of internal norms and multi‐disciplinary consensus review. 19 In this paper, we use an algorithmic approach with the normative standards to characterize visit‐level cognitive statuses across cohorts. Specifically, we defined categories as follows: CU‐S = centiles [16, 100),b CU‐D = centiles [4, 16), and impairment (MCI/dementia) = centiles (0, 4) separately for cross‐sectional and longitudinal centiles from the PACC3‐TMTB and Memory composites. This yielded four categorical cognitive status estimates for every observation, two making use of PACC3‐TMTB and two using Memory. The 16th and 4th centiles correspond to ∼1 and 1.75 SD below the mean in a normal distribution and are consistent with thresholds used by our group and others to identify possible impairment. 14 , 55 , 56 Algorithmic status was coded ordinally (CU‐S < CU‐D < MCI/dementia), and was entered in models with a polynomial contrast, allowing us to test a linear component (stepwise differences with worse algorithmic status) and a quadratic one (allowing saturation or acceleration of the effects). For descriptive purposes, we estimate the proportion of consecutive observations in which a “worse” status transitions back to a “better” one.

2.5.3.1. Concurrent validity

To examine concurrent validity relative to consensus‐based cognitive status determinations, we used two approaches with each of the two selected harmonized composites (harmonized PACC3‐TMTB and Memory). In our primary analysis, we used Kruskal–Wallis tests to compare groups defined by consensus‐based cognitive statuses on concurrent unconditional and conditional centile outcomes at the most recent observation. We used Dunn's tests for pairwise post hoc comparisons, and corresponding effect sizes were described with Cliff's δ, a nonparametric effect size measure reflecting the differential probability of dominance of Yi versus Yj (i.e., P(Yi > Yj)—P(Yi < Yj)). 57 In secondary analyses, we compared groups defined by algorithmic cognitive statuses, using their concurrent, clinician‐adjudicated (consensus‐based) cognitive status as the outcome, with all observations included. To do this, we fit generalized linear mixed effects models with a binary outcome representing clinically significant impairment (0 = CU; 1 = MCI or dementia). Each model included a random intercept per participant and one or more fixed effects pertaining to algorithmic cognitive status. For each of the selected composites (harmonized PACC3‐TMTB and Memory), we examined and compared model fit of three parallel models: one using the algorithmic (cognitive) status based on unconditional centiles for that composite, one using algorithmic status based on conditional centiles, and one with both. Since clinicians review the neuropsychological test scores relative to cross‐sectional norms, and since the PACC3 and Memory constituent tests are focal in consensus discussions, we anticipated that the unconditional centile algorithmic cognitive status would be most strongly associated with consensus‐based cognitive statuses.

2.5.3.2. Predictive validity

The primary cognitive outcome used to examine evidence of predictive validity was progression from consensus‐based “CU at the time of the first conditional centile” to consensus‐based clinical impairment at the last visit (i.e., MCI or dementia at last visit). The secondary outcome used was annualized change in CDR‐SB from the time of the first conditional centile to the last visit. For each outcome, “first PACC3 conditional centile” referred to the first visit with both harmonized PACC3‐TMTB unconditional and conditional centiles. We hypothesized that a worse algorithmic status at first PACC3 would be associated with higher risk of progression to a consensus‐based clinical cognitive status at last visit (also by consensus conference) and greater annualized change in CDR‐SB. We repeated these analyses with the Memory conditional centiles.

We modeled progression risk using logistic regression as follows. Similar to the concurrent validity analyses, for the primary definition of CU at first PACC3, we examined and compared model fit of three parallel models: one using the algorithmic status based on unconditional centiles, one using algorithmic status based on conditional centiles and one with both. We also report corresponding odds ratios for each model. We repeated the analyses replacing the PACC3 predictor with the corresponding harmonized Memory composite and algorithmically‐defined categories. Sensitivity analyses additionally included time from first composite to last composite as a covariate.

To examine whether algorithmic CU‐D at the first eligible visit would be associated with greater clinical deterioration over time, as measured by annualized change in CDR‐SB from the first to the last eligible visit (the secondary predictive validity outcome; computed as (CDR‐SBlast‐CDR‐SBfirst)/(agelast‐agefirst)), we used linear regression, covarying baseline CDR‐SB to mitigate the influence of regression to the mean. For the primary definition of “first PACC3,” we again compared models that included unconditional and conditional algorithmic cognitive status separately and together. These analyses were also repeated using the Memory composite‐based algorithmically‐defined categories. Sensitivity analyses modeled CDR‐SB directly with linear mixed effects models, including a random intercept per subject and incorporating baseline algorithmic status, time since baseline, and their interaction as fixed effects to assess whether rates of change in CDR‐SB differ by algorithmic status.

2.5.3.3. Associations with AD biomarkers

Finally, if normative standards for cognitive tests are sensitive to AD‐related changes, we expect that unconditional and conditional cognitive centiles should be worse (lower) in those who are AD biomarker positive versus negative. Relatedly, if algorithmic CU‐D is a valid indicator of disease‐related change, we expect to see worse biomarker levels and more biomarker positivity in this group relative to algorithmic CU‐S (yet lower levels and less biomarker positivity than in the algorithmic impaired group). To investigate these two complementary questions, we performed the following analyses with each of the two focal harmonized composites (PACC3‐TMTB and Memory) using the closest possible pairings of scan and test (for information on the distribution of this time difference, see Section 3.2.3.1). First, to investigate whether centiles differed between PET measured A− versus A+ and T− versus T+, we compared the unconditional and conditional centiles in positive versus negative biomarker groups using Wilcoxon tests, and we described effect sizes with Cliff's δ. Conversely, we compared PiB global DVR and MK‐6240 MTC SUVR values across algorithmically‐defined cognitive statuses for the closest cognitive assessment using Kruskal‐Wallis tests; significant omnibus tests were followed with Dunn's test. p‐Values were adjusted using the Benjamini‐Hochberg correction for multiple comparisons. For descriptive purposes, we also depict the proportion and 95% confidence interval for A+ or T+ in each algorithmic cognitive status for each of the four variables: PACC3‐TMTB unconditional, PACC3‐TMTB conditional; Memory unconditional, Memory conditional. Last, we calculated the sensitivity and specificity associated with A+ and T+ (separately) for each of the four variables and two centile cutoffs to understand how varying centiles thresholds might impact relationships between algorithmic cognitive statuses and biomarker positivity.

2.5.4. Aim 4: ShinyApp case example

To facilitate visual review of an individual's cognitive performance relative to normative unconditional and conditional standards, Dr Jonaitis developed an interactive application using Shiny. 24 The Shiny application (“WisNorms”) plots cognitive scores versus age, overlaying observed performance on unconditional decile lines, while also highlighting statistically‐unusual conditional centiles. WisNorms allows additional information such as medical events, biomarker levels, or estimated A onset ages 41 to be indicated via vertical lines at the ages at which things were identified. We created a synthetic dataset for external exploration of the tool using the R package synthpop. 58 We illustrate use of WisNorms via a selected real case that shows significant change relative to baseline in the presence of A and T.

3. RESULTS

Tests across all available visits were harmonized as described in the methods section prior to subsequent analyses. Sample characteristics of the normative standards development and hypothesis testing sets are shown in Table 2 overall (n = 1390) and by WRAP (n = 1083) and WADRC (n = 307) cohorts (left three columns of data). As noted earlier, the sample used to examine validity evidence was expanded with data collected through May, 2022. Data for a total of 2454 individuals (1647 WRAP and 807 WADRC) were pulled prior to applying the analysis‐specific selection criteria described in Sections 2.5.3; analysis‐specific n’s are reported in each Subsection of 3.3; characteristics of the full sample are summarized in the right three columns of Table 2 (overall and by cohort).

TABLE 2.

Sample characteristics of samples used to develop normative standards (left three columns) and to test hypotheses relating to validity (right three columns)

Standards development Hypothesis testing
Variable Overall WADRC WRAP Overall WADRC WRAP
N 1390 307 1083 2454 807 1647
Female, N (%) 952 (68%) 205 (67%) 747 (69%) 1651 (67%) 491 (61%) 1160 (70%)
Race
White, N (%) 1305 (94%) 275 (90%) 1030 (95%) 2061 (84%) 633 (78%) 1428 (87%)
Black or African–American, N (%) 56 (4%) 24 (8%) 32 (3%) 300 (12%) 138 (17%) 162 (10%)
American Indian or Alaska Native, N (%) 16 (1%) 7 (2%) 9 (1%) 48 (2%) 34 (4%) 14 (1%)
Asian, N (%) 3 (0%) 0 (0%) 3 (0%) 5 (0%) 1 (0%) 4 (0%)
Other, N (%) 9 (1%) 0 (0%) 9 (1%) 1 (0%) 0 (0%) 1 (0%)
Unknown, N (%) 1 (0%) 1 (0%) 0 (0%) 2 (0%) 1 (0%) 1 (0%)
Ethnicity
Non‐Hispanic, N (%) 1374 (99%) 300 (98%) 1074 (99%) 2392 (97%) 783 (97%) 1609 (98%)
Hispanic, N (%) 11 (1%) 2 (1%) 9 (1%) 54 (2%) 17 (2%) 37 (2%)
Unknown, N (%) 5 (0%) 5 (2%) 0 (0%) 8 (0%) 7 (1%) 1 (0%)
Primary language
English, N (%) 1369 (98%) 304 (99%) 1065 (98%) 2330 (95%) 795 (99%) 1535 (93%)
Spanish, N (%) 9 (1%) 0 (0%) 9 (1%) 34 (1%) 3 (0%) 31 (2%)
Mandarin, N (%) 0 (0%) 0 (0%) 0 (0%) 1 (0%) 1 (0%) 0 (0%)
Unknown, N (%) 1 (0%) 1 (0%) 0 (0%) 72 (3%) 1 (0%) 71 (4%)
Education
16+ years, N (%) 893 (64%) 219 (71%) 674 (62%) 1468 (60%) 523 (65%) 945 (57%)
 <16 years, N (%) 497 (36%) 88 (29%) 409 (38%) 985 (40%) 284 (35%) 701 (43%)
Baseline reading proficiency
Q1, N (%) 70 (5%) 17 (6%) 53 (5%) 246 (10%) 108 (13%) 138 (8%)
Q2, N (%) 251 (18%) 45 (15%) 206 (19%) 500 (20%) 146 (18%) 354 (21%)
Q3, N (%) 484 (35%) 81 (26%) 403 (37%) 784 (32%) 214 (27%) 570 (35%)
Q4, N (%) 585 (42%) 164 (53%) 421 (39%) 926 (38%) 339 (42%) 587 (36%)
Parental FH of AD
Parental FH of AD, N (%) 980 (71%) 209 (68%) 771 (71%) 1259 (51%) 469 (2%) 1245 (76%)
No parental FH of AD, N (%) 410 (29%) 98 (32%) 312 (29%) 702 (29%) 334 (41%) 372 (23%)
Unknown, N (%) 0 (0%) 0 (0%) 0 (0%) 495 (20%) 463 (57%) 32 (2%)
Baseline cognitive status
Unimpaired, N (%) 1390 (100%) 307 (100%) 1083 (100%) 2239 (91%) 610 (76%) 1629 (99%)
MCI, N (%) 114 (4%) 102 (13%) 12 (1%)
Dementia, N (%) 67 (3%) 67 (8%) 0 (0%)
Other non‐MCI impairment, N (%) 32 (2%) 28 (3%) 4 (0%)
Last cognitive status
Unimpaired, N (%) 1390 (100%) 307 (100%) 1083 (100%) 2129 (85%) 578 (72%) 1551 (92%)
MCI, N (%) 153 (3%) 76 (9%) 77 (4%)
Dementia, N (%) 131 (5%) 121 (15%) 10 (1%)
Other non‐MCI impairment, N (%) 39 (2%) 32 (4%) 7 (0%)
Age at baseline, mean (SD) 55.92 (7.65) 62.03 (8.61) 54.18 (6.36) 57.39 (8.81) 63.10 (9.58) 54.59 (6.86)
Age at last visit, mean (SD) 65.16 (7.06) 66.42 (8.53) 64.81 (6.55) 66.14 (9.12) 68.75 (9.74) 64.86 (8.52)
Years of follow‐up, mean (SD) 9.25 (3.51) 4.38 (1.42) 10.63 (2.57) 8.75 (5.62) 5.65 (3.07) 10.27 (5.95)
No. of visits, median (range) 5 (2–9) 5 (2–9) 5 (3–6) 5 (1–13) 5 (1–13) 5 (1–8)
Baseline CDR, median (range) 0 (0–0.5) 0 (0–0.5) 0 (0–0.5) 0 (0–2) 0 (0–2) 0 (0–1)
Last CDR, median (range) 0 (0–0.5) 0 (0–0.5) 0 (0–0.5) 0 (0–3) 0 (0–3) 0 (0–1)
Baseline CDR Sum of Boxes, median (range) 0 (0–3.5) 0 (0–0.5) 0 (0–3.5) 0 (0–5.5) 0 (0–2) 0 (0–5.5)
Last CDR Sum of Boxes, median (range) 0 (0–3) 0 (0–0.5) 0 (0–3) 0 (0–7.5) 0 (0–5.5) 0 (0–7.5)

Note: Within each group of columns, characteristics are presented for the group as a whole (Overall) and separately by cohort (WADRC, WRAP). Standards were developed and hypotheses tested on subsets of participants who were deemed eligible on a test‐wise basis (see Table 1); for this table, participants were included if they contributed scores on at least one test.

3.1. Normative standards development

Automated model selection was attempted for 21 outcomes, including the 13 test scores and 8 composites listed in Table 1. Among these, three variables – crosswalked MMSE, Digit Span Forward, and Digit Span Backward – did not produce any model satisfying the Khmaladze goodness of fit statistic for the conditional and/or unconditional models, perhaps due to ceiling effects (MMSE) and/or coarseness (Digit Span) and were discarded from further consideration. One additional outcome, the BNT, was subject to a strong ceiling effect that made estimation of higher quantiles difficult; however, quantiles at the 50th and below were estimable. Parameter estimates for the qualifying 18 outcomes are shown for selected centiles in Table S2 (unconditional models) and Table S3 (conditional models).

3.2. Examining validity evidence

Estimated unconditional and conditional centiles for the outcomes selected for validity analyses are shown in scatterplots in Figure S1. Across all observations where both centiles were estimable, the Spearman correlations between unconditional and conditional PACC3‐TMTB and Memory composite centiles were 0.76 and 0.75, respectively. However, the strength of this relationship appears to be driven by extreme scores; within a subset of observations having centile estimates between 0.04 and 0.96 inclusive, the Spearman correlations were 0.54 and 0.52, respectively, and within a narrower subset between 0.16 and 0.84 inclusive, Spearman correlations were 0.39 and 0.38, respectively. Rates of conversion from a more to a less severe algorithmic cognitive status on successive visits were 9%–10% for both unconditional and conditional centiles of both PACC3‐TMTB and Memory composites.

3.2.1. Hypotheses related to concurrent validity

At participants’ last visit, consensus‐based cognitive status was strongly associated with all the selected centile measures. Boxplots for unconditional and conditional centiles at last visit are shown by clinician‐adjudicated cognitive status in Figure 1. Cognitive status was strongly associated with concurrent, unconditional PACC3‐TMTB and Memory centile measures (p < 0.0001 for all omnibus tests). As shown in Table S4 and Figure 1, all but one of the pairwise post hoc comparisons were significant for each of the four centiles variables, with the lone exception being the MCI versus Dementia comparisons for the Memory‐based conditional centiles. In addition, Cliff's δ effect sizes were slightly larger for the harmonized PACC3‐TMTB composite than for the Memory composite, and larger for the unconditional standards compared with the conditional standards (Table S4).

FIGURE 1.

FIGURE 1

Last centiles by last consensus cognitive status. Ns listed below each boxplot. From left to right: unconditional PACC3‐TMTB centiles (first column); conditional PACC3‐TMTB centiles (second column); unconditional Memory centiles (third column); and conditional Memory centiles (fourth column). Asterisks indicate significance level (* p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001; ns = p > 0.05) from post hoc comparisons. Corresponding statistical output and Cliff's delta estimates are found in Table S4. PACC, Preclinical Alzheimer's Cognitive Composite; TMTB, Trail‐Making Test B

Secondary analyses using generalized mixed effects models indicated that centile‐based, algorithmic status predicted concurrent, clinically‐significant impairment at all visits, with a pattern matching a stepwise increase in probability of impairment with successively more worrisome algorithmic status. This was true for algorithmic statuses based on unconditional centile measures alone (PACC3‐TMTB: odds ratio (OR) = 117.91, p < 0.0001, n = 1849 participants and 6317 visits; Memory: OR = 155.15, p < 0.0001, n = 1892 participants and 7081 visits) and for those based only on conditional centile measures (PACC3‐TMTB: OR = 27.25, p < 0.0001; Memory: OR = 41.18, p < 0.0001). When both unconditional and conditional information were included in the same model, the conditionally defined algorithmic cognitive status explained variance in clinical impairment beyond that explained by unconditional centile measures, although more so when using the Memory composite to assign the algorithmic statuses than when using PACC3‐TMTB. Full statistical results for models are shown in Table S5.

3.2.2. Hypotheses related to predictive validity

Limiting the set to those who were CU at the first‐available algorithmic status defined using PACC3‐TMTB and Memory, respectively (see Section 2.5.3.2), 1146 and 1274 participants were eligible for inclusion. Of these, 49 (4.3%) and 65 (5.1%), respectively, had progressed to a clinical status (via consensus review) by the time of their last visit (mean(SD) follow‐up time: PACC3‐TMTB, 7.6 (2.1); Memory, 7.4 (2.2)). The proportions progressing in each group are shown in Figure 2A. Logistic regression models indicated no strong and consistent relationship between baseline algorithmic cognitive status designations and clinical progression in general, though a relationship was seen with the unconditional, Memory‐based status (Memory: OR = 2.84, p = 0.006; all other p > 0.05). Results for the full models (including both unconditional and conditional cognitive status designations) are shown in Table S6. Results of sensitivity analyses including time between first‐available and last visit as a covariate were substantially similar (Memory: OR = 3.08, p = 0.003; all other p > 0.05).

FIGURE 2.

FIGURE 2

(A) Probabilities of clinical progression, defined as proportion progressing to clinical impairment via clinical consensus conference, plotted by “baseline” algorithmic cognitive status (where “baseline” refers to the first visit at which both unconditional and conditional centiles were estimable). Error bars reflect Agresti–Coull binomial 95% confidence intervals. (B) Estimated marginal means and 95% confidence intervals of annualized change in CDR‐SB by baseline algorithmic cognitive status, after adjusting for baseline CDR‐SB. Ns for each row of both panels are listed to the left of Panel A confidence intervals. From top to bottom: Unconditional PACC3‐TMTB centiles; Conditional PACC3‐TMTB centiles; Unconditional Memory centiles; Conditional Memory centiles. Corresponding statistical output is found in Table S7 (rightmost column). CDR‐SB, CDR Sum of Boxes; PACC, Preclinical Alzheimer's Cognitive Composite; TMTB, Trail‐Making Test B

Secondary analyses using annualized change in CDR‐SB (ΔCDR‐SB; PACC3‐TMTB n = 1127; Memory n = 1253) indicated that algorithmic cognitive statuses of CU‐D or MCI/Dementia‐level impairment were associated with greater expected change on this measure per year. Specifically, for algorithmic cognitive statuses based on unconditional centile measures alone, the pattern suggested higher ΔCDR‐SB with each successively worse cognitive status (PACC3‐TMTB: β^= 0.016, p = 0.011; Memory: β^= 0.021, p = 0.0025). For algorithmic cognitive statuses based on conditional centile measures, the pattern was similar for PACC3‐TMTB (β^= 0.027, p = 0.00027), but for the Memory composite, there was also enhanced impact of algorithmic MCI/dementia compared to the other two algorithmic statuses (linear term β^= 0.032, p < 0.0001; quadratic term β^= 0.018, p = 0.0046). When both unconditional and conditional information were included in a single model, the predictive relationship for conditional algorithmic cognitive status was very similar (PACC3‐TMTB: β^= 0.023, p = 0.009; Memory: linear term β^= 0.022, p = 0.011, quadratic term β^= 0.018, p = 0.004), but unconditional algorithmic cognitive status remained a significant predictor only when defined using Memory (PACC3‐TMTB: β^= 0.004, p = 0.53; Memory: β^= 0.017, p = 0.026). Model‐predicted ΔCDR‐SB by baseline algorithmic statuses are shown in Figure 2B for each of the PACC3‐TMTB and Memory unconditional and conditional algorithmic cognitive status predictors; full model results are shown in Table S7. Sensitivity analyses fitting raw CDR‐SB with linear mixed effects models produced conceptually similar results; full model fits are shown in Table S8.

3.2.3. Hypotheses related to biomarkers

3.2.3.1. Centile differences for PET A+ versus A− and PET T+ versus T−

Among 587 participants with at least one A scan, 580 (N = 436 A−, 144 A+; mean [SD] = 0.7 [1.0] absolute years between scan and cognition [|Δt|]; no difference in either |Δt| or Δt for A+ versus A−, p > 0.05; see Figure S2) had available PACC3‐TMTB centiles at the nearest cognitive assessment, and 584 (N = 439 A−, 145 A+; |Δt| = 0.5 [0.8] years; no difference in either |Δt| or Δt for A+ vs. A−, p > 0.05) had Memory centiles. Among 499 participants with at least one T scan, the numbers with PACC3‐TMTB and Memory centiles were 493 (N = 450 T−, 43 T+; Δt = 0.8 [1.0] years; no difference in either |Δt| or Δt for T+ vs. T−, p > 0.05) and 497 (N = 452 T−, 45 T+; |Δt| = 0.5 [0.8] years; no difference in either |Δt| or Δt for T+ vs. T−, p > 0.05), respectively. Non‐parametric comparisons of PACC3‐TMTB and Memory unconditional and conditional centiles closest to a PET scan showed significant differences for PET A+ versus A− and PET T+ versus T− (all p < 0.0001). Compared to those who were PET A−, those who were PET A+ had lower unconditional and conditional centiles for both PACC3‐TMTB and Memory. Similarly, in comparison with the PET T− group, those who were PET T+ had lower unconditional and conditional centiles for both composites. Cliff's δ effect sizes for T+ versus T− were approximately twice as large as for A+ versus A− across all four variables. Numeric results are shown in Table S9.

3.2.3.2. Biomarker differences across algorithmically determined cognitive statuses

Boxplots of Global PiB DVR and MK‐6240 SUVR in the mesiotemporal meta‐ROI across values are shown across algorithmic cognitive statuses in Figure 3. Corresponding Kruskal‐Wallis tests (p < 0.0001 across all omnibus tests) and subsequent pairwise comparisons and Cliff's δ effect sizes are shown in Table S10. Briefly, the top half of Table S10 shows the following related to Global PiB DVR: In post hoc pairwise comparisons, all pairwise comparisons of unconditional PACC3‐TMTB based cognitive statuses were significant with Cliff's δ’s ranging from 0.18 to 0.50. PiB DVR patterns across cognitive statuses defined by unconditional Memory centiles were similar, except for nonsignificant difference between algorithmic CU‐S versus. CU‐D (δ = 0.05, p = 0.58). PiB DVR patterns across algorithmically‐defined cognitive status using conditional centiles were similar for both PACC3‐TMTB and Memory composites, although the CU‐S versus CU‐D comparison did not reach significance in either case. The bottom half of Table S10 shows the corresponding analyses of MK‐6240 MTC composite SUVR. All pairwise comparisons across algorithmic cognitive statuses of MK‐6240 SUVR values were significant, except for the following: unconditional PACC3‐TMTB‐based CU‐D versus MCI/Dementia (Cliff's δ = 0.18) and unconditional Memory‐based CU‐S versus CU‐D (Cliff's δ = 0.16); all other Cliff's δ’s ranged between 0.17 and 0.45.

FIGURE 3.

FIGURE 3

Boxplots of Global PiB DVR (top row) and MK‐6240 SUVR in the mesiotemporal meta‐ROI (bottom row) by algorithmic cognitive statuses. Ns listed below each boxplot. From left to right, algorithmic status based on: unconditional PACC3‐TMTB (first column); conditional PACC3‐TMTB (second column); unconditional Memory (third column); and conditional Memory (fourth column) centiles. Asterisks indicate significance level (* p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001; ns = p > 0.05) from post hoc comparisons. Corresponding statistical output and Cliff's delta estimates are found in Table S9. DVR, distribution volume ratio; PACC, Preclinical Alzheimer's Cognitive Composite; PiB, PiB‐PET; ROI, region of interest; SUVR, standardized uptake value ratio; TMT, Trail‐Making Test B

Figure 4 depicts the proportion (and Agresti‐Coull 95% CI) who were A+ (upper panel) or T+ (lower panel) for each of the four algorithmic cognitive status variables derived from unconditional and conditional PACC3‐TMTB centiles and unconditional and conditional Memory centiles. With both tracers, those with worse algorithmic cognitive status showed a greater probability of biomarker positivity. The sensitivity and specificity of our algorithmic thresholds (16th centile = CU‐S vs. CU‐D, MCI/Dem; 4th centile = CU‐S, CU‐D vs. MCI/Dem) for A and T positivity are shown in Table S11.

FIGURE 4.

FIGURE 4

Probabilities of biomarker positivity via PiB (A+; top row) and MK (T+; bottom row) as a function of nearest algorithmic cognitive status. Error bars reflect Agresti–Coull binomial 95% confidence intervals. Ns listed to the left of each confidence interval. PiB, PiB‐PET

3.3. WisNorms and case example

WisNorms is available for download at https://www.github.com/emjonaitis/WisNorms. The app comes pre‐loaded with synthetic data similar to that collected in the two cohorts. To illustrate the use of this information in the context of a single (real) research participant, we have included two screenshots of a data dashboard created with Shiny (Figure 5). For illustrative purposes, we have selected a participant with several biomarker measurements, whose first A scan was negative but who subsequently became A+, and who experienced concomitant cognitive decline. The first panel plots this participant's AVLT Delayed scores over seven visits and shows strong delayed recall performance over the first four visits – roughly following the 75th unconditional centile curve followed by a steady decrease over the most recent three, with the final observation coming in below the 2nd unconditional centile curve. Superimposed triangles indicate visits for which conditional centiles were unusually high or low. Although the score at the fifth visit was still at the 50th unconditional centile mark, judged against the conditional standards, it was only at the 7th centile, a fact reflected by the superimposed downward facing triangle. In the second panel, we can see a similar pattern emerging in Digit Symbol, where high performance in earlier visits has not been sustained, and scores at the fifth and seventh visits were at the 7th conditional percentile or less despite remaining near the median score for the participant's demographic group. In addition, the upward facing triangle at the third visit reflects a surprisingly high conditional centile, a feature which in other situations may help researchers identify factors associated with response to treatment or resilience, although in this case the resilience was not sustained.

FIGURE 5.

FIGURE 5

Screenshots illustrating the use of the Shiny application to visualize individual cognitive performance over time. Vertical lines reflect biomarker measurements (red = biomarker positive). The shaded region represents the estimated amyloid onset age (EAOA). 41 Top: AVLT Delayed Recall; Bottom: WAIS‐R Digit Symbol. AVLT, Auditory Verbal Learning Test.

4. DISCUSSION

In this paper, we used harmonized data spanning two longitudinal cognitive aging cohorts, with an average of more than 9 years of follow‐up, to construct internal, quantile‐regression‐based cross‐sectional and longitudinal normative standards for adults aged 40–85 for 18 cognitive outcomes (including 10 individual tests and 8 composites). In the context of cohort studies, which tend to be disproportionately healthier than the population as a whole, local normative standards like these can serve as more sensitive thresholds for identifying worrisome performance. Conditioning on baseline performance can dial this in further, providing a sensitive, personalized indicator of cognitive change. The comprehensive approach in this study, examining both individual tests and composites, creates a large and flexible set of harmonized test scores and corresponding normative standards.

4.1. Normative standards

This work extends previous work focused only on the WRAP cohort 22 in the following ways: combines two cohorts; expands the set of outcomes and composites for which standards were developed; and validates against clinically relevant outcomes and against AD biomarkers. As in our earlier work, the large sample allowed normative standards to be adjusted for age, sex, education level, and reading recognition (a proxy for education quality and premorbid abilities) for both cross‐sectional and longitudinal standards, and practice and baseline performance for longitudinal samples. Non‐linear age associations were present for most of the outcomes. In contrast to our earlier work, systematic model selection was performed to allow unnecessary covariates to drop out of each model. More importantly, the restricted regression quantile approach used with this dataset guarantees that model fits are free of crossed quantile estimates. Although interest in conditional norms models is growing, and longitudinal norms for AVLT have recently been published alongside a ShinyApp, 59 use of quantile regression in this setting remains relatively uncommon. 60 , 61

4.2. Validity

Preliminary validity evidence from these standards suggests that when they are used to map a given composite score for a person to a centile, those centiles correspond well to concurrent clinician adjudications of cognitive status. Although both unconditional and conditional centiles show this type of validity, when included in the same model, the unconditional relationship is stronger; this is unsurprising, as these more closely resemble clinical judgment, which compares scores against published cross‐sectional norms but does not take longitudinal information into account directly. In addition, we saw strong relationships between our algorithmically defined cognitive statuses, including the earliest signals of impairment, and AD biomarkers. This finding extends our earlier work to define a pre‐MCI status, which we have called CU‐D. 19

Despite these relationships, evidence for predictive clinical validity is more mixed. We saw no consistent linkage between algorithmic statuses based on unconditional or conditional centiles at baseline and the probability of clinical conversion over the window observed. The small number of conversion events seen during the study is a factor. A related possibility is that clinical judgment may not be very sensitive to change when it is far from clinical thresholds, and the generally high baseline performance of our participants means that on average, change from baseline must be more extreme to cross such a threshold. On the other hand, we also found that low centiles at baseline were modestly associated with worse longitudinal change on CDR‐SB. Although the effect size was small, these relationships between composite score‐based centiles and a validated functional measure suggest that the early change that is, being detected by our algorithmic approach may be capturing a phenomenon with eventual clinical implications. For example, in a paper that harmonized data across multiple cohorts, including WRAP, age‐adjusted norms were developed and algorithmic definitions of impairment were compared based on test norms only (1 SD threshold), CDR cutoffs, or both 62 ; the definition using both test scores and CDR was most associated with risk of progression to MCI.

4.3. Visualization tool

Our Shiny application for visualizing cognitive performance over time was inspired by growth charts, and we envision it being used in a similar way to help inform researchers’ judgments about individual cognitive health. Piloting the use of the ShinyApp during consensus review has demonstrated preliminary qualitative evidence that seeing performance relative to both the unconditional and conditional normative standards helps clinicians gauge the severity of change. Although the norms we have based this app on are internal norms best suited for research purposes, the idea may extend naturally to any context in which longitudinal data are available electronically, and clinicians wish to be able to gauge whether a departure from baseline is meaningful.

4.4. Limitations and future directions

This study has several limitations. First, when harmonizing data, availability of within‐person comparison data was limited for selected tests (e.g., tests of literacy), so for tests without published cross‐walks, we created between‐cohort equipercentile maps, which may not be robust to differences between the two samples. The fact that these cohorts were recruited from a common population is a mitigating factor, but in the presence of greater between‐cohort variation, more sophisticated harmonization methods, for example, using item response theory, might be required. 62 , 63 Next, our modeling strategy was unsuccessful with three tests of interest, most likely due to coarse score distributions and/or ceiling effects. However, these same properties suggest these tests may be poor ones for identifying early change. Third, we tested only one operationalization of assigning algorithmic cognitive statuses, with cut‐points set a priori based on clinical experience. The relatively lower sensitivity and higher specificity of these in flagging biomarker positive individuals (Table S11) suggests that these specific cut‐points may not be optimal. Future analyses may use receiver operating characteristics (ROC) to identify empirically driven thresholds for these centile measures that are specific to particular outcomes. Last, in testing our predictive validity hypothesis related to progression to clinical impairment, the low rate of progression resulted in limited statistical power to detect meaningful differences. Despite these limitations, our analyses indicate acceptable preliminary validity evidence, and we see several potential future directions for scores like these, including incorporating conditional centiles into consensus processes for identifying early change; linking these centiles to other biomarkers and health variables, in order to better understand complementary, non‐AD‐specific causes of decline 64 ; and evaluating to what extent such longitudinal information might help to mitigate racial bias in neuropsychological assessment. In addition, future analyses may also investigate alternative definitions of algorithmic cognitive status such as ones that combine normative standards with the CDR. 62 Future directions may also include applying the longitudinal change techniques to investigate response to treatment.

4.5. Conclusions

In this paper, we have harmonized a large set of variables from the WRAP and WADRC studies, largely using published crosswalks. In addition, we have refined and extended our earlier method for using quantile regression to generate unconditional (i.e., cross‐sectional) and conditional (i.e., longitudinal) normative standards for a large set of individual tests and cognitive composites. The longitudinal normative standards offer the possibility to identify surprising change from baseline relative to others in this harmonized cohort and may therefore identify those at increased risk of dementia before they have hit cross‐sectional thresholds for impairment. Comparing our conditional and unconditional centiles to multiple outcomes of interest, including clinical consensus cognitive status, longitudinal change on the CDR‐SB, and PET evidence of brain A and T, we find modest evidence of both concurrent and predictive validity, suggesting that this method for algorithmically identifying change has some promise. In future work, population‐representative normative data could be brought to bear to turn this tool into one with direct clinical application.

CONFLICT OF INTEREST STATEMENT

S.C.J. has served as a consultant to Eisai and Roche Diagnostics, has received an equipment grant from Roche Diagnostics, and has received support (sponsoring of an observational study and provision of precursor for T imaging) from Cerveau Technologies. B.T.C. has received imaging agents and precursor from Avid Radiopharmaceuticals. T.J.B. has received speaking honoraria from Intermountain Healthcare. Authors E.M.J., B.P.H., K.D.M., L.R.C., K.C., C.E.G., L.D., S.A., R.C., N.A.C., and R.E.L. declare that they have no competing interests. Author disclosures are available in the Supporting Information.

CONSENT STATEMENT

All participants provided informed consent for all study procedures.

Supporting information

Supporting Information

ALZ-20-3305-s001.docx (887.1KB, docx)

Supporting Information

ALZ-20-3305-s002.pdf (786.3KB, pdf)

ACKNOWLEDGMENTS

We extend our deepest thanks to the WRAP and WADRC participants and staff for their invaluable contributions to the study. Work at the University of Wisconsin was supported by National Institutes of Health R01AG027161 (Johnson), National Institutes of Health R01AG021155 (Johnson), National Institutes of Health P30AG062715 (Asthana), Wisconsin Alzheimer's Disease Research Center's Pilot Funding Program (Langhough (fka Koscik)), National Institutes of Health R01AG054059 (Gleason), Alzheimer's Association Research Foundation 19614533 (Betthauser), National Institutes of Health S10 OD025245‐01 (Christian) and the National Institutes of Health UL1TR002375 (Cody; University of Wisconsin Institute for Clinical and Translational Research).

Jonaitis EM, Hermann BP, Mueller KD, et al. Longitudinal normative standards for cognitive tests and composites using harmonized data from two Wisconsin AD‐risk‐enriched cohorts. Alzheimer's Dement. 2024;20:3305–3321. 10.1002/alz.13774

Footnotes

a

Balance of white and nonwhite participants similar to 2020 census. African American and Native American participants are overrepresented by design.

b

We use mathematical bracket notation to denote intervals. Square brackets indicate a closed interval in which the endpoint is included; rounded brackets indicate an open interval in which the endpoint is excluded.

REFERENCES

  • 1. Harrington KD, Lim YY, Ames D, et al. Using robust normative data to investigate the neuropsychology of cognitive aging. Arch Clin Neuropsychol Off J Natl Acad Neuropsychol. 2017;32(2):142‐154. doi: 10.1093/arclin/acw106 [DOI] [PubMed] [Google Scholar]
  • 2. Hedden T, Gabrieli JDE. Insights into the ageing mind: a view from cognitive neuroscience. Nat Rev Neurosci. 2004;5(2):87‐96. doi: 10.1038/nrn1323 [DOI] [PubMed] [Google Scholar]
  • 3. Salthouse TA. Trajectories of normal cognitive aging. Psychol Aging. 2019;34(1):17‐24. doi: 10.1037/pag0000288 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Mistridis P, Krumm S, Monsch AU, Berres M, Taylor KI. The 12 years preceding mild cognitive impairment due to Alzheimer's disease: the temporal emergence of cognitive decline. J Alzheimers Dis JAD. 2015;48(4):1095‐1107. doi: 10.3233/JAD-150137 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Amieva H, Le Goff M, Millet X, et al. Prodromal Alzheimer's disease: successive emergence of the clinical symptoms. Ann Neurol Off J Am Neurol Assoc Child Neurol Soc. 2008;64(5):492‐498. [DOI] [PubMed] [Google Scholar]
  • 6. Rajan KB, Wilson RS, Weuve J, Barnes LL, Evans DA. Cognitive impairment 18 years before clinical diagnosis of Alzheimer disease dementia. Neurology. 2015;85(10):898‐904. doi: 10.1212/WNL.0000000000001774 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Karr JE, Graham RB, Hofer SM, Muniz‐Terrera G. When does cognitive decline begin? A systematic review of change point studies on accelerated decline in cognitive and neurological outcomes preceding mild cognitive impairment, dementia, and death. Psychol Aging. 2018;33(2):195‐218. doi: 10.1037/pag0000236 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Albert MS, DeKosky ST, Dickson D, et al. The diagnosis of mild cognitive impairment due to Alzheimer's disease: recommendations from the National Institute on Aging‐Alzheimer's Association workgroups on diagnostic guidelines for Alzheimer's disease. Alzheimers Dement J Alzheimers Assoc. 2011;7(3):270‐279. doi: 10.1016/j.jalz.2011.03.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. McKhann GM, Knopman DS, Chertkow H, et al. The diagnosis of dementia due to Alzheimer's disease: recommendations from the National Institute on Aging‐Alzheimer's Association workgroups on diagnostic guidelines for Alzheimer's disease. Alzheimers Dement. 2011;7(3):263‐269. doi: 10.1016/j.jalz.2011.03.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Sperling RA, Aisen PS, Beckett LA, et al. Toward defining the preclinical stages of Alzheimer's disease: recommendations from the National Institute on Aging‐Alzheimer's Association workgroups on diagnostic guidelines for Alzheimer's disease. Alzheimers Dement. 2011;7(3):280‐292. doi: 10.1016/j.jalz.2011.03.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Storandt M, Grant EA, Miller JP, Morris JC. Longitudinal course and neuropathologic outcomes in original vs revised MCI and in pre‐MCI. Neurology. 2006;67(3):467‐473. doi: 10.1212/01.wnl.0000228231.26111.6e [DOI] [PubMed] [Google Scholar]
  • 12. Hammers DB, Porter S, Dixon A, Suhrie KR, Duff K. Validating 1‐Year reliable change methods. Arch Clin Neuropsychol Off J Natl Acad Neuropsychol. 2021;36(1):87‐98. doi: 10.1093/arclin/acaa055 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Stein J, Luppa M, Brähler E, König HH, Riedel‐Heller SG. The assessment of changes in cognitive functioning: reliable change indices for neuropsychological instruments in the elderly—a systematic review. Dement Geriatr Cogn Disord. 2010;29(3):275‐286. doi: 10.1159/000289779 [DOI] [PubMed] [Google Scholar]
  • 14. Clark LR, Koscik RL, Nicholas CR, et al. Mild cognitive impairment in late middle age in the Wisconsin Registry for Alzheimer's Prevention study: prevalence and characteristics using robust and standard neuropsychological normative data. Arch Clin Neuropsychol Off J Natl Acad Neuropsychol. 2016;31(7):675‐688. doi: 10.1093/arclin/acw024 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. De Santi S, Pirraglia E, Barr W, et al. Robust and conventional neuropsychological norms: diagnosis and prediction of age‐related cognitive decline. Neuropsychology. 2008;22(4):469‐484. doi: 10.1037/0894-4105.22.4.469 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Hassenstab J, Chasse R, Grabow P, et al. Certified normal: Alzheimer's disease biomarkers and normative estimates of cognitive functioning. Neurobiol Aging. 2016;43:23‐33. doi: 10.1016/j.neurobiolaging.2016.03.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Holtzer R, Goldin Y, Zimmerman M, Katz M, Buschke H, Lipton RB. Robust norms for selected neuropsychological tests in older adults. Arch Clin Neuropsychol Off J Natl Acad Neuropsychol. 2008;23(5):531‐541. doi: 10.1016/j.acn.2008.05.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Koscik RL, La Rue A, Jonaitis EM, et al. Emergence of mild cognitive impairment in late middle‐aged adults in the Wisconsin Registry for Alzheimer's Prevention. Dement Geriatr Cogn Disord. 2014;38(1‐2):16‐30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Langhough Koscik R, Hermann BP, Allison S, et al. Validity evidence for the research category, “Cognitively Unimpaired – Declining,” as a risk marker for mild cognitive impairment and Alzheimer's disease. Front Aging Neurosci. 2021;13:688478. doi: 10.3389/fnagi.2021.688478 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Sherwood B, Zhou AXH, Weintraub S, Wang L. Using quantile regression to create baseline norms for neuropsychological tests. Alzheimers Dement Amst Neth. 2016;2:12‐18. doi: 10.1016/j.dadm.2015.11.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Cheung YB, Xu Y, Feng L, et al. Unconditional and conditional standards using cognitive function curves for the modified Mini‐Mental State exam: cross‐sectional and longitudinal analyses in older Chinese adults in Singapore. Am J Geriatr Psychiatry Off J Am Assoc Geriatr Psychiatry. 2015;23(9):915‐924. doi: 10.1016/j.jagp.2014.08.008 [DOI] [PubMed] [Google Scholar]
  • 22. Koscik RL, Jonaitis EM, Clark LR, et al. Longitudinal standards for mid‐life cognitive performance: identifying abnormal within‐person changes in the Wisconsin Registry for Alzheimer's Prevention. J Int Neuropsychol Soc. 2019;25(1):1‐14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Wang C, Katz MJ, Chang KH, et al. UDSNB 3.0 neuropsychological test norms in older adults from a diverse community: results from the Einstein Aging Study (EAS). J Alzheimers Dis JAD. 2021;83(4):1665‐1678. doi: 10.3233/JAD-210538 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Chang W, Cheng J, Allaire J, et al. Shiny: Web Application Framework for R. R package version 1.7.1. 2021. https://CRAN.R‐project.org/package=shiny
  • 25. Johnson SC, Koscik RL, Jonaitis EM, et al. The Wisconsin Registry for Alzheimer's Prevention: a review of findings and current directions. Alzheimers Dement Diagn Assess Dis Monit. 2018;10:130‐142. doi: 10.1016/j.dadm.2017.11.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Gollan TH, Weissberger GH, Runnqvist E, Montoya RI, Cera CM. Self‐ratings of spoken language dominance: a Multi‐Lingual Naming Test (MINT) and preliminary norms for young and aging Spanish‐English bilinguals. Biling Camb Engl. 2012;15(3):594‐615. doi: 10.1017/S1366728911000332 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Beekly DL, Ramos EM, Lee WW, et al. The National Alzheimer's Coordinating Center (NACC) database: the uniform data set. Alzheimer Dis Assoc Disord. 2007;21(3):249‐258. doi: 10.1097/WAD.0b013e318142774e [DOI] [PubMed] [Google Scholar]
  • 28. Monsell SE, Dodge HH, Zhou XH, et al. Results from the NACC uniform data set neuropsychological battery crosswalk study. Alzheimer Dis Assoc Disord. 2016;30(2):134‐139. doi: 10.1097/WAD.0000000000000111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Schmidt M. Rey Auditory Verbal Learning Test: A Handbook. Western Psychological Services; 1996. [Google Scholar]
  • 30. Manly JJ, Jacobs DM, Touradji P, Small SA, Stern Y. Reading level attenuates differences in neuropsychological test performance between African American and White elders. J Int Neuropsychol Soc. 2002;8(3):341‐348. [DOI] [PubMed] [Google Scholar]
  • 31. Manly JJ, Touradji P, Tang MX, Stern Y. Literacy and memory decline among ethnically diverse elders. J Clin Exp Neuropsychol. 2003;25(5):680‐690. [DOI] [PubMed] [Google Scholar]
  • 32. Wilkinson GS. The Wide Range Achievement Test: Manual. 3rd Ed.. Jastak Association; 1993. [Google Scholar]
  • 33. Ivnik RJ, Malec JF, Smith GE, Tangalos EG, Petersen RC. Neuropsychological tests’ norms above age 55: COWAT, BNT, MAE token, WRAT‐R reading, AMNART, STROOP, TMT, and JLO. Clin Neuropsychol. 1996;10(3):262‐278. [Google Scholar]
  • 34. Gershon RC, Cook KF, Mungas D, et al. Language measures of the NIH toolbox cognition battery. J Int Neuropsychol Soc. 2014;20(6):642‐651. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Hughes CP, Berg L, Danziger WL, Coben LA, Martin RL. A new clinical scale for the staging of dementia. Br J Psychiatry J Ment Sci. 1982;140:566‐572. [DOI] [PubMed] [Google Scholar]
  • 36. Galvin JE. The Quick Dementia Rating System (QDRS): a rapid dementia staging tool. Alzheimers Dement Diagn Assess Dis Monit. 2015;1(2):249‐259. doi: 10.1016/j.dadm.2015.03.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Berman SE, Koscik RL, Clark LR, et al. Use of the Quick Dementia Rating System (QDRS) as an initial screening measure in a longitudinal cohort at risk for Alzheimer's disease. JAD Rep. 2017;1(1):9‐13. doi: 10.3233/ADR-170004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Betthauser TJ, Koscik RL, Jonaitis EM, et al. Amyloid and tau imaging biomarkers explain cognitive decline from late middle‐age. Brain. 2020;143(1):320‐335. doi: 10.1093/brain/awz378 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Sprecher KE, Bendlin BB, Racine AM, et al. Amyloid burden is associated with self‐reported sleep in nondemented late middle‐aged adults. Neurobiol Aging. 2015;36(9):2568‐2576. doi: 10.1016/j.neurobiolaging.2015.05.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Racine AM, Clark LR, Berman SE, et al. Associations between performance on an abbreviated CogState battery, other measures of cognitive function, and biomarkers in people at risk for Alzheimer's disease. J Alzheimers Dis JAD. 2016;54(4):1395‐1408. doi: 10.3233/JAD-160528 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Betthauser TJ, Bilgel M, Koscik RL, et al. Multi‐method investigation of factors influencing amyloid onset and impairment in three cohorts. Brain. 2022;145(11):4065‐4079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Koscik RL, Betthauser TJ, Jonaitis EM, et al. Amyloid duration is associated with preclinical cognitive decline and tau PET. Alzheimers Dement Diagn Assess Dis Monit. 2020;12(1):e12007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Jack CR, Wiste HJ, Weigand SD, et al. Defining imaging biomarker cut points for brain aging and Alzheimer's disease. Alzheimers Dement J Alzheimers Assoc. 2017;13(3):205‐216. doi: 10.1016/j.jalz.2016.08.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Graves RE, Bezeau SC, Fogarty J, Blair R. Boston naming test short forms: a comparison of previous forms with new item response theory based forms. J Clin Exp Neuropsychol. 2004;26(7):891‐902. doi: 10.1080/13803390490510716 [DOI] [PubMed] [Google Scholar]
  • 45. Koscik RL, Jonaitis EM, Clark LR, et al. Longitudinal standards for mid‐life cognitive performance: identifying abnormal within‐person changes in the Wisconsin Registry for Alzheimer's Prevention. J Int Neuropsychol Soc JINS. 2019;25(1):1‐14. doi: 10.1017/S1355617718000929 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Liu‐Seifert H, Andersen S, Case M, et al. Statistical properties of continuous composite scales and implications for drug development. J Biopharm Stat. 2017;27(6):1104‐1114. doi: 10.1080/10543406.2017.1315819 [DOI] [PubMed] [Google Scholar]
  • 47. Donohue MC, Sperling RA, Salmon DP, et al. The preclinical Alzheimer cognitive composite: measuring amyloid‐related decline. JAMA Neurol. 2014;71(8):961‐970. doi: 10.1001/jamaneurol.2014.803 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Donohue MC, Sun CK, Raman R, et al. Cross‐validation of optimized composites for preclinical Alzheimer's disease. Alzheimers Dement N Y N. 2017;3(1):123‐129. doi: 10.1016/j.trci.2016.12.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Jonaitis EM, Koscik RL, Clark LR, et al. Measuring longitudinal cognition: individual tests versus composites. Alzheimers Dement Diagn Assess Dis Monit. 2019;11:74‐84. doi: 10.1016/j.dadm.2018.11.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Wechsler D. Wechsler Adult Intelligence Scale—Revised. Harcourt Brace & Co. for The Psychological Corporation; 1981. [Google Scholar]
  • 51. He XM. Quantile curves without crossing. Am Stat. 1997;51(2):186‐192. doi: 10.2307/2685417 [DOI] [Google Scholar]
  • 52. Koenker R. Quantile Regression. Cambridge University Press; 2005. [Google Scholar]
  • 53. Geraci M. Qtools: a coolection of models and tools for quantile interference. R J. 2016;8(2):117‐138. [Google Scholar]
  • 54. Koenker R, quantreg: quantile regression. R package version 5.86. Published online. 2021. https://CRAN.R‐project.org/package=quantreg
  • 55. Goodwill AM, Campbell S, Henderson VW, et al. Robust norms for neuropsychological tests of verbal episodic memory in Australian women. Neuropsychology. 2019;33(4):581‐595. doi: 10.1037/neu0000522 [DOI] [PubMed] [Google Scholar]
  • 56. Jak AJ, Bondi MW, Delano‐Wood L, et al. Quantification of five neuropsychological approaches to defining mild cognitive impairment. Am J Geriatr Psychiatry Off J Am Assoc Geriatr Psychiatry. 2009;17(5):368‐375. doi: 10.1097/JGP.0b013e31819431d5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Cliff N. Dominance statistics—ordinal analyses to answer ordinal questions. Psychol Bull. 1993;114(3):494‐509. doi: 10.1037/0033-2909.114.3.494 [DOI] [Google Scholar]
  • 58. Nowok B, Raab GM, Dibben C. synthpop: bespoke creation of synthetic data in R. J Stat Softw. 2016;74(11):1‐26. doi: 10.18637/jss.v074.i11 [DOI] [Google Scholar]
  • 59. Alden EC, Lundt ES, Twohy EL, et al. Mayo normative studies: a conditional normative model for longitudinal change on the auditory verbal learning test and preliminary validation in preclinical Alzheimer's disease. Alzheimers Dement Amst Neth. 2022;14(1):e12325. doi: 10.1002/dad2.12325 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Maidman A, Wang L, Zhou XH, Sherwood B. Quantile partially linear additive model for data with dropouts and an application to modeling cognitive decline. Stat Med. 2023;42(16):2729‐2745. doi: 10.1002/sim.9745 [DOI] [PubMed] [Google Scholar]
  • 61. Zuno Reyes A, Trejo S, Matute E. Linear and nonlinear effect of years of schooling, sex, and age on the CERAD‐MX and complementary tasks in a Mexican sample: a cross‐sectional study. Arch Clin Neuropsychol Off J Natl Acad Neuropsychol. 2023;38(6):962‐975. doi: 10.1093/arclin/acad009 [DOI] [PubMed] [Google Scholar]
  • 62. Gross AL, Hassenstab JJ, Johnson SC, et al. A classification algorithm for predicting progression from normal cognition to mild cognitive impairment across five cohorts: the preclinical AD consortium. Alzheimers Dement Amst Neth. 2017;8:147‐155. doi: 10.1016/j.dadm.2017.05.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Chan KS, Gross AL, Pezzin LE, Brandt J, Kasper JD. Harmonizing measures of cognitive performance across international surveys of aging using item response theory. J Aging Health. 2015;27(8):1392‐1414. doi: 10.1177/0898264315583054 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Cody KA, Koscik RL, Erickson CM, et al. Associations of the Lifestyle for Brain Health index with longitudinal cognition and brain amyloid beta in clinically unimpaired older adults: findings from the Wisconsin Registry for Alzheimer's Prevention. Alzheimers Dement Amst Neth. 2022;14(1):e12351. doi: 10.1002/dad2.12351 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

ALZ-20-3305-s001.docx (887.1KB, docx)

Supporting Information

ALZ-20-3305-s002.pdf (786.3KB, pdf)

Articles from Alzheimer's & Dementia are provided here courtesy of Wiley

RESOURCES