Abstract
INTRODUCTION
Practice effects (PEs) are a well‐known potential confound in natural history studies of longitudinal cognitive decline in aging and early‐stage Alzheimer's disease. The implication of PEs on Alzheimer's disease clinical trials is less well understood, although we have previously speculated that a “run‐in” period of repeated cognitive assessments prior to randomization may improve the efficiency of clinical trials [Jacobs et al. Alzheimer's & Dementia 2017;3(4):531‐535]. We have also described how the performance of composite outcome measures depends on parameters that may be influenced by PEs.
METHODS
To investigate this, we used the cognitive battery within the National Alzheimer's Coordinating Center (NACC) Uniform Data Set to characterize the potential impact of PEs on clinical trial design and outcome measures. The analysis restricted to N = 1094 amnestic mild cognitively impaired participants with 3 years of follow‐up data. Linear mixed effects models estimate the magnitude of PEs observed in aMCI participants. Power calculations informed by the pattern of progression in the NACC sample were used to describe the net impact of PEs on trials with and without a run‐in phase. Weighting parameters of optimal composite measures constructed from the NACC battery were also compared.
RESULTS
PEs were large, often exceeding the magnitude of annual rate of change observed in later assessments. Annualized rate of change, and therefore target treatment effect sufficient to achieve a specified percentage reduction in rate of decline, was larger after run‐in. Sample size projections for the run‐in design were a fraction of those required for trials without run‐in. Weighting parameters that optimize composite outcome performance were also different for the two designs, underscoring the importance of considering design in the construction of composite outcomes.
DISCUSSION
Clinical trials randomizing after a run‐in period measure treatment efficacy relative to decline unbiased by PEs, and require smaller sample size.
Highlights
In the National Alzheimer's Coordinating Center (NACC), amnestic mild cognitive impairment (aMCI) cohort practice effects often exceed annualized rate of change.
Run‐in clinical trial designs can be used to extinguish practice effects.
Rate of decline after run‐in is faster and unbiased by practice effects.
Run‐in designs correctly target the most clinically relevant outcome signal.
Practice effects also impact weighting of optimal composite measures.
Keywords: clinical trial design, clinical trial efficiency, composite measures, optimal weights, practice effects
1. INTRODUCTION
Practice effects are typically defined as improvements in cognitive test performance on repeated exposure to the same instrument at two or more visits. 1 , 2 It is thought that these improvements result from both task familiarity and practice‐related item‐level familiarity. 3 When there is a short test–retest interval practice effects are often large 4 and can be identified through a clear measurable improvement in cognitive test performance on repeated testing. For longer test–retest intervals when normative decline is expected, this improvement may be less obvious because it is confounded with the true rate of decline as time progresses. In studies of Alzheimer's disease (AD) where practice effects have been observed both for individual cognitive tests 3 , 5 , 6 , 7 and composite measures, 8 , 9 the effects of aging and dementia may diminish or mask the practice effect signal, such that even stable or declining test performances reflect bias. 7 , 10 Thus even in the absence of measurable improvements in cognitive test performance, practice effects may nonetheless be present and can affect the observed rate of decline and treatment effect signal observed in Alzheimer clinical trials. The potential bias incurred by ignoring practice effects with or without measurable improvements may also have implications for other treatment trials with cognitive endpoints, including Huntington's disease, 11 multiple sclerosis, 12 , 13 human immunodeficiency virus (HIV), 14 , 15 and cognitive rehabilitation in cancer survivors. 16 , 17 Hence, the issues and proposed solutions presented in the manuscript are relevant to many applications beyond Alzheimer clinical trials.
Current approaches to mitigating the impact of practice effects in clinical trials include administering alternative test forms, 18 or restricting analyses to subgroups defined by the presence or absence of practice effects. 6 Alternatively, we have proposed that clinical trials use a single‐blind placebo run‐in period to “wash out” practice effects before randomization. 8 Our observation 8 has been that target treatment effect size, when calculated to achieve a prespecified percent reduction in rate of cognitive decline under treatment, is larger with the run‐in design. When the target treatment effect is calculated in this way, the larger effect size means that substantially smaller sample size is required to power a clinical trial. Describing effect size as percentage reduction in rate of decline (percent slowing) is an effective method of communicating treatment effects to the lay public unfamiliar with the assessment instruments used by clinical trialists, and was the method of communication of treatment effects used by each of the successful monoclonal antibody trials reported to date. 19 , 20 These trials observed net treatment effects of 20%–30% slowing of decline for a range of cognitive and functional outcomes. 21 We anticipate that treatment effects sufficient to achieve this magnitude of slowing are a reasonable pragmatic benchmark for what can be considered a useful clinical intervention response, and that Phase 3 trials may reference this effect size moving forward in early AD trials.
In this article, we use data from the National Alzheimer's Coordinating Center (NACC) cognitive battery to illustrate and estimate the potential magnitude of practice effects in participants with a baseline diagnosis of amnestic mild cognitive impairment (aMCI), and to investigate the implications of a clinical trial run‐in phase on sample size projections and on weighting parameters that optimize cognitive composite outcomes. Based on our prior observations, 8 we hypothesize that rate of decline after practice effects are extinguished will be faster and that sample size projections will be smaller for this design.
2. METHODS
The NACC Uniform Data Set (UDS) 22 , 23 provides an opportunity to investigate the potential magnitude of practice effects in the aMCI population, as will be described in Section 2.4. We can also use these data to estimate placebo arm performance and inform power calculations for clinical trials without and with a single‐blind run‐in period and describe how this affects sample size projections for a clinical trial. NACC data from baseline to follow‐up are representative of placebo arm data for a trial that does not include a run‐in period. These are the type of data conventionally used to inform the design of future clinical trials. Alternatively, we can consider the baseline visit a practice session, and the subsequent visits as representative of data that would be expected for a trial with a run‐in design. We will use power calculations informed by these two subsets of the NACC data set to estimate the potential improvement in clinical trial efficiency realized by the run‐in design. Finally, we use these data to estimate and compare optimal cognitive composite outcome measures under the respective trial designs.
RESEARCH IN CONTEXT
Systematic review: So called “practice effects” following from test familiarity on repeated cognitive assessments have been well characterized in the psychometric literature. Practice effects may be relevant to clinical trials using cognitive outcome measures because they confound observed decline from first assessment to end of trial. We have speculated that a single arm placebo ‘'run‐in’' phase to extinguish practice effects prior to randomization may improve the efficiency of clinical trials.
Interpretation: Using novel models, we found that the magnitude of practice effects is large in the amnestic mild cognitive impairment population. Rate of cognitive decline unbiased by practice effects is faster, meaning trials powered to detect a prespecified percent slowing in rate of cognitive decline require a substantially smaller sample size.
Future directions: Future investigations to estimate the magnitude of practice effects of specific outcome measures and target populations are required to inform sample size requirements for trials with a run‐in design.
2.1. Study sample
The analysis was conducted using 1094 participants from the NACC UDS (Version 2), 22 identified after restricting to participants who were above 60‐years old and with a diagnosis of amnestic mild cognitive impairment at their baseline visit, and who had complete cognitive testing data for their baseline visit and for at least three consecutive annual follow‐up visits after the baseline visit. NACC mild cognitive impairment criteria are (1) subjective concern regarding decline in cognitive function from a previous state, (2) documented impairment in one or more cognitive domains, and (3) largely preserved independence in functional ability. 22 A diagnosis of aMCI was made if memory was one of the affected cognitive domains. 22 , 23
2.2. Component tests
The NACC UDS (Version 2) cognitive battery consists of 11 individually administered component tests. For two of these, Animal Naming and Vegetable Naming, the mean was computed to form a single Category Fluency variable. The remaining components were taken directly from the UDS battery: the Mini‐Mental Status Exam (MMSE), the Wechsler Adult Intelligence Scale – Revised (WAIS‐R) Digit Symbol Substitution Test (DSST), Story Recall Immediate, Story Recall Delayed, Digit Span Forward number correct, Digit Span Backward number correct, Trail Making Test A time to completion, Trail Making Test B time to completion, and the Boston Naming test.
2.3. Composite measures
Many contemporary clinical trials use a composite measure consisting of multiple components as the primary outcome measure, for example, the Preclinical Alzheimer Cognitive Composite (PACC), 24 the Alzheimer's Disease Neuroimaging Initiative (ADNI) battery composite, 25 and the Alzheimer's Disease Assessment Scale Cognitive and Executive Subscales (ADAS‐Cog‐Exec). 26 To investigate the extent to which practice effects impact the calculation and performance of composite measures, we used a modification of the PACC constructed from PACC component measures available in the NACC database (the MMSE, DSST, and Story Recall Delayed) called the “PACC3”. 9 Composite outcomes are calculated as the weighted sum of component cognitive measures. For each clinical trial design, we calculated optimal composite weighting parameters that maximize the signal‐to‐noise of change from first to last visit in a clinical trial. 9 , 27
2.4. Statistical methods
Practice effects were estimated by fitting linear mixed effects models with random slopes and intercepts that also included an additional fixed effect indicator variable for the baseline NACC assessment. The indicator variable allows the mean of the baseline visits to deviate from the linear fixed effect trajectory of the post‐baseline observations. The coefficient for this indicator variable offset term estimates the mean deviation of the first NACC assessment from the linear fixed effect trajectory estimated from the post‐baseline visits (see Figure 1). Assuming the longitudinal trajectories extrapolated to baseline time (Figure 1) describe what would have been observed if there had been repeated measures sufficient to “wash out” practice effects, the magnitude of these offset terms estimates the potential magnitude of the shift in scores that may have been be achieved if repeated measures had been applied prior to the first NACC assessment.
FIGURE 1.

Mean scores by visit for each component in the NACC cognitive battery. Vertical bars denote ± 1 standard error. Fixed effect trajectories from mixed effects model fits are shown in gray. Practice effect estimates are indicated by the red line segments at Visit 1 and show the discrepancy in test performance between the Visit 1 mean score and the fixed effect trajectory. NACC, National Alzheimer's Coordinating Center
Power calculations were performed for two mock clinical trials informed by the NACC data, a 24‐month trial assuming no run‐in informed by baseline and month 24 data from NACC, and a 24‐month trial, with run‐in, informed by month 12 to month 36 data from NACC. We do recognize that the mock run‐in trial informed by available NACC data is not perfectly representative of what would actually be implemented in a future trial designed with a run‐in phase. In practice, in a pre‐planned trial with a run‐in phase the run‐in period would be a much shorter interval and would likely also include more than one run‐in period repeated assessment to extinguish practice effects. In these ways, the NACC data are not perfectly representative of what would be experience in a run‐in designed trial. Nonetheless, the NACC data are the best available data to inform run‐in effects, and on the balance, biases introduced by using NACC data are likely conservative (see further comments in Section 4).
Power was calculated using the standard formula for a two‐sample t test:
As implemented in the R statistical programming language power.t.test function assuming equal allocation to arms. Here, is the within group variance of the change, ∆ is the treatment effect size, and and are the quantiles of the normal distribution chosen to correspond to a two‐sided test with alpha error rate 0.05 and power of 0.8. We use relative efficiency to compare the performance of outcome measures under the two different designs, defined here as the ratio of sample size required for the design with run‐in to the sample size required for a design without run‐in. In tables and text, treatment effect sizes for sample size projections are expressed in units of Cohen's d, that is, ∆ expressed in units of the standard deviation of change .
3. RESULTS
Recruitment visit demographics and cognitive scores are summarized in Table 1. Participants were between the age of 61– 97 years with a mean age of 74.7 (SD = 7.2) years; 45.4% were female, and 38.8% were APOE ε4 carriers. At baseline, the mean score on the MMSE was 27.6 (SD = 2.1).
TABLE 1.
Baseline characteristics
| Parameter n = 1094 | Baseline |
|---|---|
| Age, mean (SD) | 74.65 (7.19) |
| Education, mean (SD) | 15.57 (3.04) |
| Female (%) | 497 (45.4) |
| APOE4 positive (%) | 425 (38.8) |
| White (%) | 932 (85.1) |
| Black or African American (%) | 128 (11.7) |
| Asian (%) | 18 (1.6) |
| Other (%) | 15 (1.4) |
| NACC Neuropsychological Battery | |
| MMSE, mean (SD) | 27.55 (2.11) |
| DSST, mean (SD) | 40.36 (11.08) |
| Story Immed., mean (SD) | 9.30 (4.12) |
| Story Delayed, mean (SD) | 6.73 (4.61) |
| Digit Span—F, mean (SD) | 8.21 (2.03) |
| Digit Span—B, mean (SD) | 6.23 (2.12) |
| Trail Making A, mean (SD) | 39.78 (18.24) |
| Trail Making B, mean (SD) | 117.09 (64.11) |
| Boston Naming, mean (SD) | 25.73 (3.94) |
| Category Fluency, mean (SD) | 14.23 (3.56) |
Abbreviations: APOE4, apolipoprotein E4; DSST, Digit Symbol Substitution Test; MMSE, Mini‐Mental Status Examination; NACC, National Alzheimer's Coordinating Center.
3.1. Magnitude of practice effects
The mean and standard error at each visit for each component of the NACC battery are illustrated in Figure 1 and provided in Table S1. The pattern of change after the baseline visit is approximately linear for most component tests, while the baseline visit means do not fit this linear pattern (Figure 1). To characterize this, linear mixed effects models were fit to the longitudinal scores of each component test. Fixed effect intercept and slope terms in these models estimate the mean linear trajectory from the first to the third follow‐up and are illustrated by the fitted lines in the plots of each component measure. An additional baseline visit indicator term was included in the model. This indicator variable estimates the offset distance between the mean level predicted by the fixed effects model and the observed baseline mean score. The estimated offsets are illustrated by vertical line segments between these two points in Figure 1.
The mean annual rate of change post‐baseline, the mean offset, and their ratio are summarized in Table 2. The offset term (estimating the potential magnitude of practice effects) is greater than 50% of the annual rate of change on follow‐up visits for all of the cognitive measures. The offset term approached 100% of the annual rate of change for Trail Making Test B (86%) and Boston Naming (92%). The offset term exceeded 100% for Story Recall Immediate (148%), Story Recall Delayed (229%), and Trail Making Test A (107%). In the mixed effects models, the offset terms were highly significant (p < 0.001) for all of the cognitive measures except for Digit Span Forward and Digit Span Backward. In regard to detecting decline, p‐values for all cognitive measures were highly statistically significant (p < 0.0001), although this is not unexpected given the large sample size.
TABLE 2.
Fixed effect parameter estimates from the mixed effects model fits illustrated in Figure 1
| Parameter | Offset term | Annual rate of change | |||
|---|---|---|---|---|---|
| Instrument | Coef. | p‐value | Coef. | p‐value | Ratio* |
| MMSE | −0.453 | 3.4E‐08 | −0.65 | 1.2E‐61 | 0.696 |
| DSST | −0.884 | 1.1E‐04 | −1.381 | 1.7E‐38 | 0.64 |
| Story Immed. | −0.417 | 7.4E‐04 | −0.282 | 2.5E‐07 | 1.479 |
| Story Delayed | −0.508 | 2E‐05 | −0.222 | 3.2E‐05 | 2.285 |
| Digit Span—F | −0.102 | 1.1E‐01 | −0.131 | 2.6E‐07 | 0.781 |
| Digit Span—B | −0.098 | 1.4E‐01 | −0.147 | 5.6E‐08 | 0.667 |
| Trail Making A | 2.459 | 6.3E‐05 | 2.295 | 2E‐17 | 1.071 |
| Trail Making B | 11.658 | 8.8E‐09 | 13.589 | 1.6E‐49 | 0.858 |
| Boston Naming | −0.423 | 4E‐06 | −0.457 | 3.4E‐26 | 0.924 |
| Category Fluency | −0.368 | 4.5E‐04 | −0.652 | 3.4E‐46 | 0.565 |
Abbreviations: DSST, Digit Symbol Substitution Test; MMSE, Mini‐Mental Status Examination.
3.2. Implications for sample size calculations
Parameters used for power calculations, and projected sample size assuming a target 30% slowing of progression under treatment, are summarized for each design and each outcome measures in Table 3. Effect sizes are larger for the run‐in design (Table 3). Sample size projections under the run‐in design are one‐half or less the sample size projections under the design with no run‐in for almost all outcome measures. The MMSE was the most sensitive of the NACC component measures. Under the run‐in design the sample size required for the MMSE outcome (738 per arm) is 46% of the sample size required under the design with no run‐in (1608 per arm).
TABLE 3.
Comparing the effect of a run‐in to wash‐out the practice effect on the projected sample size (N/arm) needed to detect a 30% slowing of decline for a 2‐year trial
| Parameter | Power calculations without run‐in phase | Power calculations with run‐in phase | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Instrument | 24‐month mean change (SD) | Effect size (30% slowing) | Cohen's d (30% slowing) | N/arm | 24‐month mean change (SD) | Effect size (30% slowing) | Cohen's d (30% slowing) | N/arm | Relative efficiency * |
| MMSE | −0.78 (2.37) | −0.23 | −0.1 | 1608 | −1.30 (2.67) | −0.39 | −0.15 | 738 | 0.46 |
| DSST | −1.87 (6.80) | −0.56 | −0.08 | 2309 | −2.76 (7.22) | −0.83 | −0.11 | 1194 | 0.52 |
| Story Immed. | −0.17 (3.71) | −0.05 | −0.01 | 86001 | −0.56 (3.49) | −0.17 | −0.05 | 6682 | 0.08 |
| Story Delayed | 0.06 (3.69) | 0.02 | 0.01 | 615305 | −0.44 (3.40) | −0.13 | −0.04 | 10206 | 0.02 |
| Digit Span—F | −0.15 (1.68) | −0.04 | −0.03 | 22180 | −0.26 (1.69) | −0.08 | −0.05 | 7258 | 0.33 |
| Digit Span—B | −0.12 (1.83) | −0.04 | −0.02 | 40642 | −0.29 (1.70) | −0.09 | −0.05 | 5891 | 0.14 |
| Trail Making A | 1.81 (17.00) | 0.54 | 0.03 | 15399 | 4.59 (17.89) | 1.38 | 0.08 | 2652 | 0.17 |
| Trail Making B | 12.86 (55.13) | 3.86 | 0.07 | 3208 | 27.18 (62.69) | 8.15 | 0.13 | 929 | 0.29 |
| Boston Naming | −0.41 (2.72) | −0.12 | −0.04 | 7814 | −0.91 (2.99) | −0.27 | −0.09 | 1861 | 0.24 |
| Category Fluency | −0.98 (2.87) | −0.29 | −0.1 | 1495 | −1.30 (3.08) | −0.39 | −0.13 | 975 | 0.65 |
Sample size with run‐in:sample size without run‐in.
Abbreviations: DSST, Digit Symbol Substitution Test; MMSE, Mini‐Mental Status Examination.
3.3. Implications for the determination of weights that optimize performance of composite endpoints
Weighting parameters used to calculate optimal composites are summarized for each design and each composite outcome in Table 4. Weights that optimize the performance of a composite are a function of the magnitude and standard deviation of change for each component measure, as well as the covariance of change scores between the measures. 27 , 28 Each of these parameters is different for designs with and without run‐in, and the net effect of these differences can result in changes in the weights used to calculate an optimal composite. We found that weights for two of the PACC3 components only changed modestly, while both the magnitude and sign of the Story Recall Delayed component changed between the two designs (Table 4).
TABLE 4.
Weighting parameters for calculating the optimal PACC3 component measure for trials with run‐in and trials without run‐in
| Component measure weight | |||
|---|---|---|---|
| Design | MMSE | DSST | Story Delayed |
| Trial with run‐in | 0.769 | 0.187 | 0.044 |
| Trial without run‐in | 0.676 | 0.188 | −0.136 |
Abbreviation: PACC3, Preclinical Alzheimer Cognitive Composite.
4. DISCUSSION
We found that practice effects are clearly present in this aMCI sample. Moreover, the practice effects directly modified the “signal‐to‐noise” in change from baseline visit to final visit in these data. Sample size projections for clinical trials with a run‐in period to accommodate practice effects were a fraction of sample size projections for trials without a run‐in period. Because signal‐to‐noise is a critical parameter in the formula used to calculate optimal composite outcomes designed to maximize the efficiency of clinical trials, 27 , 28 the calculation of optimal composites was also different for trials with and without a run‐in period. These findings illustrate the importance of considering practice effects when planning clinical trials with cognitive outcome measures.
There are limitations to this analysis. Primarily, we did not have data with a formal run‐in period specifically designed to “wash out” practice effect signal from longitudinal data. Instead, we used available data to estimate the potential magnitude of this effect. In our mock clinical trial with run‐in, we used the initial evaluation as the practice session, after which there was a 12‐month lag before the baseline visit. We presume there is some attenuation of practice effect during this long interval. Moreover, for some cognitive outcome measures, repeated retesting is required to fully extinguish practice effects. 4 An actual clinical trial might use a run‐in period on the order of 4–6 weeks, and, depending on the outcome measure, may even use multiple practice sessions to fully extinguish practice effect signal. In this way our estimates of practice effect may underestimate the actual signal that would be observed in a clinical trial with a run‐in period designed to fully extinguish practice effects, and our estimates of relative efficiency (Table 3) may underestimate the improvement in statistical power that may be achieved by this design. Conversely, basing the analysis on NACC participants with 3‐year follow‐up data may have induced subtle biases, and the fact that NACC participants are a year older after the mock ‘'run‐in’' period, may have induced a slight anti‐conservative bias in estimated improvement in power for the run‐in design. Cumulatively these minor conservative and anti‐conservative biases do not detract from our overall conclusion that practice effects are present in aMCI data and can have a profound effect on sample size calculations. We further emphasize that the sample size estimates presented are not intended for the design of future trials but rather to illustrate the practice effect phenomenon and to underscore the importance of practice effects when powering trials with a run‐in design. The data used in this analysis are for aMCI subjects recruited between 2005 and 2015, and are not representative of participants recruited to contemporary clinical trials, especially trials that use biomarker confirmation of Alzheimer's disease in their inclusion criteria. Future studies more representative of contemporary clinical trial participants will be required to inform the design of future trials.
The recently completed trial of solanezumab for the treatment of preclinical Alzheimer's disease provides an opportunity to illustrate the potential impact of practice effects with real data. In this trial, the covariate‐adjusted mean placebo arm PACC score increased by about 0.25 points from baseline randomization to week 72, after which the mean score steadily declined, dropping by approximately 1.5 points from its peak to the week 240 end of study visit (Fig. 2A in ref. [18]). The net change from baseline to end of study was 1.25 points, or a 0.005 points per week rate of decline. A trial informed by this baseline to week 240 data and targeting a 30% slowing of decline would power to detect a difference in mean PACC score equal to 30% of 0.005, or 0.0015 points per week slowing. On the other hand, assuming instead randomization was performed after a run‐in period to extinguish practice effects, the estimated rate of decline under placebo is 1.5/ (240 ‐ 72) and equals a 0.009 points per week rate of decline. A trial with run‐in design informed by these data and targeting a 30% slowing of decline would power to detect a 0.0027 point per week slowing of decline under active treatment. The larger treatment effect in the latter scenario is easier to power when designing a future clinical trial. All other things being equal (length of time on treatment and the variance of change baseline randomization visit to close out being equal), and assuming a mixed model repeated measures analysis, the run‐in design trial would require 70% less participants to achieve a given level of power (see formulas derived in ref. [28]). We emphasize that the faster rate of decline observed after the extinguishing of practice effects is unbiased by practice effect signal, and is therefore the true rate of decline relevant to the United States Food and Drug Administration (FDA) and to patients seeking clinical intervention. The larger treatment effect requires a more effective intervention, but is the effect size needed to achieve a meaningful 30% slowing of decline.
Not all outcome measures have practice effects. For example, informant‐ and clinician‐derived functional scales are less prone to the factors influencing performance on repeated cognitive assessments. Among cognitive scales, there is a substantial range in the magnitude of practice effects (see e.g., Table 2). The target population is also a critical determinant in the magnitude of practice effects. The aforementioned A4 trial targeted cognitively normal participants and observed practice effects up to 72 weeks after baseline. The A4 trial included one administration of the PACC at a screening visit no more than 90 days before baseline, 29 after which there was a large (1.2 point) improvement in mean score from screening to baseline consistent with practice effects. The A4 trial also used alternating versions of PACC component scales to address practice effects. 18 Despite the screening visit practice session and the alternating versions of forms, there was still evidence of practice effects up to week 72 in the A4 data. This speaks to the difficulty in extinguishing practice effects in cognitively normal participants. Practice effects should be an important consideration in the choice of outcome measures for clinical trials, and this choice depends on the domain of measure as well as the target population.
5. CONCLUSIONS
We have demonstrated that practice effects are present in the aMCI population, and that practice effects can have a dramatic effect on statistical power when target effect size is calculated to achieve a specified percent slowing of decline. The critical observation is that rate of decline after practice effects have been extinguished is steeper, meaning that the treatment effect size required to achieve a specified percent slowing of decline is larger, and easier to power for, when using the run‐in design. Clinically meaningful treatment effects should in fact be determined relative to the unbiased steeper rate of decline observed after practice effects have been extinguished. The net impact is that the run‐in design correctly targets the most clinically relevant outcome signal, for which smaller sample sizes are required to power clinical trials.
CONFLICT OF INTEREST STATEMENT
M.L.T. reports: consulting services for NeuroUX; H.H.F. reports: grants to UCSD from Allyx Therapeutics and Vivoryon (Probiodrug); service agreements through UCSD for consulting with LuMind Foundation, Biosplice Therapeutics, Arrowhead Pharmaceuticals, Novo Nordisk, and Axon Neurosciences; DMC and DSMB services for Roche/Genentech Pharmaceuticals, Tau Consortium, and Janssen Research & Development LLC, as well as serving on the Scientific Advisory Board for the Tau Consortium. All related funds are directed to UCSD with none personally received. He also reports philanthropic support through the Epstein Family Alzheimer's Disease Collaboration. He reports personal funds received for Detecting and Treating Dementia Serial Number 12/3‐2691 U.S. Patent No. PCT/US2007/07008, Washington DC, U.S. Patent and Trademark Office.; and S.D.E. reports: DSMB services for clinical trials performed by Janssen Research & Development LLC and Suven. H.H.D., D.M.J., and J.A.D. do not report any conflicts of interest.
CONSENT STATEMENT
Written informed consent was obtained from all NACC participants and coparticipants. Participants in the A4 trial provided written informed consent before enrollment.
The NACC database is funded by NIA/NIH Grant U24 AG072122. NACC data are contributed by the NIA‐funded ADRCs: P30 AG062429 (PI James Brewer, MD, PhD), P30 AG066468 (PI Oscar Lopez, MD), P30 AG062421 (PI Bradley Hyman, MD, PhD), P30 AG066509 (PI Thomas Grabowski, MD), P30 AG066514 (PI Mary Sano, PhD), P30 AG066530 (PI Helena Chui, MD), P30 AG066507 (PI Marilyn Albert, PhD), P30 AG066444 (PI David Holtzman, MD), P30 AG066518 (PI Lisa Silbert, MD, MCR), P30 AG066512 (PI Thomas Wisniewski, MD), P30 AG066462 (PI Scott Small, MD), P30 AG072979 (PI David Wolk, MD), P30 AG072972 (PI Charles DeCarli, MD), P30 AG072976 (PI Andrew Saykin, PsyD), P30 AG072975 (PI Julie A. Schneider, MD, MS), P30 AG072978 (PI Ann McKee, MD), P30 AG072977 (PI Robert Vassar, PhD), P30 AG066519 (PI Frank LaFerla, PhD), P30 AG062677 (PI Ronald Petersen, MD, PhD), P30 AG079280 (PI Jessica Langbaum, PhD), P30 AG062422 (PI Gil Rabinovici, MD), P30 AG066511 (PI Allan Levey, MD, PhD), P30 AG072946 (PI Linda Van Eldik, PhD), P30 AG062715 (PI Sanjay Asthana, MD, FRCP), P30 AG072973 (PI Russell Swerdlow, MD), P30 AG066506 (PI Glenn Smith, PhD, ABPP), P30 AG066508 (PI Stephen Strittmatter, MD, PhD), P30 AG066515 (PI Victor Henderson, MD, MS), P30 AG072947 (PI Suzanne Craft, PhD), P30 AG072931 (PI Henry Paulson, MD, PhD), P30 AG066546 (PI Sudha Seshadri, MD), P30 AG086401 (PI Erik Roberson, MD, PhD), P30 AG086404 (PI Gary Rosenberg, MD), P20 AG068082 (PI Angela Jefferson, PhD), P30 AG072958 (PI Heather Whitson, MD), P30 AG072959 (PI James Leverenz, MD).
The A4 Study is a secondary prevention trial in preclinical Alzheimer's disease, aiming to slow cognitive decline associated with brain amyloid accumulation in clinically normal older individuals. The A4 Study is funded by a public–private‐philanthropic partnership, including funding from the National Institutes of Health‐National Institute on Aging, Eli Lilly and Company, Alzheimer's Association, Accelerating Medicines Partnership, GHR Foundation, an anonymous foundation and additional private donors, with in‐kind support from Avid and Cogstate. The companion observational Longitudinal Evaluation of Amyloid Risk and Neurodegeneration (LEARN) Study is funded by the Alzheimer's Association and GHR Foundation. The A4 and LEARN Studies are led by Dr Reisa Sperling at Brigham and Women's Hospital, Harvard Medical School and Dr Paul Aisen at the Alzheimer's Therapeutic Research Institute (ATRI), University of Southern California. The A4 and LEARN Studies are coordinated by ATRI at the University of Southern California, and the data are made available through the Laboratory for Neuro Imaging at the University of Southern California. The participants screening for the A4 Study provided permission to share their de‐identified data in order to advance the quest to find a successful treatment for Alzheimer's disease. We would like to acknowledge the dedication of all the participants, the site personnel, and all of the partnership team members who continue to make the A4 and LEARN Studies possible. The complete A4 Study Team list is available on: www.actcinfo.org/a4‐study‐team‐lists.
Supporting information
Supporting Information
ACKNOWLEDGMENTS
The authors thank the participants and study teams of the NACC and the A4 Study for making these data available for research. This work was supported by the Epstein Family Alzheimer's Research Collaboration; and the following National Institute of Aging (NIA) grants: U19 AG10483, R01 AG049810, and P30 AG062429. The funding sources had no role in the design of the analysis, data interpretation, writing of the manuscript, or the decision to submit for publication.
Duehring JA, Jacobs DM, Thomas ML, Dodge HH, Feldman HH, Edland SD. Implications of practice effects for the design of Alzheimer clinical trials. Alzheimer's Dement. 2025;11:e70154. 10.1002/trc2.70154
References
- 1. Calamia M, Markon K, Tranel D. Scoring higher the second time around: meta‐analyses of practice effects in neuropsychological assessment. Clin Neuropsychol. 2012;26:543–570. [DOI] [PubMed] [Google Scholar]
- 2. Heilbronner RL, Sweet JJ, Attix DK, Krull KR, Henry GK, Hart RP. Official position of the American Academy of Clinical Neuropsychology on serial neuropsychological assessments: the utility and challenges of repeat test administrations in clinical and forensic contexts. Clin Neuropsychol. 2010;24:1267–1278. [DOI] [PubMed] [Google Scholar]
- 3. Goldberg TE, Harvey PD, Wesnes KA, Snyder PJ, Schneider LS. Practice effects due to serial cognitive assessment: implications for preclinical Alzheimer's disease randomized controlled trials. Alzheimer's Dement (Amst). 2015;1:103–111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Bartels C, Wegrzyn M, Wiedl A, Ackermann V, Ehrenreich H. Practice effects in healthy adults: a longitudinal study on frequent repetitive cognitive testing. BMC Neurosci. 2010;11:118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Gavett BE, Gurnani AS, Saurman JL, et al. Practice effects on story memory and list learning tests in the neuropsychological assessment of older adults. PLoS One. 2016;11:e0164492. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Wang G, Kennedy RE, Goldberg TE, Fowler ME, Cutter GR, Schneider LS. Using practice effects for targeted trials or sub‐group analysis in Alzheimer's disease: how practice effects predict change over time. PLoS One. 2020;15:e0228064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Elman JA, Jak AJ, Panizzon MS, et al. Underdiagnosis of mild cognitive impairment: a consequence of ignoring practice effects. Alzheimer's Dement (Amst). 2018;10:372‐381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Jacobs DM, Ard MC, Salmon DP, Galasko DR, Bondi MW, Edland SD. Potential implications of practice effects in Alzheimer's disease prevention trials. Alzheimer's Dement (N Y). 2017;3:531–535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Wang X, Jacobs D, Salmon DP, Feldman HH, Edland SD. Optimal weighting of Preclinical Alzheimer's Cognitive Composite (PACC) scales to improve their performance as outcome measures for Alzheimer's disease clinical trials. Int J Stat Med Res. 2023;12:90–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Sanderson‐Cimino M, Elman JA, Tu XM, et al. Cognitive practice effects delay diagnosis of MCI: implications for clinical trials. Alzheimer's Dement (N Y). 2022;8:e12228. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Beglinger LJ, Adams WH, Fiedorowicz JG, et al. Practice effects and stability of neuropsychological and UHDRS tests over short retest intervals in Huntington disease. J Huntingtons Dis. 2015;4:251–260. [DOI] [PubMed] [Google Scholar]
- 12. Leavitt V, Mostert J, Comtois J, et al. Measuring cognitive change in secondary progressive MS: an analysis of the ASCEND cognition substudy. J Neurol. 2025;272:338. [DOI] [PubMed] [Google Scholar]
- 13. Castrogiovanni N, Mostert J, Repovic P, et al. Longitudinal changes in cognitive test scores in patients with relapsing‐remitting multiple sclerosis: an analysis of the DECIDE dataset. Neurology. 2023;101:e1–e11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Ownby RL, Waldrop‐Valverde D, Jones DL, et al. Evaluation of practice effect on neuropsychological measures among persons with and without HIV infection in northern India. J Neurovirol. 2017;23:134–140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Sithinamsuwan P, Hutchings N, Ananworanich J, et al. Practice effect and normative data of an HIV‐specific neuropsychological testing battery among healthy Thais. J Med Assoc Thai. 2014;97 Suppl 2:S222–S233. [PMC free article] [PubMed] [Google Scholar]
- 16. Cherrier MM, Anderson K, David D, et al. A randomized trial of cognitive rehabilitation in cancer survivors. Life Sci. 2013;93:617–622. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Andreotti C, Root JC, Schagen SB, et al. Reliable change in neuropsychological assessment of breast cancer survivors. Psychooncology. 2016;25:43–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Sperling RA, Donohue MC, Raman R, et al. Trial of solanezumab in preclinical Alzheimer's disease. N Engl J Med. 2023;389:1096–1107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Sims JR, Zimmer JA, Evans CD, et al. Donanemab in early symptomatic Alzheimer disease: the TRAILBLAZER‐ALZ 2 randomized clinical trial. JAMA. 2023;330:512–527. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. van Dyck CH, Swanson CJ, Aisen P, et al. Lecanemab in early Alzheimer's disease. N Engl J Med. 2023;388(1):9‐21. doi: 10.1056/NEJMoa2212948 [DOI] [PubMed] [Google Scholar]
- 21. Edland SD, Llibre‐Guerra JJ. Semorinemab in mild‐to‐moderate Alzheimer disease: a glimmer of hope though cautions remain. Neurol. 2023;101:593–594. [DOI] [PubMed] [Google Scholar]
- 22. National Alzheimer's Coordinating Center Uniform Data Set, https://naccdata.org/data‐collection/forms‐documentation/uds‐2
- 23. Morris JC, Weintraub S, Chui HC, et al. The Uniform Data Set (UDS): clinical and cognitive variables and descriptive data from Alzheimer disease centers. Alzheimer Dis Assoc Disord. 2006;20:210–216. [DOI] [PubMed] [Google Scholar]
- 24. Donohue MC, Sperling RA, Salmon DP, et al. The preclinical Alzheimer cognitive composite: measuring amyloid‐related decline. JAMA Neurol. 2014;71:961–970. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Feldman HH, Messer K, Qiu Y, et al. Varoglutamstat: inhibiting glutaminyl cyclase as a novel target of therapy in early Alzheimer's disease. J Alzheimers Dis. 2024;101:S79–S93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Jacobs DM, Thomas RG, Salmon DP, et al. Development of a novel cognitive composite outcome to assess therapeutic effects of exercise in the EXERT trial for adults with MCI: The ADAS‐Cog‐Exec. Alzheimers Dement (N Y). 2020;6:e12059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Ard MC, Raghavan N, Edland SD. Optimal composite scores for longitudinal clinical trials under the linear mixed effects model. Pharm Stat. 2015;14:418–426. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Edland SD, Ard MC, Sridhar J, et al. Proof of concept demonstration of optimal composite MRI endpoints for clinical trials. Alzheimers Dement (N Y). 2016;2:177–181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Sperling RA, Donohue MC, Raman R, et al. Association of factors with elevated amyloid burden in clinically normal older individuals. JAMA Neurol. 2020;77:735–745. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supporting Information
