Abstract
Objective
reliable change methods can assist in the determination of whether observed changes in performance are meaningful. The current study sought to validate previously published 1-year standardized regression-based (SRB) equations for commonly administered neuropsychological measures that incorporated baseline performances, demographics, and 1-week practice effects.
Method
Duff et al.’s SRB prediction equations were applied to an independent sample of 70 community-dwelling older adults with either normal cognition or mild cognitive impairment, assessed at baseline, at 1 week, and at 1 year.
Results
minimal improvements or declines were seen between observed baseline and observed 1-year follow-up scores, or between observed 1-year and predicted 1-year scores, on most measures. Relatedly, a high degree of predictive accuracy was observed between observed 1-year and predicted 1-year scores across cognitive measures in this repeated battery.
Conclusions
these results, which validate Duff et al.’s SRB equations, will permit clinicians and researchers to have more confidence when predicting cognitive performance on these measures over 1 year.
Keywords: Reliable change, Assessment, Neuropsychology, Practice effects, Memory
Introduction
Clinical neuropsychologists are frequently called upon to provide repeated assessments of an older patient’s cognition to assess for cognitive change over time—often over an interval of 1 year (Barth et al., 2003; Chelune & Duff, 2012; Heilbronner et al., 2010). Performances may decline as a result of neurodegenerative conditions like mild cognitive impairment (MCI; Winblad et al., 2004) or Alzheimer’s disease (AD; McKhann et al., 2011), whereas improvements in a patient’s medical state may lead to better cognitive performance for conditions like cardiovascular disease (Garrett et al., 2004; Sloan & Pressler, 2009) or HIV/AIDS (Ragin, Storey, Cohen, Edelman, & Epstein, 2004a; Ragin, Storey, Cohen, Epstein, & Edelman, 2004b). Direct intervention via cognitive rehabilitation for conditions like stroke may additionally result in cognitive improvements over time (das Nair, Cogger, Worthington, & Lincoln, 2016).
When tracking the progression of a disease or gauging the benefit of an intervention, benefit from familiarity of testing procedures and repeated exposure to testing materials may arise, which are referred to as practice effects (Bartels, Wegrzyn, Wiedl, Ackermann, & Ehrenreich, 2010; Beglinger et al., 2005; Duff et al., 2007; Duff et al., 2010a). A host of factors may contribute to a patient’s susceptibility to practice effects, including patient factors like demographics (e.g., age, education) or diagnostic status, or testing factors like the cognitive domain being tested, the length of the retest interval, or the use of alternate forms (Calamia, Markon, & Tranel, 2012). Increases in test scores as a result of practice effects may lead to faulty interpretations about improvements in cognition, if not properly accounted for (Marley et al., 2017). If considered properly, however, practice effects may assist clinical interpretation regarding a patient’s prognosis (Duff et al., 2011; Hassenstab et al., 2015; Machulda et al., 2013), treatment response (Duff et al., 2010b), or disease pathology (Duff et al., 2018; Duff, Foster, & Hoffman, 2014; Galvin et al., 2005; Mormino et al., 2014). As a result, it is important for clinical neuropsychologists to consider the impact of practice effects on their patient’s cognitive performance during serial assessment.
In an effort to assist neuropsychologists in distinguishing meaningful change from practice-effect artifact during repeated cognitive assessments, a family of statistical procedures have been developed—known as reliable change methods (Hammers, Duff, & Chelune, 2015; Lezak, Howieson, Bigler, & Tranel, 2012). A variety of procedures have been created, with there being two general approaches: the “simple difference” method and the “predicted difference” method. Taking their lead from Matarazzo and Herman (Matarazzo & Herman, 1984), both approaches consider difference scores in terms of their frequency or base rate in a reference population such as a standardization sample or another patient group. The reliable change index (RCI) is the primary measurement of change using the simple difference method. The earliest RCI method was advanced by Jacobson and Truax (Jacobson & Truax, 1991). Using the standard deviation (SD) of test scores at baseline and the test–retest reliability coefficient, they computed a standard error of measurement at Time 1 (T1), multiplied it by 2 to account for measurement error in the retest scores, and then computed a standard error of the difference, an estimate of the SD of the difference scores. Chelune and colleagues (Chelune, Naugle, Lüders, Sedlak, & Awad, 1993) offered a practice-effect-adjusted RCI modification by suggesting that the mean practice effect for the reference population be incorporated in the calculation of the simple difference. Iverson (Iverson, 2001) additionally modified this practice-adjusted RCI by computing separate standard errors of measurement for baseline and retest scores and pooling these to compute an estimate of the SD of differences. Further refinements have been made to estimate the distribution of test–retest difference scores and the mean of the difference scores (Blasi et al., 2009).
Conversely, the predicted difference method uses linear regression to predict retest scores at Time 2 (T2) for individuals based on their baseline or T1 performance, along with other relevant information (e.g., demographics, diagnosis, retest interval, etc.). McSweeny and colleagues (McSweeny, Naugle, Chelune, & Luders, 1993) developed this methodology, described as the standardized regression-based (SRB) predicted difference method, which has been further advanced by Maassen and colleagues (Maassen, Bossema, & Brand, 2006). Specifically, a z score—or discrepancy change score—is calculated by comparing an individual’s predicted T2 score and his/her observed T2 scores and normalizing by the standard error of the estimate (SEest) of the regression model (z = (T2 − T2’)/SEest). Discrepancy change scores (z scores) above 1.645 frequently represent “improvement” when using reliable change methods, whereas z scores below −1.645 reflect “decline” and z scores between ±1.645 indicate “stability”. These specific z score cut-offs reflect significance at an α value of 0.10 when using 90% confidence intervals of stability, which is consistent with McSweeny and colleagues’ (1993) methods. Because of the properties of the 90% confidence interval, if the z scores were normally distributed, then one would expect 5% of participants to show “improvement,” 90% to remain “stable,” and 5% to “decline” beyond expectation (Hammers, Suhrie, Dixon, Porter, & Duff, 2020).
Given the variety of reliable change methods available across both simple difference and predicted difference methods, Hinton-Bayre (2016) recently examined differences in results between specific reliable change methods depending on baseline testing performance and other retest parameters (e.g., mean practice effects, differential practice effects, reliability coefficients, etc.). The author observed that when considering the possibility of decline at T2, McSweeny’s SRB method performed better when individual baseline scores were below the normative mean, irrespective of differential practice. When an individual score was greater than the normative mean, Chelune’s RCI performed better during scenarios with lower retest variance, and Maassen’s modified SRB method performed better during scenarios with greater retest variance. Of note, these results were reversed if considering the possibility of improvement at T2. Hinton-Bayre (2016) subsequently endorsed the use of the predicted difference (SRB) method, given study findings and its ability to incorporate multiple predictors of the retest score, from statistical parameters (e.g., mean practice, differential practice, and test–retest reliability) to participant characteristics (demographic information) and testing features (e.g., retest interval Crawford & Garthwaite, 2007). These considerations have additionally been embraced by others in the literature, with the end result being that the SRB approach has gained broad acceptance (Attix et al., 2009; Crockford et al., 2018; Duff et al., 2004; Duff and colleagues, 2010a; Gavett, Ashendorf, & Gurnani, 2015; Hammers, Suhrie, Dixon, Porter, & Duff, 2020; Rinehardt et al., 2010; Sanchez-Benavides et al., 2016; Stein, Luppa, Brahler, Konig, & Riedel-Heller, 2010).
Duff and colleagues (Duff and colleagues, 2010a) previously developed complex-SRB prediction equations for several commonly administered cognitive tests, including the Hopkins Verbal Learning Test—Revised (HVLT-R; Brandt & Benedict, 2001), the Brief Visuospatial Memory Test—Revised (BVMT-R; Benedict, 1997), Symbol Digit Modality Test (SDMT; Smith, 1973), the Trail Making Test Parts A and B (TMT-A and TMT-B; Reitan, 1992), and the Controlled Oral Association Test (COWAT; Benton, Hamsher, Rey, & Sivan, 1994). Each measure was assessed twice over 1 week in 127 community-dwelling older adults, of whom 84 were classified as cognitively intact and 53 were classified as having amnestic MCI. In addition, these same participants were then assessed a third time approximately 1 year later on this battery of tasks. Duff and colleagues (2010a) incorporated not only baseline scores and demographic characteristics in their SRB equations to predict test scores at 1 year, but they additionally included test- and patient-specific short-term practice effects observed between baseline and the 1-week administration of these same tasks. Although the inclusion of short-term practice effects may add to the prediction of future cognitive performance for these commonly used tasks, as of yet, Duff et al.’s SRB equations have not been validated, including in the original publication. This represents an important gap in knowledge about these SRB prediction equations. Consequently, the aim of the current study was to evaluate the validity of these SRB prediction equations using a sample of community-dwelling older adults with either normal cognition or amnestic MCI, to allow for generalizability of Duff et al.’s prediction equations to populations at risk for developing AD later in life. It was hypothesized that because Duff et al.’s prediction equations incorporated diagnostic status (e.g., cognitively intact, MCI) into the complex-SRB calculations, the application of these SRBs would generalize to an independent sample of older adults with normal cognition or MCI. This was anticipated to result in the expected proportion of participants from the validation sample “declining”, remaining “stable”, and “improving” on these cognitive measures over 1 year. Such a result would suggest external support for these SRB prediction equations and would increase confidence in the diagnostic and prognostic value of these equations when tests are repeated over a 1-year interval.
Method
Participants
Table 1 reflects demographic characteristics of participants from Duff and colleagues’ (2010a) development sample and the current validation sample. For the current sample, 76 older adults were recruited from the community (e.g., senior centers and independent living facilities). Six participants withdrew over the course of the study, resulting in a final sample of 70 participants with 1-year test data. As will be described subsequently, 38 participants were classified as being cognitively intact, and 32 participants were classified as having amnestic MCI. Because Duff colleagues (2010a) development sample combined cognitively intact individuals and those with MCI together—and incorporated MCI diagnosis as a predictor in their SRB calculations—the current sample has been similarly pooled across diagnoses. The current pooled sample’s mean age was 74.8 (SD = 7.0, range = 65–90) years old and it averaged 15.4 (SD = 2.9, range = 8–20+) years of education. The sample of participants was predominantly female (74.3% female) and the majority of participants were Caucasian (98.5%). Premorbid intellect at baseline was average-to-high-average according to the Wide Range Achievement Test—fourth edition (WRAT-4; Wilkinson & Robertson, 2006) reading subtest (standard score: M = 110.1, SD = 10.4, range = 82–145). As can be observed in Table 1, on average, the sample displayed within expectation abilities for domains of immediate memory (M = 108.1, SD = 16.8, range = 57–152), visuospatial/constructional (M = 99.3, SD = 15.6, range = 45–126), language (M = 102.8, SD = 11.7, range = 75–127), attention (M = 102.4, SD = 14.7, range = 72–135), delayed memory (M = 103.7, SD = 15.0, range = 44–131), and total scale score (M = 104.9, SD = 15.1, range = 70–138) at baseline on the Repeatable Battery for the Assessment of Neuropsychological Status (RBANS; Randolph, 2012). Self-reported depression was generally low (M = 5.1, SD = 3.8, range = 0–18) according to the 30-item Geriatric Depression Scale (GDS; Yesavage et al., 1982).
Table 1.
Demographic characteristics of Duff et al.’s development sample and the current validation sample
| Variable | Duff et al.’s development sample | Current validation sample | |
|---|---|---|---|
| Mean (SD) | Mean (SD) | Range | |
| Total n | 127 | 70 | |
| n cognitively intact | 74 | 38 | |
| n with MCI | 53 | 32 | |
| Age (years) | 78.7 (7.8) | 74.8 (7.0) | 65–90 |
| Education (years) | 15.5 (2.5) | 15.4 (2.9) | 12–20+ |
| Gender (n) | |||
| Males | 24 | 18 | |
| Females | 103 | 52 | |
| Race (n) | |||
| Non-white/non-Caucasian | 0 | 1 | |
| White, non-Hispanic | 127 | 69 | |
| One-week test interval (days) | — | 7.6 (1.8) | 6–15 |
| One-year test interval (days) | — | 353.9 (33.3) | 294–471 |
| WRAT-3/4 premorbid intellect (SS) | 108.4 (6.0) | 110.1 (10.4) | 82–145 |
| RBANS indexes (SS) | |||
| Immediate memory | — | 108.1 (16.8) | 57–152 |
| Visuospatial/constructional | — | 99.3 (15.6) | 45–126 |
| Language | — | 102.8 (11.7) | 75–127 |
| Attention | — | 102.4 (14.7) | 72–135 |
| Delayed memory | — | 103.7 (15.0) | 44–131 |
| Total scale | — | 104.9 (15.1) | 70–138 |
| Geriatric depression scale | 4.0 (3.3) | 5.1 (3.8) | 0–18 |
Note: WRAT-3/4 premorbid intellect = wide range achievement test—third (Duff et al.’s sample) or fourth (current sample) edition reading subtest, RBANS = repeatable battery for the assessment of neuropsychological status, SS = standard score, MCI = mild cognitive impairment.
Of those six participants who withdrew from the current study, their mean age was 78.3 (SD = 9.2) years old and their mean education was 16.0 (SD = 2.8) years. They were 83.3% female and 100% Caucasian. Premorbid intellect was 114.0 on the WRAT-4 (SD = 8.4), and their performances ranged from the average to the high average range on the RBANS (Immediate Memory: M = 109.2, SD = 6.0; Visuospatial/Constructional: M = 95.2, SD = 19.2; Language: M = 113.3, SD = 13.1; Attention: M = 98.5, SD = 7.8; Delayed Memory: M = 100.7, SD = 8.3; and Total Scale score: M = 104.3, SD = 9.5). Diagnostically, two participants were classified as being cognitively intact and four participants were classified as having amnestic MCI. Self-reported depression on the 30-item GDS was low (M = 5.5, SD = 3.0). No significant differences were observed between participants who completed the study relative to those who withdrew for any variables pertaining to demographics (age, education, sex, and ethnicity), cognition (premorbid intellect, individual RBANS indexes, and diagnosis), or self-reported depression (ps > .01).
One-sample t-tests were used to compare continuous demographic variables (e.g., age, education) between Duff et al.’s development sample and the current validation sample, and one-sample chi-square analyses were conducted between the two samples to compare categorical demographic variables (e.g., sex and ethnicity). The current sample was younger than the development sample, t(69) = −4.65, p = .001, d = −1.12, but there were no differences in education, t(69) = −0.16, p = .87, d = −0.04, or WRAT premorbid intellect, t(69) = 1.41, p = .17, d = 0.28, between the samples. There were no differences in observed baseline performance between the development and validation samples for most measures in the repeated cognitive battery (BVMT-R Total Recall, BVMT-R Delayed Recall, SDMT, TMT-A, TMT-B, and COWAT; ps > .01), with the exception of HVLT-R Total Recall, t(69) = 5.71, p = .001, d = 1.38, and HVLT-R Delayed Recall, t(69) = 3.23, p = .002, d = 0.78. In both cases, observed baseline performances were better for the current validation sample than for Duff et al.’s development sample. There were no differences in the sex or ethnic distributions—or the proportion of participants with MCI—between the development and validation samples, χ2 (1) = 2.12, p = .15, Phi = 0.17, for sex, χ2 (1) = 0.13, p = .72, Phi = 0.04, for ethnicity, and, χ2 (1) = 0.46, p = .50, Phi = 0.08, for MCI status.
The classification procedure for participants in the current sample has been described previously (Duff et al., 2017; Duff, Dalley, Suhrie, & Hammers, 2019). Briefly, participants were classified as being cognitively intact versus having amnestic MCI based on criteria by Albert and colleagues (Albert et al., 2011) and Petersen (Petersen, 2004), which incorporated the report of a participant and a knowledgeable informant, and a previously administered baseline cognitive evaluation—including the RBANS. Cognitive impairment for a domain was defined as a performance below the 7th percentile (SD = −1.5) relative to premorbid intellect, meaning that a difference of 22.5 standard score (SS) points between WRAT performance and a relevant task performance was necessary for impairment. For example, if a participant’s WRAT performance was SS = 100, then she/he would have been classified as being impaired for a task if her/his task performance was below SS = 77.5. Conversely, if a participant’s WRAT performance was SS = 120, then she/he would have been classified as being impaired for a task if her/his task performance was below SS = 97.5. Inclusion criteria for all participants included being aged 65 years or older and functionally independent. Exclusion criteria included neurological conditions likely to affect cognition, dementia, major psychiatric condition, current severe depression, substance abuse, anti-convulsant or anti-psychotic medications, and residence in a skilled nursing or living facility.
Procedure
All procedures were approved by the local Institutional Review Board before the study commenced. All participants provided informed consent before completing any procedures. The following measures were administered at a baseline visit:
HVLT-R (Brandt & Benedict, 2001) is a verbal memory task with 12 words learned over three trials, with the correct words summed for the Total Recall score (range = 0–36). The Delayed Recall score is the number of correct words recalled after a 20–25-min delay (range = 0–12). For all HVLT-R scores, higher values indicate better performance.
BVMT-R (Benedict, 1997) is a visual memory task with six geometric designs in six locations on a card learned over three trials, with correct designs and locations summed for the Total Recall score (range = 0–36). The Delayed Recall score is the number of correct designs and locations recalled after a 20–25-min delay (range = 0–12). For all BVMT-R scores, higher values indicate better performance.
SDMT (Smith, 1973) is a divided attention and psychomotor speed task, with the number of correct responses in 90 s being the total score (range = 0–110), and higher values indicate better performance.
TMT-A and TMT-B (Reitan, 1992) are tests of visual scanning/processing speed and set shifting/complex mental flexibility, respectively. For each part, the score is the time to complete the task (range = 0–180 s for TMT-A, and range = 0–300 s for TMT-B). Higher values indicate poorer performance.
COWAT (Benton et al., 1994) is a measure of rule-based verbal fluency where participants are provided 1 min per trial to generate as many words as possible that begin with the letters C, F, and L (with each letter being a separate trial). The total score is the number of words generated across the three trials, minus the number of rule violations. Higher values indicate better performance.
WRAT-4 Reading subtest (Wilkinson & Robertson, 2006) is used as an estimate of premorbid intellect, in which an individual attempts to pronounce irregular words. The score is normalized to SSs (M = 100, SD = 15) relative to age-matched peers. Higher values indicate better performance.
RBANS (Randolph, 2012) is a neuropsychological test battery comprising 12 subtests that are used to calculate index scores for domains of immediate memory, visuospatial/constructional, attention, language, delayed memory, and global neuropsychological functioning. The index scores utilize age-corrected normative comparisons from the test manual to generate SSs (M = 100, SD = 15). Higher scores indicate better cognition.
The 30-item GDS (Yesavage et al., 1982) was used to assess self-reported depression. Higher scores indicated more self-reported depression.
After approximately 1 week (M = 7.6 days, SD = 1.8, range = 6–15), the HVLT-R, BVMT-R, SDMT, TMT-A, TMT-B, and COWAT (hereafter referred to as the “repeated cognitive battery”) were repeated to generate a 1-week practice effect value. The same form of each test was used to maximize 1-week practice effects. The repeated cognitive battery was additionally administered approximately 1-year later (M = 353.9 days, SD = 33.3, range = 294–471) in order to apply Duff and colleagues’ (2010a) SRBs, again using the same forms to maximize 1-year practice effects. The RBANS and WRAT-4 were only administered at baseline, and participants were classified as cognitively intact or MCI based on their performance on baseline scores from all tests. The GDS was also administered at baseline. With the exception of SDMT possessing one missing value, no other variables of interest possessed missing data.
Analyses
One-week practice effects across the sample
Following Duff and colleagues’ (2010a) methodology, the current study calculated 1-week practice effects for each measure in the repeated cognitive battery (HVLT-R, BVMT-R, SDMT, TMT-A, TMT-B, and COWAT) by subtracting the observed baseline score from the observed 1-week score (1-week practice effects = observed 1-week − observed baseline). These 1-week practice effects will subsequently be used in the application of Duff et al.’s prediction equations. One-sample t-tests were used to compare 1-week practice effects between Duff et al.’s development sample and the current validation sample.
Traditional pairwise baseline versus 1-year analyses across the sample
Pair-wise t-tests were conducted to compare observed baseline and observed 1-year scores (i.e., comparison of T1 vs. T2 scores) for each of the repeated measures in the repeated cognitive battery (HVLT-R, BVMT-R, SDMT, TMT-A, TMT-B, and COWAT). These analyses were undertaken to approximate a traditional evaluation of change over time without controlling for practice effects or participant variables and will establish a “basic change score” relative to baseline (as compared to change relative to SRB prediction). Please note, although the authors acknowledge that the 1-year administration of the repeated cognitive battery represented the third test administration for these participants (not their second), they maintained the nomenclature of T2 for the follow-up timepoint of interest for consistency with the remainder of the SRB literature.
SRB analyses across the sample
Previously published SRB prediction equations for each of the measures in the repeated cognitive battery were applied to the current sample’s observed baseline and observed 1-year scores (see Table 2 for Duff and colleagues (2010a)) SRB equations). As has been described previously (Duff and colleagues’ 2010a), the SRB prediction algorithms were calculated from a development sample using stepwise multiple-regression analyses to maximize the prediction of performance for each repeated measure in the cognitive battery. Demographic variables (e.g., age, education, and sex), diagnosis (e.g., MCI), baseline test score, 1-week practice effects, and retest interval were used to predict the respective test score at 1-year follow-up.
Table 2.
Regression equations for predicting 1-year follow-up scores from Duff et al.’s (2010) development sample
| Predicted 1-year follow-up | R 2 | SEest | Observed baseline | PE | |
|---|---|---|---|---|---|
| Hopkins verbal learning test—revised | |||||
| Total recall | 5.08 + (T1*0.77) + (PE*0.27) | .44 | 4.60 | 23.6 (5.5) | 4.3 (4.0) |
| Delayed recall | 7.09 + (T1*0.82) + (PE*0.47) − (age*0.08) | .54 | 2.41 | 6.9 (3.4) | 2.1 (2.4) |
| Brief visuospatial memory test—revised | |||||
| Total recall | 2.76 + (T1*0.78) + (PE*0.32) − (MCI*2.88) | .66 | 4.61 | 14.8 (6.9) | 8.5 (5.4) |
| Delayed recall | 0.75 + (T1*0.83) + (PE*0.37) | .61 | 2.05 | 5.7 (3.4) | 2.4 (2.2) |
| Symbol digit modality test | 5.87 + (T1*0.81) + (PE*0.66) | .47 | 8.48 | 40.6 (12.4) | 1.7 (10.1) |
| Trail making test | |||||
| Part A | 4.41 + (T1*0.97) + (PE*0.65) | .42 | 14.65 | 43.6 (15.6) | −4.0 (9.9) |
| Part B | −12.16 + (T1*1.00) + (PE*0.51) + (sex*22.27) | .52 | 46.55 | 113.3 (52.9) | −12.6 (38.1) |
| Controlled oral word association test | 22.41 + (T1*0.85) + (PE*0.51) − (age*0.21) | .58 | 8.11 | 38.9 (10.8) | 1.0 (7.3) |
Notes: All scores are raw scores. Age is in years, MCI is coded as NO (cognitively intact) = 0 and YES (MCI) = 1, and sex is coded as male = 0 and female = 1. PE = the 1-week practice effect observed for the respective task calculated by the equation (PE = observed 1 week score − observed baseline score). R2 = squared value of Pearson’s correlation coefficient for baseline and 1-year score, SEest = standard error of the estimate. To calculate the predicted 1-year follow-up score, use the formula in the column titled “Predicted 1-year follow-up”. To calculate the reliable change score, use (observed 1-year follow-up − predicted 1-year follow-up)/SEest.
Following the application of these SRB prediction equations to the current sample’s baseline repeated cognitive battery scores and relevant demographic and testing characteristics, a z score was calculated for each measure in the repeated cognitive battery. The z scores reflect a normalized deviation of change for an individual participant and are calculated using the equation, z = (T2 − T2’)/SEest), where the difference between the observed 1-year score (T2) and the predicted 1-year score (T2’) is divided by the SEest. Although some debate exists about the proper standard error estimate for use in reliable change methods (Hinton-Bayre, 2010), we have previously shown the equivalence of the two most-common estimates and provided support for use of the SEest (Hammers & Duff, 2019). The z scores were calculated by comparing the observed 1-year score (T2) to the predicted 1-year score (T2’), in order to examine how Duff et al.’s SRBs applied to the validation sample. The individual z scores were then averaged to determine the mean deviation across the sample for each measure in the repeated cognitive battery, with z scores then compared to expectation (z = 0) based on the normal distribution of z scores using a one-sample t test. As a reminder, a negative z score indicates that the observed follow-up score was smaller than the predicted follow-up score—suggesting a reliable “decline” over 1 year, whereas a positive z score indicates that the observed follow-up score was larger than the predicted follow-up score—suggesting a reliable “improvement” over 1 year.
SRB analyses incorporating distributions of individual performance
To determine if the observed distribution of participants’ performance deviated significantly from the expected distribution based on the normal distribution of z scores, the resultant z scores were additionally trichotomized into “decline” (z score < −1.645), “stable” (z score falling between ±1.645), or “improve” (z score > 1.645) for all measures in the repeated cognitive battery, with the exception of TMT-A and TMT-B that used reversed scoring. As indicated previously, if the z scores were normally distributed, then it would be expected that 5% of participants show “declines,” 90% would show “stability”, and 5% would reflect “improvement” (Hammers, Suhrie, Dixon, Porter, & Duff, 2020). Following this trichotomization, individual one-sample chi-square analyses were conducted for each measure in the repeated cognitive battery to compare the observed distribution of individual performance to expectation.
Additionally, to determine the predictive accuracy of Duff and colleagues’ (2010a) equations, the percentage of individuals’ predicted 1-year scores falling within ±0.5 SD, ±1.0 SD, ±1.5 SD, and ±2.0 SD of the observed 1-year scores was calculated for each measure in the repeated cognitive battery. SD values were calculated from the sample’s observed 1-year score separately for the respective measures.
Measures of effect size were expressed throughout as Cohen’s d values for continuous data, and Phi coefficients for categorical data. Given the number of comparisons in the current study, a two-tailed alpha level was set at .01 for all statistical analyses.
Results
One-Week Practice Effects Across the Sample
The current study calculated 1-week practice effects for each measure in the repeated cognitive battery (HVLT-R, BVMT-R, SDMT, TMT-A, TMT-B, and COWAT) by subtracting the observed baseline score from the observed 1-week score (1-week practice effects = observed 1-Week − observed baseline). There were no differences in the magnitude of 1-week practice effects observed between the development and validation samples for nearly all measures in the repeated cognitive battery (ps > .01), with the exception of HVLT-R Total Recall, t(69) = −2.73, p = .008, d = −0.66. For this score, the observed 1-week practice effect was smaller for the current validation sample than for Duff et al.’s development sample.
Traditional Pairwise Baseline Versus 1-Year Analyses Across the Sample
Change over time was first assessed using a traditional method of comparing observed baseline and observed 1-year follow-up scores for each of the repeated measures in the cognitive battery (HVLT-R, BVMT-R, SDMT, TMT-A, TMT-B, and COWAT; see Table 3 for means and SDs) in this sample of community-dwelling older adults. No significant differences were observed for the any of the measures, HVLT-R Total Recall, t(69) = −0.34, p = .74, d = −0.08, HVLT-R Delayed Recall, t(69) = −0.60, p = .55, d = −0.14, BVMT-R Total Recall, t(69) = −0.78, p = .44, d = −0.19, BVMT-R Delayed Recall, t(69) = 0.63, p = .53, d = 0.15, SDMT, t(68) = −0.10, p = .92, d = −0.02, TMT-A, t(69) = −0.86, p = .39, d = −0.21, TMT-B, t(69) = 0.71, p = .48, d = 0.17, or COWAT, t(69) = −1.21, p = .23, d = −0.29.
Table 3.
Baseline, observed, and predicted year-week cognitive scores, standardized z scores, and p values for difference from expectation (z = 0) based on the normal distribution of z scores in the validation sample of cognitively intact and MCI participants
| Observed baseline | Observed 1-year | Predicted 1-year | PE value | z score | p value | |
|---|---|---|---|---|---|---|
| Hopkins verbal learning test—revised | ||||||
| Total recall | 27.1 (5.2) | 27.3 (5.1) | 26.9 (3.8) | 3.4 (2.7) | 0.09 (0.7) | .32 |
| Delayed recall | 8.4 (3.9) | 8.6 (3.4) | 8.9 (2.3) | 2.0 (2.9) | −0.13 (1.1) | .31 |
| Brief visuospatial memory test—revised | ||||||
| Total recall | 15.2 (7.5) | 15.7 (7.6) | 16.1 (6.6) | 8.7 (4.1) | −0.08 (1.1) | .56 |
| Delayed recall | 6.5 (3.3) | 6.4 (3.2) | 6.9 (2.4) | 2.1 (1.9) | −0.29 (1.0) | .02 |
| Symbol digit modality test | 39.6 (9.0) | 39.7 (10.1) | 39.5 (7.9) | 2.2 (4.6) | 0.03 (0.6) | .68 |
| Trail making test | ||||||
| Part A | 43.4 (16.8) | 45.6 (23.7) | 48.7 (19.5) | 3.5 (26.5) | −0.22 (1.5) | .23 |
| Part B | 111.9 (54.7) | 108.2 (56.0) | 109.2 (49.5) | −13.9 (31.9) | −0.02 (0.8) | .83 |
| Controlled oral word association test | 36.6 (10.6) | 37.9 (13.3) | 38.9 (9.3) | 2.1 (6.6) | −0.12 (1.0) | .34 |
Notes: PE = 1-week practice effects. p value = significance of one-sample t tests examining whether z scores differed from expectation (z = 0) based on the normal distribution of z scores.
SRB Analyses Across the Sample
Duff and colleagues’ (2010a) SRB prediction equations for each of the repeated measures in the cognitive battery were applied to the current sample, resulting in the calculation of a z score for each measure in the repeated battery. When using one-sample t-tests to compare z scores for each repeated cognitive measure to expectation (z = 0) based on the normal distribution of z scores (Table 3), no significant differences were observed for any of the measures, HVLT-R Total Recall, t(69) = 0.10, p = .32, d = 0.02, HVLT-R Delayed Recall, t(69) = −1.02, p = .31, d = −0.25, BVMT-R Total Recall, t(69) = −0.59, p = .56, d = 0.14, BVMT-R Delayed Recall, t(69) = −2.43, p = .02, d = −0.59, SDMT, t(68) = 0.42, p = .68, d = 0.10, TMT-A, t(69) = −1.21, p = .23, d = −0.29, TMT-B, t(69) = −0.21, p = .83, d = −0.05, or COWAT, t(69) = −0.97, p = .34, d = −0.23.
SRB Analyses Incorporating Individual Distributions of Performance
Next, we examined the distribution of individual older adults that “declined” (z score < −1.645 for HVLT-R, BVMT-R, SDMT, and COWAT; z score > 1.645 for TMT-A and TMT-B), remained “stable” (z score falling between ±1.645), or “improved” (z score > 1.645 for HVLT-R, BVMT-R, and SDMT; z score < −1.645 for TMT-A and TMT-B) between baseline and 1-year administrations of the repeated cognitive battery for each z score calculated. The majority of participants exhibited the expected level of improvement (90.6% of participants; see Table 4). A significant difference in performance distribution was seen relative to expectation for BVMT-R Delayed Recall, χ2 (2) = 13.11, p = .001, Phi = .43. Specifically, greater proportions of individuals “declined” than expected based on normal distributions for this task (14% of participants “declined”). No significant differences in distributions from anticipation were observed for the measures of HVLT-R Total Recall, χ2 (2) = 3.97, p = .14, Phi = .24, HVLT-R Delayed Recall, χ2 (2) = 6.57, p = .04, Phi = .31, BVMT-R Total Recall, χ2 (2) = 0.78, p = .68, Phi = .11, SDMT, χ2 (2) = 7.667, p = .02, Phi = .33, TMT-A, χ2 (2) = 0.16, p = .92, Phi = .05, TMT-B, χ2 (2) = 2.44, p = .30, Phi = .19, or COWAT, χ2 (2) = 0.16, p = .92, Phi = .05.
Table 4.
Percentage of sample that “declined”, remained “stable”, or “improved” when applying Duff et al.’s (2010a) SRB methodology
| Decline | Stable | Improvement | |
|---|---|---|---|
| Hopkins verbal learning test—revised | |||
| Total recall | 1 | 98 | 1 |
| Delayed recall | 11 | 86 | 3 |
| Brief visuospatial memory test—revised | |||
| Total recall | 7 | 87 | 6 |
| Delayed recall | 14 | 83 | 3 |
| Symbol digit modality test | 0 | 99 | 1 |
| Trail making test | |||
| Part A | 6 | 88 | 6 |
| Part B | 7 | 92 | 1 |
| Controlled oral word association | 4 | 92 | 4 |
Notes: p value = significance of chi square tests between observed distribution and expected distribution based on the normal curve distribution of z scores (5% display “declines”, 90% display stability, 5% display “improvements”).
Finally, Table 5 displays the percentages of individuals whose predicted 1-year scores fell within ±0.5 SD, ±1.0 SD, ±1.5 SD, and ±2.0 SD of the observed 1-year scores. As can be seen in the table, between 54.3 and 73.9% of participants’ predicted 1-year scores were within ±0.5 SD of the observed 1-year scores across all measures in the repeated cognitive battery. Between 82.6% and 100.0% of participants’ predicted 1-year scores were within ±1.0 SD, between 92.8% and 100.0% of participants’ predicted 1-year scores were within ±1.5 SD, and 95.7% and 100.0% of participants’ predicted 1-year scores were within ±2.0 SD of the observed 1-year scores. For the individual measures, SDMT was consistently the most accurate (73.9%, 100.0%, 100.0%, and 100.0%, respectively) measure in the repeated cognitive battery, with all other measures displaying comparable relative accuracy across the four SD accuracy metrics.
Table 5.
Predictive accuracy of change formulas based on the validation sample of cognitively intact and MCI participants
| % within ±0.5 SD | % within ±1.0 SD | % within ±1.5 SD | % within ±2.0 SD | |
|---|---|---|---|---|
| Hopkins verbal learning test—revised | ||||
| Total recall | 54.3 | 90.0 | 97.2 | 100.0 |
| Delayed recall | 64.3 | 84.3 | 95.7 | 98.6 |
| Brief visuospatial memory test—revised | ||||
| Total recall | 64.3 | 87.2 | 92.8 | 98.6 |
| Delayed recall | 67.1 | 82.6 | 97.1 | 100.0 |
| Symbol digit modality test | 73.9 | 100.0 | 100.0 | 100.0 |
| Trail making test | ||||
| Part A | 67.1 | 88.6 | 94.2 | 95.7 |
| Part B | 58.6 | 88.6 | 94.3 | 100.0 |
| Controlled oral word association test | 62.9 | 91.4 | 97.1 | 100.0 |
Note: SD values for the respective measures were taken from the observed 1-year column in Table 3.
Discussion
The current study sought to examine the validity of previously published 1-year SRB-predicted difference equations (Duff and colleagues, 2010a) for commonly administered cognitive measures (HVLT-R, BVMT-R, SDMT, TMT-A, TMT-B, and COWAT) in an independent sample of community-dwelling older adults with either normal cognition or MCI. Measures were assessed at baseline, at 1 week (to calculate 1-week practice effects), and at approximately 1 year. To our knowledge, this is the first study to apply these algorithms to an external sample of participants, and permits the opportunity to examine whether Duff et al.’s equations can accurately predict 1-year performance in an independent set of individuals.
For our current validation sample, no change in performance was observed over 1 year when using the traditional method of comparing observed test scores at baseline and 1-year across any of the measures administered in the repeated cognitive battery. This finding is consistent with Calamia and colleagues’ (Calamia et al., 2012) previous work suggesting that practice effects are likely small in older adults. Specifically, in a meta-analysis of 349 studies assessing practice effects across a variety of neuropsychological tests, the authors observed that performance on a cognitive task at T2 was expected to improve, on average, by only 0.242 of a SD from baseline performance for a 40-year-old cognitively-intact individual with a 1-year retest interval. After factoring in adjustments for our sample age (mean of 74.8 years old), cognitive domain (verbal memory), and diagnosis (normal cognition and MCI), our sample was expected to improve on average between 0.038 and 0.096 of a SD on the HVLT-R Total Recall and Delayed Recall. Given that the SD of the HVLT-R Total Recall and Delayed Recall scores in our current sample averaged 5 points (Table 3), we should have observed only very small improvements in cognition between baseline and 1-year (0.19–0.48 points). These values coincide with our observation of a nonsignificant improvement of only 0.2 points for our sample across 1 year on these measures.
Similarly, when applying (Duff and colleagues’ 2010a) SRB prediction equations to baseline performance on these measures, our sample’s level of observed performance was consistently within expectation of predictions across nearly all measures administered when incorporating 1-week practice effects into the set of predictors. First, when considering group-level analysis, no significant differences were observed between predicted 1-year scores and observed 1-year scores (i.e., z scores) for our validation sample (see Table 3). Second, when examining the distributions of individual participants in our sample that displayed “declines”, “stability, or “improvements”, only BVMT-R Delayed Recall reflected significantly greater proportions of individuals “declining” than was anticipated based on the normal distribution of z scores. As seen in Table 4, few participants possessed “declines” or “improvements” beyond expectation of normal distributions, with on average 90.6% of the individual participants displaying the expected performance. This is consistent with the properties of the normal curve for a 90% CI at α = .10, which indicate that 90% of participants should remain “stable”, whereas 5% should display “declines” and another 5% should display “improvement” (Hammers, Suhrie, Dixon, Porter, & Duff, 2020). This consistency with expectation suggests that Duff et al.’s SRB equations predicted the T2 performance of the current sample accurately, as do the high concordance between individual distributions of predicted and observed 1-year scores. Specifically, we found that the majority of individuals’ predicted 1-year scores in the validation sample fell within ±0.5 SD of the observed 1-year scores across the measures in the repeated cognitive battery (54.3%–73.9%), with consistently higher percentages of accuracy when considering 1.0 SD, 1.5 SD, and 2.0 SD (see Table 5). These percentages are generally consistent with previous research (Duff et al., 2004) observing accuracy of 57.6%–88.2% for SRB equations predicting indexes on the RBANS within 5 SS points (approximately 0.33 SD), and accuracy of 85.7%–97.3% within 10 SS points (approximately 0.67 SD). The fact that the current sample displayed slightly lower accuracy ratings than Duff and colleagues (2004) may be accounted for the domains assessed, and that the current study examined accuracy across individual measures, as compared to composite indexes like in Duff and colleagues (2004).
The ability of Duff and colleagues’ (2010a) SRBs to predict performance at 1 Year for the repeated measures in the repeated battery occurred despite some minor differences between the development and current samples. Specifically, the current sample was younger in age and displayed better baseline performances on the HVLT-R Total Recall and HVLT-R Delayed Recall measures than Duff and colleagues’ (2010a) development sample, which would have suggested the potential for the current sample to benefit from prior exposure of test materials to a greater extent given the previously documented links between greater practice effects and younger age (Calamia et al., 2012; Salthouse, 2010), and greater practice effects and stronger baseline performance (“the rich get richer”; Rapport et al., 1997a; Rapport, Brines, Axelrod, & Theisen, 1997b. Instead, the sample observed equivalent or smaller (for HVLT-R Total Recall) 1-week practice effects than the development sample, and otherwise the samples appeared to be generally comparable for baseline performance, education, premorbid intellect, sex, ethnicity, and diagnostic make-up (normal cognition vs. MCI). This general comparability in between samples on these relevant variables likely explains the ability of Duff et al.’s SRB prediction equations to predict T2 performance in our sample with accuracy, and these results support the validity of these SRB prediction algorithms to predict the presence or absence of clinically meaningful change over 1 year.
Of note, 14% of participants “declined” on the BVMT-R Delayed Recall measure when applying the full model of Duff et al.’s SRB equations, which was beyond expectation. This result is surprising because this visual memory task has previously shown to be especially susceptible to benefit from repeated exposure (Duff et al., 2008; Duff et al., 2011; Duff et al., 2014; Hammers et al., 2020), which coincides with Calamia and colleagues (2012) meta-analysis showing that the largest positive practice effects observed for a cognitive domain across tasks was for visual memory. However, as seen in Table 3, the BVMT-R Delayed Memory task still improved by roughly 2 points between baseline and 1-week (PE value = 2.1, or ~1 SD improvement), suggesting that although this task may be susceptible to short-term practice effects, the impact of repeated exposure over a 1-year interval was minimal. Instead, BVMT-R Delayed Recall has also been shown to possess a strong relationship with hippocampal atrophy early in the development of cognitive decline (Duff et al., 2018), which would support our observation of greater declines than anticipated on this task for participants early in the AD continuum (i.e., those cognitively normal or with MCI).
The current study is not without limitations. First, these findings are specific to the cognitive measures administered in this battery over retest intervals of 1 week and 1 year; therefore, it is not necessarily assumed that the results can generalize to other measures of the same cognitive domains, different retest intervals, or use of alternative forms (see the description of Calamia et al., 2012 earlier). Second, these results may not generalize to more heterogeneous participants in regard to premorbid functioning, sex, education, and ethnicity. In particular, only one participant in the current sample was non-Caucasian, and therefore, it is unknown how these SRB equations predict 1-year cognitive performance in ethnic minority populations. Future research should consider generalizability of Duff and colleagues (2010a) prediction equations in samples that are not mostly female, Caucasian, and well-educated. Third, the current results do not provide information on cognitive prediction in other disease states (AD, Fronto-temporal dementia), and therefore, future work is recommended to examine the validity of Duff et al.’s prediction equations across a variety of neurodegenerative conditions. Relatedly, the current study did not include subanalyses for the subsets of cognitively intact and MCI participants in the current sample. We felt that this was justified because (i) Duff et al.’s development sample also included a pooled cognitively intact and MCI sample, and (ii) Duff et al.’s SRB prediction algorithms incorporated diagnostic state as a predictor. It was not felt, however, that separating the sample would have altered our conclusions. For example, although not reported in the Results, no differences were observed between cognitively intact and MCI participants for important findings in the current study like the proportions of individuals “declining”, remaining “stable”, and “improving” (BVMT-R Delayed Recall displayed significantly more “decliners” than expectation for both diagnostic conditions). Fifth, the current study used a simple difference method to calculate 1-week practice effects (i.e., 1-week score—baseline score), despite more sophisticated methods for calculating practice effects being available (e.g., standard deviation index or RCI; Iverson, 2001). This was done for consistency with the original Duff et al. SRB equations; however, future work should consider incorporating more advanced practice effect indexes if ever modifying these equations. Similarly, the current study also required clinicians/researchers to make a conditional “diagnosis” at baseline in order to classify individuals as cognitively intact or MCI and subsequently use the appropriate diagnostic beta weight (see Table 2) when applying Duff and colleagues’ (2010a)) equations. Re-calculation of these prediction equations with a “clean” cognitively intact sample is encouraged in future work. Finally, as previously suggested in Duff and colleagues (Duff and colleagues, 2010a), all participants in the current sample underwent repeated assessment of this neuropsychological battery both 1-week after baseline and after 1 year. As a result, these participants are likely not reflective of “typical” patients being evaluated in a neuropsychological clinic who may undergo repeat cognitive testing across 1-year. Inclusion of these participants was necessary to consider validation of Duff et al.’s published SRB equations; however, future studies including participants not being exposed to the test stimuli at 1-week would be necessary to consider the utility of these equations in typical clinical settings. Despite these limitations, however, the results—which validate Duff and colleagues (2010a) SRB equations—will permit clinicians and researchers to have more confidence when predicting cognitive performance on these measures over 1 year.
Contributor Information
Dustin B Hammers, Department of Neurology, Center for Alzheimer’s Care, Imaging, and Research, University of Utah, Salt Lake City, UT, USA; Center on Aging, University of Utah, Salt Lake City, UT, USA.
Sariah Porter, Department of Neurology, Center for Alzheimer’s Care, Imaging, and Research, University of Utah, Salt Lake City, UT, USA.
Ava Dixon, Department of Neurology, Center for Alzheimer’s Care, Imaging, and Research, University of Utah, Salt Lake City, UT, USA.
Kayla R Suhrie, Department of Neurology, Center for Alzheimer’s Care, Imaging, and Research, University of Utah, Salt Lake City, UT, USA.
Kevin Duff, Department of Neurology, Center for Alzheimer’s Care, Imaging, and Research, University of Utah, Salt Lake City, UT, USA; Center on Aging, University of Utah, Salt Lake City, UT, USA.
Funding
The project described was supported by research grants from the National Institutes on Aging (K23 AG028417 and 5R01 AG055428). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute on Aging or the National Institutes of Health.
Conflict of Interest
None declared.
References
- Albert M. S., DeKosky S. T., Dickson D., Dubois B., Feldman H. H., Fox N. C., et al. (2011). The diagnosis of mild cognitive impairment due to Alzheimer’s disease: Recommendations from the National Institute on Aging-Alzheimer’s association workgroups on diagnostic guidelines for Alzheimer's disease. The Journal of the Alzheimer's Association, 7(3), 270–279. doi: 10.1016/j.jalz.2011.03.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Attix D. K., Story T. J., Chelune G. J., Ball J. D., Stutts M. L., Hart R. P., et al. (2009). The prediction of change: Normative neuropsychological trajectories. The Clinical Neuropsychologist, 23(1), 21–38. doi: 10.1080/13854040801945078. [DOI] [PubMed] [Google Scholar]
- Bartels C., Wegrzyn M., Wiedl A., Ackermann V., & Ehrenreich H. (2010). Practice effects in healthy adults: A longitudinal study on frequent repetitive cognitive testing. BMC Neuroscience, 11, 118 Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/20846444. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Barth J. T., Pliskin N., Axelrod B., Faust D., Fisher J., Harley J. P., et al. (2003). Introduction to the NAN 2001 definition of a clinical neuropsychologist. NAN policy and planning committee. Archives of Clinical Neuropsychology, 18(5), 551–555. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/14591449. [PubMed] [Google Scholar]
- Beglinger L. J., Gaydos B., Tangphao-Daniels O., Duff K., Kareken D. A., Crawford J., et al. (2005). Practice effects and the use of alternate forms in serial neuropsychological testing. Archives of Clinical Neuropsychology, 20(4), 517–529. doi: 10.1016/j.acn.2004.12.003. [DOI] [PubMed] [Google Scholar]
- Benedict R. (1997). Brief visuospatial memory test—revised: Professional manual. Lutz, FL: Psychological Assessment Resources, Inc. [Google Scholar]
- Benton A. L., Hamsher K., Rey G. L., & Sivan A. B. (1994). Multilingual aphasia examination (3rd ed.). Iowa City, IA: AJA Associates. [Google Scholar]
- Blasi S., Zehnder A. E., Berres M., Taylor K. I., Spiegel R., & Monsch A. U. (2009). Norms for change in episodic memory as a prerequisite for the diagnosis of mild cognitive impairment (MCI). Neuropsychology, 23(2), 189–200. doi: 10.1037/a0014079. [DOI] [PubMed] [Google Scholar]
- Brandt J., & Benedict R. (2001). Hopkins verbal learning test—revised. Odessa, FL: PAR. [DOI] [PubMed] [Google Scholar]
- Calamia M., Markon K., & Tranel D. (2012). Scoring higher the second time around: Meta-analyses of practice effects in neuropsychological assessment. The Clinical Neuropsychologist, 26(4), 543–570. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/22540222. [DOI] [PubMed] [Google Scholar]
- Chelune G., & Duff K. (2012). The assessment of change: Serial assessments in dementia evaluation. New York, NY: Springer. [Google Scholar]
- Chelune G., Naugle R., Lüders H., Sedlak J., & Awad I. (1993). Individual change following epilepsy surgery: Practice effects and base-rate information. Neuropscyhology, 1, 41–52. [Google Scholar]
- Crawford J. R., & Garthwaite P. H. (2007). Using regression equations built from summary data in the neuropsychological assessment of the individual case. Neuropsychology, 21(5), 611–620. doi: 10.1037/0894-4105.21.5.611. [DOI] [PubMed] [Google Scholar]
- Crockford C., Newton J., Lonergan K., Madden C., Mays I., O'Sullivan M., et al. (2018). Measuring reliable change in cognition using the Edinburgh cognitive and behavioural ALS screen (ECAS). Amyotroph Lateral Scler Frontotemporal Degener, 19(1–2), 65–73. doi: 10.1080/21678421.2017.1407794. [DOI] [PMC free article] [PubMed] [Google Scholar]
- das Nair R., Cogger H., Worthington E., & Lincoln N. B. (2016). Cognitive rehabilitation for memory deficits after stroke. Cochrane Database of Systematic Reviews, 9, CD002293. doi: 10.1002/14651858.CD002293.pub3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duff K., Anderson J. S., Mallik A. K., Suhrie K. R., Atkinson T. J., Dalley B. C. A., et al. (2018). Short-term repeat cognitive testing and its relationship to hippocampal volumes in older adults. Journal of Clinical Neuroscience, 57, 121–125. doi: 10.1016/j.jocn.2018.08.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duff K., Atkinson T. J., Suhrie K. R., Dalley B. C., Schaefer S. Y., & Hammers D. B. (2017). Short-term practice effects in mild cognitive impairment: Evaluating different methods of change. Journal of Clinical and Experimental Neuropsychology, 39(4), 396–407. doi: 10.1080/13803395.2016.1230596. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duff K., Beglinger L. J., Moser D. J., Paulsen J. S., Schultz S. K., & Arndt S. (2010a). Predicting cognitive change in older adults: The relative contribution of practice effects. Archives of Clinical Neuropsychology, 25(2), 81–88. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/20064816. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duff K., Beglinger L. J., Moser D. J., Schultz S. K., & Paulsen J. S. (2010b). Practice effects and outcome of cognitive training: Preliminary evidence from a memory training course. The American Journal of Geriatric Psychiatry, 18(1), 91 Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/20104658. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duff K., Beglinger L. J., Schultz S. K., Moser D. J., McCaffrey R. J., Haase R. F., et al. (2007). Practice effects in the prediction of long-term cognitive outcome in three patient samples: A novel prognostic index. Archives of Clinical Neuropsychology, 22(1), 15–24. doi: 10.1016/j.acn.2006.08.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duff K., Beglinger L. J., Van Der Heiden S., Moser D. J., Arndt S., Schultz S. K., et al. (2008). Short-term practice effects in amnestic mild cognitive impairment: Implications for diagnosis and treatment. International Psychogeriatrics, 20(5), 986–999. doi: 10.1017/S1041610208007254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duff K., Dalley B. C. A., Suhrie K. R., & Hammers D. B. (2019). Predicting premorbid scores on the repeatable battery for the assessment of neuropsychological status and their validation in an elderly sample. Archives of Clinical Neuropsychology, 34(3), 395–402. doi: 10.1093/arclin/acy050. [DOI] [PubMed] [Google Scholar]
- Duff K., Foster N. L., & Hoffman J. M. (2014). Practice effects and amyloid deposition: Preliminary data on a method for enriching samples in clinical trials. Alzheimer Disease and Associated Disorders, 28(3), 247–252. doi: 10.1097/WAD.0000000000000021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duff K., Lyketsos C. G., Beglinger L. J., Chelune G., Moser D. J., Arndt S., et al. (2011). Practice effects predict cognitive outcome in amnestic mild cognitive impairment. The American Journal of Geriatric Psychiatry, 19(11), 932–939. doi: 10.1097/JGP.0b013e318209dd3a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duff K., Schoenberg M. R., Patton D., Mold J., Scott J. G., & Adams R. L. (2004). Predicting change with the RBANS in a community dwelling elderly sample. Journal of the International Neuropsychological Society, 10(6), 828–834. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/15637773. [DOI] [PubMed] [Google Scholar]
- Galvin J. E., Powlishta K. K., Wilkins K., McKeel D. W. Jr., Xiong C., Grant E., et al. (2005). Predictors of preclinical Alzheimer disease and dementia: A clinicopathologic study. Archives of Neurology, 62(5), 758–765. doi: 10.1001/archneur.62.5.758. [DOI] [PubMed] [Google Scholar]
- Garrett K. D., Browndyke J. N., Whelihan W., Paul R. H., DiCarlo M., Moser D. J., et al. (2004). The neuropsychological profile of vascular cognitive impairment--no dementia: Comparisons to patients at risk for cerebrovascular disease and vascular dementia. Archives of Clinical Neuropsychology, 19(6), 745–757. doi: 10.1016/j.acn.2003.09.008. [DOI] [PubMed] [Google Scholar]
- Gavett B. E., Ashendorf L., & Gurnani A. S. (2015). Reliable change on neuropsychological tests in the uniform data set. Journal of the International Neuropsychological Society, 21(7), 558–567. doi: 10.1017/S1355617715000582. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hammers D., Duff K., & Chelune G. (2015). Assessing change of cognitive trajectories over time in later life In Pachana N. A., & Laidlaw K. (Eds.), Oxford handbook of clinical geropsychology. Oxford: Oxford University Press. [Google Scholar]
- Hammers D. B., & Duff K. (2019). Application of different standard error estimates in reliable change methods. Archives of Clinical Neuropsychology, acz054. doi: 10.1093/arclin/acz054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hammers D. B., Suhrie K. R., Dixon A., Porter S., & Duff K. (2020). Reliable change in cognition over 1 week in community-dwelling older adults: A validation and extension study. Archives of Clinical Neuropsychology, acz076. doi: 10.1093/arclin/acz076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hammers D. B., Suhrie K. R., Dixon A., Porter S., & Duff K. (2020). Validation of one-week reliable change methods in cognitively intact community-dwelling older adults. Aging, Neuropsychology, and Cognition, 1–21. doi: 10.1080/13825585.2020.1787942. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hassenstab J., Ruvolo D., Jasielec M., Xiong C., Grant E., & Morris J. C. (2015). Absence of practice effects in preclinical Alzheimer's disease. Neuropsychology, 29(6), 940–948. doi: 10.1037/neu0000208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heilbronner R. L., Sweet J. J., Attix D. K., Krull K. R., Henry G. K., & Hart R. P. (2010). Official position of the American Academy of clinical neuropsychology on serial neuropsychological assessments: The utility and challenges of repeat test administrations in clinical and forensic contexts. The Clinical Neuropsychologist, 24(8), 1267–1278. doi: 10.1080/13854046.2010.526785. [DOI] [PubMed] [Google Scholar]
- Hinton-Bayre A. D. (2010). Deriving reliable change statistics from test-retest normative data: Comparison of models and mathematical expressions. Archives of Clinical Neuropsychology, 25(3), 244–256. doi: 10.1093/arclin/acq008. [DOI] [PubMed] [Google Scholar]
- Hinton-Bayre A. D. (2016). Clarifying discrepancies in responsiveness between reliable change indices. Archives of Clinical Neuropsychology, 31(7), 754–768. doi: 10.1093/arclin/acw064. [DOI] [PubMed] [Google Scholar]
- Iverson G. L. (2001). Interpreting change on the WAIS-III/WMS-III in clinical samples. Archives of Clinical Neuropsychology, 16(2), 183–191. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/14590186. [PubMed] [Google Scholar]
- Jacobson N. S., & Truax P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59(1), 12–19. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/2002127. [DOI] [PubMed] [Google Scholar]
- Lezak M., Howieson D., Bigler E., & Tranel D. (2012). Neuropsychological assessment (5th ed.). New York: Oxford University Press. [Google Scholar]
- Maassen G. H., Bossema E. R., & Brand N. (2006). Reliable change assessment with practice effects in sport concussion research: A comment on Hinton-Bayre. British Journal of Sports Medicine, 40(10), 829–833. doi: 10.1136/bjsm.2005.023713. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Machulda M. M., Pankratz V. S., Christianson T. J., Ivnik R. J., Mielke M. M., Roberts R. O., et al. (2013). Practice effects and longitudinal cognitive change in normal aging vs. incident mild cognitive impairment and dementia in the Mayo Clinic study of aging. The Clinical Neuropsychologist, 27(8), 1247–1264. doi: 10.1080/13854046.2013.836567. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marley C. J., Sinnott A., Hall J. E., Morris-Stiff G., Woodsford P. V., Lewis M. H., et al. (2017). Failure to account for practice effects leads to clinical misinterpretation of cognitive outcome following carotid endarterectomy. Physiological Reports, 5(11), e13264. doi: 10.14814/phy2.13264. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matarazzo J. D., & Herman D. O. (1984). Base rate data for the WAIS-R: Test-retest stability and VIQ-PIQ differences. Journal of Clinical Neuropsychology, 6(4), 351–366. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/6501578. [DOI] [PubMed] [Google Scholar]
- McKhann G. M., Knopman D. S., Chertkow H., Hyman B. T., Jack C. R. Jr., Kawas C. H., et al. (2011). The diagnosis of dementia due to Alzheimer's disease: Recommendations from the National Institute on Aging-Alzheimer's Association workgroups on diagnostic guidelines for Alzheimer's disease. Alzheimers Dement, 7(3), 263–269. doi: 10.1016/j.jalz.2011.03.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McSweeny A., Naugle R. I., Chelune G. J., & Luders H. (1993). T-scores for change: An illustration of a regression approach to depicting change in clinical neuropsychology. The Clinical Neuropsychologist, 7, 300–312. [Google Scholar]
- Mormino E. C., Betensky R. A., Hedden T., Schultz A. P., Amariglio R. E., Rentz D. M., et al. (2014). Synergistic effect of beta-amyloid and neurodegeneration on cognitive decline in clinically normal individuals. JAMA Neurology, 71(11), 1379–1385. doi: 10.1001/jamaneurol.2014.2031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Petersen R. C. (2004). Mild cognitive impairment as a diagnostic entity. Journal of Internal Medicine, 256(3), 183–194. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/15324362. [DOI] [PubMed] [Google Scholar]
- Ragin A. B., Storey P., Cohen B. A., Edelman R. R., & Epstein L. G. (2004a). Disease burden in HIV-associated cognitive impairment: A study of whole-brain imaging measures. Neurology, 63(12), 2293–2297. doi: 10.1212/01.wnl.0000147477.44791.bd. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ragin A. B., Storey P., Cohen B. A., Epstein L. G., & Edelman R. R. (2004b). Whole brain diffusion tensor imaging in HIV-associated cognitive impairment. AJNR. American Journal of Neuroradiology, 25(2), 195–200. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/14970017. [PMC free article] [PubMed] [Google Scholar]
- Randolph C. (2012). Repeatable battery for the assessment of neuropsychological status. Bloomington, MN: The Psychological Corporation. [Google Scholar]
- Rapport L. J., Axelrod B. N., Theisen M. E., Brines D. B., Kalechstein A. D., & Ricker J. H. (1997a). Relationship of IQ to verbal learning and memory: Test and retest. Journal of Clinical and Experimental Neuropsychology, 19(5), 655–666. doi: 10.1080/01688639708403751. [DOI] [PubMed] [Google Scholar]
- Rapport L. J., Brines D., Axelrod B., & Theisen M. E. (1997b). Full scale IQ as mediator of practice effects: The rich get richer. The Clinical Neuropsychologist, 11(4), 375–380. [Google Scholar]
- Reitan R. (1992). Trail making test: Manual for administration and scoring. Tucson, AZ: Reitan Neuropsychology Laboratory. [Google Scholar]
- Rinehardt E., Duff K., Schoenberg M., Mattingly M., Bharucha K., & Scott J. (2010). Cognitive change on the repeatable battery of neuropsychological status (RBANS) in Parkinson's disease with and without bilateral subthalamic nucleus deep brain stimulation surgery. The Clinical Neuropsychologist, 24(8), 1339–1354. doi: 10.1080/13854046.2010.521770. [DOI] [PubMed] [Google Scholar]
- Salthouse T. A. (2010). Influence of age on practice effects in longitudinal neurocognitive change. Neuropsychology, 24(5), 563–572. doi: 10.1037/a0019026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sanchez-Benavides G., Pena-Casanova J., Casals-Coll M., Gramunt N., Manero R. M., Puig-Pijoan A., et al. (2016). One-year reference norms of cognitive change in Spanish old adults: Data from the neuronorma sample. Archives of Clinical Neuropsychology, 31(4), 378–388. doi: 10.1093/arclin/acw018. [DOI] [PubMed] [Google Scholar]
- Sloan R. S., & Pressler S. J. (2009). Cognitive deficits in heart failure: Re-cognition of vulnerability as a strange new world. The Journal of Cardiovascular Nursing, 24(3), 241–248. doi: 10.1097/JCN.0b013e3181a00284. [DOI] [PubMed] [Google Scholar]
- Smith A. (1973). Symbol digit modalities test. Los Angeles, CA: Western Psychological Services. [Google Scholar]
- Stein J., Luppa M., Brahler E., Konig H. H., & Riedel-Heller S. G. (2010). The assessment of changes in cognitive functioning: Reliable change indices for neuropsychological instruments in the elderly—a systematic review. Dementia and Geriatric Cognitive Disorders, 29(3), 275–286. doi: 10.1159/000289779. [DOI] [PubMed] [Google Scholar]
- Wilkinson G. S., & Robertson G. J. (2006). WRAT 4: Wide Range Achievement Test, professional manual. Lutz, FL: Psychological Assessment Resources, Inc. [Google Scholar]
- Winblad B., Palmer K., Kivipelto M., Jelic V., Fratiglioni L., Wahlund L. O., et al. (2004). Mild cognitive impairment--beyond controversies, towards a consensus: Report of the international working group on mild cognitive impairment. Journal of Internal Medicine, 256(3), 240–246. doi: 10.1111/j.1365-2796.2004.01380.x. [DOI] [PubMed] [Google Scholar]
- Yesavage J. A., Brink T. L., Rose T. L., Lum O., Huang V., Adey M., et al. (1982). Development and validation of a geriatric depression screening scale: A preliminary report. Journal of Psychiatric Research, 17(1), 37–49. doi: 10.1016/0022-3956(82)90033-4. [DOI] [PubMed] [Google Scholar]
