Abstract
Objective
To examine how the duration of time delay between Wechsler Memory Scale (WMS) Logical Memory I and Logical Memory II (LM) affected participants’ recall performance.
Method
There are 46,146 total Logical Memory administrations to participants diagnosed with either Alzheimer's disease (AD), vascular dementia (VaD), or normal cognition in the National Alzheimer's Disease Coordinating Center's Uniform Data Set.
Results
Only 50% of the sample was administered the standard 20–35 min of delay as specified by WMS-R and WMS-III. We found a significant effect of delay time duration on proportion of information retained for the VaD group compared to its control group, which remained after adding LMI raw score as a covariate. There was poorer retention of information with longer delay for this group. This association was not as strong for the AD and cognitively normal groups. A 24.5-min delay was most optimal for differentiating AD from VaD participants (47.7% classification accuracy), an 18.5-min delay was most optimal for differentiating AD versus normal participants (51.7% classification accuracy), and a 22.5-min delay was most optimal for differentiating VaD versus normal participants (52.9% classification accuracy).
Conclusions
Considering diagnostic implications, our findings suggest that test administration should incorporate precise tracking of delay periods. We recommend a 20-min delay with 18–25-min range. Poor classification accuracy based on LM data alone is a reminder that story memory performance is only one piece of data that contributes to complex clinical decisions. However, strict adherence to the recommended range yields optimal data for diagnostic decisions.
Keywords: Sensitivity, Specificity, Non-standard test administration, Training
Introduction
Consistency in the administration of neuropsychological tests is critical to obtaining valid and reliable findings. The results of these tests provide specific information about individuals’ strengths, weaknesses and the nature of their cognitive deficits. Any deviation from standard test administration may present problems in the reliability and validity of the test results (Lee, Reynolds, & Willson, 2008), which can present potentially significant problems given the role of these measures in contributing to differential diagnoses and treatment recommendations. There is a paucity of research evaluating the effects of non-standard administrations on patients’ performances.
Test administration errors occur in the provision of instruction, presentation of test items, and in the scoring of measures. The speed, time, and type of presentation may affect the performance on neuropsychological tests. For example, speeded or slowed presentation of stimuli on tests such as Digit Span have been found to significantly affect performance (Baddeley & Lewis, 1984; Hagen, Durham, & Shannon, 1977). Baddeley and Lewis (1984) found that individuals recalled more digits during Digit Span with rapid presentation if retrieval occurs within 1–2 s following presentation. Hagen and colleagues (1977) found better performances on Digit Span when voice inflection is dropped on the last digit. Shum, Murray and Eadie (1997) assessed the affect of speed of story memory presentation on performance by evaluating differences in participant performance on the Wechsler Memory Scale-Revised (WMS-R) Logical Memory at three speeds (slow, medium, and fast). The researchers found clinically and statistically significant differences in test scores when the story was presented at the slow speed versus the other speeds. More explicit instructions and verbalization during problem solving seem to correlate with improved performance on measures such as WAIS Digit Symbol and Wisconsin Card Sorting Test (Dillon, 1981; Joncas & Standing, 1998; Perry, Potterat, & Braff, 2001). Therefore, across various neuropsychological tests, deviations in standard administration can have a significant affect on test scores. However, some deviations do not seem to affect performance. For example, performance was not significantly affected by variations in delay periods (i.e., 15, 30, 45, and 60 min) on the Rey-Osterrieth Complex Figure (Berry & Carpenter, 1992).
The WMS is one of the most frequently utilized tests amongst neuropsychologists in the United States (Rabin, Barr, & Burton, 2005). In fact, a brief search of “Logical Memory” in Google Scholar provides a return of 15,000+ articles in the last 5 years alone. The assessment of memory functioning, often using measures such as the WMS that incorporate a delay paradigm, is an integral component of the evaluation of cognitive deficits and decline in patients. Weintraub et al. (2009) found no significant differences in the affect of delay interval on number of WMS Logical Memory units recalled in a population of cognitively normal participants; however, cognitively impaired participants were not included in the sample. Given that standardized administration is assumed when interpreting neuropsychological data, the current study aims to examine the extent to which older adults (>65 years) were administered the WMS Logical Memory Delayed subtest (LMII) within the standard 20- to 35-min delay used in this data set. Additionally, this study aims to examine the degree to which the delay duration affected proportion of information retained across the delay.
Methods
For the current study, we obtained archival data from the National Alzheimer's Coordinating Center's (NACC) Uniform Data Set (UDS; Weintraub et al., 2009). Between 2005 and 2014, each participant within the UDS underwent a standardized neurodiagnostic evaluation and received a consensus diagnosis from trained clinicians. The WMS-R and WMS-III Logical Memory Immediate (LMI) and Delayed (LMII) subtests (Harcourt Assessment, Inc., 1987) were administered to 21,376 participants with either probable Alzheimer's disease (AD), probable vascular dementia (VaD), or normal cognition at an initial evaluation, with a total of 46,146 Logical Memory administrations after accounting for follow-up evaluations. After selecting first visit cases and excluding those without a valid WMS administration (i.e., not administered during testing, data available for only one of the two LM subtests) and delay time recording, the remaining 65% of participant administrations included adults (66–95 years) with probable AD (n = 5,995; 43.2% of the remaining sample), probable VaD (n = 224; 1.6%), and normal cognition (n = 7,657; 55.2%).
Within the full sample of participants (N = 46,146), we first examined the percentage of participants who were administered the WMS LM subtests according to the standard administration guideline of 20- to 35-min delay between LMI and LMII. Next, using only the first visit cases, we quantified proportion of retained information for each participant by dividing units of information recalled during LMII by units learned during LMI. The proportion of information retained at LMII had extreme scores (N = 3,590; 25.87%) related to a higher raw score at LMII compared to LMI. Such extreme scores were replaced with a proportion retained score of 1.1 to capture the essence of better LMII compared to LMI recall while limiting effects of outliers on analyses.
Due to significant differences in age, gender, and years of education across the AD, VaD, and cognitively normal groups, two separate random samples of the cognitively normal group were demographically matched to the AD and VaD groups respectively. We then performed separate one-way ANOVAs to compare the effect of delay time duration on proportion of story information retained (PR) during LMII for both AD versus cognitively normal and VaD versus cognitively normal groups. We confirmed these findings via additional general linear model analyses controlling for each participant's performance on LMI (Table 1). Finally, receiver operating characteristics (ROC) curve analyses were utilized to determine the optimal time delay point for accurately differentiating AD from normal controls, VaD from normal controls, and AD from VaD participants.
Table 1.
Descriptive and frequency statistics for all study groups
| Sex (% Female) | Prob. AD | AD Control | Prob. VaD | VaD Control |
|---|---|---|---|---|
| N = 5,995 | N = 2,480 | N = 224 | N = 2,205 | |
| 54.3% | 55.0% | 50.9% | 53.2% | |
| Age (SD) | 77.8 (6.4) | 77.5 (6.4) | 78.2 (6.6) | 78.2 (6.3) |
| Years of education (SD) | 14.3 (3.6) | 14.3 (2.9) | 13.8 (3.8) | 14.1 (3.1) |
| LMI raw score (SD) | 4.7 (3.9) | 12.5 (3.9) | 7.6 (4.8) | 12.5 (3.8) |
| LMII raw score (SD) | 2.4 (3.5) | 11.2 (4.2) | 5.7 (5.3) | 11.1 (4.2) |
| Delay time duration in minutes (SD) | 21.4 (6.3) | 21.3 (6.6) | 21.9 (6.4) | 21.39 (6.6) |
| Proportion of information retained | 35.3% | 87.4% | 58.3% | 87.3% |
Results
Of the 41,146 administrations, delay durations between LMI and LMII ranged from 1 to 60 min. 49.7% of the LMII administrations fell within the standardized delay duration. Females made up approximately 50% of the participants in all study groups (53.3% AD, 55.0% AD Controls, 50.9% VaD, 53.2% VaD Controls). The average age of participants in each group was approximately 78 years of age with average education at approximately 14%. Study groups did not have equality of variance on logical memory measures; therefore, inferential statistics were reported with equal variances not assumed. Participants in the AD group differed significantly from normal controls (n = 2,480) in terms of LMI raw score (t(8,473) = −83.65, p < .001), LMII raw score (t(3,986) = −99.89, p < .001) and proportion of information retained on LMII (t(8,392) = −63.60, p < .001), with no between-group difference in delay duration (t(4,395) = .62, p = .533). VaD participants also differed significantly from controls (n = 2,205) with regard to LMI (t(252) = −14.29, p < .001), LMII (t(252) = −14.63, p < .001), and proportion of information retained (t(232) = −10.68, p < .001), with no between-group differences in delay duration (t(2,427) = 1.15, p = .252).
Separate plots of the results from our general linear model analyses are provided in the figure. The one-way ANOVA for AD versus normal controls evidenced no significant group X delay interaction (F(36, 8,387) = 1.124, p = .240). Our one-way ANOVA for the VaD versus normal controls evidenced a significant group X delay interaction (F(30, 2,357) = 3.514, p = <.001), indicating that the effect of delay duration on proportion of information retained at LMII differs between VaD participants and cognitively normal controls. After controlling for LMI performance, follow-up analyses continued to show significant group X delay interactions for VaD versus control (F(30, 2,356) = 2.982, p = < .001). To further clarify this difference in effect of delay duration on proportion of information retained, we employed separate bivariate correlations between delay duration and proportion of information for each study group. The VaD group (Fig. 1) evidenced the strongest negative relationship with Pearson's correlation coefficient r = −.266 (p < .001). The VaD control group's r was −.130 (p < .001). Together with the ANOVA, this pattern suggests that the VaD group's performance (i.e., proportion recalled) is more negatively affected by longer delay duration compared to healthy controls. For the AD group, the r was −.105 (p < .001) in comparison to its control group's r of −.121 (p < .001). This is consistent with a lack of interaction between group and delay, and suggests that AD group's performance (i.e., proportion recalled) is less strongly influenced by delay duration compared to healthy controls’ performance.
Fig. 1.
Graphical display of differences between AD, VaD, and control participants in effect of delay time duration on proportion of LMI information retained during LMII.
ROC curve analysis for discriminating AD participants from normal controls evidenced an 18.5-min delay between LM I and II as the most optimal cut point (area under the curve, AUC = 51.7), with 65.3% sensitivity and 39.3% specificity. Further ROC curve analysis revealed a 22.5-min delay time cut point for accurately differentiating VaD participants from controls (AUC = 52.9), with 40.2% sensitivity and 69.6% specificity. A final ROC curve analysis evidenced a 24.5-min cut point as most optimal for distinguishing AD from VaD participants (AUC = 47.7), with 28.7% sensitivity and 62.9% specificity. A table of likelihood ratios associated with different delay durations and a figure of ROC curves are available online (see Supplementary Materials online).
Discussion
Findings indicate that delay duration has an effect on recall performance for some patients. From the findings of both our general linear models and correlation analyses, probable VaD participants were most significantly affected by the variability in delay time duration, followed by normal controls, then AD participants. Approximately 50% of the administrations of the LMII included in this database were not administered the standardized WMS LM delay.
There is a traditional view that AD and VaD have different learning, memory, and cognitive profiles. Specifically, the prototypical memory profile in individuals with VaD is characterized by relatively intact initial learning curve; however, acquisition of new information is generally depressed relative to cognitively normal individuals. Individuals with VaD typically demonstrate impaired memory recall following a delay period but are aided by recognition cueing paradigms (Tierney et al., 2001). In contrast, individuals with AD typically exhibit limited acquisition of new information and rapid forgetting characterized by significant impairment recalling previously presented information following a delay, with limited assistance from recognition cueing (Cullum & Liff, 2014; Xie et al., 2010). It is expected that an individual who has a “pure” VaD will perform better on memory tests and more poorly on executive function tests than individuals with “pure” AD (Desmond, 2004; Graham, Emery, & Hodges, 2004; Reed, et al., 2007; Weintraub, Wicklund, & Salmon, 2012). In contrast, individuals with AD perform poorly on episodic memory tests in all modalities (i.e., contextualized and decontextualized verbal memory, visual memory, etc.) (Graham et al., 2004; Reed, et al., 2007; Weintraub et al., 2012). However, complicating this distinction is the more recent controversy in the field with regard to mixed etiology, as AD and VaD can be comorbid within the same individual (Gorelick et al., 2011).
Because of the complexity and possibility for comorbid pathophysiology, clinical neuropsychologists rely on a battery of tests across major cognitive domains in addition to learning and memory. When WMS LM is used alone, our ROC curve analysis indicated that, even at the most optimal time delay points, classification accuracy for differentiating among AD, VaD, and cognitively normal groups ranged around 50% across all comparisons (i.e., AD vs. normal, VaD vs. normal, AD vs. VaD). The clinical utility of the WMS LM recall alone to effectively differentiate AD from VaD is limited and should always be used in combination with other neuropsychological tests.
In examining memory performance across different time delays, we found that individuals with probable VaD were more penalized by longer delay period than individuals with probable AD. This may be because the VaD group initially learned more information in the immediate memory learning trial than the AD group, and therefore also had a greater proportion of information to remember. The AD group was less affected by the longer delay periods likely because participants in these groups had only learned a limited amount of information in the initial learning trial; thus there was a floor effect of amount of information to lose over a delay period.
Interestingly, upon further examination of data, earlier time delay points (i.e., ≤15 min) yielded higher sensitivity for accurately detecting AD and VaD participants, whereas longer delay duration (i.e., ≥35 min) resulted in more accurate identification of cognitively normal subjects. Unfortunately, when patients present for evaluation, clinicians do not know their diagnosis and must choose a delay period that maximizes the predictive power of the test regardless of the patient's diagnosis. Based on the present findings, we recommend that clinicians aim for a 20-min delay with a range of 18–25 min when using WMS-R or WMS-III, which is a slight deviation from the 20–35 min of standardized range specified by these WMS manuals. This recommendation is based on our ROC analyses and visual inspection of the figure for the data between 18 and 25 min of duration delay. In the AD group, the regression line between 18 and 25 min appeared flat, suggesting that low proportion recall does not vary significantly as a function of delay duration. In the VaD group, although the regression line between 18 and 25 min appeared to have a slight downward slope, the proportion recall does not yet dip very low. We glean from these observations that within a restricted range of 18–25 min of delay duration, the proportion recall performance of VaD participants may be least confounded by the delay duration effect.
The findings presented in this study are used to elucidate the importance of standardized test administration and precise tracking of delay periods. There may be many reasons for non-standardized administration of tests given the propensity for human error as well as the variety of presenting problems in patients undergoing testing (e.g., patient agitation requiring extended breaks or patient fatigue necessitating a truncated examination) that may necessitate adjustments in test administration. Results from non-standardized administration can still be useful as important information can be gathered through observation of how an individual approaches a task. There is some support for using results from non-standardized administration in a qualitative manner. For instance, the Boston process approach pioneered by Dr. Edith Kaplan emphasizes taking a process-oriented approach to neuropsychological assessment by considering unique patient and testing factors when interpreting neuropsychological tests. However, considering neuropsychologists’ increasingly prominent role in interdisciplinary teams and medical decision-making, it is crucial to ensure the most accurate administration possible. Although errors occur, it is important to explore strategies for reducing the frequency of errors. Extensive training in test administration, including in vivo observation of new testing technicians and use of alarms and/or time logs to keep track of time delays, may help to reduce the frequency of administration errors. Test redundancy, or the use of multiple measures to examine each cognitive domain, would also be beneficial in mitigating the effects of administration errors and bolstering conclusions based on test findings.
One notable limitation of the applicability of current findings is that the data came from WMS-R and WMS-III. The WMS-R LM has slight administration differences from the current version although the stories have remained the same. Specifically, the adjusted administration of the WMS logical memory subtest used in this data set calls for only Story A to be read. Given that most practitioners will include Story B in their administration, direct applicability of present findings to later versions of the WMS should be cautioned.
Supplementary Material
Funding
The NACC database is funded by NIA/NIH Grant U01 AG016976. NACC data are contributed by the NIA funded ADCs: P30 AG019610 (PI Eric Reiman, MD), P30 AG013846 (PI Neil Kowall, MD), P50 AG008702 (PI Scott Small, MD), P50 AG025688 (PI Allan Levey, MD, PhD), P50 AG047266 (PI Todd Golde, MD, PhD), P30 AG010133 (PI Andrew Saykin, PsyD), P50 AG005146 (PI Marilyn Albert, PhD), P50 AG005134 (PI Bradley Hyman, MD, PhD), P50 AG016574 (PI Ronald Petersen, MD, PhD), P50 AG005138 (PI Mary Sano, PhD), P30 AG008051 (PI Steven Ferris, PhD), P30 AG013854 (PI M. Marsel Mesulam, MD), P30 AG008017 (PI Jeffrey Kaye, MD), P30 AG010161 (PI David Bennett, MD), P50 AG047366 (PI Victor Henderson, MD, MS), P30 AG010129 (PI Charles DeCarli, MD), P50 AG016573 (PI Frank LaFerla, PhD), P50 AG016570 (PI Marie-Francoise Chesselet, MD, PhD), P50 AG005131 (PI Douglas Galasko, MD), P50 AG023501 (PI Bruce Miller, MD), P30 AG035982 (PI Russell Swerdlow, MD), P30 AG028383 (PI Linda Van Eldik, PhD), P30 AG010124 (PI John Trojanowski, MD, PhD), P50 AG005133 (PI Oscar Lopez, MD), P50 AG005142 (PI Helena Chui, MD), P30 AG012300 (PI Roger Rosenberg, MD), P50 AG005136 (PI Thomas Montine, MD, PhD), P50 AG033514 (PI Sanjay Asthana, MD, FRCP), P50 AG005681 (PI John Morris, MD), and P50 AG047270 (PI Stephen Strittmatter, MD, PhD)
Supplementary Material
Supplementary material are available at Archives of Clinical Neuropsychology online.
Conflict of Interest
None declared.
References
- Baddeley A., & Lewis V. (1984). When does rapid presentation enhance digit span. Bulletin of the Psychonomic Society, 22, 403–405. doi:10.3758/BF03333858. [Google Scholar]
- Berry D. T., & Carpenter G. S. (1992). Effect of four different delay periods on recall of the Rey-Osterrieth Complex Figure by older persons. The Clinical Neuropsychologist, 6, 80–84. doi:10.1080/13854049208404119. [Google Scholar]
- Cullum C. M., & Liff C. D. (2014). Mild cognitive impairment and Alzheimer disease In Stucky K. J., Kirkwood M. W., & Donders J. (Eds.). Neuropsychology study guide & board review (pp. 448–465). New York: Oxford University Press. [Google Scholar]
- Desmond D. W. (2004). The neuropsychology of vascular cognitive impairment: Is there a specific cognitive deficit. Journal of the Neurological Sciences, 226, 3–7. [DOI] [PubMed] [Google Scholar]
- Dillon R. F. (1981). Analogical reasoning under different methods of test administration. Applied Psychological Measurement, 5, 341–347. doi:10.1177/014662168100500307. [Google Scholar]
- Gorelick P. B., Scuteri A., Black S. E., DeCarli C., Greenberg S. M., Iadecola C., et al. (2011). Vascular contributions to cognitive impairment and dementia: A statement for Heathcare Professionals from the American Heart Association/American Stroke Association. Stoke, 42, 2672–2713. doi:10.1161/STR.0b013e3182299496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Graham N. L., Emery T., & Hodges J. R. (2004). Distinctive cognitive profiles in Alzheimer's disease and subcortical vascular dementia. Journal of Neurology, Neurosurgery & Psychiatry, 75, 61–71. [PMC free article] [PubMed] [Google Scholar]
- Hagen R. L., Durham T., & Shannon D. (1977). Administration of Digit Span in the Wechsler and Binet: Differences that matter. Journal of Clinical Psychology, 33, 480–482. doi:10.1300/J151v03n03_0. [Google Scholar]
- Joncas J., & Standing L. (1998). How much do accurate instructions raise scores on a timed test. Perceptual and Motor Skills, 86, 1257–1258. doi:10.2466/pms.1998.86.3c.1257. [Google Scholar]
- Lee D., Reynolds C. R., & Willson V. L. (2008). Standardized test administration: Why bother. Journal of Forensic Neuropsychology, 3, 55–81. doi:10.1300/J151v03n03_04. [Google Scholar]
- Perry W., Potterat E. G., & Braff D. L. (2001). Self-monitoring enhances Wisconsin Card Sorting Test performance in patients with schizophrenia: Performance is improved by simply asking patients to verbalize their sorting strategy. Journal of International Neuropsychological Society, 7, 344–352. [DOI] [PubMed] [Google Scholar]
- Rabin L. A., Barr W. B., & Burton L. A. (2005). Assessment practices of clinical neuropsychologists in the United States and Canada: A survey of INS, NAN, and APA Division 40 members. Archives of Clinical Neuropsychology, 20, 33–65. doi:10.1016/j.acn.2004.02.005. [DOI] [PubMed] [Google Scholar]
- Reed B. R., Mungas D. M., Kramer J. H., Ellis W., Vinters H. V., Zarow C., et al. (2007). Profiles of neuropsychological impairment in autopsy-defined Alzheimer's disease and cerebrovascular disease. Brain, 130, 731–739. [DOI] [PubMed] [Google Scholar]
- Shum D. H., Murray R. A., & Eadie K. (1997). Effect of speed of presentation on administration of the Logical Memory subtest of the Wechsler Memory Scale-Revised. The Clinical Neuropsychologist, 11, 188–191. doi:10.1080/13854049708407049. [Google Scholar]
- Tierney M. C., Black S. E., Szalai J. P., Snow W. G., Fisher R. H., Nadon G., et al. (2001). Recognition memory and verbal fluency differentiate probable Alzheimer disease from subcortical ischemic vascular dementia. Archives of Neurology, 58, 1654–1659. [DOI] [PubMed] [Google Scholar]
- Weintraub S., Salmon D., Mercaldo N., Ferris S., Graff-Radford N. R., Chui H., et al. (2009). The Alzheimer's disease centers’ uniform data set (UDS): The neuropsychological test battery. Alzheimer Disease and Associated Disorders, 23, 91 doi:10.1097/WAD.0b013e318191c7dd. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weintraub S., Wicklund A. H., & Salmon D. P. (2012). The neuropsychological profile of Alzheimer disease. Cold Spring Harbor Perspectives in Medicine, 2, a006171. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xie S. X., Libon D. J., Wang X., Massimo L., Moore P., Vesely L., et al. (2010). Longitudinal patterns of semantic and episodic memory in frototemporal lobar degeneration and Alzheimer's disease. Journal of the International Neuropsychological Society, 16, 278–286. DOI:http://dx.doi.org/10.1017/S1355617709991317. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

