Skip to main content
Alzheimer's & Dementia logoLink to Alzheimer's & Dementia
. 2025 Sep 5;21(9):e70644. doi: 10.1002/alz.70644

Practice effects on digital cognitive assessment tools: insights from the defense automated neurobehavioral assessment battery

Matteo Bellitti 1, Meagan V Lauber 1,2, Edward Searls 3, Honghuang Lin 4, Rhoda Au 1,3,5,6,7,8, Vijaya B Kolachalama 1,5,9,
PMCID: PMC12412751  PMID: 40911708

Abstract

INTRODUCTION

Digital cognitive assessments offer a promising approach to monitoring cognitive impairments, but repeated use can introduce practice effects, potentially masking changes in cognitive status. We evaluated practice effects using the Defense Automated Neurobehavioral Assessment (DANA), a digital battery designed for cognitive monitoring.

METHODS

We analyzed data from 116 participants from the Boston University Alzheimer's Disease Research Center, comparing response times across two DANA sessions, around 90 days apart, while controlling for cognitive status, sex, age, and education.

RESULTS

Modest practice effects were found, and cognitive impairment was associated with slower response times in several tasks. Classification models, including logistic regression and random forest classification, achieved accuracies of up to 71% in assessing cognitive status.

DISCUSSION

Our study establishes a framework for evaluating practice effects in digital cognitive assessment tools. Future work should expand the sample size and diversity to enhance the generalizability of findings in broader clinical contexts.

Highlights

  • We systematically evaluated practice effects using the DANA battery as a case study.

  • Modest practice effects were observed across two testing sessions, with a median inter‐session interval of 93 days.

  • Cognitive impairment was significantly associated with slower response times in key tasks (p < 0.001).

  • Our framework offers a systematic approach for evaluating practice effects in digital cognitive tools.

Keywords: digital cognitive assessments, practice effects, response time

1. BACKGROUND

Digital cognitive assessments have the potential to transform early detection and monitoring of cognitive impairments by offering frequent remote administration, standardized testing conditions, and increased sensitivity to subtle cognitive changes. 1 , 2 Recent studies have shown that some digital assessments perform as well as or better than traditional neuropsychological tests, 3 , 4 , 5 , 6 , 7 with the potential to offer earlier detection of cognitive decline, due to their administration in a participant's natural environment across multiple sessions. 8 , 9 , 10 , 11 However, the repeated and frequent administration of digital assessments can increase their susceptibility to practice effects, 12 , 13 improvements in test performance due to familiarity with the tasks rather than genuine cognitive gains.

Practice effects arise from multidimensional cognitive processes, including procedural learning and task familiarity, which occur with repeated exposure to cognitive assessments. 14 They are commonly measured by either calculating improvements in task accuracy or reductions in response time (RT) from one session to the next. 15 , 16 , 17 , 18 , 19 Some studies have shown that practice effects can obscure early cognitive decline, 14 , 20 , 21 leading to false‐negative results in at‐risk populations, where detection of subtle changes is critical. Therefore, it is important to evaluate the impact of practice effects on digital assessments and design tools that are robust in the context of monitoring cognitive status.

One such digital tool, the Defense Automated Neurobehavioral Assessment (DANA), originally developed for military applications, 22 is a flexible digital battery currently being evaluated for its utility in detecting cognitive decline. 23 , 24 DANA's mobile format and diverse cognitive tasks make it suitable for frequent and remote use, offering a promising option for cognitive monitoring in clinical and non‐clinical populations. It measures reaction time on tasks involving key cognitive domains, such as attention, memory, and visuospatial processing, through the following six tasks: Simple Response Time (SRT), Procedural Reaction Time (PRT), Go/No‐Go (GNG), Match‐to‐Sample (MTS), Spatial Processing (SP), and Code Substitution (CS). SRT evaluates response speed by requiring participants to tap a target as quickly as possible when it appears. PRT assesses decision‐making and executive function by measuring accuracy, RT, and impulsivity in choice‐based tasks. GNG evaluates sustained attention and impulsivity, while MTS tests short‐term memory and visuospatial discrimination. SP assesses visuospatial analytic ability. CS is a polyfactorial cognitive assessment, measuring attention, learning, visual scanning, and memory domains. Despite its advantages, the susceptibility of DANA, like other digital cognitive assessments, to practice effects in longitudinal monitoring has not yet been rigorously tested.

In this study, we used DANA as a template to evaluate the impact of practice effects. By examining repeated use of DANA, particularly in the context of monitoring cognitive status, we sought to identify how practice effects influence performance over time. Our findings offer insights into how digital cognitive assessments can be evaluated to distinguish true changes in cognitive status from improvements due to test familiarity, supporting the development of more accurate and reliable tools for monitoring cognition in at‐risk populations.

RESEARCH IN CONTEXT

  1. Systematic review: Digital cognitive assessments have the potential to detect early and monitor cognitive decline, particularly in at‐risk populations. Despite their promise, repeated administration can lead to practice effects, which may mask early cognitive impairments. Studies have shown that practice effects can complicate longitudinal cognitive monitoring by improving performance due to task familiarity rather than genuine cognitive improvements. Few studies have systematically evaluated the susceptibility of digital cognitive assessments to practice effects over time.

  2. Interpretation: We evaluated the presence of practice effects in the DANA battery, a widely used cognitive assessment tool. We observed a modest improvement in response time improvement (0% to 4.2%) across two repeated sessions, suggesting that DANA may be relatively insensitive to practice effects while remaining sensitive to cognitive impairments.

  3. Future directions: Although promising, our results are based on a limited and homogeneous participant sample. Future research should focus on expanding the participant pool to include more diverse populations and extended follow‐up periods. Additional testing is needed to determine whether DANA remains reliable over long‐term monitoring and to refine models for controlling practice effects in broader clinical applications.

2. METHODS

2.1. Data sources and participant selection

This study is ancillary to the Boston University Alzheimer's Disease Research Center (BUADRC), which referred the study participants and collected the necessary demographic and clinical data: Each participant completed an annual evaluation comprising a clinical interview, neurological examination, neuropsychological testing, and measures of functional independence. Digital data were acquired in collaboration with Linus Health, which provided the software platform to administer the digital cognitive assessments. 25 , 26

Participants were included in the study if they were 40 years of age or older, spoke English, and completed two separate sessions of the full six‐task DANA battery on their smartphone. Although familiarity with digital devices was not explicitly measured or used as an inclusion criterion, touchscreen device ownership was considered a reasonable proxy. Exclusion criteria included pregnancy, presence of implanted devices, and insufficient visual acuity. These exclusion criteria, which may seem not particularly relevant for a study involving a digital application, were chosen in an excess of caution to protect against any possible interference from digital technologies used by the study participants. Consequently, the resulting study population had = 116 participants, whose demographic characteristics are summarized in Table 1. The six DANA tasks were presented in two separate blocks – (CS, GNG, SRT) and (MTS, PRT, SP). Study participants were asked to complete both blocks in a remote unsupervised setting and complete them again after 90 days had elapsed from the initial completion of all tasks. To accommodate participants' schedules, tasks remained available for 6 weeks after each scheduled test date and did not need to be completed on the same day. Each participant received a reminder (via phone call) to complete the task 2 weeks before each of the two scheduled test dates and were given the opportunity to report any technical difficulty. On and after each scheduled date, the participants received weekly email reminders to complete the tasks until either they did so, or 6 weeks had elapsed. In this design, the target interval between the two sessions was 90 days; however, the allowable range was between 48 and 132 days. The observed median time between the two sessions is 93 days, and the interquartile range is 90 to 99 days. The software platform measures “response time” as the elapsed time in milliseconds between stimulus presentation and the participant's interaction with the screen during task completion across two sessions. The RT is recorded only after ignoring a task–specific recommended number of “warm‐up” trials (SRT:5, CS:4, SP:10, PRT:10, GNG:5, MTS:3). 22

TABLE 1.

Study population.

N Percentage (%)
Total participants 116
Race
White 100 86.2
Black or African American 11 9.5
Asian 4 3.4
Other 1 0.9
Sex
Female 71 61
Male 45 39
Education
Above college (16 years) 57 49
College (12 to 16 years) 54 47
High school or less (12 years) 5 4
Impaired
No 84 72
Yes 32 28
Age (years)
Median 70
IQR 62–75

Note: The table presents the distribution of key demographic factors (sex, race, education) and cognitive status, along with median age and interquartile range (IQR). Percentages are reported for categorical variables.

Demographic and clinical data were collected following the National Alzheimer's Coordinating Center (NACC) Uniform Data Set (UDS) version 3 standard. 27 We labeled each participant's cognitive status as either “intact” or “impaired” using the highest between their Clinical Dementia Rating (CDR) 28 score and their NACCUDSD score 27 (denoting their clinical diagnosis as either normal cognition, impaired‐not‐mild cognitive impairment (MCI), MCI, or dementia). A participant was labeled as “intact” only if it was marked as such according to both scales. This ensured that participants with even subtle signs of impairment were assigned to the “impaired” group. We used the cognitive status at the chronologically closest clinical visit to label the DANA data. To determine cognitive status, we considered any available CDR or NACCUDSD score up to 1 year before the earliest DANA measurement.

FIGURE 1.

FIGURE 1

Response time across trials for different cognitive tasks. The median response time (solid lines) and interquartile range (shaded areas) are shown for each task, stratified by cognitive status (Impaired: Yes/No). Tasks include Code Substitution (CS), Simple Reaction Time (SRT), Spatial Processing (SP), Procedural Reaction Time (PRT), Go/No‐Go (GNG), and Matching‐to‐Sample (MTS). A clear within‐session practice effect is observed across all tasks, with response times decreasing as trial numbers increase. This indicates the need to account for trial number when modeling the response times to avoid confounding effects.

2.2. Statistical model

We explored the connection between RT and cognitive impairment in two ways. We first built a linear regression model with random intercepts (Equation 1) to estimate the association between each predictor, including cognitive status, on RT. We then used machine learning classifiers as proof‐of‐concept predictors of cognitive status from DANA and basic demographics. In both analyses, we elected to keep the RT on correct answers only. All tasks except MTS had >95% correct rates, so the fraction of incorrect responses was deemed to be non‐informative. Furthermore, previous literature suggests that the RT may be more informative than the accuracy rate in older adults. 29 This approach excludes not only incorrect responses but “fast” and “lapsed” responses as well, where a fast response occurs when a participant taps the screen before the stimulus is presented, and a lapsed response occurs when the response takes longer than the allotted maximum time for that specific task (SRT: 900 ms, GNG: 1500 ms, MTS: 4 s, CS: 4 s, PRT: 2 s, SP: 5 s. See Lathan et al. 22 for more details). In the case of GNG, we only kept stimuli where the participant was expected to act. Each tap on the screen, referred to as a “trial,” was represented as a row in our data matrix. For example, a single session of the SRT task comprised 40 trials: The stimulus was presented 40 times consecutively. We used the trial number, denoting which of the 40 individual taps was being analyzed, among the predictors, because we observed within‐session practice effects (Figure 1). While the RT as a function of the trial number was constant in SP and MTS, we still allowed our models to use the trial number as predictor; we did not in principle exclude within‐session practice effects. Our data (Figure 1) suggested that in this population, the number of warm‐up trials should be increased by at least five in all tasks except SP and MTS, where the recommended number appeared to already be sufficient to suppress within‐session practice effects but not so large as to induce noticeable cognitive fatigue: in SRT, GNG, CS, and PRT, there is a distinct pattern of sharp initial decline in RT during the first five trials (Figure 1). In SRT and CS, this is followed by a continuous decline throughout the remainder of the session, whereas in GNG and PRT, these improvements are less distinct. Furthermore, notice that these observations suggest that DANA tasks do not induce significant cognitive fatigue in either cognitively intact or impaired participants, in which case RT would increase with trial number.

To understand the association between RT and cognitive status, including practice effects over different visits, we modeled the RTs at each session as a linear mixed‐effects model with random intercepts (η):

logResponseTime=β0+βIImpaired+βDDays+βIDImpaired×Days+βAAge+βSSex+βEEducation+βTTrialNumber+η (1)

where “Days” is the number of days between the first and second DANA sessions for each participant. Additionally, we included an interaction term between Days and Impaired to allow the cognitively impaired and unimpaired groups to learn at different rates between the two sessions. In our framework, the “Sex” and “Impaired” variables were binary indicators, and everything else was treated as a continuous variable. Continuous variables were centered and scaled to unit variance before training of the model, but we present results in original units for ease of interpretation. Given that the number of days elapsed between the first and second session (Days) is not the same for all participants (Figure 2), we treated it as a continuous variable. We controlled for age, sex, education, and trial number. We modeled the natural logarithm of the RT to help mitigate residual heteroskedasticity and to make the comparison of effect sizes across tasks and predictors easy to understand in terms of percentage changes. Fit results are reported in Table 2, including the two‐sided p values for the β coefficients adjusted using the Benjamini–Yekutieli procedure for false discovery rate at the α = 0.05 levels.

FIGURE 2.

FIGURE 2

Interval distribution between two DANA sessions. Number of days elapsed between the first and second DANA measurements for six cognitive tasks: Code Substitution (CS), Go/No‐Go (GNG), Matching‐to‐Sample (MTS), Procedural Reaction Time (PRT), Simple Reaction Time (SRT), and Spatial Processing (SP). The data are stratified by cognitive status (Impaired: Yes/No). Participants were asked to complete the battery again after an interval of 90 days, with an 8‐week flexibility window centered on the 90‐day mark. The distribution of intervals between cognitively impaired and intact participants is not significantly different (p > 0.7 in all cases), as determined by a Kolmogorov–Smirnov test. This indicates similar adherence to testing schedules across the two groups. The slight difference in position of the outliers is explained by participants not completing all tasks of the battery on the same day, and “missing” points correspond to a participant refusing to complete a task. The “Impaired” group is defined in Section 2.1 (i.e., highest between Clinical Dementia Rating score > 0 and the National Alzheimer's Coordinating Center Uniform Data Set > 1).

TABLE 2.

Summary of random intercepts model (Equation 1).

Task exp(beta) exp([0.025) exp(0.975]) p > |t| padj  > |t|

SRT

N = 108

ICC = 0.449

Age 0.999 0.996 1.002 0.619 1
Education 1 0.987 1.014 0.973 1
Days/90 0.996 0.988 1.005 0.398 1
Trials 0.864 0.851 0.876 <0.001 <0.001
Sex 1.073 1.012 1.139 0.019 0.185
Days/90:Impaired 1.006 0.989 1.022 0.493 1
Impaired 0.978 0.935 1.023 0.476 1
Intercept (ms) 361.015 210.376 619.518 <0.001 <0.001

SP

N = 103

ICC = 0.170

Age 1.004 1.001 1.007 0.01 0.107
Education 0.989 0.975 1.003 0.115 0.849
Days/90 0.971 0.948 0.994 0.015 0.153
Trials 0.987 0.942 1.034 0.581 1
Sex 1 0.941 1.061 0.989 1
Days/90:Impaired 1.036 0.986 1.088 0.159 1
Impaired 1.04 1 1.082 0.081 0.683
Intercept (ms) 2010.76 1121.621 3604.743 <0.001 <0.001

MTS

N = 108

ICC = 0.219

Age 1.005 1.001 1.008 0.007 0.079
Education 0.991 0.975 1.008 0.315 1
Days/90 1.006 0.979 1.034 0.677 1
Trials 0.99 0.94 1.043 0.705 1
Sex 0.991 0.922 1.065 0.805 1
Days/90:Impaired 1.036 0.983 1.093 0.185 1
Impaired 0.924 0.88 0.97 0.123 0.877
Intercept (ms) 1390.644 692.015 2794.58 <0.001 <0.001

GNG

N = 109

ICC = 0.325

Age 1.007 1.004 1.01 <0.001 <0.001
Education 0.997 0.983 1.012 0.708 1
Days/90 0.968 0.955 0.98 <0.001 <0.001
Trials 0.915 0.894 0.936 <0.001 <0.001
Sex 1.054 0.991 1.121 0.095 0.726
Days/90:Impaired 1.011 0.986 1.037 0.399 1
Impaired 1.025 0.977 1.075 0.332 1
Intercept (ms) 378.142 212.835 671.842 <0.001 <0.001

PRT

N = 100

ICC = 0.325

Age 1.006 1.003 1.009 <0.001 <0.001
Education 0.988 0.975 1.001 0.08 0.683
Days/90 0.975 0.963 0.986 <0.001 <0.001
Trials 0.977 0.957 0.998 0.028 0.261
Sex 1.052 0.993 1.114 0.083 0.683
Days/90:Impaired 0.98 0.957 1.003 0.087 0.69
Impaired 1.09 1.045 1.138 0.007 0.079
Intercept (ms) 528.27 309.656 901.224 <0.001 <0.001

CS

N = 112

ICC = 0.319

Age 1.01 1.007 1.013 <0.001 <0.001
Education 0.997 0.984 1.01 0.675 1
Days/90 0.978 0.967 0.989 <0.001 <0.001
Trials 0.882 0.866 0.9 <0.001 <0.001
Sex 1.013 0.956 1.073 0.663 1
Days/90:Impaired 0.96 0.939 0.981 <0.001 <0.001
Impaired 1.148 1.096 1.201 <0.001 <0.001
Intercept (ms) 1041.293 609.399 1779.279 <0.001 <0.001

Note: For each DANA task, we reported results from a linear mixed‐effects model with a random intercept for participant ID. The first column indicates the task name, effective number of participants with two valid sessions (accounting for dropout), and the intraclass correlation coefficient (ICC). To aid interpretation, we reported the exponentiated beta coefficients (exp(β)), 95% confidence intervals, two‐sided p values, and false discovery rate‐adjusted p values (Benjamini–Yekutieli procedure, α = 0.05). While the model includes Days as a continuous variable, the reported estimates correspond to exp(90βD), representing the effect of a 90‐day interval between sessions. Similarly, for Trials, we reported the effect associated with completing the entire task (i.e., full trial set).

While this regression analysis approach allowed us to disentangle the effect of cognitive impairment from normal aging and other confounders, we were also interested in evaluating the potential of DANA as a predictive tool for discerning cognitively impaired from cognitively intact participants. To demonstrate DANA's capabilities for this purpose, we trained two machine learning classifiers: a logistic regression and a random forest classifier, using age, sex, education, trial number, and RT (mean and standard deviation in each session) as predictors in each model (Table 3), and evaluated their generalization performance via internal cross‐validation. To ensure that all measures for a given participant appeared either in the training fold or the validation one, the data were split into five folds, label stratified, and ID‐grouped, so that each fold contained the same proportion of cognitively impaired participants, and each participant appeared in exactly one fold. Additionally, because the distribution of impaired and intact participants in our dataset was uneven (Table 1), we used class‐weighted loss during training: Misclassification errors were penalized proportionally to class prevalence. In our population, this implies that misclassifying a sample from the cognitively impaired group penalizes the model 84/32 ≈ 2.6 times more than misclassifying one from the unimpaired group. This approach ensured that the model would produce a reasonable balance between sensitivity and specificity, rather than optimizing accuracy alone. 30 To identify which DANA tasks were most sensitive to detecting cognitive impairment and contextualize overall model performance, we compared them with baseline naïve models trained on demographic information such as age, sex, and education level only (Table 3).

TABLE 3.

Classification model results across all DANA tasks.

Metric Model CS GNG PRT SP MTS SRT Demog. only
Accuracy LR 0.69 ± 0.08 0.59 ± 0.09 0.66 ± 0.12 0.58 ± 0.05 0.62 ± 0.10 0.60 ± 0.08 0.61 ± 0.09
RF 0.70 ± 0.07 0.61 ± 0.06 0.71 ± 0.07 0.67 ± 0.10 0.66 ± 0.08 0.64 ± 0.06 0.67 ± 0.09
Precision LR 0.44 ± 0.10 0.31 ± 0.10 0.44 ± 0.18 0.33 ± 0.06 0.38 ± 0.11 0.34 ± 0.09 0.36 ± 0.11
RF 0.44 ± 0.12 0.3 ± 0.07 0.47 ± 0.13 0.35 ± 0.20 0.40 ± 0.10 0.37 ± 0.06 0.40 ± 0.17
Sensitivity LR 0.66 ± 0.25 0.49 ± 0.21 0.60 ± 0.15 0.61 ± 0.26 0.56 ± 0.11 0.50 ± 0.13 0.56 ± 0.19
RF 0.51 ± 0.22 0.35 ± 0.12 0.53 ± 0.17 0.35 ± 0.20 0.37 ± 0.11 0.43 ± 0.07 0.38 ± 0.16
Specificity LR 0.70 ± 0.08 0.62 ± 0.09 0.68 ± 0.17 0.57 ± 0.12 0.64 ± 0.11 0.63 ± 0.07 0.63 ± 0.10
RF 0.77 ± 0.07 0.71 ± 0.08 0.77 ± 0.10 0.78 ± 0.08 0.76 ± 0.10 0.71 ± 0.11 0.77 ± 0.11

Note: Performance metrics across different tasks for our two classification models on the validation fold. LR: unregularized logistic regression. RF: random forest model with 32 trees and maximum depth three. Both models are trained on age, sex, years of education, mean response time per session, and variance of response time per session in the specified task. In the last column we report the performance of a baseline model that uses only age, sex, and years of education. All values are reported as mean and standard deviation (SD) over the five cross‐validation folds. The “precision” metric is also known as positive predictive value. The tasks are abbreviated as follows: Code Substitution Simultaneous (CS), Go‐No‐Go (GNG), Procedural Reaction Time (PRT), Spatial Processing (SP), Match‐to‐Sample (MTS), and Simple Reaction Time (SRT).

3. RESULTS

The observed compliance with the designed testing schedule is presented in Figure 2. The cognitively impaired and intact groups complied in the same way: Comparing the distribution of the Days variable between the two groups via a Kolmogorov–Smirnov test gives non‐significant results (p = 0.7 or higher) for all tasks.

The results of multivariate linear regression analysis with random intercepts are presented in Table 3 and visualized in Figure 3. The Days coefficient, which represents the change in RT associated with the number of days since the first DANA session, did not show a consistent positive or negative effect for MTS (+0.6% after 90 days, 95% confidence interval [CI95]: [−2.1%, 3.4%], p = 0.677, padj  = 1), SP (−2.9%, CI95: [−5.2%, +0.6%], p = 0.015, padj  = 0.153), or SRT (−0.4% CI95: [−1.2%, 0.5%], p = 0.398, padj  = 1): In each case, the 95% confidence interval includes 0%, and we failed to reject the null hypothesis (no change after 90 days). We observed a small but consistent decrease in RT after 90 days in CS (−2.2% CI95: [−3.4%, −1.1%], p < 0.001, padj  < 0.001), PRT (−2.5% CI95: [−3.7%, −1.4%], p < 0.001, padj  < 0.001), and GNG (−3.2% CI95: [−4.5%, −2.0%], p < 0.001, padj  < 0.001). Additionally, the results showed that cognitive impairment was significantly associated with a change in RT in CS (+14.8%, CI95: [9.6%, 20.1%], p < 0.001, padj  < 0.001) and PRT (+9.0%, CI95: [4.5%, 13.8%], p = 0.007, padj  = 0.079). Compare the size of this effect to the much smaller improvement in RT after 90 days, controlled for Age, Sex, Education, Impairment, and Trial Number, than can be read from the βD coefficient: a 2.3% improvement (CI95: [1.1%, 2.3%, p < 0.001, padj  < 0.001) in CS and 2.5% (CI95: 1.4%, 3.7%, p < 0.001, padj  < 0.001) in PRT. For the other four tasks, the estimated changes in RT associated with cognitive impairment had 95% confidence intervals that included zero, meaning the effects were compatible with zero, or statistically indistinguishable from no effect at this sample size. The Days × Impaired term, which measures the difference in learning effects between the cognitively impaired and unimpaired groups, is significantly different from zero only for the CS task: In all other tasks, the two groups exhibited essentially the same learning rate across the two sessions. In CS, the interaction term is negative, meaning that the RT gap between the impaired and unimpaired groups shrinks (βID  = −4%, CI95: [−6.1%, −1.9%]) but remains large at the second session (βI  = 14.8%, βI + βID  = 10.2%).

FIGURE 3.

FIGURE 3

Response time change per unit increase in each predictor. Results from the linear model (Table 3), expressed as percentage changes for interpretability. Error bars represent 95% confidence intervals. Although Days is modeled as a continuous variable, panel (A) shows the estimated response time change associated with a 90‐day delay, which is the scheduled inter‐session interval. (E) Effect is associated with being female. (G) Change in response time is associated with completing the full task rather than a single trial. The “Impaired” group is defined in Section 2.1 (i.e., highest between Clinical Dementia Rating score > 0 and National Alzheimer's Coordinating Center Uniform Data Set > 1). Response Time (RT), Code Substitution (CS), Simple Reaction Time (SRT), Spatial Processing (SP), Procedural Reaction Time (PRT), Go/No‐Go (GNG), and Matching‐to‐Sample (MTS).

Overall, these results demonstrate that DANA is sensitive to differences in cognition in at least two tasks while remaining relatively insensitive to the influence of repeated testing, demonstrating its potential utility as a longitudinal monitoring tool of cognitive status.

The necessity of including Trial Number among the predictors (as anticipated in Section 2.2) is quantitatively confirmed by the regression analysis: in SRT, GNG, and CS, the regression coefficient βT shows that RT decreases as the task progresses, with the strongest effect in SRT (−13.6% RT by the end of the task, CI95: [−12.4%, −14.9%], p < 0.001, padj  < 0.001), PRT (−2.3% CI95: [−4.3%, −0.02%], p = 0.028, padj = 0.261), and CS (−12.8% CI95: [−10%, −13.4%], p < 0.001, padj  < 0.001) when the other predictors are fixed. Conversely, the βT coefficient was compatible with no effect in SP (−2.3% CI95: [−5.8%, +3.4%], p = 0.58, padj = 1) and MTS (−1% CI95: [−6%, +4.3%], p = 0.705, padj = 1), confirming the negligible within‐session practice effects observed in Figure 1.

RT slowed with age in all tasks except SRT: significantly so in GNG (+0.7% per year of age, CI95 [0.4%, 1.0%], p < 0.001, padj  < 0.001), PRT (+0.6% CI95: [0.3%, 0.9%], p < 0.001, padj  < 0.001), and CS (+1.0% CI95: [0.7%, 1.3%], p < 0.001, padj  < 0.001), even after adjusting for false discovery rate via the Benjamini–Yekutieli procedure, and in a less significant way in SP (+0.4% CI95: [0.1%, 0.7%], p = 0.01, padj  = 0.107) and MTS (0.5% CI95: [0.1%, 0.8%], p = 0.007, padj  = 0.079). The effect of age on SRT RT was estimated to be negative but was statistically compatible with zero, suggesting the association may be negligible at this sample size (−0.1% CI95: [−0.4%, +0.2%], p = 0.619, padj  = 1). Cognitively impaired participants were slowest compared to their intact counterparts in CS (+14.8% CI95 [9.6%, 20.1%], p < 0.001, padj  < 0.001), with PRT a close second (+9% CI95: [4.5%, 13.8%], p = 0.007, padj  = 0.079). The effect of cognitive impairment on the other four tasks was compatible with zero, suggesting no meaningful association at this sample size (Table 3). Our results showed significant sex differences only in SRT, where female participants were 7.3% (CI95: [1.2%, 13.9%], p = 0.019, padj  = 0.185) slower than male ones. Regarding the effect of education, it was not significant in any of the tasks. This suggests that DANA is not overly sensitive to education level. However, due to the limited educational diversity within our cohort, with 96% of participants having at least some college education or higher, definitive conclusions cannot be drawn.

The results of our proof‐of‐concept classification models, which measure how well we can predict participants’ cognitive status from their demographics (age, sex, years of education) and RT, as well as naïve baseline models, are summarized in Table 3. The random forest and logistic regression models achieved accuracies as high as 70% and 69%, respectively, on the validation sets. Specifically, RT on the CS task as input to the classifier resulted in an 8% increase accurately identifying impaired individuals for the logistic regression model and a 3% increase for the random forest model, compared to the naïve classifier with only demographic information. Similarly, RT on PRT increased the classification accuracy of the logistic regression model by 5% and accuracy of the random forest model by 4% compared to the demographic‐only classifier models. The highest average validation sensitivity was achieved by the logistic regression model using CS RT and demographic variables, reaching 66 ± 25%. While this level of sensitivity falls short of what is typically required for clinical application, these preliminary findings are encouraging. They demonstrate the feasibility of adapting the DANA digital battery to a new and more clinically diverse population and support the need for further research, refinement, and validation of this approach for cognitive assessment in broader settings.

4. DISCUSSION

In this study, we present a modeling approach to evaluate the effects of repeated testing on digital cognitive assessment tools. Using the DANA battery as a case study, we assessed whether practice effects, that is, performance improvements resulting from repeated exposure to cognitive tasks, could mask actual cognitive changes. Our findings show that DANA is relatively insensitive to significant practice effects. Such methodologies have broader implications for evaluating the reliability of other digital cognitive tools. By analyzing these effects, the accuracy of longitudinal assessments can be enhanced, ensuring that cognitive decline is not obscured by performance gains due to repeated testing.

Our findings demonstrate the promise of DANA as a longitudinal dementia monitoring tool in two key components. First, our multivariate linear regression with random intercepts indicates that as a repeated measure of longitudinal assessment, DANA is sensitive to changes in cognition while remaining robust to the influence of practice effects. We found significant differences in RT between cognitively intact and impaired participants for the CS and PRT tasks and after controlling for age, sex, education, and cognitive status, we observed that the 90‐day learning rates were minimal for all tasks. Second, as a proof of concept, our classification models confirmed our initial results by demonstrating that select DANA tasks, specifically CS and PRT, are capable of distinguishing between participants who are cognitively impaired and those who are cognitively intact and that RT on the DANA battery as a whole generally increases the accuracy of our classification models compared to naïve models with only demographic information. Taken together, these findings suggest that DANA is a valid digital assessment tool for monitoring cognitive status, with the added advantages of home administration and weak practice effects. This makes it accessible to a broad range of older adults, as it can be administered on a device they are already familiar with, such as their own smartphone.

While multivariate linear regression indicated that the effect of days since the first DANA was statistically indistinguishable from zero for most tasks, CS, PRT, and GNG showed modest improvements in RT over 90 days (−3.3%, −3.0%, and −3.1%, respectively). For CS, this could potentially be explained by the specific task design, where participants are presented with a set of symbol–digit pairs that comprise a small number of the same symbols used each time paired with different digits. Therefore, it is plausible that through procedural learning, previous exposure to and subsequent familiarity with these specific symbols and protocol could give participants a slight processing speed advantage when repeating the task. 31 , 32 This is in line with findings from Bergman et al., 33 who found that tasks involving a major speed component were susceptible to practice effects, but not for reaction time, sustained attention, verbal memory, or visuospatial tasks. Similarly, the repetitive and procedural design of the PRT task may inherently be more affected by procedural learning than the other tasks, resulting in the observed modest practice effects. The GNG task is designed to have a higher ratio of “Go” trials, where the correct response is reactionary, compared to the “No‐Go” trials, where the correct response is inhibitory. 34 Previous studies posited that the frequent repetition of the “Go” trials could facilitate performance improvements arising from both associative learning and motor skill acquisition effects. 35 Because we only analyzed “Go” instances, these cognitive mechanisms could plausibly underlie our observed modest practice effects for GNG.

Our findings that CS and PRT were associated with cognitive impairment, where the other tasks were not significantly so, could be potentially explained by their polyfactorial design, assessing multiple areas of cognition at once. The CS task requires simultaneous orchestration of various cognitive networks. 36 As such, functional decline in any one of these systems will be captured by the task, making it a valuable tool for detecting subtle impairment across a wide variety of cognitive domains. 37 Similarly, the PRT evaluates multiple cognitive competencies at once, including decision making, speed, and executive functions shown to be impacted by cognitive impairment. 38 , 39 , 40

Our study has a few limitations. Much of the participant population was White, so future studies should aim to replicate these analyses in more diverse populations. Additionally, our sample size was limited, with longitudinal data available from only 116 participants across a 90‐day interval between sessions. In the future, we plan to replicate this approach with a larger sample over a longer follow‐up period. Since this study only includes data from two sessions, further investigation with additional sessions is needed to determine the reliability of these findings over time. Given the low sensitivity observed for select DANA tasks in detecting cognitive impairments, caution is warranted when considering their use in diagnostic settings, and the full battery should be administered. Another limitation is that assessments were administered on the participant's own smartphone. While the bring‐your‐own‐device design has distinct advantages, such as enabling a larger and more diverse participant pool to be enrolled, it also introduces potential latency discrepancies between different devices that can influence results, especially if reaction times differ at a scale of less than 150 ms. Furthermore, this study did not examine the influence of time‐of‐day effects on task performance, which have been shown to affect memory performance in older adults. 41 In addition to examining these effects, future research should also explore whether intra‐individual variability or the absence of practice effects could serve as potential markers for risk of future cognitive decline, even in cognitively normal older adults. 42 , 43 , 44 , 45

In conclusion, our findings underscore the importance of accounting for practice effects when using digital tools for longitudinal cognitive monitoring. DANA demonstrates potential in distinguishing cognitively impaired from intact individuals, while also showing relative insensitivity to performance gains unrelated to actual cognitive improvement that may arise from repeated assessments. Future work should prioritize data collection from a larger, more diverse participant group to determine whether DANA's insensitivity to practice effects persists and whether the model's predictive accuracy improves on this larger sample. Additionally, exploring extended periods of data collection will help assess the long‐term reliability of DANA and similar tools for continuous cognitive monitoring.

CONFLICT OF INTEREST STATEMENT

Vijaya B. Kolachalama is a co‐founder and equity holder of deepPath Inc. and CogniMark, Inc. (formerly CogniScreen). He also serves on the scientific advisory board of Altoida Inc. Rhoda Au is a scientific advisor to Signant Health and Novo Nordisk. The remaining authors declare no competing interests. Author disclosures are available in the supporting information.

Supporting information

Supporting Information

ALZ-21-e70644-s001.pdf (758.6KB, pdf)

ACKNOWLEDGMENTS

This project was supported by grants from the National Institute on Aging's Artificial Intelligence and Technology Collaboratories (P30‐AG073105), the American Heart Association (20SFRN35460031), and the National Institutes of Health (R01‐HL159620, R01‐AG062109, R01‐AG083735, R01‐NS142076, and P30‐AG013846). Data collection methods were approved by the Institutional Review Board of Boston University Medical Campus (IRB No. H‐22543). All participants (or their Legally Authorized Representatives) provided their written informed consent to participate in the study.

Bellitti M, Lauber MV, Searls E, Lin H, Au R, Kolachalama VB. Practice effects on digital cognitive assessment tools: insights from the defense automated neurobehavioral assessment battery. Alzheimer's Dement. 2025;21:e70644. 10.1002/alz.70644

Matteo Bellitti and Meagan V. Lauber contributed equally to this study.

REFERENCES

  • 1. Öhman F, Hassenstab J, Berron D, et al. Current advances in digital cognitive assessment for preclinical Alzheimer's disease. Alzheimers Dement. 2021;13:e12217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Staffaroni AM, Tsoy E, Taylor J, et al. Digital cognitive assessments for dementia. Pract Neurol. 2020;2020:24‐45. [PMC free article] [PubMed] [Google Scholar]
  • 3. Nicosia J, Aschenbrenner AJ, Balota DA, et al. Unsupervised high‐frequency smartphone‐based cognitive assessments are reliable, valid, and feasible in older adults at risk for Alzheimer's disease. J Int Neuropsychol Soc. 2023;29:459‐471. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Thompson LI, Kunicki ZJ, Emrani S, et al. Remote and in‐clinic digital cognitive screening tools outperform the MoCA to distinguish cerebral amyloid status among cognitively healthy older adults. Alz & Dem Diag Ass Dis Mo. 2023;15:e12500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Berron D, Olsson E, Andersson F, et al. Remote and unsupervised digital memory assessments can reliably detect cognitive impairment in Alzheimer's disease. Alzheimers Dement. 2024;20:4775‐4791. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Hollinger KR, Woods SR, Adams‐Clark A, et al. Defense automated neurobehavioral assessment accurately measures cognition in patients undergoing electroconvulsive therapy for major depressive disorder. J ECT. 2018;34:14. [DOI] [PubMed] [Google Scholar]
  • 7. Dion C, Kunicki ZJ, Emrani S, et al. Remote and in‐clinic digital cognitive screening tools outperform the MoCA to distinguish cerebral amyloid status among cognitively healthy older adults. Alzheimers Dement. 2023;19:e075327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Polk SE, Öhman F, Hassenstab J, et al. A scoping review of remote and unsupervised digital cognitive assessments in preclinical Alzheimer's disease. npj Digit Med. 2025;8(1):266. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Thompson LI, De Vito AN, Kunicki ZJ, et al. Psychometric and adherence considerations for high‐frequency, smartphone‐based cognitive screening protocols in older adults. J Int Neuropsychol Soc. 2024;30(8):785‐793. [DOI] [PubMed] [Google Scholar]
  • 10. Hassenstab J, Aschenbrenner AJ, Balota DA, et al. Remote cognitive assessment approaches in the Dominantly Inherited Alzheimer Network (DIAN). Alzheimers Dement. 2020;16:e038144. [Google Scholar]
  • 11. Sliwinski MJ, Mogle JA, Hyun J, et al. Reliability and validity of ambulatory cognitive assessments. Assessment. 2018;25:14‐30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Duff K, Dixon A, Embree L. A closer look at practice effects in mild cognitive impairment and Alzheimer's disease. Arch Clin Neuropsychol. 2024;39:1‐10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Zheng B, Udeh‐Momoh C, Watermeyer T, et al. Practice effect of repeated cognitive tests among older adults: associations with brain amyloid pathology and other influencing factors. Front Aging Neurosci. 2022;14:909614. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Goldberg TE, Harvey PD, Wesnes KA, et al. Practice effects due to serial cognitive assessment: implications for preclinical Alzheimer's disease randomized controlled trials. Alz Dem Diag Ass Dis Mo. 2015;1:103‐111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Scharfen J, Blum D, Holling H. Response time reduction due to retesting in mental speed tests: a meta‐analysis. J Intell. 2018;6:6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Bartels C, Wegrzyn M, Wiedl A, et al. Practice effects in healthy adults: a longitudinal study on frequent repetitive cognitive testing. BMC Neurosci. 2010;11:118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Del Rossi G, Malaguti A, Del Rossi S. Practice effects associated with repeated assessment of a clinical test of reaction time. J Athl Train. 2014;49:356‐359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Basner M, Hermosillo E, Nasrini J, et al. Cognition test battery: adjusting for practice and stimulus set effects for varying administration intervals in high performing individuals. J Clin Exp Neuropsychol. 2020;42:516‐529. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Holm SP, Wolfer AM, Pointeau GHS, et al. Practice effects in performance outcome measures in patients living with neurologic disorders—a systematic review. Heliyon. 2022;8:e10259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Sanderson‐Cimino M, Elman JA, Tu XM, et al. Practice effects in mild cognitive impairment increase reversion rates and delay detection of new impairments. Front Aging Neurosci. 2022;14:847315. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Sliwinski M, Buschke H. Cross‐sectional and longitudinal relationships among age, cognition, and processing speed. Psychol Aging. 1999;14:18‐33. [DOI] [PubMed] [Google Scholar]
  • 22. Lathan C, Spira JL, Bleiberg J, et al. Defense automated neurobehavioral assessment (DANA)—psychometric properties of a new field‐deployable neurocognitive assessment tool. Milit Med. 2013;178:365‐371. [DOI] [PubMed] [Google Scholar]
  • 23. Ding H, Kim M, Searls E, et al. Digital neuropsychological measures by defense automated neurocognitive assessment: reference values and clinical correlates. Front Neurol. 2024;15:1340710. 10.3389/fneur.2024.1340710. Epub ahead of print 15 February 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. De Anda‐Duran I, Sunderaraman P, Searls E, et al. Comparing cognitive tests and smartphone‐based assessment in 2 US community‐based cohorts. J Am Heart Assoc. 2024;13:e032733. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. De Anda‐Duran I, Hwang PH, Popp ZT, et al. Matching science to reality: how to deploy a participant‐driven digital brain health platform. Front Dement. 2023;2:1135451. 10.3389/frdem.2023.1135451. Epub ahead of print 5 May 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Linus Health, https://linushealth.com
  • 27. Besser L, Kukull W, Knopman DS, et al. Version 3 of the National Alzheimer's Coordinating Center's Uniform Data Set. Alzheimer Dis Assoc Disord. 2018;32:351‐358. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Morris JC. Clinical dementia rating: a reliable and valid diagnostic and staging measure for dementia of the Alzheimer type. Int Psychogeriatr. 1997;9:173‐176. [DOI] [PubMed] [Google Scholar]
  • 29. Christ BU, Combrinck MI, Thomas KGF. Both reaction time and accuracy measures of intraindividual variability predict cognitive performance in Alzheimer's disease. Front Hum Neurosci. 2018;12:124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. He H, Garcia EA. Learning from imbalanced data. IEEE Trans Knowledge Data Eng. 2009;21:1263‐1284. [Google Scholar]
  • 31. Koziol LF, Budding DE. Procedural learning. In: Seel NM, ed. Encyclopedia of the sciences of learning. Springer US:2694‐2696. [Google Scholar]
  • 32. Hong J‐Y, Gallanter E, Müller‐Oehring EM, et al. Phases of procedural learning and memory: characterisation with perceptual‐motor sequence tasks. J Cogn Psychol. 2019;31:543‐558. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Bergman I, Franke Föyen L, Gustavsson A, Van den Hurk W. Test–retest reliability, practice effects and estimates of change: a study on the Mindmore digital cognitive assessment tool. Scand J Psychol. 2025;66(1):1‐14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Helmers KF, Young SN, Pihl RO. Assessment of measures of impulsivity in healthy male volunteers. Person Individual Diff. 1995;19:927‐935. [Google Scholar]
  • 35. Schapkin SA, Falkenstein M, Marks A, Griefahn B. Practice‐related effects in a Go‐Nogo task. Perceptual Mot Skills. 2007;105(3_suppl):1275‐1288. [DOI] [PubMed] [Google Scholar]
  • 36. Jaeger J. Digit symbol substitution test: the case for sensitivity over specificity in neuropsychological testing. J Clin Psychopharmacol. 2018;38:513‐519. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Williamson M, Maruff P, Schembri A, et al. Validation of a digit symbol substitution test for use in supervised and unsupervised assessment in mild Alzheimer's disease. J Clin Exp Neuropsychol. 2022;44:768‐779. [DOI] [PubMed] [Google Scholar]
  • 38. Chehrehnegar N, Shati M, Esmaeili M, et al. Executive function deficits in mild cognitive impairment: evidence from saccade tasks. Aging Ment Health. 2022;26:1001‐1009. [DOI] [PubMed] [Google Scholar]
  • 39. Reinvang I, Grambaite R, Espeseth T. Executive dysfunction in MCI: subtype or early symptom. Int J Alzheimers Dis. 2012;2012:1‐8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Ávila‐Villanueva M, Marcos Dolado A, Gómez‐Ramírez J, et al. Brain structural and functional changes in cognitive impairment due to Alzheimer's disease. Front Psychol. 2022;13:886619. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Wilks H, Aschenbrenner AJ, Gordon BA, et al. Sharper in the morning: cognitive time of day effects revealed with high‐frequency smartphone testing. J Clin Exp Neuropsychol. 2021;43:825‐837. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Vito AD, Kunicki Z, Britton K, et al. 5 Intraindividual variability in processing speed on Digital Cognitive Assessments differs by amyloidosis status in cognitively normal older adults. J Int Neuropsychol Soc. 2023;29:217‐218. [Google Scholar]
  • 43. Hassenstab J, Ruvolo D, Jasielec M, et al. Absence of practice effects in preclinical Alzheimer's disease. Neuropsychology. 2015;29:940‐948. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Jutten RJ, Grandoit E, Foldi NS, et al. Lower practice effects as a marker of cognitive performance and dementia risk: a literature review. Alzheimers Dement. 2020;12(1):e12055. 10.1002/dad2.12055. Epub ahead of print January 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Jutten RJ, Rentz DM, Amariglio RE, et al. Fluctuations in reaction time performance as a marker of incipient amyloid‐related cognitive decline in clinically unimpaired older adults. Alzheimers Dement. 2022;18:e066578. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

ALZ-21-e70644-s001.pdf (758.6KB, pdf)

Articles from Alzheimer's & Dementia are provided here courtesy of Wiley

RESOURCES