Abstract
Cognition is a dynamic process and is subject to substantial variation across short and long timescales. It is becoming common to assess cognition repeatedly over short intervals to determine the correlates and consequences of such “cognitive variability”. A high frequency cognitive assessment approach is also an ideal method for measuring how cognition operates in daily life. Nevertheless, several fundamental questions regarding the nature of cognitive variability remain unanswered. We utilize data from the COGITO study, which administered 9 separate cognitive tests to more than 200 participants for 100 days to answer the following questions: Do different tasks exhibit similarly reliable levels of variability and does variability cluster into distinct cognitive domains? This rich dataset was analyzed using Bayesian mixed effects location scale models which simultaneously estimate individual means and variability. All 9 tasks exhibited significant variability across the 100 days of testing. Tasks within the domains of episodic memory or processing speed were moderately correlated with each other suggesting some degree of domain specificity. Working memory tasks, on the other hand, did not correlate well with each other suggesting variability in these tasks is dominated by momentary or task specific influences. These findings not only advance our theoretical understanding of what cognitive variability is but also provide insight into which cognitive tests are most suitable for high-frequency administration and thus may be most amenable to use for studying aging and cognitive processes as they occur in daily life. Appropriate limits on the generalizability of our results are noted.
Keywords: cognitive variability, aging, daily life, high-frequency assessment
Introduction
Researchers are increasingly interested in cognitive and psychological phenomenon as dynamic processes (Hamaker & Wichers, 2017; Hultsch et al., 2000; MacDonald, Li, et al., 2009; Ram et al., 2005), that are subject to meaningful, within-person fluctuations across either short (moment-to-moment) or longer (day-to-day; week-to-week) timescales. Such observations have led to an increase in high-frequency, daily measurement studies that aim to assess cognitive ability in daily life (Aschenbrenner & Jackson, 2023; Cerino et al., 2021; Schmiedek, Lovden, et al., 2010). Although there are many reasons for collecting daily measures of cognition, including increased reliability of the estimation of mean performance (Nicosia et al., 2022; Sliwinski et al., 2018), one important application is to find certain instances when individuals are performing optimally or, conversely, when they are below their typical performance. These natural fluctuations, i.e., variability, in cognitive function could potentially have major implications for daily life, as cognition permeates virtually every activity an older adult may engage in, from holding a meaningful conversation to spatially navigating their environment. Failures at these tasks may, at times, have significant consequences, e.g., failing to remember an important conversation with their doctor, or having difficulties with driving. As such, assessing daily cognitive variability provides additional information about an individual, and perhaps future outcomes, that cannot be obtained from standard, single-shot, in-lab assessments alone.
Decades of research on cognitive variability have identified multiple neurobiological and psychological mechanisms that might underly dynamic fluctuations in cognitive performance. These mechanisms include, but are not limited to, momentary changes in the efficiency of attentional control processes (West et al., 2002), a search for optimal test-taking strategies (Allaire & Marsiske, 2005), individual variations in white matter integrity (Jackson et al., 2012), or dopamine transmission (MacDonald, Cervenka, et al., 2009), increased “neural noise” (Li et al., 2001), changes in responses to stress (Sliwinski et al., 2006), fatigue (Fuentes et al., 2001; Riegler et al., 2022), mood (Brose et al., 2014), motivation (Brose et al., 2012), or variations in personality characteristics (Aschenbrenner & Jackson, 2023). Ambient distractions (Madero et al., 2021) and environmental or social context (Cerino et al., 2021) can also influence within-person variation in cognition.
The extent to which an individual experiences such variations in cognitive performance may carry additional information about their functioning, above and beyond mean performance. Studies that have addressed questions such as these have shown that cognitive variability is a significant predictor of important outcomes such as the risk of developing Alzheimer disease (AD). For example, within-person variability across one week of testing is associated with the APOE ε4 allele, a major genetic risk factor for AD (Aschenbrenner, Hassenstab, et al., 2024). Others have demonstrated similar findings in participants diagnosed with mild cognitive impairment (Aschenbrenner & Jackson, 2023; Cerino et al., 2021) or with clinical symptoms of AD (Hultsch et al., 2000). Cognitive variability is also associated with a host of other psychological phenomena including general intelligence (Ram et al., 2005) and attention ability (Stawski et al., 2019). Together, these results clearly indicate the notion of cognitive variability is an important construct to study and can provide critical insight into how the cognitive system changes with age.
Despite the promise of within-person fluctuation in cognition as an important indicator of function, there are a number of unanswered questions that any investigator proposing to conduct a high-frequency cognitive assessment study is immediately faced with. For example, which cognitive domains best lend themselves to daily assessment? Which types of tasks or stimuli are most suited to use within that domain? How many assessments or days of testing are required to reliably observe significant within-person variation? Furthermore, high frequency assessment studies are typically hampered by the necessity of keeping individual assessments relatively brief. That is, because participants are asked to engage with the cognitive tests multiple times over a limited interval (e.g., several weeks), most studies opt to administer a small number of relatively short tests that encompass a variety of cognitive domains (Cerino et al., 2021; Nicosia et al., 2022). The question of which domains to assess are particularly important as it is often assumed that variability in certain constructs is more informative or predictive than others. For example, both Aschenbrenner et al. (2024) and Cerino et al. (2021) indicated a key role for processing speed variability in determining risk for AD.
While there has been a large amount of research focused on the structure and reliability of cross-sectional cognitive tests, little psychometric work has examined what measures are sound tools for studying within-person fluctuations in cognition. Currently, it is unclear whether certain cognitive tests that assess the same domain show similar fluctuations within person. Moreover, it is unclear if cognitive fluctuations are broad, such that variability can be thought of as a latent factor across all cognitive tests, or whether daily variability is domain or even task specific. Similarly, does variability in all domains predict higher order functioning, such as working memory capacity or fluid intelligence? Each of these issues could substantially constrain one’s ability to test the mechanisms of variability which will help promote a greater understanding of cognitive function in daily life.
How to measure cognitive variability?
Fluctuations in cognitive ability are often examined within a multi-level model, where a time varying predictor (e.g., stress) is associated with fluctuations in daily cognition. Such an approach, however, does not provide an estimate of how variable a person is and thus prohibits an examination of questions about the theoretical nature of cognitive variability. A common method to characterize cognitive fluctuations is to derive a measure of intra-individual variability (IIV) across a participant’s time series with a summary statistic such as the simple standard deviation (possibly corrected for mean performance), taking the square root of the squared difference between successive observations (RMSSD, von Neumann et al., 1941), calculating the relative variability adjusted for the maximum variability associated with a given level of mean performance (Mestdagh et al., 2018), or computing parameter estimates from theoretically motivated probability distributions (e.g., the ex-Gaussian) or cognitive process models (e.g., the Wiener diffusion model). In addition to these established approaches, a more recent technique is to use mixed effects location scale models (MELSM), which are an extension of standard linear mixed effects models that allow for individual differences in the amount of “residual” variability in the outcome of interest (Hedeker et al., 2008; Williams et al., 2020). The resulting parameter, typically called “sigma”, serves as a direct estimate of variability and has the added benefit of additionally controlling for mean performance in the task by jointly estimating both the location (mean) and scale (variability) parameters.
An implicit, and often untested, assumption is that different measures of variability are equally reflective of the same underlying process. Stawski et al. (2019) directly compared five different metrics of IIV in two studies that used a measurement burst design. Importantly, the results of Stawski et al. (2019) indicate that different metrics of variability are highly correlated and roughly equally predictive of cognitive outcomes (in their case, attention switching ability). The sole exception was the coefficient of variation (i.e., the standard deviation divided by the mean), which underperformed relative to other variability metrics. Given that any single study rarely includes multiple quantifications of IIV, the results of Stawski et al. are reassuring in suggesting that different IIV measures are indeed reflective of the same mechanism.
However, other studies have suggested the story is not so simple, particularly when considering between person individual differences. For example, Rutter et al. (2020) examined several reaction time variability metrics as a function of age and found that each outcome produced slight differences in multiple analyses regarding age effects. In the affective dynamics literature, several studies have suggested that different results regarding affective stability in major depressive disorder can be obtained depending on the IIV metric under consideration (Bos et al., 2019; Jahng et al., 2008; Koval et al., 2013). Thus, the selection of an appropriate IIV metric may be critical to the conclusions of any given study, and therefore the selection of a specific quantification of IIV needs to be explicitly justified. Relatedly, these different metrics (and constructs) may provide different reliability estimates, making the study of these constructs more or less difficult.
Is cognitive variability a unitary construct?
When considering mean performance, it is standard procedure to group tasks into latent cognitive domains such as “memory” or “executive functioning”. It is thus natural to assume that variability on specific tasks will also group into those same domains, i.e., variability on a test of executive functioning is due to the same source as variability in the construct of executive functioning. However, due to the necessary constraints of high-frequency test designs, specifically that individual tests must be kept relatively brief, it is typical to employ only one or, at best, a few measures that often fall within a single domain, e.g., working memory (Stawski et al., 2019), simple or choice reaction time (Rutter et al., 2020; Schmiedek et al., 2007) or attentional control (Tse et al., 2010). Alternatively, one can administer single tests from a variety of domains (Allaire & Marsiske, 2005; Cerino et al., 2021; Nicosia et al., 2022). Regardless, it is important to establish whether variability in a specific task translates to variability on a latent construct (Feng & Hancock, 2024; Nestler, 2020). If variability in cognition is due to a common source rather than error, then variability in different tasks that tap the same construct should show high correlations with one another.
Just as importantly, it is unknown whether variability in one cognitive domain reflects variability in the another via a shared latent factor or if variability should be thought of as separate processes, unique to a specific domain. That is, are individuals equally variable on all tasks and domains or is there task and domain specific variation? Analyses of trial level response times suggest that variability can be separated into distinct factors (e.g., attentional control vs. lexical decision; (Unsworth, 2015), or working memory and mathematical reasoning (Judd et al., 2024), however intercorrelations among tasks within a domain are relatively modest (Allaire & Marsiske, 2005; Judd et al., 2024). Extending these findings to daily variability and to a larger assortment of tests is important because, as mentioned, processing speed variability has been identified as a key predictor of Alzheimer disease risk (Aschenbrenner, Hassenstab, et al., 2024; Cerino et al., 2021), however these studies only obtain a single measure of processing speed and thus are unable to assess whether variability in cognition can be thought of as a single construct. Identifying the factor structure of within-person variability is important from a theoretical standpoint, as a multi-factor structure would indicate there are multiple sources responsible for fluctuations in cognition.
Characteristics of highly variable individuals
Theorizing about the relationship between basic cognitive processes (e.g., processing speed) and higher order constructs such as working memory capacity or reasoning ability has typically been cast in terms of the worst performance rule (Coyle, 2003). Specifically, the correlation between e.g., intelligence is higher for the slowest response times compared to the fastest RTs. For example, Schmiedek et al. (2007) showed that RT variability (indexed by the tau parameter of the ex-Gaussian, (see also Tse et al. (2010)) in choice RT tasks was strongly correlated with working memory capacity and reasoning ability. It is possible that this relationship can be accommodated by lapses in controlled attention. That is, during a task, if a participant gets momentarily distracted, they will have a few, relatively slow responses. Individuals who are high in working memory capacity or reasoning are simply less susceptible to these attentional lapses. However, it is unclear whether these findings will apply to variability in daily cognition, as lapses would need to persist over a greater time period.
Age presents another important characteristic that dictates the amount of observed cognitive variability. Interestingly, at the daily level, older adults exhibit reduced variability relative to younger adults across a number of tasks of processing speed, working memory, episodic memory (Schmiedek et al., 2013), or attentional control (Aschenbrenner & Jackson, 2023). Explanations for this pattern range from lower susceptibility to mind wandering to changes in motivation or conscientiousness. Regardless of the precise mechanism, these age differences could also translate to differences in reliability, structure, or correlates of cognitive variability. Hence, the tasks chosen to evaluate daily cognitive variability as a function of predictors of interest may very well need to be different depending on the age range of the population of interest.
The present study
This study was developed to answer multiple critical questions regarding the theoretical underpinnings of daily cognitive variability, including is cognitive variability reliable in the classic psychometric sense? Is an individual who is highly variable one week also variable the next? Do different tasks and different domains possess similar reliability (hereafter referred to as “consistency” to avoid overlap with the definition of reliability as measuring signal to noise ratios)? Does variability in different tasks cluster into distinct domains as we would expect based on patterns of mean performance? By answering these questions, we can provide guidance to researchers regarding the types of tasks that are most suitable for high-frequency assessment. To this end, we will compare variability on 9 separate tasks that cluster into one of three cognitive domains. If variability is task specific, extreme care will need to be taken when selecting measures for a study of cognition in daily life. However, if tasks within a domain are relatively interchangeable, more flexibility is available regarding which tasks are used in a given study. This is important as researchers are typically quite limited in the type and number of tasks that can be administered in a high-frequency assessment paradigm. In addition to the implications for study design (e.g., which tasks to select and which domains), the aforementioned questions will also provide insight into the theoretical nature of variability, i.e., what does it mean to have variability in cognitive scores?
Method
Transparency and Openness:
The data used here are part of a previously collected research project (see below). Access to the data is governed by a data request process (Brose et al., 2019) and hence we are unable to post the data to an established repository. We have posted our analytical code (using R) to the Open Science Framework (https://osf.io/s5w4y/?view_only=54819edc8663445ab6e9164922923e5a) and have also included all analysis scripts as supplementary material. Sample size, experimental manipulations, and the measures collected were determined by the investigators of the original study. The current analyses were not preregistered. The Institutional Review Board at Washington University in St. Louis determined that this study did not constitute human subjects research as the data were completely deidentified and the authors have no way of linking the data back to identifiable information.
COGITO Study:
The COGITO study was designed as a large-scale, repeated cognitive testing study to evaluate within-person structures of cognitive ability in a high-powered fashion (Schmiedek et al., 2020), although an number of additional questions have also been addressed including near and far transfer of cognitive training (Schmiedek, Lovden, et al., 2010), and issues surrounding the variability and psychometric structure of the tasks (Brose et al., 2012, 2014; Ghisletta et al., 2018; Schmiedek et al., 2009, 2013, 2014). In COGITO, 101 younger adults and 103 older adults (with equal distribution of males and females) completed 100 daily sessions of 9 cognitive tests. A number of additional measures were given at a pre and post-test as well as some self-reported questionnaires, however only the daily cognitive data are examined in the present report.
Daily Tasks:
The daily tasks can be thought of as a 3×3 design in which 3 cognitive domains were assessed (working memory, episodic memory, processing speed) with 3 different types of stimuli (verbal, numerical, figural / spatial). The design of each test is discussed briefly below, and details can be found in associated COGITO publications (e.g., Schmiedek et al., 2010). This set-up allows us to determine if daily variability is similar across different domains and across types of stimuli within a single domain.
Processing Speed-Numerical (PS-N):
Participants were shown two strings of 5 digits and had to determine as quickly as possible whether the stimuli were the same or different. Two blocks of 40 trials each were administered and the primary outcome used for the variability metrics (described below) was the mean response time to correct trials.
Processing Speed- Verbal (PS-V):
This task was identical to the numerical task except participants compared two strings of 5 consonants.
Processing Speed- Figural / Spatial (PS-F):
This task was identical to the numerical task except participants compared two “fribbles”, which are three-dimensional colored objects consisting of multiple connected parts.
Episodic Memory – Numerical (EM-N):
Participants studied a list of 12 number-noun word pairs. The nouns were low frequency, and the numbers were 2 digits. After the entire list was presented, the nouns were shown again in a random order and the participant had to enter the associated numbers. Two lists were shown each day. Primary outcome was the proportion of correct responses averaged across both lists.
Episodic Memory- Verbal (EM-V):
Lists of 36 nouns were presented at an individually determined presentation rate. Words were balanced on many important characteristics, such as frequency, and ranged from 4 to 9 letters. After the entire list was shown, participants recalled the items by typing the first three letters of each word. Two lists were shown each day. The primary outcome was the proportion of correct responses weighted by the order produced, averaged across both lists.
Episodic Memory – Figural / Spatial (EM-F):
Participants were shown 12 colored pictures of real-world objects placed in a 6×6 grid. Once all items were shown, participants “placed” the objects back in the correct squares using the mouse. Two blocks were included each day, and the primary outcome was the proportion of correct responses averaged across both lists.
Working Memory – Numerical (WM-N):
Four single digits were presented horizontally in four squares on the computer screen. The digits then disappeared and a series of 8 “updating” operations (e.g., + 3, −4) were displayed in a row below the original digits. Participants had to mentally perform each operation and update the associated digit in memory. After 8 operations were performed, the four final digits had to be re-entered on the screen. Eight blocks of trials were administered each day. The primary outcome was the proportion of correct responses across all blocks.
Working Memory – Verbal (WM-V):
A series of 10 letters were sequentially presented on the screen with a digit below the letter. Participants were to determine if the number matched the correct position in the alphabetic sequence that was presented thus far. For example, K-1 indicates K is the first letter in the sequence. Then B-2 would be false as the alphabetic sequence is B-K and B is the first letter not the second. Eight blocks were administered each day, and the primary outcome was the proportion of correct responses across all blocks.
Working Memory – Figural / Spatial (WM-F):
A sequence of dots was presented in a 4 by 4 grid. For each dot that was shown, participants had to indicate whether that dot was in the same position as the dot shown three trials previously. A total of 39 dots were presented in each block,and four blocks were given daily. The primary outcome was the proportion of correct responses across all blocks.
Additional Measures:
In additional to the daily tasks, we examined performance on the Berlin Intelligence Structure (BIS) test (Jäger et al., 1997) as a measure of fluid intelligence. Nine tasks from the BIS were selected (e.g., solving number series, verbal analogies, finding the odd one out etc.) and scores on each test were standardized within age group and averaged to from the fluid intelligence composite. Finally, we generated a composite measure of working memory capacity formed from the equally weighted average of three complex span tasks, rotation span, counting span, and reading span (Schmiedek, Lovden, et al., 2010).
Statistical Analysis:
All data processing and analysis was conducted in R version 4.1.2, and MELSM models were fit using brms (Bürkner, 2018). MELSMs are extensions of standard multilevel models that allow the assumption of homogeneity of variance among individuals to be relaxed, allowing for individual differences in magnitude of cognitive variability. MELSMs model differences in their locations (e.g., mu, cognitive levels) and scale (e.g., sigma, variability of levels) simultaneously. For each of the location and scale models, measurements across all observations (i, Level 1) were nested within individuals (j, Level 2) and covariates were included in the scale model when appropriate for a specific research question (e.g., to assess associations with fluid intelligence) as illustrated below. Given that standard deviations cannot be below zero, MELSMs fit the scale coefficients on the log scale and then exponentiates them to guarantee positive values.
We organize the results around three empirical questions. First, what is the test-retest consistency of the sigma parameter from MELSM and does it vary as a function of task or domain? To address this question, we split the COGITO dataset into four quarters (25 cognitive sessions in each quarter) and fit a multivariate MELSM with each quarter of the task included as a dependent variable. This allows us to look at the correlation between the sigma effects ( for each portion of the task. We focus on the correlations between the first and second quarter, the first and final quarter, and between the third and final quarter. Our rationale is that comparing the first and second quarter captures the test-retest consistency across what many would consider a reasonable number of assessments (i.e., ~25). The comparison between the first and final quarter would help capture how much variability might change with practice or fatigue. Comparing the third and fourth quarters provides an assessment of test-retest consistency after learning and fatigue (presumably) have occurred. This was repeated for each task in the study. Our second question was whether variability in each task correlated with either intelligence or working memory capacity as predicted by extant literature. We fit univariate MELSMs, one for each cognitive task, and included a fixed effect for either fluid intelligence composite scores or the working memory span composite score. Our final question was whether variability in tasks within a domain would be correlated with one another. We fit three sets of multivariate MELSM models to address this question. The first model included all 9 cognitive tasks as dependent variables but set all the random effect correlations to zero (the “uncorrelated” model). The second model relaxed the constraints and freely estimated the correlations among the three tasks within each domain but restricted the cross-domain correlations to be zero (the “domains” model). The final model allowed all 9 tasks to freely correlate with each other (the “fully correlated” model). Model fit was compared using leave one out (loo) cross-validation. Model fit can be compared via the difference in loo criterion, expected log predictive density (elpd), which is scaled such that a larger value indicates better fit. Differences in the elpd can be compared to determine if one model is a better fit to the data than another. We consider an elpd difference to be “significant” if the interval formed from estimate ± 1.96*standard error excludes zero. Participant’s data were centered around their own mean to negate differences in mean performance and focus solely on daily deviations. Given the aforementioned age differences in variability, all analyses were conducted separately within age group.
Results
The sample consisted of 101 younger adults and 103 older adults, with equal representation of males and females. Other characteristics of the sample are provided in Table 1, and further details can be found in the original COGITO publications (Schmiedek, Lovden, et al., 2010).
Table 1:
Available demographics for the sample.
| Younger | Older | |
|---|---|---|
| N | 101 | 103 |
| Age | 25.1 (2.7) | 70.8 (4.2) |
| % Female | 51% | 50% |
| BIS | 0.00 (0.65) | 0.00 (0.60) |
| Working Memory Composite | 0.85 (0.10) | 0.70 (0.11) |
Note: The working memory composite was the average proportion correct on 3 complex span tasks. BIS = Berlin Intelligence Structure Test, a z-score composite of 9 fluid intelligence tests. Z-scores were formed separately for each group hence both means are near zero. Raw BIS scores were not available.
Variations in cognitive variability across tasks
Our first step was to simply describe the amount of between and within-person variation that was observed in these tasks. Figure 1 plots the raw variability estimates for three of the nine cognitive tasks (the remainder are presented in the supplementary materials) for the entire cohort. These descriptive plots show that there are clearly substantial individual differences in the overall magnitude of variability across different cognitive tasks. Tables 2 and 3 provides additional descriptive statistics regarding variability in each task for the younger and older adults respectively, including the intraclass correlations (ICC) and sigma values, which were back-transformed to be interpretable on the original task metric. It is important to note that the WM and EM tasks are on a comparable scale (proportion accuracy), but the PS tasks are in seconds and hence are not directly comparable with the other domains.
Figure 1:

Individual variability estimates for the verbal task from each domain. Participants are ordered by magnitude on the y-axis. Black dot indicates individual estimate while grey lines represent individual 95% CI.
Table 2:
Task Intraclass correlations (ICC), sigma intercepts, split-quarter test-retest consistency, and relationships with intelligence and working memory span (95% credible intervals) in the younger adult group for each task.
| Task | ICC | Sigma | Sigma / Mean | Split Quarter Consistency | BIS | Working Memory | ||
|---|---|---|---|---|---|---|---|---|
| Q1vQ2 | Q1vQ4 | Q3vQ4 | ||||||
| PS-V | 0.66 | 0.17 (0.16:0.19) | 0.13 | 0.63 (0.47:0.76) | 0.56 (0.39:0.71) | 0.77 (0.66:0.86) | −0.28 (−0.41:−0.16) | −0.90 (−1.77:−0.06) |
| PS-N | 0.64 | 0.13 (0.12:0.15) | 0.11 | 0.70 (0.56:0.80) | 0.62 (0.47:0.74) | 0.89 (0.83:0.94) | −0.30 (−0.44:−0.15) | −1.20 (−2.19:−0.21) |
| PS-F | 0.64 | 0.22 (0.20:0.23) | 0.16 | 0.36 (0.13:0.55) | 0.34 (0.11:0.53) | 0.80 (0.70:0.89) | −0.14 (−0.24:0.04) | −0.14 (−0.80:0.53) |
| WM-V | 0.65 | 0.08 (0.07:0.08) | 0.13 | 0.07 (−0.28:0.43) | 0.05 (−0.27:0.36) | 0.78 (0.55:0.95) | 0.04 (−0.03:0.11) | 0.32 (−0.11:0.76) |
| WM-N | 0.55 | 0.10 (0.09:0.11) | 0.15 | 0.42 (0.14:0.66) | 0.27 (−0.01:0.52) | 0.81 (0.63:0.94) | −0.04 (−0.12:0.05) | 0.22 (−0.31:0.75) |
| WM-F | 0.62 | 0.06 (0.05:0.06) | 0.06 | 0.50 (0.32:0.66) | 0.30 (0.09:0.49) | 0.83 (0.74:0.90) | −0.09 (−0.24:0.07) | −1.10 (−2.09:−0.13) |
| EM-V | 0.73 | 0.12 (0.10:0.13) | 0.25 | 0.69 (0.53:0.81) | 0.35 (0.13:0.54) | 0.69 (0.55:0.81) | 0.07 (−0.05:0.19) | 0.57 (−0.17:1.32) |
| EM-N | 0.67 | 0.14 (0.13:0.14) | 0.28 | 0.72 (0.52:0.87) | 0.41 (0.19:0.61) | 0.82 (0.68:0.92) | 0.04 (−0.05:0.12) | 0.42 (−0.11:0.96) |
| EM-F | 0.59 | 0.14 (0.14:0.15) | 0.29 | 0.60 (0.35:0.81) | 0.32 (0.06:0.55) | 0.73 (0.56:0.86) | 0.07 (−0.00:0.15) | 0.41 (−0.09:0.89) |
• Note: The “sigma / mean” column represents the average sigma value divided by the group mean to express the magnitude of variability as a proportion of the grand mean.
Table 3:
Task Intraclass correlations (ICC), sigma intercepts, split-quarter test-retest consistency, and relationships with intelligence and working memory span (95% credible intervals) in the older adult group for each task.
| Task | ICC | Sigma | Sigma / Mean | Split Quarter Consistency | BIS | Working Memory | ||
|---|---|---|---|---|---|---|---|---|
| Q1vQ2 | Q1vQ4 | Q3vQ4 | ||||||
| PS-V | 0.83 | 0.13 (0.12:0.14) | 0.06 | 0.36 (0.10:0.59) | 0.39 (0.07:0.67) | 0.78 (0.52:0.96) | −0.05 (−0.13:0.03) | −0.18 (−0.61:0.23) |
| PS-N | 0.87 | 0.11 (0.10:0.12) | 0.06 | 0.73 (0.54:0.88) | 0.49 (0.26:0.69) | 0.85 (0.70:0.96) | −0.09 (−0.17:0.00) | 0.01 (−0.47:0.49) |
| PS-F | 0.72 | 0.20 (0.19:0.22) | 0.09 | 0.36 (0.09:0.59) | 0.46 (0.21:0.68) | 0.76 (0.56:0.90) | −0.00 (−0.11:0.11) | −0.26 (−0.80:0.30) |
| WM-V | 0.60 | 0.06 (0.06:0.07) | 0.12 | 0.55 (0.23:0.80) | 0.10 (−0.31:0.51) | 0.81 (0.49:0.98) | 0.12 (0.05:0.18) | 0.19 (−0.17:0.55) |
| WM-N | 0.52 | 0.10 (0.09:0.11) | 0.16 | 0.53 (0.30:0.72) | 0.15 (−0.12:0.40) | 0.78 (0.60:0.92) | −0.07 (−0.19:0.06) | −0.37 (−1.02:0.29) |
| WM-F | 0.71 | 0.06 (0.06:0.06) | 0.08 | −0.03 (−0.26:0.20) | 0.01 (−0.22:0.24) | 0.63 (0.45:0.77) | −0.08 (−0.19:0.03) | −0.33 (−0.93:0.30) |
| EM-V | 0.78 | 0.05 (0.05:0.06) | 0.24 | 0.74 (0.59:0.85) | 0.58 (0.40:0.74) | 0.82 (0.70:0.92) | 0.11 (−0.03:0.24) | 0.66 (−0.02:1.35) |
| EM-N | 0.56 | 0.09 (0.09:0.10) | 0.35 | 0.48 (0.24:0.68) | 0.34 (0.09:0.57) | 0.60 (0.38:0.79) | 0.06 (−0.03:0.16) | −0.21 (−0.73:0.30) |
| EM-F | 0.59 | 0.09 (0.08:0.09) | 0.30 | 0.85 (0.70:0.96) | 0.66 (0.44:0.83) | 0.88 (0.73:0.98) | 0.10 (0.01:0.19) | 0.35 (−0.13:0.82) |
Younger adults:
There are several important results to note. First, the ICCs indicate the majority of the variance in each task resides at the between-person level, but a substantial amount of within-person variability is also present. The variability (i.e., the sigma parameter) across 100 days of testing in all nine tasks was significant, as shown in the second column of the table. Second, the processing speed tasks and the working memory tasks exhibited roughly equal amounts of variability, approximately 10–15% of the mean (column 3 of Table 2). Variability on the episodic memory tasks was noticeably larger with average fluctuations being 20–30% the size of the overall mean. Within a given domain, however, each task (i.e., type of stimuli) showed roughly similar sized variance, with the sole exception being spatial working memory for which the magnitude of variability was approximately half that of the other working memory tests.
Older adults:
Results for the older adults were similar in nature as the younger adults (see Table 3). Specifically, ICCs indicate the majority of variability resides at the between person level. Overall variability (Column 3) showed a clearly ordering of magnitude across domains. That is, variability was smallest on tests of processing speed (5–10% of the mean), followed by working memory (10–15% of the mean), then by episodic memory (25–35% of the mean). While variability was similar in magnitude across stimuli within a domain, there were trends for higher variability in numerical episodic memory test and lowest in verbal episodic memory.
Split-Quarter Test-Retest Consistency
The results from the split-quarter test-retest consistency analysis are presented in the fourth through the sixth column of Tables 2 and 3. A few points are particularly salient.
Younger adults:
First, when considering the first two quarters, test-retest consistency ranges from relatively poor (for the working memory tasks) to relatively modest (for the processing speed and episodic memory tasks). There were some differences across the stimuli types within the domains, however. For example, spatial processing speed had surprisingly poor test-retest consistency (r = 0.34) as did verbal working memory (r = 0.07), while all tasks of episodic memory exhibited similar test-retest consistency. When considering the first and final quarters, the processing speed test-retest consistency remain relatively unchanged, whereas test-retest consistency for working memory and episodic memory both dropped considerably. Finally, when considering the third and fourth quarters, test-retest consistency improved dramatically to a level that is quite typical of test-retest consistency for cognitive measures.
Older Adults:
Considering the first two quarters, test-retest consistency estimates varied dramatically across tasks and stimulus types. Specifically, test-retest consistency for two of the processing speed tests (verbal and spatial) were rather poor whereas for the numerical task test-retest consistency was quite good (r = 0.73). Working memory test-retest consistency was modest at best with noticeably poor performance for the spatial working memory task. Finally, episodic memory test-retest consistency was quite good for verbal and spatial tasks but modest at best for numerical memory. For the first and fourth quarters, test-retest consistency was poor for all domains but uniformly good when considering only the third and the fourth quarters. Again, stimuli specific deviations are noted. For example, numerical episodic memory was poor relative to the other two tasks and spatial working memory underperformed relative to the other two working memory tasks.
Correlations with fluid intelligence and working memory
The relationship between cognitive variability and BIS performance is shown in the seventh column of Table 2 and Table 3.
Younger adults:
Variability in two of the processing speed tasks (verbal and numerical) were significantly related to fluid intelligence in line with prior studies (Schmiedek et al., 2007). The same tasks were also associated with working memory capacity, which is perhaps not surprising given the well-known relationship between working memory and fluid intelligence (Conway & Kovacs, 2013). The only other significant relationship was between variability on the spatial working memory task and total working memory capacity.
Older adults:
The pattern of correlations in the older adults was somewhat different to the younger adults in that there were no consistent associations between cognitive variability and either fluid intelligence or working memory span. Although there was a significant association between verbal working memory and spatial episodic memory with the BIS scale, these effects were small and uninterpretable in light of the other, non-significant comparisons.
Combined Analyses:
To ensure that any non-significant findings in these correlational analyses are not due to sample size, we combined the younger and older adults and re-ran the models comparing variability to fluid intelligence and working memory span. Details are provided in the supplemental materials, but the combined analyses tell a similar story as the within group analysis. Specifically, variability across all three processing speed tests was negatively associated with fluid intelligence and all three episodic memory tests were positively correlated with working memory capacity.
Correlations within a domain
Table 4 presents the intercorrelations among the sigma parameters of all nine of the cognitive tasks.
Table 4:
Sigma correlations across tasks within and across domains. The correlation coefficient is presented in the bottom triangle and the corresponding 95% credible interval in the top. Processing speed tasks are highlighted in yellow, working memory tasks in orange, episodic memory tasks in green.
| Younger Adults | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| PS-V | PS-N | PS-F | WM-V | WM-N | WM-F | EM-V | EM-N | EM-F | |
| PS-V | 1 | 0.88:0.96 | 0.44:0.72 | −0.10:0.31 | −0.20:0.20 | 0.06:0.44 | −0.25:0.14 | −0.07:0.33 | −0.42:−0.02 |
| PS-N | 0.92* | 1 | 0.28:0.61 | −0.12:0.29 | −0.19:0.22 | 0.12:0.49 | −0.22:0.18 | −0.14:0.26 | −0.41:−0.01 |
| PS-F | 0.59* | 0.46* | 1 | −0.00:0.41 | −0.21:0.21 | −0.07:0.33 | −0.32:0.08 | −0.19:0.22 | −0.28:0.14 |
| WM-V | 0.11 | 0.08 | 0.21 | 1 | −0.18:0.26 | −0.01:0.40 | 0.10:0.49 | 0.02:0.44 | −0.02:0.40 |
| WM-N | −0.00 | 0.01 | 0.00 | 0.04 | 1 | −0.06:0.35 | −0.30:0.12 | −0.24:0.19 | −0.10:0.33 |
| WM-F | 0.26* | 0.32* | 0.13 | 0.20 | 0.15 | 1 | −0.22:0.19 | −0.01:0.39 | −0.20:0.22 |
| EM-V | −0.06 | −0.02 | −0.12 | 0.30* | −0.09 | −0.02 | 1 | 0.08:0.32 | 0.10:0.49 |
| EM-N | 0.13 | 0.07 | 0.02 | 0.24* | −0.02 | 0.19 | 0.50* | 1 | 0.02:0.42 |
| EM-F | −0.23* | −0.21* | −0.07 | 0.19 | 0.11 | 0.01 | 0.30* | 0.23* | 1 |
| Older Adults | |||||||||
| PS-V | 1 | 0.39:0.71 | 0.28:0.63 | −0.26:0.18 | −0.08:0.34 | −0.15:0.27 | −0.16:0.27 | −0.18:0.24 | −0.20:0.23 |
| PS-N | 0.56* | 1 | 0.14:0.53 | −0.23:0.21 | −0.13:0.29 | −0.14:0.29 | −0.27:0.15 | −0.20:0.24 | −0.39:0.03 |
| PS-F | 0.47* | 0.35* | 1 | −0.22:0.23 | −0.34:0.08 | −0.19:0.23 | −0.34:0.07 | −0.20:0.23 | −0.30:0.13 |
| WM-V | −0.04 | −0.01 | 0.00 | 1 | −0.25:0.20 | −0.41:0.03 | −0.13:0.30 | −0.18:0.27 | −0.05:0.38 |
| WM-N | 0.14 | 0.09 | −0.14 | −0.03 | 1 | −0.19:0.24 | −0.19:0.23 | 0.02:0.43 | −0.08:0.34 |
| WM-F | 0.06 | 0.08 | 0.02 | −0.20 | 0.03 | 1 | −0.16:0.26 | −0.22:0.21 | −0.12:0.31 |
| EM-V | 0.05 | −0.07 | −0.14 | 0.09 | 0.02 | 0.05 | 1 | 0.17:0.55 | 0.22:0.59 |
| EM-N | 0.03 | 0.02 | 0.02 | 0.05 | 0.23* | −0.01 | 0.37* | 1 | −0.13:0.30 |
| EM-F | 0.01 | −0.19 | −0.09 | 0.17 | 0.13 | 0.10 | 0.42* | 0.09 | 1 |
Note: PS = processing speed, WM = working memory, EM = episodic memory, V = verbal, N = Numerical, F = Figural. An asterisk next to a correlation coefficient denotes statistical “significance” defined as when the 95% credible interval excluded zero.
Younger adults:
Considering first the correlations within a cognitive domain (shaded in different colors to aid in visual representation), both the processing speed and the episodic memory tasks were moderately correlated with one another suggesting that a person high in variability in one speeded task or memory task will be similarly variable in another. Surprisingly, the same pattern did not hold for the working memory tasks as there was virtually no correlation between any of the sigmas for that domain. Correlations across different domains were modest and rarely exceeded 0.2 in magnitude.
The elpd_loo of the “fully correlated model” was 55191.1 (SE = 436.6), the “uncorrelated” model was 55186.7 (SE = 436.0) and for the “domains” model the elpd_loo was 55203.1 (SE = 435.2). Although these numbers imply that the domains model best fits the data, it is more informative to examine the relative difference in fit between the models. Clearly, the uncorrelated model fits much more poorly than the domains model (difference in elpd = −16.6, SE = 7.7). As the difference exceeds 2 standard errors, we consider this significantly poorer model fit. In contrast, the fully correlated model, fit also relatively well and the difference between the fully correlated models and the domains models was not significant (difference in elpd = −12.2, SE = 6.2).
Older adults:
Results for the older adults were quite similar to the younger adults. Specifically, modest intercorrelations were observed for both the processing speed and episodic memory tests with virtually zero correlation among variability in the working memory tasks. Again, there were few significant correlations across different domains.
These observations were confirmed by comparison of the elpd_loo. Specifically, for the fully correlated model, the elpd_loo = 72790 (SE = 345.2), the uncorrelated model, elpd_loo = 72785.1 (SE = 345.7), and the domains model, elpd_loo = 72794.4 (SE = 345.0). As before, the difference between the domains model and the uncorrelated model was significant (difference = −9.4, SE = 4.6) however the difference between the domains and correlated models was not (difference = −4.4, SE = 3.1).
Discussion
It is increasingly recognized that variability in cognitive performance provides additional information about an individual’s functioning compared to examination of mean performance alone. While much of the cognitive variability research has been conducted at the trial level (e.g., standard deviation of RTs across trials within a task) it is becoming common to examine performance across multiple days (Allaire & Marsiske, 2005; Aschenbrenner & Jackson, 2023; Cerino et al., 2021; Nicosia et al., 2022; Schmiedek et al., 2013). Despite the number of studies examining daily cognitive variability, our understanding of what daily fluctuations actually represent is still in its infancy. For example, a relatively straightforward question that is integral for future research planning is whether there is domain specificity to cognitive variability. Momentary lapses in attention is one often cited mechanism for cognitive variability (West et al., 2002). If this were the sole mechanism at play, one might expect relatively equal fluctuations across all tasks as individuals’ momentary distractions (e.g., checking a text message) should occur with equal frequency regardless of the particular task they are engaged in. In contrast, the cognitive construct itself may be subject to subtle variations in efficiency (i.e., “working memory” today is operating below the typical levels), outside the contributions of a general mechanism such as attention. In this scenario, we would expect variability to cohere into established domains, i.e., to be domain specific. As it currently stands, it is unclear whether cognitive off-days would generalize to all aspects of functioning or be specific to a particular domain.
Given the constraints of a typical high-frequency assessment study, a relatively small number of tests are able to be administered, and this ultimately precludes a careful examination of multiple tests and domains. In the present study, we utilized a large scale, high-frequency assessment study to provide an examination of the psychometric properties of variability in different cognitive domains. These analyses not only provide guidance to researchers on which tasks might be most appropriate for their research study but also provides insights into the underlying causes and mechanisms of daily cognitive variability.
For both younger and older adults, there was significant variability in each of the 9 cognitive tasks, which was not particularly surprising and indeed has been demonstrated in this data set before albeit with different analytical methods (Schmiedek et al., 2013). When compared against mean performance, the episodic memory tasks clearly showed the largest variability with average fluctuations 25–30% of the mean in magnitude. This pattern is intriguing in light of the argument that a high level of cognitive variability does not necessarily imply an impaired cognitive system. Allaire and Marsiske (2005), for example, have argued that performance variability may, in part, reflect the use of different strategies and learning how best to complete the tasks. This is particularly relevant for the episodic memory measures, which have documented benefits from specific encoding or retention strategies (Dunlosky et al., 2011), i.e., engaging in the method of loci or other mnemonic memory techniques. The ability to consistently deploy these encoding and retention strategies is arguably an individual difference characteristic that transfers across multiple tasks and remains stable over time. It may be less possible to explore different strategies in tests of processing speed or working memory and hence the overall variability is lower for these domains. In terms of test-retest consistency, results varied quite a bit across the time series (e.g., first and second quarters vs. the third and fourth quarters), as well as across domains, tasks and age groups. Consider first the correlations between the first and second quarters of the time series. These values reflect the test-retest consistency of variability across what many would consider near the upper limit of the number of cognitive assessments that can feasibly be collected without undue burden on participants. For younger adults, the test-retest consistency of the working memory tasks was clearly the lowest and never exceeded 0.50, whereas variability in processing speed and episodic memory were more moderate (rs = 0.6–0.7). The older adults showed a very similar pattern with the exception of processing speed, for which the test-retest consistency as surprisingly poor. In contrast, when considering the third and fourth quarters test-retest consistency, the correlations were very high (r ~ 0.75–0.8) for both age groups. This implies that once sufficient practice has been obtained, day-to-day fluctuations that may be due to psychological processes (e.g., momentary variations in attention or affect) can be reliably detected in all tasks. Along the same lines, it was interesting to observe that variability in the first quarter was, at best, moderately correlated with variability in the final quarter which in turn suggests variability early in the task may reflect learning and strategy use whereas later in the time series reflects momentary changes. Thus, if empirical interest is on actual fluctuations in cognition (as we believe most of the time it is) as opposed to learning or strategy use, time series of 50 assessments or more may be required.
For the younger adults, there was some evidence that variability in processing speed was associated with fluid intelligence and working memory. We did not find as robust of an association in older adults nor was their consistent evidence that the memory measures wer associated with either outcome. This was surprising as Ram et al. (2005) examined variability in memory search tasks across 36 weeks and showed a negative correlation between variability and fluid intelligence. Although there are differences in the tasks and analytical methods used across the studies cited above that may be producing differential findings across studies, we think a more likely explanation is simply power. It is becoming increasingly recognized that detecting individual differences in variability will require substantially larger sample sizes than may be typical for studies of psychology and aging (Walters et al., 2018; Wright et al., 2024). For example, in the case of small effect sizes (as can be expected in the majority of cognitive variability studies), a sample size of 100 does not provide sufficient power to detect associations between variability and a person level predictor (Walters et al. 2018). To this end, we also ran analyses on the combined cohort (yielding a total sample size of 203). In this analysis we found significant and consistent associations between variability in processing speed tests and differences in fluid intelligence, as well as between variability in episodic memory and total working memory capacity. The combined cohort findings are consistent with the extant literature although the domain specificity of the effects are intriguing in light of the argument that variability may reflect a combination of adaptative (e.g., strategy use) and maladaptive (loss of attention) processes. One could argue that the positive association between episodic memory variability and working memory span reflects strategy use. That is, individuals who are higher in working memory capacity are better able to try, evaluate, and alter their memory strategies as they proceed through the study. To the extent that processing speed tasks and working memory tasks are less amenable to strategy use, no such association would then be expected and indeed we found none. In contrast, a significant negative association between processing speed variability and fluid intelligence would reflect maladaptive processes. Individuals who have lower overall capacity may be more susceptible to minor deviations in attention that then produce a few abnormally low scores. These effects are specific to processing speed as reaction time may simply be a more sensitive indicator of momentary loss of attention compared to accuracy on a memory test.
Perhaps most informative is the analyses correlating variability across tasks and domains. Specifically, for both age groups, the sigma estimates clearly clustered into domains of processing speed and episodic memory, although the estimates for the episodic memory tasks were surprisingly small. The working memory tasks, on the other hand, did not correlate well with each other, suggesting daily variability on these measures is largely task specific. In other words, one must talk about variability on e.g., the n-back task specifically, as opposed to variability in working memory per se. At the same time, it is important to note that the working memory tasks exhibited substantial variability from day to day, as reflected by the significant sigma parameters for each task, and it is possible that the locus of this variability is driven more at the trial level. That is, it is well established that working memory tasks depend heavily on controlled attention (Engle, 2018; McCabe et al., 2010). Momentary lapses of attention, i.e., mind wandering, may impair performance on the task at hand (Aschenbrenner, Welhaf, et al., 2024; Unsworth & McMillan, 2013), but may not transfer to the next task, hence precluding high correlations between working memory tasks. Across domain correlations (e.g., between processing speed and episodic memory tasks) were low, similar to prior findings (Allaire & Marsiske, 2005; Judd et al., 2024), again underscoring the domain specificity of cognitive variability.
In addition to these theoretical advances, the analyses presented here provide some practical findings for future high-frequency assessment studies. First, with regards to the ICCs, much of the variability in each task was at the between-person level, particularly for processing speed tasks (ICCs > 0.6). While this may be useful to researchers who seek to find group differences in variability (Aschenbrenner, Hassenstab, et al., 2024; Cerino et al., 2021), it may pose problems for finding within-person associations with sleep or stress, for example. As noted, sample sizes and number of daily assessments may need to be quite large in order to reliably detect associations of interest. Second, within a domain, tasks of processing speed and tasks of episodic memory are largely equal. Specifically, within the domain of processing speed or episodic memory, the individual tasks were moderately to highly correlated, showed similar magnitudes of variability and showed consistent associations, or lack thereof, with fluid intelligence and working memory capacity. In contrast, tasks of working memory were very task specific. Although there was significant variability across days in each task, this variability did not correlate highly with other working memory tasks and even showed opposite going influences with fluid intelligence.
Such task specificity could be accommodated by extant working memory models that posit different storage for different types of material (Baddeley & Hitch, 1974). Under this scenario, variation in the ability to store verbal material would have no influence on variation in the ability to store spatial information. Such an account, however, would not accommodate the modest (at best) split quarter test-retest consistency for the working memory domain at least over shorter time scales (e.g., the first 25 and second 25 sessions). It is likely that working memory tasks are dominated by momentary distractions or lapses of attention that are particularly detrimental to working memory tasks. For example, in the alpha span task in the present study, a momentary distraction that results in the loss of even a single letter could have enormous implications for the final score as subsequently presented letters may be scored erroneously. Regardless, these results indicate that extreme care is warranted when selecting tasks for a daily study of working memory. For example, given the low correlation, it is highly likely that if a within person-predictor of variability (e.g., sleep) was used to predict daily working memory, only one of these tasks would identify an association, assuming there is one to find. As each of these tasks look similar in psychometric properties, it is impossible to choose a priori which of the three working memory tasks to select.
We have focused primarily on the notion that early performance variability may reflect engagement of strategies that are aimed to optimize performance, and the deployment of such strategies may be most relevant for tasks of episodic memory ability. Such an interpretation has support from prior studies that have examined specific tasks from COGITO. For example, Hertzog et al. (2017) examined performance on the same numerical episodic memory task and showed not only were there significant individual differences in effective strategy use (e.g., formation of stories or creating vivid images), but the use of an effective strategy on any particular day was significantly coupled with better performance on that day. Similarly, Ghisletta et al., (2018) examined speed accuracy trade-off functions in the figural comparison task and showed that younger adults were more willing to adjust their speed accuracy function (another form of a strategy) than older adults, which may explain why variability correlations were slightly higher in our analyses for younger adults than for older. Shing et al., (2012) identified a specific adaptive strategy in the numerical working memory task that was persistently used by older adults in the COGITO sample. Finally, Noack et al., (2013) showed that older adults may specifically focus on specific aspects of the stimuli in the figural episodic memory task over others in order to maintain accuracy in the task. Although these studies all identify different strategies that may be employed to maintain performance, none examined strategy use across the tasks in the same analysis. Nevertheless, it is fruitful to consider the extent to which similar strategies may be employed across different types of stimuli within that domain.
The most straightforward example is the speed accuracy tradeoff. Being able to adjust the tradeoff in a figural comparison task should translate well to a numerical or verbal comparison task. Such a supposition is supported by our finding that variability in the processing speed tests exhibited the highest correlations. The next highest correlations occurred in the episodic memory domain. While there are well-known strategies to enhance memory performance (e.g., forming verbal associations) these tend to be easier to apply to verbal or numerical stimuli and may be less applicable to picture stimuli, leading to only modest intercorrelations among these tasks. Finally, the strategy for working memory identified by Shing et al. was very specific to the numerical task and is unlikely to apply to the other two types of stimuli, hence producing very minimal correlations. Again, this underscores the notion that variability is not all created equal, certainly not in the working memory domain, and extreme care will need to be exercised during task selection.
This study has many strengths including the use of a very large (N > 200, 100 repeated assessments) high frequency assessment study. It is worth noting that COGITO collected 100 daily measurements of each cognitive task, which is important in light of recent findings that indicate potentially hundreds of assessments are needed to detect within-person effects in these studies (Wright et al., 2024). Nevertheless, some limitations on generalizability of this study are worth mentioning. First, this study was conducted in a laboratory setting and hence results from a truly ambulatory assessment, where distractions are likely to be more frequent, may be different. Second, the sample was highly educated (Schmiedek, Lovden, et al., 2010) and participants had the flexibility in daily schedules to commit to a study of this scope and duration. With the ubiquity of smartphones to assess participants remotely, it is now more feasible to collect a more representative research sample in these designs. Third, our focus in the present study was whether daily variability was systematically related across individuals. Other studies using the COGITO dataset (e.g., Schmiedek et al., 2020) have addressed questions at a different level of analysis, specifically whether variability is systematically related across tasks within an individual. Such results have shown the within-person structure of variability to be quite heterogenous across the younger adults in this sample. It is also important to consider the role of task design in the COGITO study. Specifically, presentation times for the episodic memory and working memory tasks were individually titrated based on performance in a pretest session (Schmiedek, Bauer, et al., 2010). Thus, the difficulty of the tasks was not equated across individuals, and this could influence the magnitude of cognitive variability and may make direct comparisons across tasks and age groups difficult. Future research can continue to address similar questions across all levels of analysis.
Conclusions
Cognition is not a stable construct at the daily level, with individuals routinely having relatively better or worse days. Moreover, cognitive variability is not a unitary construct, such that variability in different tasks is highly correlated, but rather show a large degree of domain specificity and in the case of working memory, task specificity. These findings help advance our theoretical understanding of cognitive variability and provide practical suggestions for what domains are most amenable to study in daily life.
Supplementary Material
Public Significance Statement.
It is becoming common to assess cognitive processes in real-world interactions, as opposed to how they occur in a laboratory setting, by using a repeated assessment paradigm delivered remotely. Not all classic cognitive tests are amenable to this assessment approach. We compare 9 different cognitive tasks to evaluate which ones are most sensitive to daily influences and hence most appropriate to utilize to study cognition in daily life.
Acknowledgments
This work was supported by a grant from the National Institute on Aging (K01 AG071847) awarded to Andrew Aschenbrenner. This work has not previously been disseminated elsewhere.
References
- Allaire JC, & Marsiske M (2005). Intraindividual variability may not always indicate vulnerability in elders’ cognitive performance. Psychology and Aging, 20(3), 390–401. 10.1037/0882-7974.20.3.390 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aschenbrenner AJ, Hassenstab J, Morris JC, Cruchaga C, & Jackson JJ (2024). Relationships between hourly cognitive variability and risk of Alzheimer’s disease revealed with mixed-effects location scale models. Neuropsychology, 38(1), 69–80. 10.1037/neu0000905 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aschenbrenner AJ, & Jackson JJ (2023). High-frequency assessment of mood, personality, and cognition in healthy younger, healthy older and adults with cognitive impairment. Aging, Neuropsychology, and Cognition, 31, 914–931. 10.1080/13825585.2023.2284412 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aschenbrenner AJ, Welhaf MS, Hassenstab JJ, & Jackson JJ (2024). Antecedents of mind wandering states in healthy aging and mild cognitive impairment. Neuropsychology. 10.1037/neu0000941 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baddeley AD, & Hitch G (1974). Working memory. In Bower G(Ed.), The psychology of learning and motivation: Advances in research and theory. (Vol. 8, pp. 47–89). [Google Scholar]
- Bos EH, Jonge P, & Cox RFA (2019). Affective variability in depression: Revisiting the inertia–instability paradox. British Journal of Psychology, 110(4), 814–827. 10.1111/bjop.12372 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brose A, Driver C, Lindenberger U, Lovden M, & Schmiedek F (2019). The COGITO Study: Overview of Research Design, Past Work, and Data Access [Dataset]. https://www.mpib-berlin.mpg.de/research/research-centers/lip/formal-methods/cogito [Google Scholar]
- Brose A, Lövdén M, & Schmiedek F (2014). Daily fluctuations in positive affect positively co-vary with working memory performance. Emotion, 14(1), Article 1. 10.1037/a0035210 [DOI] [PubMed] [Google Scholar]
- Brose A, Schmiedek F, Lövdén M, & Lindenberger U (2012). Daily variability in working memory is coupled with negative affect: The role of attention and motivation. Emotion, 12(3), Article 3. 10.1037/a0024436 [DOI] [PubMed] [Google Scholar]
- Bürkner P-C (2018). Advanced Bayesian multilevel modeling with the R package brms. The R Journal, 10(1), Article 1. 10.32614/RJ-2018-017 [DOI] [Google Scholar]
- Cerino ES, Katz MJ, Wang C, Qin J, Gao Q, Hyun J, Hakun JG, Roque NA, Derby CA, Lipton RB, & Sliwinski MJ (2021). Variability in cognitive performance on mobile devices is sensitive to mild cognitive impairment: Results from the Einstein Sging study. Frontiers in Digital Health, 3, 758031. 10.3389/fdgth.2021.758031 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Conway ARA, & Kovacs K (2013). Individual differences in intelligence and working memory. In Psychology of Learning and Motivation (Vol. 58, pp. 233–270). Elsevier. 10.1016/B978-0-12-407237-4.00007-4 [DOI] [Google Scholar]
- Coyle TR (2003). IQ, the worst performance rule, and Spearman’s law: A reanalysis and extension. Intelligence, 31(5), 473–489. 10.1016/S0160-2896(02)00175-7 [DOI] [Google Scholar]
- Dunlosky J, Bailey H, & Hertzog C (2011). Memory enhancement strategies: What works best for obtaining memory goals? In Hartman-Stein PE & LaRue A (Eds.), Enhancing Cognitive Fitness in Adults (pp. 3–23). Springer; New York. 10.1007/978-1-4419-0636-6_1 [DOI] [Google Scholar]
- Engle RW (2018). Working memory and executive attention: A revisit. Perspectives on Psychological Science, 13(2), 190–193. 10.1177/1745691617720478 [DOI] [PubMed] [Google Scholar]
- Feng Y, & Hancock GR (2024). A structural equation modeling approach for modeling variability as a latent variable. Psychological Methods, 29(2), 262–286. 10.1037/met0000477 [DOI] [PubMed] [Google Scholar]
- Fuentes K, Hunter MA, Strauss E, & Hultsch DF (2001). Intraindividual variability in cognitive performance in persons with chronic fatigue syndrome. The Clinical Neuropsychologist, 15(2), 210–227. 10.1076/clin.15.2.210.1896 [DOI] [PubMed] [Google Scholar]
- Ghisletta P, Joly-Burra E, Aichele S, Lindenberger U, & Schmiedek F (2018). Age Differences in Day-To-Day Speed-Accuracy Tradeoffs: Results from the COGITO Study. Multivariate Behavioral Research, 53(6), 842–852. 10.1080/00273171.2018.1463194 [DOI] [PubMed] [Google Scholar]
- Hamaker EL, & Wichers M (2017). No time like the present: Discovering the hidden dynamics in intensive longitudinal data. Current Directions in Psychological Science, 26(1), 10–15. 10.1177/0963721416666518 [DOI] [Google Scholar]
- Hedeker D, Mermelstein RJ, & Demirtas H (2008). An application of a mixed-effects location scale model for analysis of ecological momentary assessment (EMA) data. Biometrics, 64(2), 627–634. 10.1111/j.1541-0420.2007.00924.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hertzog C, Lövdén M, Lindenberger U, & Schmiedek F (2017). Age differences in coupling of intraindividual variability in mnemonic strategies and practice-related associative recall improvements. Psychology and Aging, 32(6), 557–571. 10.1037/pag0000177 [DOI] [PubMed] [Google Scholar]
- Hultsch DF, MacDonald SW, Hunter MA, Levy-Bencheton J, & Strauss E (2000). Intraindividual variability in cognitive performance in older adults: Comparison of adults with mild dementia, adults with arthritis, and healthy adults. Neuropsychology, 14(4), Article 4. 10.1037//0894-4105.14.4.588 [DOI] [PubMed] [Google Scholar]
- Jackson JD, Balota DA, Duchek JM, & Head D (2012). White matter integrity and reaction time intraindividual variability in healthy aging and early-stage Alzheimer disease. Neuropsychologia, 50(3), Article 3. 10.1016/j.neuropsychologia.2011.11.024 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jäger AO, Süß H-M, & Beauducel A (1997). Berliner Intelligenzstruktur-Test: BIS-Test. [Google Scholar]
- Jahng S, Wood PK, & Trull TJ (2008). Analysis of affective instability in ecological momentary assessment: Indices using successive difference and group comparison via multilevel modeling. Psychological Methods, 13(4), 354–375. 10.1037/a0014173 [DOI] [PubMed] [Google Scholar]
- Judd N, Aristodemou M, Klingberg T, & Kievit R (2024). Interindividual differences in cognitive variability are ubiquitous and distinct from mean performance in a battery of eleven tasks. Journal of Cognition, 7(1), 45. 10.5334/joc.371 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koval P, Pe ML, Meers K, & Kuppens P (2013). Affect dynamics in relation to depressive symptoms: Variable, unstable or inert? Emotion, 13(6), 1132–1141. 10.1037/a0033579 [DOI] [PubMed] [Google Scholar]
- Li S-C, Lindenberger U, & Sikström S (2001). Aging cognition: From neuromodulation to representation. Trends in Cognitive Sciences, 5(11), 479–486. 10.1016/S1364-6613(00)01769-1 [DOI] [PubMed] [Google Scholar]
- MacDonald SWS, Cervenka S, Farde L, Nyberg L, & Bäckman L (2009). Extrastriatal dopamine D2 receptor binding modulates intraindividual variability in episodic recognition and executive functioning. Neuropsychologia, 47(11), 2299–2304. 10.1016/j.neuropsychologia.2009.01.016 [DOI] [PubMed] [Google Scholar]
- MacDonald SWS, Li S-C, & Bäckman L (2009). Neural underpinnings of within-person variability in cognitive functioning. Psychology and Aging, 24(4), 792–808. 10.1037/a0017798 [DOI] [PubMed] [Google Scholar]
- Madero EN, Anderson J, Bott NT, Hall A, Newton D, Fuseya N, Harrison JE, Myers JR, & Glenn JM (2021). Environmental distractions during unsupervised remote digital cognitive assessment. The Journal of Prevention of Alzheimer’s Disease, 1–4. 10.14283/jpad.2021.9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- McCabe DP, Roediger HL, McDaniel MA, Balota DA, & Hambrick DZ (2010). The relationship between working memory capacity and executive functioning: Evidence for a common executive attention construct. Neuropsychology, 24(2), Article 2. 10.1037/a0017619 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mestdagh M, Pe M, Pestman W, Verdonck S, Kuppens P, & Tuerlinckx F (2018). Sidelining the mean: The relative variability index as a generic mean-corrected variability measure for bounded variables. Psychological Methods, 23(4), 690–707. 10.1037/met0000153 [DOI] [PubMed] [Google Scholar]
- Nestler S (2020). Modelling inter‐individual differences in latent within‐person variation: The confirmatory factor level variability model. British Journal of Mathematical and Statistical Psychology, 73(3), 452–473. 10.1111/bmsp.12196 [DOI] [PubMed] [Google Scholar]
- Nicosia J, Aschenbrenner AJ, Balota DA, Sliwinski MJ, Tahan M, Adams S, Stout SS, Wilks H, Gordon BA, Benzinger TLS, Fagan AM, Xiong C, Bateman RJ, Morris JC, & Hassenstab J (2022). Unsupervised high-frequency smartphone-based cognitive assessments are reliable, valid, and feasible in older adults at risk for Alzheimer’s disease. Journal of the International Neuropsychological Society: JINS, 1–13. 10.1017/S135561772200042X [DOI] [PMC free article] [PubMed] [Google Scholar]
- Noack H, Lövdén M, Schmiedek F, & Lindenberger U (2013). Age-related differences in temporal and spatial dimensions of episodic memory performance before and after hundred days of practice. Psychology and Aging, 28(2), 467–480. 10.1037/a0031489 [DOI] [PubMed] [Google Scholar]
- Ram N, Rabbitt P, Stollery B, & Nesselroade JR (2005). Cognitive performance inconsistency: Intraindividual change and variability. Psychology and Aging, 20(4), 623–633. 10.1037/0882-7974.20.4.623 [DOI] [PubMed] [Google Scholar]
- Riegler KE, Cadden M, Guty ET, Bruce JM, & Arnett PA (2022). Perceived fatigue impact and cognitive variability in multiple sclerosis. Journal of the International Neuropsychological Society, 28(3), 281–291. 10.1017/S1355617721000230 [DOI] [PubMed] [Google Scholar]
- Rutter LA, Vahia IV, Forester BP, Ressler KJ, & Germine L (2020). Heterogeneous indicators of cognitive performance and performance variability across the lifespan. Frontiers in Aging Neuroscience, 12, 62. 10.3389/fnagi.2020.00062 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmiedek F, Bauer C, Lövdén M, Brose A, & Lindenberger U (2010). Cognitive Enrichment in Old Age: Web-Based Training Programs. GeroPsych, 23(2), 59–67. 10.1024/1662-9647/a000013 [DOI] [Google Scholar]
- Schmiedek F, Lövdén M, & Lindenberger U (2009). On the relation of mean reaction time and intraindividual reaction time variability. Psychology and Aging, 24(4), Article 4. 10.1037/a0017799 [DOI] [PubMed] [Google Scholar]
- Schmiedek F, Lovden M, & Lindenberger U (2010). Hundred days of cognitive training enhance broad cognitive abilities in adulthood: Findings from the COGITO study. Frontiers in Aging Neuroscience. 10.3389/fnagi.2010.00027 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmiedek F, Lövdén M, & Lindenberger U (2013). Keeping it steady: Older adults perform more consistently on cognitive tasks than younger adults. Psychological Science, 24(9), Article 9. 10.1177/0956797613479611 [DOI] [PubMed] [Google Scholar]
- Schmiedek F, Lövdén M, & Lindenberger U (2014). A task is a task is a task: Putting complex span, n-back, and other working memory indicators in psychometric context. Frontiers in Psychology, 5. 10.3389/fpsyg.2014.01475 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmiedek F, Lövdén M, Von Oertzen T, & Lindenberger U (2020). Within-person structures of daily cognitive performance differ from between-person structures of cognitive abilities. PeerJ, 8, e9290. 10.7717/peerj.9290 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmiedek F, Oberauer K, Wilhelm O, Süß H-M, & Wittmann WW (2007). Individual differences in components of reaction time distributions and their relations to working memory and intelligence. Journal of Experimental Psychology: General, 136(3), Article 3. 10.1037/0096-3445.136.3.414 [DOI] [PubMed] [Google Scholar]
- Shing YL, Schmiedek F, Lövdén M, & Lindenberger U (2012). Memory updating practice across 100 days in the COGITO study. Psychology and Aging, 27(2), 451–461. 10.1037/a0025568 [DOI] [PubMed] [Google Scholar]
- Sliwinski MJ, Mogle JA, Hyun J, Munoz E, Smyth JM, & Lipton RB (2018). Reliability and validity of ambulatory cognitive assessments. Assessment, 25(1), Article 1. 10.1177/1073191116643164 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sliwinski MJ, Smyth JM, Hofer SM, & Stawski RS (2006). Intraindividual coupling of daily stress and cognition. Psychology and Aging, 21(3), 545–557. 10.1037/0882-7974.21.3.545 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stawski RS, MacDonald SWS, Brewster PWH, Munoz E, Cerino ES, & Halliday DWR (2019). A comprehensive comparison of quantifications of intraindividual variability in response times: A measurement burst approach. The Journals of Gerontology: Series B, 74(3), 397–408. 10.1093/geronb/gbx115 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tse C-S, Balota DA, Yap MJ, Duchek JM, & McCabe DP (2010). Effects of healthy aging and early stage dementia of the Alzheimer’s type on components of response time distributions in three attention tasks. Neuropsychology, 24(3), Article 3. 10.1037/a0018274 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Unsworth N (2015). Consistency of attentional control as an important cognitive trait: A latent variable analysis. Intelligence, 49, 110–128. 10.1016/j.intell.2015.01.005 [DOI] [Google Scholar]
- Unsworth N, & McMillan BD (2013). Mind wandering and reading comprehension: Examining the roles of working memory capacity, interest, motivation, and topic experience. Journal of Experimental Psychology: Learning, Memory, and Cognition, 39(3), 832–842. 10.1037/a0029669 [DOI] [PubMed] [Google Scholar]
- von Neumann J, Kent RH, Bellison HR, & Hart BI (1941). The mean square successive difference. The Annals of Mathematical Statistics, 12, 153–162. [Google Scholar]
- Walters RW, Hoffman L, & Templin J (2018). The power to detect and predict individual differences in intra-individual variability using the mixed-effects location-scale model. Multivariate Behavioral Research, 53(3), Article 3. 10.1080/00273171.2018.1449628 [DOI] [PubMed] [Google Scholar]
- West R, Murphy KJ, Armilio ML, Craik FIM, & Stuss DT (2002). Lapses of intention and performance variability reveal age-related increases in fluctuations of executive control. Brain and Cognition, 49(3), Article 3. 10.1006/brcg.2001.1507 [DOI] [PubMed] [Google Scholar]
- Williams DR, Martin SR, Liu S, & Rast P (2020). Bayesian multivariate mixed-effects location scale modeling of longitudinal relations among affective traits, states, and physical activity. European Journal of Psychological Assessment, 36(6), 981–997. 10.1027/1015-5759/a000624 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wright AGC, Scharf F, & Zimmermann J (2024). Minimum Sampling Recommendations for Applied Ambulatory Assessments. 10.31234/osf.io/3tme5 [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
