Abstract
With the growing prevalence of cognitive impairment, early detection has become increasingly critical. Prior studies have examined the association between neuropsychiatric symptoms (NPS) and cognitive impairment, identifying potential predictive relationships. However, they hardly evaluated the heterogeneous relationships between serial patterns of NPS and evolving cognition status of the patients. To address this limitation, we investigate the statistical causal relationship between NPS and cognitive impairment, as well as the dynamic changes in their predictive effects over time, with a specific focus on sex differences. Our approach accounts for the fluctuating nature of NPS and varying follow-up durations across participants by implementing a bootstrap strategy that repeatedly samples a fixed number of visits per participant in a temporal order. Then, we apply causal discovery techniques and counterfactual framework–based causal inference methods to estimate the independent effects of NPS over time. Our findings highlight apathy as a key predictive symptom of cognitive impairment. Moreover, its predictive effect peaks earlier in females than in males, indicating that early-stage tracking is particularly informative in female participants. This suggests sex-specific monitoring strategies may improve early detection and intervention of cognitive impairment.
Keywords: cognitive impairment, neuropsychiatric symptom, causal discovery, causal inference
I. Introduction
Given the growing number of individuals living with mild cognitive impairment (MCI) and dementia, cognitive impairment is increasingly recognized as a global health priority [1–3]. Cognitive deficits typically worsen as the disease progresses [1, 4, 5], impacting not only the patients but also their families and care givers [1]. Therefore, cognitive impairment requires careful monitoring to prevent progression into more severe dysfunction [4, 6]. Identifying the risk or early signs of cognitive impairment before its onset or during its initial stages is critical for providing timely and effective therapeutic interventions [2].
For the early detection of cognitive impairment, a large body of literature has conducted longitudinal studies to identify various biomarkers, signs, and symptoms that may play a predictive role [4]. In particular, strong evidence has shown that neuropsychiatric symptoms (NPS), such as apathy and depression serve as significant indicators of cognitive impairments [7, 8].
Nevertheless, accurate diagnosis and prediction of cognitive impairment remain challenging, as symptoms vary dramatically between patients [4, 9]. Patient demographics and comorbidities are significant factors influencing the variation in NPS patterns across individuals [10]. In particular, patients exhibit heterogeneity in NPS patterns by sex [11]. For instance, agitation and apathy are more common in males, whereas females are more likely to experience depression and anxiety [11, 12]. These findings highlight the importance of distinguishing the heterogeneous roles of NPS between patient groups when predicting future cognitive impairment [13].
However, it is difficult to accurately assess the predictive roles of NPS and their heterogeneous effects due to the presence of other factors that are common causes of both cognitive impairment and NPS. For example, approximately 20% of patients with diabetes mellitus experience depressive symptoms, while diabetes itself increases the risk of dementia by about 47% [14]. Such common causes often introduce bias when assessing the pure effect of NPS on cognitive impairment. To isolate and estimate the independent effects of each factor, statistical causal analysis, such as regression with instrumental variables [15], has been increasingly applied to condition the effects of other covariates. Nevertheless, conventional statistical models cannot easily explain patient-level heterogeneity of the independent effects, highlighting the use of counterfactual framework-based approaches to uncover the varying effects of NPS by patients.
In addition, differences in NPS trajectories often present barriers to the accurate diagnosis of cognitive impairment, leading to delays in intervention [1]. Each NPS typically emerges at different stages of cognitive decline [13]. Wise et al. [13] found that depression and apathy commonly appear before the onset of cognitive impairment, whereas delusions and hallucinations tend to emerge after onset. In addition, NPS patterns often fluctuate over time, and both the fluctuation patterns and persistence of these fluctuations vary across patient groups [7, 16]. However, the impact of dynamic changes in NPS across time on dementia progression remains understudied. Investigating how NPS influence cognitive impairment over different follow-up periods could offer valuable insights into the optimal timing for monitoring and intervening on these symptoms.
To address these gaps, this study provides statistical causality-based explanations on the predictive role of NPS in cognitive impairment and their dynamic effects over the follow-up period, with a focus on sex differences. Specifically, we (1) explore heterogeneous patterns of NPS occurrences by sex, (2) identify the causal relationships between NPS and cognitive impairment within each patient group using a causal discovery method, and (3) compare the dynamic predictive effects of NPS across various follow-up periods using a counterfactual framework-based causal inference technique. To account for the fluctuating nature of NPS and the varying length of follow-up among participants, we implemented a sampling approach that preserves temporal sequence of visits for causal discovery. Ultimately, this study aims to identify key NPS and clarify their roles in signaling the onset of cognitive impairment over time by sex.
II. Related work
A. Causal discovery
In clinical conditions, causality-based analysis is critical for successful intervention by discovering actual risk factors [17]. Different from association, causation adjusts for potential confounding effects, preventing bias that arises from common causes. Causal analysis consists of causal structure discovery, which identifies the causal relationships between variables from observational data, and causal inference, which estimates the size of causal effects [17, 18].
There are three major categories of causal discovery methods: constraint-based, score-based, and semi-parametric-based [17, 18]. Constraint-based methods, such as the Peter-Clark (PC) algorithm or Fast Causal Inference (FCI) algorithm, discover causal structures based on conditional independence using statistical tests. These methods efficiently handle sparse datasets and non-linear relationships between variables [18]. Moreover, the FCI algorithm is widely used in various application areas, as it can infer the presence of unmeasured confounders [17, 19]. Shen et al. [17] discovered causal relationships between various biological markers and the progression of MCI as well as early Alzheimer’s disease using the FCI algorithm. However, these approaches require a sufficient number of samples, such as study participants, to perform accurate conditional independence tests [18].
Score-based methods construct causal diagrams by maximizing likelihood functions or scores, such as Bayesian Information Criterion. These methods include the Greedy Equivalence Search (GES) algorithm and its faster version, Fast GES (FGES). Shen et al. [17] explored causal relationships between biomarkers and white matter hyperintensities, which increase the risk of dementia. These approaches produce asymptotically consistent results but assume the absence of hidden confounders [20].
Finally, semi-parametric methods incorporate parametric assumptions either about the functional forms of structural equations or the distributions of exogenous variables. For instance, the linear non-Gaussian acyclic model (LiNGAM) assumes specific parametric forms for the structural equations but makes no assumptions about the distributions of exogenous variables, except for non-Gaussianity [21]. Although these methods can identify the direction of causality more accurately, they require specific assumptions about data distributions and generally do not scale well in sparse settings [18].
In this study, we leveraged the FCI algorithm, which accounts for the presence of hidden confounders that can significantly impact causal effect estimation. To improve robustness in clinical scenarios with limited patient data, we proposed a bootstrap sampling method to augment the available observations. The detailed sampling approach is described in a subsequent section.
B. Causal inference
Among various methods to estimate the causal effects, counterfactual framework infers the effects by considering what the outcome would have been if a particular action had not been taken, or vice versa [22]. Based on the counterfactual theory, a causal effect is measured by a ‘what-if’ statement, that is, the difference between the expected value of the outcome under treatment and the potential outcome if untreated [23]. Mathematically, the average treatment effect (ATE) is calculated as , where represents the potential outcome if treatment is set to 1, and represents the outcome if is set to 0. Specifically, the counterfactual outcomes for each observation are estimated using a predictive model that simulates the effect of a hypothetical intervention. The ATE is then computed by measuring the difference between the actual outcomes and these estimated counterfactual outcomes.
Recently, machine learning (ML) techniques have been widely integrated into causal inference for estimating counterfactual causal effects due to their strong predictive performance [22]. They have advantages in handling nonlinearities, interactions between variables, and high-dimensional data [23, 24]. Moreover, ML enables the estimation of how treatment effects vary across subgroups through personalized predictions [24]. Representative examples of ML models used in causal inference include causal forest and double machine learning (DML) [23, 24].
Causal forest models estimate heterogeneous treatment effects based on random forest algorithms, where the data are split to maximize differences in conditional ATE (CATE) between resulting subgroups rather than based on traditional criteria such as entropy [25]. These models operate by partitioning the data into subgroups, represented by the leaves of decision trees, such that the treatment effect is approximately homogeneous within each leaf but exhibits meaningful heterogeneity across leaves. This method has advantages in uncovering heterogeneity between groups and yielding consistent estimators for heterogeneous treatment effect. However, despite its strength in estimating group-wise CATE, the reliability of estimated CATE values can vary across different data conditions or evaluation settings [26].
By contrast, DML uses cross-fitting and flexible ML models to obtain a unbiased estimate [27]. It integrates the predictive power of ML with the estimation of causal effects [28]. ML models are used to estimate the treatment and outcome functions separately, removing the influence of confounders [23, 27]. Then, a simple linear regression is fitted between the outcome residuals and treatment residuals. This orthogonalization process ensures that the estimated effect reflects variation in the treatment that is independent of confounders, thereby isolating the true causal relationship. Accordingly, this method effectively handles confounding and reduces bias [23, 27]. However, its performance depends on the quality of the ML models used for the treatment and outcome estimations [23, 27].
In this study, we employed DML due to its asymptotically unbiased property, which enables reliable estimation of the ATE, even in the presence of complex confounding relationships. We employed an ensemble of DML to account for variation in performance across different ML model choices and to improve robustness that is particularly important in clinical settings.
III. Data and methodology
A. Study population
We utilized data from Mayo Clinic Study of Aging (MCSA), a population-based cohort study designed to investigate the progression and risk factors of MCI and dementia among aging participants [2]. Comprehensive assessments of NPS, comorbidities, and demographic information were obtained from in-person interviews every 15 months. Cognitive impairment, including MCI and dementia, was assessed at each MCSA visit. Diagnoses were assigned by consensus among the study coordinator, physician, and neuropsychologist. For participants who discontinued visits, diagnoses were determined based on medical record review.
In this study, we define each assessment as a visit. During each visit, an informant reported whether the study participant experienced any of twelve NPS based on the neuropsychiatric inventory questionnaire (NPI-Q). The NPS included delusion, hallucination, agitation, depression, anxiety, elation, apathy, irritability, motor behavior, nighttime behavior, and appetite change [29].
We excluded (1) participants who already had cognitive impairment in their first involvement in the study [3], and (2) those who experienced any NPS before age 50. Young onset of NPS is likely related to other psychiatric conditions, such as depressive disorder and anxiety disorder [30]. Finally, 4,978 cognitively unimpaired participants and 981 cognitive impaired participants were identified.
B. Data preprocessing
To accurately identify the relationships among variables, we preprocessed the dataset in terms of three aspects. First, unlike chronic diseases such as diabetes and cardiovascular diseases, NPS tend to follow dynamic patterns, emerging and subsiding over time, rather than presenting as persistent and typically irreversible [7, 16]. This temporal variability might also suggest a complex interplay between NPS and chronic diseases. For example, NPS can both result from and contribute to the progression of dementia [6, 31].
To account for these complexities, we implemented a temporal framework by defining participant statuses across multiple time points. Specifically, data were divided into a baseline visit (T0) and three subsequent time periods (T1, T2, and T3) based on participants’ cognitive impairment status. The baseline visit included demographic information, while the subsequent time points tracked the status of diseases, including cognitive impairment, as well as the presence or absence of NPS. This structure allowed us to examine not only dynamic patterns of NPS and diseases but also potential bidirectional relationships among variables over time. A detailed overview of possible diagnostic statuses across time points is provided in Table I.
TABLE I.
Possible sampling strategy by cognitive status
| Participant group | Time period | ||
|---|---|---|---|
| T1 | T2 | T3 | |
| Cognitively unimpaired | Normal cognition | Normal cognition | Normal cognition |
| Cognitively impaired, at least at 1 time point | Normal cognition | Normal cognition | Cognitive impairment |
| Normal cognition | Cognitive impairment | Cognitive impairment | |
| Cognitive impairment | Cognitive impairment | Cognitive impairment | |
Second, due to the nature of this cohort study, each participant had a varying length of follow-up [32]. To avoid bias toward participants with more frequent visits, we randomly selected a single visit per participant for each defined time period, respectively. For participants with at least one cognitively impaired visit, we extracted a random combination of three visits that could occur either within the impaired cognitive states or across normal and impaired states, allowing comparison across different progression patterns (e.g., Normal → Normal, Normal → Impaired, Impaired → Impaired). We found that no participants transitioned from impaired to normal cognition during the analysis. The selected visits were then sorted chronologically. Overall, this approach ensured comparability across participants with varying number of observations while preserving the sequence of the data.
Algorithm 1:
The sampling algorithm.
| Input: A longitudinal dataset consisting of visit-level records for participants. For each participant , a visit set is denoted as , where indicates a number of total visits, including: participant ID and baseline demographics at , and cognitive impairment status, comorbidities and NPS indicators over time | |
| Output: Set of bootstrap samples with each dataset | |
| 1 | Initialize |
| 2 | for to 100 do |
| Initialize | |
| for to do | |
| Let , sorted by time | |
| Partition | |
| Let status | |
| Extract baseline demographics from | |
| if then | |
| Random sample | |
| Random sample | |
| else | |
| Random sample | |
| end if | |
| Let disease and NPS statuses from | |
| Let | |
| Append to | |
| end for | |
| Append to | |
| end for | |
| 3 | Return |
Third, the total number of observations per individual was small, with an average of 4.7 visits per participants. Moreover, although selecting three visits per participant ensures comparability, it inevitably leads to the exclusion of other available records. To address these limitations, we employed a bootstrapping procedure that involved repeating the random sampling with replacement 100 times. To illustrate, if a participant had two visits classified as cognitively unimpaired and five visits with cognitive impairment, we randomly selected a combination of three visits, such as one from the unimpaired and two from the impaired periods, or vice versa, sorted chronologically to form one data point in a bootstrap iteration. In another iteration, different combinations of visits, such as three visits from the impaired, can be extracted. This process allowed us to fully use the available data by including different combinations of visits across iterations. Detailed sampling procedure is outlined in Algorithm 1.
C. Identification of causal diagram
Our datasets included 63 variables, with each variable measured separately at multiple time points. Moreover, several unmeasurable or unobserved variables may have influenced the relationships among observed variables. To address both the sparsity of the dataset and the potential presence of hidden confounders, we generated causal graphs using the FCI algorithm. Given that our variables were binary (e.g., whether a comorbidity or NPS exists) or categorical (e.g., age groups), we used chi-square tests to measure associations between discrete variables [33, 34]. For each set of tested variables, samples containing missing values were excluded. We generated a total of 100 causal diagrams, each derived from a separate bootstrapped sample. A final causal diagram was then constructed by retaining only the relationships that appeared in at least 50 out of the 100 bootstrapped diagrams [19, 35]. The 50% threshold was chosen as a minimal consensus level under a majority voting criterion on binary values (i.e., presence or absence of links), ensuring that only relationships consistently identified in more than half of the resampled datasets were retained as stable causal links.
Then, we aggregated the identified causal relationships by incorporating temporal information to capture sequential patterns, irrespective of specific time points. An aggregated relationship was defined as the presence of a causal link between two variables at any two time points. For example, if apathy observed at the first time point had a directed edge toward a diagnosis of cognitive impairment at the last time point, this was interpreted as a causal relationship between apathy and cognitive impairment. We refer to this representation as a sequence diagram.
D. Estimation of causal effect
In this study, we investigated how the effects of causal variables of cognitive impairment evolve over time and how they differ by sex. Using the sequence diagram, we first estimated the effect sizes of the identified relationships over time. To investigate dynamic effects over five follow-up periods, we included only cognitively impaired participants who had at least five visits prior to diagnosis and cognitively unimpaired participants who had at least five total visits. For the cognitively impaired group, we tracked the five follow-up periods leading up to diagnosis, whereas for the unimpaired group, we focused on their most recent five visits. Then, we measured the causal effects of relevant variables at each period.
Effect sizes were calculated using DML, which provides an asymptotically unbiased and consistent estimation of the overall ATE. To address known limitations of DML and enhance the robustness of our results [23, 27], we incorporated an ensemble of three ML models that showed promising performance across various applications [27, 36]: random forest, gradient boosting machine, and XGBoost. Missing values were handled internally by the three algorithms, either by learning optimal default directions for missing values at each split or by treating missingness as an additional split option during training. The estimated effects were averaged across participants. While cross-fitting in DML typically uses two to five folds, prior research on DML has shown that there is only a minimal difference between the performance across this range [27, 36]. Therefore, for efficiency, we estimated the effects with two folds.
To evaluate the robustness of the estimates against potential violations of underlying assumptions, we conducted two representative sensitivity analyses. First, we performed a refutation test with placebo treatment by replacing the original treatment variable (i.e., a hypothesized cause of cognitive impairment) with a randomly generated variable. The test evaluated how likely these simulated estimates were under the theoretical null distribution, which is assumed to have a mean of zero. This test evaluates whether the observed effect could result from spurious correlations. Second, we simulated the presence of an unobserved confounder that was correlated with both the treatment and the outcome and examined if the estimated treatment effect was changed. We assessed the probability that the treatment effect estimated from the original dataset was drawn from this simulated distribution. If the refuted results are statistically different from the original results, the estimated effect is considered robust.
IV. Experimental Results
Table II summarizes the distribution of participant demographics and comorbidities at their last recorded visit.
TABLE II.
Summary description of participants
| Variables | Cognitively normal | Cognitively impaired |
|---|---|---|
| Number of participants | 4978 | 981 |
| Average number of visits | 4.43 | 6.49 |
| Sex, n (%) | ||
| Female | 2536 (50.94%) | 486 (49.54%) |
| Male | 2442 (49.06%) | 495 (50.46%) |
| Age, n (%) | ||
| < 70 years old | 1637 (32.88%) | 29 (2.96%) |
| 70–80 years old | 1540 (30.94%) | 150 (15.29%) |
| >= 80 years old | 1796 (36.08%) | 802 (81.75%) |
| Missing | 5 (0.1%) | 0 (0%) |
| Education, n (%) | ||
| <= 12 years | 1253 (25.17%) | 431 (43.93%) |
| > 12 years | 3703 (74.39%) | 550 (56.07%) |
| Missing | 22 (0.44%) | 0 (0%) |
| Obesity, n (%) | ||
| Yes | 1665 (33.45%) | 187 (19.06%) |
| No | 3000 (60.27%) | 538 (54.84%) |
| Missing | 313 (6.29%) | 256 (26.1%) |
| Alcohol use disorder, n (%) | ||
| Yes | 225 (4.52%) | 72 (7.34%) |
| No | 4733 (95.08%) | 909 (92.66%) |
| Missing | 20 (0.4%) | 0 (0%) |
| Smoking status, n (%) | ||
| Never | 2154 (43.27%) | 465 (47.4%) |
| Former or current | 2824 (56.73%) | 516 (52.6%) |
| Missing | 0 (0%) | 0 (0%) |
| Chronic conditions, n (%) | ||
| Hypertension | 2981 (59.88%) | 813 (82.87%) |
| Dyslipidemia | 3515 (70.61%) | 839 (85.52%) |
| Diabetes | 832 (16.71%) | 264 (26.91%) |
| Atrial fibrillation | 702 (14.1%) | 279 (28.44%) |
| Congestive heart failure | 484 (9.72%) | 247 (25.18%) |
| Coronary artery diseasea | 1416 (28.45%) | 516 (52.6%) |
Coronary artery disease includes angina, coronary artery disease, myocardial infarction, and coronary artery bypass graft.
A. Patterns of neuropsychiatric symptoms by sex
To assess potential sex differences in NPS patterns, we examined the proportion of male and female participants who experienced each type of NPS at least once within two follow-up periods before and after the diagnosis of cognitive impairment. The rate of increase was calculated as the difference between the proportions before and after diagnosis, divided by the proportion before diagnosis.
Fig. 1 shows that depression and nighttime behavioral disturbances were the two most frequently observed symptoms after cognitive impairment in both sexes, and notably, they were also common prior to diagnosis. Among male participants, delusions, hallucinations, and agitation exhibited the highest rate of increase following diagnosis. In contrast, female participants showed marked increases in hallucinations, disinhibition, and motor behavior after diagnosis. Moreover, while the occurrence of appetite change slightly decreased in males after diagnosis, it increased among females with cognitive impairment. These findings suggest notable differences in the trajectories of NPS between male and female participants. However, such heterogeneity may be influenced by confounding factors that affect both NPS expression and cognitive decline, stressing the need for causality-based approaches.
Fig. 1.

Proportions of participants experienced NPS two years before and after diagnosis of cognitive impairment.
B. Causal relationships between variables
Figs. 2 and 3 illustrate original causal diagram, derived from bootstrapped samples and the aggregated relationships among variables based on sequential patterns, respectively.
Fig. 2.

Sequence diagrams by sex.
Fig. 3.

Sequence diagrams by sex.
Overall, the two groups exhibited similar sequence diagrams, with age group, education level, and the occurrence of apathy playing key predictive roles in cognitive impairment. However, the dynamic relationships between apathy and cognitive impairment showed slight differences between the sex groups (see Fig. 2). In males, apathy at one follow-up period was associated with cognitive impairment at the immediately subsequent period. In contrast, in females, apathy showed associations with cognitive impairment across two consecutive follow-ups, suggesting a more persistent temporal relationship between apathy and later cognitive decline. This pattern suggests that apathy in earlier periods may have a longer-term causal association with subsequent cognitive impairment in females than in males.
Additionally, some sex-specific differences were observed in relationships between comorbidities and cognitive impairment. Among male participants, diabetes and coronary artery diseases appeared to be causing increased risk of cognitive impairment. Comorbidities detected at earlier visits were associated with cognitive impairment at multiple subsequent follow-ups, suggesting that these comorbidities may contribute to the onset of cognitive impairment over the long term in males (see Fig. 2).
C. Causal effects over time
To examine the effects of each variable on the diagnosis of cognitive impairment, we estimated the causal effects. The results of the refutation tests remained robust across all follow-up periods and for all causal variables identified from the diagrams, for both male and female participants. The placebo refutation test produced non-significant results (p=0.18–0.48), indicating that the estimator does not yield spurious treatment effects when no true causal relationship exists. Similarly, the unobserved confounder refutation test yielded p-values between 0.16 and 0.98, suggesting that the estimated treatment effects are robust to potential violations of the unconfoundedness assumption.
Fig. 4 illustrates the heterogeneous effects of each factor over time, along with 95% confidence intervals. Among all variables, apathy exhibited the greatest peak effect over time in both male and female participants. This suggests that individuals who experienced apathy were more likely to develop cognitive impairment in the future compared to those who did not. In particular, the positive effect of apathy peaked during the third follow-up period, indicating that the presence of apathy within the recent three periods may be a strong predictor of subsequent cognitive impairment diagnosis. Among male participants, a significant positive effect of apathy was also observed during the period immediately preceding diagnosis.
Fig. 4.

Dynamic effects of predictors by sex
Among male participants, diabetes, coronary artery diseases, and age also emerged as significant predictors of cognitive impairment diagnosis. In particular, the predictive effect of diabetes and coronary artery diseases reached their peaks two follow-up periods before diagnosis. This suggests that when a male participant is diagnosed with diabetes or other coronary artery diseases, there was an increased risk of developing cognitive impairment within the next two follow-up periods. The effect of age fluctuates over time but reaches its highest point during the period immediately preceding the diagnosis.
Among female participants, unlike in males, education had a negative effect, indicating that higher educational attainment was associated with a lower risk of cognitive impairment. In contrast, age showed minimal predictive value in diagnosing cognitive impairment among females. This suggests that the effect of age may be confounded or mediated by other factors such as comorbidities or NPS, implying that age alone has limited independent causal effect. However, this effect size may vary depending on the classification of age groups or the specific type of cognitive impairment being considered [12].
Fig. 5 presents the dynamic effect of apathy on the diagnosis of cognitive impairment using a time-series fan plot. The plot illustrates the distribution of observed effects across percentiles, with different percentiles ranges represented by shaded bands. Darker shades correspond to more probable outcomes, while lighter shades indicate less probable ones. The darkest bands reflect the interquartile range (45th to 55th percentiles), where the most observations are concentrated. In both male and female participants, the occurrence of apathy within three follow-up periods before diagnosis shows the strongest effect on cognitive impairment. Over time, the effect generally increases but also fluctuates, while showing greater variance at each period. This implies that, for a substantial proportion of males (within the 45th to 55th percentile range), the presence of apathy predicts a cognitive impairment diagnosis within two to four follow-up periods. However, for others, apathy may have little or no predictive value. Meanwhile, the highly concentrated distributions observed in the period immediately preceding diagnosis indicate a clear and consistent positive effect. This suggests that apathy commonly emerges in the period shortly preceding a cognitive impairment diagnosis.
Fig. 5.

Dynamic effects of apathy by sex.
In contrast, female participants exhibited a distinct pattern, with limited effects observed within the two periods immediately preceding diagnosis and a peak effect occurring three follow-up periods before diagnosis. This suggests that, in female, a longer interval typically exists between apathy occurrence and the eventual diagnosis of cognitive impairment. Therefore, early identification and monitoring of apathy symptoms may be particularly important for females compared to males.
V. Discusssion
We examined diverse patterns of NPS and their causal impact on cognitive impairment diagnoses over time, with particular attention to sex differences. Although depression and nighttime disturbances were the most commonly observed symptoms for both sex, apathy was the only NPS that demonstrated predictive value for cognitive impairment. This aligns with previous research that has emphasized the role of apathy in cognitive decline. Martin et al. [37] found that apathy was one of the few NPS that distinguished individuals with intact cognition from those with MCI in cohorts with Parkinson’s disease. Similarly, Van Dalen et al. [38] proposed that apathy may serve as predictor of dementia, as its presence is associated with both a higher likelihood of conversion from MCI to dementia and increased dementia severity over time. These predictive patterns were hardly observed for other NPS, such as depression, despite their frequent occurrence before and after diagnosis.
There are several possible reasons for the limited predictive roles of other NPS. First, specific NPS may act as mediators or moderators of other NPS, rather than serving as individual predictors. Depression often mediates the effect of apathy [39, 40], while anxiety can indirectly increase the risk of cognitive impairment through depression [39]. These NPS are often associated with the severity of cognitive deficits rather than directly predicting the presence of cognitive decline. Second, some NPS are prevalent in older adults in general, regardless of their cognitive status. For example, nighttime behaviors such as sleep disturbances and irritability are common among older adults with and without cognitively decline [41, 42]. Lastly, several NPS, such as hallucinations and delusions, appear in the later stages of cognitive impairment [43] and thus are hardly considered preceding symptoms of cognitive decline.
Additionally, our findings suggest that the effect of apathy was more pronounced over longer follow-up periods in females, underscoring the importance of early symptom monitoring in female populations. These findings suggest that for female patients, the emergence of apathy, even if it appears transiently several years prior to cognitive complaints, should be considered a significant early warning sign warranting closer follow-up. For male patients, the appearance of apathy may signal a more imminent risk of diagnosis.
Age and education level were the two other factors that played a predictive role on cognitive impairment. This finding is consistent with prior studies that have identified age and educational attainment as strong predictors of cognitive impairment onset [44]. However, our findings suggest that, while age remains a contributing factor, its independent effect on cognitive impairment may be more limited than commonly reported in existing literature. It is more likely that age functions as an indirect predictor, potentially through pathways such as the development of comorbid conditions.
Meanwhile, diabetes and coronary artery diseases were identified as causal factors of cognitive impairment only in male. This corroborates previous studies reporting a higher risk of dementia in males with diabetes compared to females with the same condition [20, 32]. Moreover, given that males are at greater risk for cardiovascular dementia than females [12, 45], coronary artery disease may serve as a particularly important predictor of cognitive impairment in male. Considering the early signals observed in the findings, males with these comorbidities may benefit from closer long-term cognitive monitoring and early preventive interventions.
A major strength of the study is the detailed evaluation of the effects of serial NPS and cognitive impairment over time, which has been understudied. The use of MCSA, a prospective study of cognitive aging and impairment evaluated every approximately 15 months, along with bootstrap strategy allowed us to assess comprehensive relationships between NPS and cognitive impairment over time. Additionally, the counterfactual framework-based method enabled the exploration of dynamic effects of NPS across varying follow-up periods by participants, identifying the time windows in which specific symptoms are most predictive.
This study also has several limitations. First, although the MCSA dataset represents the Midwest population, characteristics of this group differ from other populations throughout the country. Further studies should incorporate more diverse populations in the cohort or to leverage additional nation-wide datasets, such as the National Alzheimer’s Coordinating Center, to increase generalizability. Second, this study grouped MCI and dementia together under the category of cognitive impairment and did not distinguish between different types of dementia. Given that the underlying mechanisms leading to dementia vary by subtype, the trajectories and associations with NPS may also differ significantly. Third, causal conclusions drawn solely on observations can be misleading, as the statistical “causality” may overestimate or underestimate the true effects due to unmeasured confounding variables and the complexity of real-world conditions. Incorporating additional relevant factors, such as mental health disorders, that are closely related to NPS, or leveraging domain knowledge as prior information, would enhance the validity and reliability of the causal inference. Lastly, NPS were reported by informants, which may lead to underreporting of symptoms that are less observed. The impact of reporting bias should be addressed further. These aspects are left for future research.
VI. Conclusion
This study demonstrated the potential of statistical causal modeling beyond association-based analyses by accounting for the dynamic nature of NPS and varying follow-up durations. By incorporating a bootstrap sampling strategy, causal discovery with statistical approach and counterfactual framework-based analysis, our model identified potential causal relationships between NPS and cognitive impairment and revealed how the predictive effects of NPS evolve over time. While these findings offer new insights into the role of NPS in the progression of cognitive impairment, further research is warranted, particularly studies involving more diverse populations, distinct subtypes of cognitive impairment, and comorbid mental health disorders.
Acknowledgment
This study was supported by NIA R01 AG068007 and NIA RF1 AG090341. The Mayo Clinic Study of Aging was supported by NIA U01 AG006786.
Contributor Information
Eunji Jeon, Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, USA.
Muskan Garg, Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, USA.
Xingyi Liu, Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, USA.
Maria Vassilaki, Department of Quantitative Health Sciences, Mayo Clinic, Rochester, USA.
Jennifer St. Sauver, Department of Quantitative Health Sciences, Mayo Clinic, Rochester, USA.
Ronald C. Petersen, Department of Neurology, Mayo Clinic, Rochester, USA
Sunghwan Sohn, Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, USA.
References
- [1].World Health Organization, “Dementia,” 2025.
- [2].Roberts RO, Geda YE, Knopman DS, Cha RH, Pankratz VS, Boeve BF, et al. , “The Mayo Clinic Study of Aging: Design and Sampling, Participation, Baseline Measures and Sample Characteristics,” Neuroepidemiology, 2008, 30, (1), pp. 58–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Garg M, Liu X, Vassilaki M, Petersen RC, Sauver JS, and Sohn S, “Navigating Sex-Specific Disease Dynamics in Incident Dementia,” (IEEE; ). [Google Scholar]
- [4].Bastin C, and Salmon E, “Early neuropsychological detection of Alzheimer’s disease,” European Journal of Clinical Nutrition, 2014, 68, (11), pp. 1192–1199. [DOI] [PubMed] [Google Scholar]
- [5].Knopman DS, and Petersen RC, “Mild Cognitive Impairment and Mild Dementia: A Clinical Perspective,” Mayo Clinic Proceedings, 2014, 89, (10), pp. 1452–1459. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Langa KM, and Levine DA, “The Diagnosis and Management of Mild Cognitive Impairment,” JAMA, 2014, 312, (23), pp. 2551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Forrester SN, Gallo JJ, Smith GS, and Leoutsakos J-MS, “Patterns of Neuropsychiatric Symptoms in Mild Cognitive Impairment and Risk of Dementia,” The American Journal of Geriatric Psychiatry, 2016, 24, (2), pp. 117–125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Palmer K, Di Iulio F, Varsi AE, Gianni W, Sancesario G, Caltagirone C, et al. , “Neuropsychiatric Predictors of Progression from Amnestic-Mild Cognitive Impairment to Alzheimer’s Disease: The Role of Depression and Apathy,” Journal of Alzheimer’s Disease, 2010, 20, (1), pp. 175–183. [Google Scholar]
- [9].Garg M, Hejazi S, Fu S, Vassilaki M, Petersen RC, St. Sauver J, et al. , “Characterizing the progression from mild cognitive impairment to dementia: a network analysis of longitudinal clinical visits,” BMC Medical Informatics and Decision Making, 2024, 24, (1). [Google Scholar]
- [10].Liu Z, Garg M, Fu S, Sarkar S, Vassilaki M, Petersen RC, et al. , “Harnessing Transfer Learning for Dementia Prediction: Leveraging Sex-Different Mild Cognitive Impairment Prognosis,” in ‘Book Harnessing Transfer Learning for Dementia Prediction: Leveraging Sex-Different Mild Cognitive Impairment Prognosis.’ [Google Scholar]
- [11].Geda YE, Roberts RO, Mielke MM, Knopman DS, Christianson TJH, Pankratz VS, et al. , “Baseline Neuropsychiatric Symptoms and the Risk of Incident Mild Cognitive Impairment: A Population-Based Study,” American Journal of Psychiatry, 2014, 171, (5), pp. 572–581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Podcasy JL, and Epperson CN, “Considering sex and gender in Alzheimer disease and other dementias,” Dialogues in Clinical Neuroscience, 2016, 18, (4), pp. 437–446. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Wise EA, Rosenberg PB, Lyketsos CG, and Leoutsakos JM, “Time course of neuropsychiatric symptoms and cognitive diagnosis in National Alzheimer’s Coordinating Centers volunteers,” Alzheimer’s & Dementia (Amsterdam, Netherlands), 2019, 11, pp. 333–339. [Google Scholar]
- [14].Katon W, Pedersen HS, Ribe AR, Fenger-Grøn M, Davydow D, Waldorff FB, et al. , “Effect of Depression and Diabetes Mellitus on the Risk for Dementia,” JAMA Psychiatry, 2015, 72, (6), pp. 612. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Baranova A, Zhao Q, Cao H, Chandhoke V, and Zhang F, “Causal influences of neuropsychiatric disorders on Alzheimer’s disease,” Translational Psychiatry, 2024, 14, (1), pp. 114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Morrow CB, Kamath V, Dickerson BC, Eldaief M, Rezaii N, Wong B, et al. , “Neuropsychiatric symptoms cluster and fluctuate over time in behavioral variant frontotemporal dementia,” Psychiatry and Clinical Neurosciences, 2025, 79, (6), pp. 327–335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Shen X, Raghavan S, Przybelski SA, Lesnick TG, Ma S, Reid RI, et al. , “Causal structure discovery identifies risk factors and early brain markers related to evolution of white matter hyperintensities,” NeuroImage: Clinical, 2022, 35, pp. 103077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Glymour C, Zhang K, and Spirtes P, “Review of Causal Discovery Methods Based on Graphical Models,” Frontiers in Genetics, 2019, 10, [Google Scholar]
- [19].Yu X, Lophatananon A, Holmes V, Muir KR, and Guo H, “Investigating causal networks of dementia using causal discovery and natural language processing models,” npj Dementia, 2025, 1, (1). [Google Scholar]
- [20].Ogarrio JM, Spirtes P, and Ramsey J, “A Hybrid Causal Search Algorithm for Latent Variable Models,” JMLR workshop and conference proceedings, 2016, 52, (1938–7288 (Print)), pp. 368–379. [PMC free article] [PubMed] [Google Scholar]
- [21].Shimizu S, “Statistical causal discovery: LiNGAM approach,” (Springer, 2022). [Google Scholar]
- [22].Prosperi M, Guo Y, Sperrin M, Koopman JS, Min JS, He X, et al. , “Causal inference and counterfactual prediction in machine learning for actionable healthcare,” Nature Machine Intelligence, 2020, 2, (7), pp. 369–375. [Google Scholar]
- [23].Moccia C, Moirano G, Popovic M, Pizzi C, Fariselli P, Richiardi L, et al. , “Machine learning in causal inference for epidemiology,” European Journal of Epidemiology, 2024, 39, (10), pp. 1097–1108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Leist AK, Klee M, Kim JH, Rehkopf DH, Bordas SPA, Muniz-Terrera G, et al. , “Mapping of machine learning approaches for description, prediction, and causal inference in the social and health sciences,” Science Advances, 2022, 8, (42). [Google Scholar]
- [25].Bastos LSL, Wortel SA, Bakhshi-Raiez F, Abu-Hanna A, Dongelmans DA, Salluh JIF, et al. , “Comparing causal random forest and linear regression to estimate the independent association of organisational factors with ICU efficiency,” International Journal of Medical Informatics, 2024, 191, pp. 105568. [DOI] [PubMed] [Google Scholar]
- [26].Hamaya R, Hara K, Manson JE, Rimm EB, Sacks FM, Xue Q, et al. , “Machine-learning approaches to predict individualized treatment effect using a randomized controlled trial,” European Journal of Epidemiology, 2025, 40, (2), pp. 151–166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Fuhr J, Berens P, and Papies D, “Estimating Causal Effects with Double Machine Learning--A Method Evaluation,” arXiv preprint arXiv:2403.14385, 2024. [Google Scholar]
- [28].Bach P, Schacht O, Chernozhukov V, Klaassen S, and Spindler M, “Hyperparameter Tuning for Causal Inference with Double Machine Learning: A Simulation Study,” in Proc. Conference, Proceedings of Machine Learning Research, 2024, pp. 1065–1117. [Google Scholar]
- [29].Krell-Roesch J, Syrjanen JA, Machulda MM, Christianson TJ, Kremers WK, Mielke MM, et al. , “Neuropsychiatric symptoms and the outcome of cognitive trajectories in older adults free of dementia: The Mayo Clinic Study of Aging,” International Journal of Geriatric Psychiatry, 2021, 36, (9), pp. 1362–1369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Antonsdottir IM, Rosenberg P, and Ismail Z, “Dementia Insights: Mild Behavioral Impairment,” Practical Neurology, 2023, SEP-OCT. [Google Scholar]
- [31].Geda YE, Schneider LS, Gitlin LN, Miller DS, Smith GS, Bell J, et al. , “Neuropsychiatric symptoms in Alzheimer’s disease: Past progress and anticipation of the future,” Alzheimer’s & Dementia, 2013, 9, (5), pp. 602–608. [Google Scholar]
- [32].Marques SCS, Doetsch J, Brødsgaard A, Cuttini M, Draper ES, Kajantie E, et al. , “Improving Understanding of Participation and Attrition Phenomena in European Cohort Studies: Protocol for a Multi-Situated Qualitative Study,” JMIR Research Protocols, 2020, 9, (7), e14997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Malinsky D, and Danks D, “Causal discovery algorithms: A practical guide,” Philosophy Compass, 2018, 13, (1), e12470. [Google Scholar]
- [34].Raghu VK, Poon A, and Benos PV, “Evaluation of Causal Structure Learning Methods on Mixed Data Types,” in Proc. Conference, Proceedings of Machine Learning Research, 2018, pp. 48–65. [Google Scholar]
- [35].Shen X, Ma S, Vemuri P, Simon G, Weiner MW, Aisen P, et al. , “Challenges and Opportunities with Causal Discovery Algorithms: Application to Alzheimer’s Pathophysiology,” Scientific Reports, 2020, 10, (1). [Google Scholar]
- [36].Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W, et al. , “Double/debiased machine learning for treatment and structural parameters,” The Econometrics Journal, 2018, 21, (1), C1–C68. [Google Scholar]
- [37].Martin GP, McDonald KR, Allsop D, Diggle PJ, and Leroi I, “Apathy as a behavioural marker of cognitive impairment in Parkinson’s disease: a longitudinal analysis,” Journal of Neurology, 2020, 267, (1), pp. 214–227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [38].Van Dalen JW, Van Wanrooij LL, Moll Van Charante EP, Brayne C, Van Gool WA, and Richard E, “Association of Apathy With Risk of Incident Dementia,” JAMA Psychiatry, 2018, 75, (10), pp. 1012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [39].Ma L, “Depression, Anxiety, and Apathy in Mild Cognitive Impairment: Current Perspectives,” Frontiers in Aging Neuroscience, 2020, 12. [Google Scholar]
- [40].Ruthirakuhan M, Herrmann N, Vieira D, Gallagher D, and Lanctôt KL, “The Roles of Apathy and Depression in Predicting Alzheimer Disease: A Longitudinal Analysis in Older Adults With Mild Cognitive Impairment,” The American Journal of Geriatric Psychiatry, 2019, 27, (8), pp. 873–882. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Da Silva RAPC, “Sleep disturbances and mild cognitive impairment: A review,” Sleep Science, 2015, 8, (1), pp. 36–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [42].Leoutsakos JMS, Wise EA, Lyketsos CG, and Smith GS, “Trajectories of neuropsychiatric symptoms over time in healthy volunteers and risk of MCI and dementia,” International Journal of Geriatric Psychiatry, 2019, 34, (12), pp. 1865–1873. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [43].Zhang NK, Zhang SK, Zhang LI, Tao HW, and Zhang G-W, “The neural basis of neuropsychiatric symptoms in Alzheimer’s disease,” Frontiers in Aging Neuroscience, 2024, 16. [Google Scholar]
- [44].Zhong T, Li S, Liu P, Wang Y, and Chen L, “The impact of education and occupation on cognitive impairment: a cross-sectional study in China,” Frontiers in Aging Neuroscience, 2024, 16. [Google Scholar]
- [45].Gannon OJ, Robison LS, Custozzo AJ, and Zuloaga KL, “Sex differences in risk factors for vascular contributions to cognitive impairment & dementia,” Neurochemistry International, 2019, 127, pp. 38–55. [DOI] [PubMed] [Google Scholar]
