Abstract
Background:
The interactions between behavioral disturbances, chronic diseases, and Alzheimer's disease (AD) risk are not fully understood, particularly in the context of the COVID-19 pandemic.
Objective:
This study aimed to identify key demographic, behavioral, and health-related predictors of AD using machine learning approaches.
Methods:
We conducted a cross-sectional analysis of 3257 participants from the National Health and Aging Trends Study (NHATS) and its COVID-19 supplement. Predictors included demographic, behavioral, and chronic disease variables, with self-reported physician-diagnosed AD as the outcome. LASSO and random forest (RF) models identified significant predictors, and regression tree analysis examined interactions to estimate individual AD risk profiles and subgroups.
Results:
Stroke, diabetes, osteoporosis, depression, and sleep disturbances emerged as key predictors of AD in both LASSO and RF models. Regression tree analysis identified three risk subgroups: a high-risk subgroup with a history of stroke and diabetes, showing a 68% AD risk among females; an intermediate-risk subgroup without stroke but with osteoporosis and positive COVID-19 status, showing a 30% risk; and a low-risk subgroup without stroke or osteoporosis, with the lowest risk (∼10%). Female patients with both stroke and diabetes had significantly higher AD risk than males (68% versus 10%, p = 0.029). Among patients without stroke but with osteoporosis, COVID-19 positivity increased AD risk by 20% (30% versus 10%, p = 0.006).
Conclusions:
Machine learning effectively delineates complex AD risk profiles, highlighting the roles of vascular and metabolic comorbidities and the modifying effects of sex, osteoporosis, and COVID-19. These insights support targeted screening and early intervention strategies to improve outcomes in older adults.
Keywords: Alzheimer's disease, behavioral disturbances, chronic disease, epidemiology, machine learning
Introduction
Alzheimer's disease (AD) is a progressive neurodegenerative disorder that significantly impacts cognitive function, daily activities, and quality of life in aging populations. AD is the most common cause of dementia, accounting for approximately 60–80% of cases worldwide. 1 The prevalence of AD-related dementia continues to increase, with an estimated 152 million individuals expected to be affected worldwide by 2050. 2 Despite advances in understanding the disease process, early detection and prevention of this disease remain important public health priorities.
Epidemiological studies have identified several demographic, clinical, and psychosocial risk factors associated with AD. Age remains the strongest predictor, with prevalence doubling every five years after age 65 years. 3 Sex differences have also been reported, with a higher prevalence observed among women, likely due to longer life expectancy and hormonal influences. 4 Furthermore, racial and ethnic disparities exist in dementia risk, with non-Whites experiencing a higher burden of the disease due to socioeconomic factors, genetics, and access to healthcare.5,6
The risk of AD-related dementia may be influenced by multiple comorbid conditions. Cardiovascular diseases, such as hypertension, heart disease, and stroke, contribute to neurovascular dysfunction, increasing dementia susceptibility. 7 Metabolic disorders, including diabetes and osteoporosis, have also been linked to cognitive decline through inflammatory and vascular mechanisms. 8 In addition to being risk factors for AD, mental health conditions, such as depression and anxiety, are also early symptoms of AD, emphasizing the importance of early psychological intervention. 9 Moreover, sleep disturbances such as nightmares and poor sleep quality have been linked to cognitive decline, possibly due to disruptions in the brain clearance mechanisms.10,11
The impact of social and psychological factors, such as loneliness and social isolation, is an emerging area of interest in dementia research. Studies have demonstrated that individuals experiencing persistent loneliness are at an increased risk of developing cognitive impairment, depression and dementia due to chronic stress and neuroinflammation.12,13 The COVID-19 pandemic has further highlighted the role of social determinants in dementia progression, as infection and pandemic-related stressors have been linked to increased cognitive impairment among older adults. 14
Several studies have examined the effects of individual risk factors on the development of AD, such as cardiovascular disease, mental illnesses, and loneliness. However, few have examined their combined effects. 7 Improving targeted prevention strategies by better understanding these interactions may be possible. Sleep disorders and mental health conditions, including anxiety and depression, are increasingly recognized as contributors to dementia, yet their underlying mechanisms remain unclear, necessitating further research to establish causal relationships. Moreover, there is a need for further investigation of the long-term cognitive effects of COVID-19 in older individuals to determine whether the virus acts as a direct risk factor or if it accelerates cognitive decline. 15 In addition, despite the existence of differential dementia risk for AD, there is a lack of understanding of the specific social, environmental, and biological factors responsible for these differences.
There is a critical need for evidence-based interventions targeting modifiable risk factors such as sleep quality, mental health, and cardiovascular health to reduce dementia incidence. This study aimed to analyze demographic, behavioral disturbances, and chronic disease characteristics related to AD to identify key predictors that could aid in the development of personalized interventions and public health designed to reduce its prevalence.
Methods
Study design and participants
This study considered a cross-sectional design using data from 3257 participants in the first wave of the National Health and Aging Trends Study (NHATS) and its supplementary COVID-19 dataset. Only those who returned a complete NHATS COVID-19 questionnaire were included. The NHATS tracker file initially included 3961 participants who were eligible for the COVID-19 sample person or proxy questionnaire.15,16 NHATS is an ongoing, nationally representative panel study of Medicare beneficiaries aged 65 years and older in the U.S., based on a stratified, three-stage sampling design.11,16 Participants were selected based on U.S. County, residential zip code, and age, excluding residents of Alaska, Hawaii, and Puerto Rico. The sample is periodically updated to help researchers track national disability trends and individual changes over time. The “last month of life” interview collects information on the quality of end-of-life care. In addition, caregivers of NHATS participants are periodically surveyed through the supplemental National Study of Caregiving (NSOC). More details are available at: https://www.nhats.org/researcher/nsoc.
For this analysis, participants from the main NHATS cohort were selected from the last follow-up and matched with their corresponding records in the COVID-19 supplemental sample. The COVID-19 dataset, collected between June and December 2020, was obtained through mailed questionnaires, either self-administered or completed by proxy, as well as through the Family Members and Friends survey. This supplemental dataset included detailed information on mental health and COVID-19-related experiences. All participants provided informed consent, and the NHATS study was approved by the Johns Hopkins University Institutional Review Board. 17
Outcome
The primary outcome of interest was self-reported AD. In the NHATS dataset, this was determined based on responses to the question: “Has a doctor ever told you that you have dementia, Alzheimer's disease, or another type of memory-related disease?” Participants were classified as having AD if they or their proxy respondents reported that a physician had specifically diagnosed them with AD, rather than another form of dementia or cognitive impairment. This binary outcome variable was coded as 1 = Yes (self- or proxy-reported diagnosis of AD) and 0 = No (not reported diagnosis of AD or dementia). Participants reporting other forms of memory loss or cognitive decline without a confirmed AD diagnosis were excluded from the outcome classification to improve specificity. 18 This focused approach ensured that the predictive modeling targeted AD risk specifically, rather than general dementia.
Predictor variables
The study used a comprehensive set of demographics, behavioral, and clinical predictor variables previously associated with AD risk among older adults. These variables were selected based on their theoretical significance and availability within the NHATS and its COVID-19 supplementary dataset. 19 Demographic variables included age, sex, and race/ethnicity. Behavioral health and psychological factors encompassed self- or proxy-reported measures of sleep quality (categorized as good, fair, or poor), depressive symptoms, anxiety, loneliness, and nightmares or disturbing dreams. These behavioral indicators were derived from responses to the COVID-19 supplemental questionnaire and were coded as binary or ordinal variables as appropriate. Clinical predictors included self-reported diagnoses of chronic conditions such as heart disease, arthritis, osteoporosis, diabetes, high blood pressure (hypertension), and stroke or other cerebrovascular disease. All predictor variables were thoroughly assessed for missing data, recoded for consistency, and treated as either continuous or categorical covariates in the modeling process, depending on their distribution and clinical relevance.
Predictive models
Feature selection is crucial in machine learning applications to mitigate high-dimensional data challenges and improve model efficiency.20,21 Least Absolute Shrinkage and Selection Operator (LASSO) regression, and Random Forest (RF) machine learning predictive models were used to identify key predictors of AD. LASSO regression was implemented for variable selection and shrinkage, applying an L1 penalty in shrinking some regression coefficients to zero to enhance model interpretability and prevent overfitting.22,23 The model was fitted using a binomial family, suitable for the binary outcome (AD: Yes/No). Ten-fold cross-validation was used to determine the optimal lambda value, ensuring model performance with minimal predictive variables. 23 RF is an ensemble learning method that constructs multiple decision trees using bootstrap samples from the training data and combines their predictions to improve accuracy and reduce overfitting.24,25
For classification tasks, each tree is built by considering a random subset of predictors at each split, with the Gini index used to determine the best split. Once a sufficient number of trees are grown, the final prediction is made by majority voting across trees.25,26 Variable importance is assessed using the permutation importance method, which measures the increase in prediction error after randomly permuting a variable's values. 27 Variables contributing minimally or negatively to model accuracy is excluded, and the most important predictors of the outcome is retained. To enhance interpretability, a regression tree is then applied to the selected predictors to estimate the probability of AD risk, providing conditional probabilities and interpretable decision rules for identifying high-risk profiles among older adults. 28
Before conducting predictive machine learning models (LASSO and RF), the dataset is randomly split into training and testing sets in a 70:30 ratio for model development and internal validation. Stratified sampling is used to preserve the class distribution in both sets. Model performance is assessed in the test set using the area under the receiver operating characteristic curve (AUC-ROC) to evaluate discriminative ability and calibration plots to examine the agreement between predicted probabilities and observed outcomes.24,25
Results
Population characteristics
Table 1 presents population characteristics among 3257 older adults, stratified by AD status. Among the total sample, 265 individuals (8.1%) were identified as having AD. Over half of those with AD were aged 80–89 years (52%), compared to 40% in the non-AD group. Similarly, 29% of AD patients were aged 90 or older, nearly triple the proportion in the non-AD group (12%). Sex and race distributions were similar across groups, with females comprising 62% of the AD group and 58% of the non-AD group (Table 1). White individuals accounted for 74% of the AD group and 76% of the non-AD group. While poor/fair sleep was more common in the AD group (88%) than in the non-AD group (12%). Loneliness was similarly prevalent in both groups (∼69–72%), but anxiety and depression were substantially lower in the AD group (57% and 52%, respectively) compared to the non-AD group (80% and 72%). Chronic diseases such as heart disease (38% versus 26%), high blood pressure (82% versus 74%), arthritis (80% versus 74%), osteoporosis (47% versus 35%), and diabetes (37% versus 28%) were more prevalent among individuals with AD. Stroke was particularly elevated in the AD group (6.8%) compared to those without AD (1.6%). A higher proportion of COVID-19 positivity was observed among those with AD (5.4%) than those without (2.2%) (Table 1).
Table 1.
Population summary with key demographic, behavioral, and health-related variables associated with Alzheimer's disease-related dementia (N = 3257).
| Predictors | Overall Samples | Patients with No AD disorder | Patients with AD disorder | p c |
|---|---|---|---|---|
| Overall, N = 3257 a | No, N = 2992 b | Yes, N = 265 b | ||
| Age group | <0.001 | |||
| 70–79 y | 1493/3257 (46%) | 1441 / 2992 (48%) | 52 / 265 (20%) | |
| 80–89 y | 1342 / 3257 (41%) | 1205 / 2992 (40%) | 137 / 265 (52%) | |
| 90+ y | 422 / 3257 (13%) | 346 / 2992 (12%) | 76 / 265 (29%) | |
| Sex | 0.200 | |||
| Female | 1887 / 3257 (58%) | 1723 / 2992 (58%) | 164 / 265 (62%) | |
| Male | 1370 / 3257 (42%) | 1269 / 2992 (42%) | 101 / 265 (38%) | |
| Race | 0.400 | |||
| Non-White | 785 / 3257 (24%) | 716 / 2992 (24%) | 69 / 265 (26%) | |
| White | 2472 / 3257 (76%) | 2276 / 2992 (76%) | 196 / 265 (74%) | |
| Sleep quality | 0.002 | |||
| Poor | 1093 / 3177 (34%) | 1014 / 2914 (35%) | 79 / 263 (30%) | |
| Fair | 1859 / 3177 (59%) | 1707 / 2914 (59%) | 152 / 263 (58%) | |
| Good | 225 / 3177 (7.1%) | 193 / 2914 (6.6%) | 32 / 263 (12%) | |
| Loneliness | 2041 / 2962 (69%) | 1967 / 2859 (69%) | 74 / 103 (72%) | 0.500 |
| Anxiety | 2479 / 3185 (78%) | 2329 / 2922 (80%) | 150 / 263 (57%) | <0.001 |
| Depression | 2228 / 3181 (70%) | 2091 / 2919 (72%) | 137 / 262 (52%) | <0.001 |
| Nightmares | 498 / 3079 (16%) | 474 / 2825 (17%) | 24 / 254 (9.4%) | 0.002 |
| COVID-19 | 78 / 3173 (2.5%) | 64 / 2914 (2.2%) | 14 / 259 (5.4%) | 0.001 |
| Heart Disease | 884 / 3257 (27%) | 784 / 2992 (26%) | 100 / 265 (38%) | <0.001 |
| High Blood Pressure | 2444 / 3257 (75%) | 2226 / 2992 (74%) | 218 / 265 (82%) | 0.005 |
| Arthritis | 2437 / 3257 (75%) | 2225 / 2992 (74%) | 212 / 265 (80%) | 0.043 |
| Osteoporosis | 1185 / 3257 (36%) | 1060 / 2992 (35%) | 125 / 265 (47%) | <0.001 |
| Diabetes | 939 / 3257 (29%) | 840 / 2992 (28%) | 99 / 265 (37%) | 0.001 |
| Stroke | 65 / 3257 (2.0%) | 47 / 2992 (1.6%) | 18 / 265 (6.8%) | <0.001 |
n / N (%).
Non-missing Frequences (%).
Pearson's Chi-squared test.
Statistical analysis
Figure 1(a) presents the variable importance of ranking for predicting the risk of AD. Stroke emerged as the most critical predictor, followed by age groups, indicating a strong association between advanced age and dementia risk. Depression, osteoporosis, and sleep quality also showed notable contributions, indicating that mental health and sleep patterns played significant roles in AD development. Other important predictors included race, sex, anxiety, arthritis, COVID-19, diabetes, loneliness, heart disease, nightmares, and high blood pressure. These findings highlighted the complex interplay of multiple health conditions in dementia risk.
Figure 1.
Key predictors of Alzheimer's disease-related dementia: variable importance and regression analysis.
Figure 1(b) is a coefficient path plot from a regularized regression model, LASSO, displaying how predictor variables contribute to explaining variance in the dataset. As regularization strength decreases (moving from left to right), more variables enter the model, with some showing strong positive or negative associations. The steep drop in some coefficients indicates that certain variables had a substantial influence when included, whereas others contributed minimally. The results confirmed that certain key features, including stroke, aging, and depression, are more predictive of the risk of AD than others. Both plots demonstrated that AD was influenced by multiple interacting factors, highlighting the need for a comprehensive approach to risk assessment and prevention.
Figure 2 presents the results of an RF analysis designed to identify significant predictors of AD. The x-axis represents the variable importance measure (vimp), which quantifies the contribution of each predictor to the classification model. The y-axis lists the potential risk factors analyzed in the study. The colored dots indicate statistical significance, with red representing significant predictors and blue denoting non-significant variables.
Figure 2.
Random forest (RF) for finding predictors that are significantly associated with AD-related dementia. Red: significantly associated, Blue: Not significant.
The results showed strokes were the most significant risk factor for dementia, reinforcing its long-established role in the development of the disease. Other significant predictors included age, osteoporosis, nightmares, loneliness, high blood pressure, heart disease, diabetes, depression, COVID-19, arthritis, and anxiety, all of which contributed significantly to the model's predictive ability (Figure 2). These findings were consistent with prior LASSO regression models, which indicated that cognitive decline played a crucial role. Additionally, depression, loneliness, and anxiety as significant predictors underscored the growing recognition of psychological well-being as a dementia risk factor. Interestingly, variables such as race, nightmares, heart disease, and arthritis did not show a significant association with AD in this analysis (Figure 2). While these factors have been explored, their lack of significance in this model indicated that their impact may be less pronounced when other risk factors were accounted for. Overall, these findings provided valuable insights into the multifaceted nature of AD risk because of the complexity of the data. Results from both LASSO regression and RF models were consistent, identifying similar predictors and directional associations with the risk of AD.
Figure 3 presents a regression tree model that stratified the probability or likelihood of AD based on chronic conditions, behavioral health factors, and demographic characteristics. Stroke was the primary splitting variable (Node 1, p < 0.001), indicating that the history of stroke was the strongest predictor of AD. Among individuals with a stroke history, the next important predictor was diabetes (Node 2, p = 0.048). Within this group, sex (Node 3, p = 0.029) further differentiated risk levels. Specifically, females with both stroke and diabetes (Node 4, n = 16) showed the highest probability of AD, with approximately 68% classified as having AD. In contrast, males with stroke and diabetes (Node 5, n = 10) had a much lower AD probability, around 10%. Individuals with stroke but without diabetes (Node 6, n = 39) had a moderate risk of AD, with a predicted probability of about 18% (Figure 3).
Figure 3.
Random forest tree regression for Alzheimer's diseases and behaviors disturbance and chronic diseases in the United States.
In the absence of stroke, osteoporosis emerged as the most significant predictor (Node 7, p = 0.007). Individuals with no stroke and no osteoporosis (Node 8, n = 2039) had the lowest AD probability, estimated at ∼6% (Figure 3). Among those with osteoporosis, COVID-19 positive (Node 9, p = 0.006) further stratified risk. Those who tested positive for COVID-19 (Node 10, n = 28) had a predicted AD probability of approximately 32%, while those who tested negative (Node 11, n = 1125) had a lower, yet still elevated, risk of ∼9% (Figure 3).
Figure 4 left panel represents the receiver operating characteristic (ROC) curve assessing the ability of the predictive model to distinguish between older adults with and without AD. The ROC curve was constructed by plotting sensitivity (true positive rate) against 1-specificity (false positive rate) across varying classification thresholds. The red curve represents the model's performance, while the black diagonal line corresponds to a non-discriminatory classifier (AUC = 0.50). The model achieved an area under the curve (AUC) of 0.82, reflecting strong discriminative capacity, indicating that the model correctly differentiates AD cases from non-cases approximately 82% of the time (Figure 4).
Figure 4.
Model performance for predicting Alzheimer's disease: ROC curve and calibration plot.
Similarly Figure 4 (right panel) shows the calibration plot, which evaluates the agreement between predicted and observed probabilities of AD. The x-axis displays the observed AD event rate (obsRate), and the y-axis represents the predicted event rate (predRate). The black reference line indicates perfect calibration, where predicted risk equals observed risk. The red line illustrates the model's calibration fit, and the blue dots represent grouped observed rates from the dataset. The close alignment of the red calibration curve with the black reference line demonstrates that the model's predicted probabilities closely matched actual AD rates across the range of risk levels (Figure 4). Minor deviations at higher observed rates suggest slight underestimation of AD risk in these subgroups. Overall, the model demonstrates both high discrimination and strong calibration, supporting its utility as a reliable tool for identifying high-risk profiles for AD among older adults.
Discussion
The study findings provide valuable insight into the risk factors associated with AD, highlighting the complex interplay between factors such as demographic, behavioral disturbance, and chronic disease variables. Based on an analysis of 3257 older adults affected by the COVID-19 pandemic in the U.S., significant associations have been found between various predictors and the risk of AD. Stroke, osteoporosis, diabetes, and COVID-19 are strongly associated with this disease among the study participants, while other predictors exhibit various significant relationships. Additionally, the machine learning model offers novel insights into the low-, intermediate-, and high-risk subgroups of AD within the study population. However, due to the complex data structure and intercorrelation among predictors, the strength of these associations cannot be delineated for older adults.
Consistent with previous research, age emerge as a significant predictor of AD.2,29 The proportion of patients with AD increase significantly with age, particularly in the 80–89 and 90+ years groups of the older adults affected by the COVID-19 pandemic in the United States. The findings align with the well-established findings that age is the most significant risk factor for dementia.30,31 Additionally, the higher prevalence of comorbid conditions such as heart disease, high blood pressure, arthritis, osteoporosis, diabetes, and stroke in the AD group suggests that these conditions may exacerbate or contribute to the risk of dementia. These findings are consistent with studies indicating that cardiovascular and metabolic diseases are significant risk factors for cognitive decline. 32 The higher prevalence of AD among women has been widely documented, with women accounting for approximately two-thirds of AD cases. 33 The decline in estrogen levels of post-menopause has been linked to neurodegenerative processes, increasing women's susceptibility to AD. 4
Additionally, women tend to exhibit faster disease progression, suggesting sex-specific neurobiological mechanisms that warrant further investigation. 34 Our study findings align with this finding that the participants who have been affected by COVID-19 have a higher rate of AD-related dementia than their counterparts. In addition, the risk of developing AD-related dementia is nearly twice as high for African Americans and 1.5 times higher for Hispanics. 35 High rates of cardiovascular disease, diabetes, hypertension, and obesity all contribute to these disparities. 36 Minorities often suffer from delayed diagnoses and low participation in clinical trials, which limits the generalizability of research findings.34,36 However, our study findings indicate that race appear to insignificantly influence the development of this disease among older adults. This result may be due to a relatively small sample size or the predominance of White participants in the study population. Consequently, the findings may not fully capture potential racial disparities. Further research with more diverse and representative samples is needed to better understand the role of race in disease risk.
Interestingly, anxiety and depression were less prevalent in AD compared to the non-AD group. The findings could be due to underdiagnosis or altered symptom presentation in dementia patients, as cognitive impairment may mask traditional symptoms of these psychological conditions.5,37 Furthermore, it may be the result of a survival bias, where individuals with severe anxiety or depression are less likely to reach the advanced age at which AD is diagnosed. Further research is needed to understand these relationships and their implications for diagnosis and treatment. Moreover, the association between sleep quality and AD-related dementia is complex. While patients with AD were more likely to report good sleep quality, this finding may be influenced by recall bias or the subjective nature of sleep assessments in cognitively impaired individuals. Previous studies have shown that sleep disturbances are common in dementia, other phycological disorder and may contribute to disease progression.11,38 In addition, the lower prevalence of nightmares in the AD could suggest changes in dream recall or altered sleep architecture in dementia patients. 38
There is strong evidence linking chronic diseases to cognitive decline and AD. Prior stroke is a significant risk factor for vascular dementia, but it also increases the likelihood of developing mixed dementia (AD+ vascular dementia). 39 Poor cardiovascular health reduces cerebral perfusion, leading to chronic brain hypoxia and neurodegeneration.7,39 Midlife hypertension or high blood pressure is a well-established risk factor for late-life cognitive decline and AD.7,39 This study demonstrates a mixture of findings about these chronic diseases because of structural relationships among covariates and AD. Further research is needed to understand these relationships and their implications for diagnosis and the event of interest.
Osteoporosis has been increasingly recognized as a potential risk factor for AD-related dementia, with shared biological mechanisms such as chronic inflammation, oxidative stress, and hormonal dysregulation. 40 Bone loss is associated with reduced estrogen levels in aging women, which may accelerate both osteoporosis and neurodegeneration. 40 Furthermore, individuals with osteoporosis often experience falls and head trauma, which are additional contributors to cognitive decline. Diabetes is a well-documented risk factor for both vascular and neurodegenerative dementia. 41 Additionally, diabetes exacerbates vascular dysfunction, increasing the likelihood of microvascular brain damage. 41 Consistent with previous research, our study reinforce that osteoporosis and a history of diabetes are significant predictors of AD risk among older adults. These conditions may contribute to cognitive decline through shared metabolic or vascular pathways. Given their potential impact, we recommend further investigation to clarify the mechanisms linking these health factors to AD.
The higher prevalence of COVID-19 in the AD group highlights the vulnerability of this population to infections. This finding is consistent with reports indicating that individuals with dementia are at increased risk for severe outcomes from COVID-19 due to factors such as age, comorbidities, and potential difficulties in adhering to preventive measures. 42 It is evident from this that targeted interventions are required to protect dementia patients during pandemics. Emerging evidence suggests that COVID-19 infection significantly increases the risk of cognitive decline, depression, and dementia, particularly in older adults.11,43 The neuroinflammatory response triggered by SARS-CoV-2 may accelerate pathological processes associated with AD, such as amyloid-beta aggregation and tau hyperphosphorylation. 44 Additionally, post-COVID cognitive impairment (i.e., “brain fog”) has been reported, raising concerns about long-term neurodegenerative consequences.45,46
The regression tree analysis offers a novel and interpretable, data-driven approach to identifying high-risk profiles for AD among older adults. The tree structure revealed distinct combinations of demographic and clinical variables that significantly stratify AD risk. Most notably, stroke emerged as the primary and strongest predictor, reinforcing its well-established role in the pathogenesis of AD and other forms of dementia. Among individuals with a history of stroke, the presence of diabetes further increased AD risk, and sex played a modifying role – females with both stroke and diabetes exhibited the highest probability of AD (approximately 68%), suggesting a compounded vulnerability within this subgroup. For stroke survivors without diabetes, the risk of AD remained elevated but was comparatively lower (approximately 15%), while males with stroke and diabetes had an even lower AD probability (∼10%), underscoring the importance of sex-specific patterns in disease susceptibility. These findings emphasize the interaction between vascular, metabolic, and biological sex factors, highlighting the need for targeted cognitive screening among older women with multiple comorbidities. 47
Among individuals without a stroke history, the presence of osteoporosis significantly differentiated risk. While those without both stroke and osteoporosis showed the lowest AD probability (∼6%), the risk increased among those with osteoporosis-particularly in those who also tested positive for COVID-19. In this subgroup, the probability of AD reached approximately 32%, suggesting a potential compounding effect of COVID-19 on already vulnerable individuals with chronic conditions such as osteoporosis. This is consistent with emerging research suggesting that COVID-19 may contribute to long-term neurological and cognitive consequences, especially in older adults.40,41
The predictive model demonstrated strong discriminatory ability for identifying older adults at risk for AD) with an AUC of 0.82, indicating good accuracy in distinguishing cases from non-cases. Previous studies shown for different study outcomes, machine learning models provided better accuracy with reliable decision.13,24,25 Calibration analysis showed close agreement between predicted and observed AD rates, supporting the reliability of predicted probabilities across risk levels. Minor underestimation at higher observed rates is consistent with prior dementia prediction research, likely reflecting smaller subgroup sizes and greater heterogeneity.21,25
These results suggest the model can effectively support early identification of high-risk individuals, enabling timely interventions targeting modifiable risk factors such as cardiovascular health, diabetes management, and lifestyle changes, which may help reduce dementia incidence.24,25 The combination of high discrimination and good calibration underscores the potential for integrating such machine learning models into research and clinical decision-making. 48 External validation in diverse populations and inclusion of additional biomarkers could further enhance predictive performance.
From a public health perspective, these results reinforce the need for early intervention and management of chronic diseases as part of dementia prevention strategies. Targeted education and health promotion initiatives for older adults-especially those with a history of stroke, diabetes, or osteoporosis-could mitigate modifiable risk factors. Additionally, given the interpretability of regression tree models, these results can inform the development of risk stratification tools to aid clinicians in identifying individuals most in need of preventive services and cognitive monitoring. This study supports substantiable development by identifying key risk factors for AD, which can guide prevention and management in older adults. It addresses by highlighting potential disparities and the need for equitable healthcare. The use of machine learning demonstrates innovation in early detection and personalized care. Overall, the findings can inform strategies to promote healthy aging and reduce disease burden for older adult populations.
While this study provides important insights, it has several limitations. The cross-sectional design limits our ability to infer causality. A limitation regarding external validity is that NHATS participants represent U.S. Medicare beneficiaries aged 65 and older, so the findings may not be generalizable to younger populations or to populations outside the United States. Longitudinal studies are needed to explore the temporal relationships between risk factors and AD. Additionally, reliance on self-reported data for some variables may introduce bias. Moreover, the dataset did not include other health conditions or covariates for analysis. Future research should incorporate objective measures and consider the potential impact of genetic and environmental factors on this disease condition.
Conclusion
The machine learning models provide a robust framework for understanding the complex interplay between these predictors and the risk of AD. These models offer a comprehensive view of the factors that may influence risk and progression by considering multiple variables simultaneously. The machine learning model revealed that a history of stroke, diabetes, and osteoporosis-particularly in combination-markedly increased AD risk. Women with both stroke and diabetes and individuals with osteoporosis and a history of COVID-19 infection were identified as the highest-risk subgroups. In contrast, those without stroke or osteoporosis had the lowest risk. These findings demonstrate the utility of interpretable machine learning models for identifying complex, high-risk profiles and support the integration of such tools into clinical practice to enable earlier detection and targeted prevention efforts in vulnerable populations.
Acknowledgements
The authors thank NHATS for giving access to the data. During the preparation of this work, we used ChatGPT in order to revise and improve language and readability with citation. After using this tool/service, we reviewed and edited the content as needed and took full responsibility for the content of the publication.
Footnotes
ORCID iD: Md Roungu Ahmmad https://orcid.org/0000-0002-3886-5777
Ethical considerations: Not applicable.
Consent to participate: Not applicable.
Consent for publication: Not applicable.
Author contribution(s): Md Roungu Ahmmad: Conceptualization; Data curation; Formal analysis; Formal analysis; Investigation; Methodology; Project administration; Resources; Software; Supervision; Validation; Visualization; Writing – original draft; Writing – review & editing.
Emran Hossain: Conceptualization; Investigation; Writing – original draft; Writing – review & editing.
Md Tareq Ferdous Khan: Conceptualization; Formal analysis; Investigation; Methodology; Validation; Writing – original draft; Writing – review & editing.
Sumitra Paudel: Conceptualization; Validation; Writing – original draft; Writing – review & editing.
Funding: The authors received no financial support for the research, authorship, and/or publication of this article.
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
The data analyzed in this study are publicly available through the National Health and Aging Trends Study (NHATS) (https://www.nhats.org/researcher/data-access).
References
- 1.Blank RH. Alzheimer’s disease and other dementias: an introduction. In: Blank RH. (ed.) Social & public policy of Alzheimer’s disease in the United States. Singapore: Springer, 2019, pp.1–26. [Google Scholar]
- 2.Nichols E, Steinmetz JD, Vollset SE, et al. Estimation of the global prevalence of dementia in 2019 and forecasted prevalence in 2050: an analysis for the global burden of disease study 2019. Lancet Public Health 2022; 7: e105–e125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Livingston G, Huntley J, Sommerlad A, et al. Dementia prevention, intervention, and care: 2020 report of the lancet commission. Lancet 2020; 396: 413–446. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ferretti MT, Iulita MF, Cavedo E, et al. Sex differences in Alzheimer disease—the gateway to precision medicine. Nat Rev Neurol 2018; 14: 457–469. [DOI] [PubMed] [Google Scholar]
- 5.Mayeda ER, Glymour MM, Quesenberry CP, et al. Inequalities in dementia incidence between six racial and ethnic groups over 14 years. Alzheimers Dement 2016; 12: 216–224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Arya PK, Sur K, Dhote S, et al. Integrating multi-source satellite imagery and socio-economic household data for wealth-based poverty assessment of India: a GIS and machine learning based approach. Soc Indic Res 2025; 179: 653–676. [Google Scholar]
- 7.Gottesman RF, Albert MS, Alonso A, et al. Associations between midlife vascular risk factors and 25-year incident dementia in the atherosclerosis risk in communities (ARIC) cohort. JAMA Neurol 2017; 74: 1246–1254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Biessels GJ, Despa F. Cognitive decline and dementia in diabetes mellitus: mechanisms and clinical implications. Nat Rev Endocrinol 2018; 14: 591–604. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Diniz BS, Butters MA, Albert SM, et al. Late-life depression and risk of vascular dementia andAlzheimer’s disease: systematic review and meta-analysis of community-basedcohort studies. Br J Psychiatry 2013; 202: 329–335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Bubu OM, Brannick M, Mortimer J, et al. Sleep, cognitive impairment, and Alzheimer’s disease: a systematic review and meta-analysis. Sleep 2017; 40: zsw032. [DOI] [PubMed] [Google Scholar]
- 11.Ahmmad R, Burns PA, Alam A, et al. Understanding the Impact of Social Engagement Activities, Health Protocol Maintenance, and Social Interaction on Depression During Covid-19 Pandemic Among Older Americans, https://www.hilarispublisher.com/open-access/understanding-the-impact-of-social-engagement-activities-health-protocol-maintenance-and-social-interaction-on-depression-during-c-96657.html (accessed 23 August 2025).
- 12.Cacioppo JT, Cacioppo S. The growing problem of loneliness. Lancet 2018; 391: 426. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.You J, Zhang Y-R, Wang H-F, et al. Development of a novel dementia risk prediction model in the general population: a large, longitudinal, population-based machine-learning study. eClinicalMedicine 2022; 53: 101665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Alemanno F, Houdayer E, Parma A, et al. COVID-19 cognitive deficits after respiratory assistance in the subacute phase: a COVID-rehabilitation unit experience. PLoS One 2021; 16: e0246590. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Iodice F, Cassano V, Rossini PM. Direct and indirect neurological, cognitive, and behavioral effects of COVID-19 on the healthy elderly, mild-cognitive-impairment, and Alzheimer’s disease populations. Neurol Sci 2021; 42: 455–465. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Davydow DS, Hough CL, Levine DA, et al. Functional disability, cognitive impairment, and depression after hospitalization for pneumonia. Am J Med 2013; 126: 615–624.e5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Murray CJ, Ezzati M, Flaxman AD, et al. GBD 2010: A multi-investigator collaboration for global comparative descriptive epidemiology. Lancet 2012; 380: 2055–2058. [DOI] [PubMed] [Google Scholar]
- 18.Patel N, Stagg BC, Swenor BK, et al. Association of co-occurring dementia and self-reported visual impairment with activity limitations in older adults. JAMA Ophthalmol 2020; 138: 756–763. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Bae S, Malcolm MP, Nam S, et al. Association between COVID-19 and activities of daily living in older adults. OTJR (Thorofare N J) 2023; 43: 202–210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Kohavi R, John GH. Wrappers for feature subset selection. Artif Intell 1997; 97: 273–324. [Google Scholar]
- 21.Pini N, Lucchini M, Esposito G, et al. A machine learning approach to monitor the emergence of late intrauterine growth restriction. Front Artif Intell 2021; 4: 622616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Liu J, Tu J, Xu L, et al. MRI-based radiomics signatures for preoperative prediction of ki-67 index in primary central nervous system lymphoma. Eur J Radiol 2024; 178: 111603. [DOI] [PubMed] [Google Scholar]
- 23.Friedman JH, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw 2010; 33: 1–22. [PMC free article] [PubMed] [Google Scholar]
- 24.Roozbeh N, Montazeri F, Farashah MV, et al. Proposing a machine learning-based model for predicting nonreassuring fetal heart. Sci Rep 2025; 15: 7812. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Ranjbar A, Taeidi E, Mehrnoush V, et al. Machine learning models for predicting pre-eclampsia: a systematic review protocol. BMJ Open 2023; 13: e074705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Breiman L. Random forests. Mach Learn 2001; 45: 5–32. [Google Scholar]
- 27.Liaw A, Wiener M. Classification and Regression by Randomforest. R News, 2, 18–22. – References – Scientific Research Publishing, https://www.scirp.org/reference/referencespapers?referenceid=2107686 (2002, accessed 23 August 2025).
- 28.Hosmer DW, Jr, Lemeshow S, Sturdivant RX. Applied logistic regression. Hoboken, NJ: John Wiley & Sons, 2013. [Google Scholar]
- 29.Brookmeyer R, Johnson E, Ziegler-Graham K, et al. Forecasting the global burden of Alzheimer’s disease. Alzheimers Dement 2007; 3: 186–191. [DOI] [PubMed] [Google Scholar]
- 30.van der Flier WM, Scheltens P. Epidemiology and risk factors of dementia. J Neurol Neurosurg Psychiatry 2005; 76: v2–v7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Stallard PJE, Ukraintseva SV, Doraiswamy PM. Changing story of the dementia epidemic. JAMA 2025; 333: 1579–1580. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Kivipelto M, Ngandu T, Fratiglioni L, et al. Obesity and vascular risk factors at midlife and the risk of dementia and Alzheimer disease. Arch Neurol 2005; 62: 1556–1560. [DOI] [PubMed] [Google Scholar]
- 33.Martinkova J, Quevenco F-C, Karcher H, et al. Proportion of women and reporting of outcomes by sex in clinical trials for Alzheimer disease: a systematic review and meta-analysis. JAMA Netw Open 2021; 4: e2124124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Viña J, Lloret A. Why women have more Alzheimer’s disease than men: gender and mitochondrial toxicity of amyloid-β peptide. J Alzheimers Dis 2010; 20: S527–S533. [DOI] [PubMed] [Google Scholar]
- 35.Barnes LL, Bennett DA. Alzheimer’s disease in African Americans: risk factors and challenges for the future. Health Aff 2014; 33: 580–586. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Weuve J, Barnes LL, Mendes de Leon CF, et al. Cognitive aging in black and white Americans: cognition, cognitive decline, and incidence of Alzheimer disease dementia. Epidemiology 2018; 29: 151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Enache D, Winblad B, Aarsland D. Depression in dementia: epidemiology, mechanisms, and treatment. Curr Opin Psychiatry 2011; 24: 461. [DOI] [PubMed] [Google Scholar]
- 38.Ju Y-ES, Lucey BP, Holtzman DM. Sleep and Alzheimer disease pathology—a bidirectional relationship. Nat Rev Neurol 2014; 10: 115–119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Pendlebury ST, Rothwell PM. Prevalence, incidence, and factors associated with pre-stroke and post-stroke dementia: a systematic review and meta-analysis. Lancet Neurol 2009; 8: 1006–1018. [DOI] [PubMed] [Google Scholar]
- 40.Nasme F, Behera J, Tyagi P, et al. The potential link between the development of Alzheimer’s disease and osteoporosis. Biogerontology 2025; 26: 43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Chatterjee S, Peters SAE, Woodward M, et al. Type 2 diabetes as a risk factor for dementia in women compared with men: a pooled analysis of 2.3 million people comprising more than 100,000 cases of dementia. Diabetes Care 2015; 39: 300–307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Wang H, Li T, Barbarino P, et al. Dementia care during COVID-19. Lancet 2020; 395: 1190–1191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Taquet M, Geddes JR, Husain M, et al. 6-month Neurological and psychiatric outcomes in 236379 survivors of COVID-19: a retrospective cohort study using electronic health records. Lancet Psychiatry 2021; 8: 416–427. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Heneka MT, Golenbock D, Latz E, et al. Immediate and long-term consequences of COVID-19 infections for the development of neurological disease. Alzheimers Res Ther 2020; 12: 69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Davis HE, Assaf GS, McCorkell L, et al. Characterizing long COVID in an international cohort: 7 months of symptoms and their impact. eClinicalMedicine 2021; 38: 101019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Perrottelli A, Sansone N, Giordano GM, et al. Cognitive impairment after post-acute COVID-19 infection: a systematic review of the literature. J Pers Med 2022; 12: 2070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Volgman AS, Bairey Merz CN, Aggarwal NT, et al. Sex differences in cardiovascular disease and cognitive impairment: another health disparity for women? J Am Heart Assoc 2019; 8: e013154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Adlung L, Cohen Y, Mor U, et al. Machine learning in clinical decision making. Med 2021; 2: 642–665. [DOI] [PubMed] [Google Scholar]




