Abstract
Objectives
Alzheimer’s disease (AD) poses a significant challenge for individuals aged 65 and older, being the most prevalent form of dementia. Although existing AD risk prediction tools demonstrate high accuracy, their complexity and limited accessibility restrict practical application. This study aimed to develop a convenience, efficient prediction model for AD risk using machine learning techniques.
Design and setting
We conducted a cross-sectional study with participants aged 60 and older from the National Alzheimer’s Coordinating Center. We selected personal characteristics, clinical data and psychosocial factors as baseline predictors for AD (March 2015 to December 2021). The study utilised Random Forest and Extreme Gradient Boosting (XGBoost) algorithms alongside traditional logistic regression for modelling. An oversampling method was applied to balance the data set.
Interventions
This study has no interventions.
Participants
The study included 2379 participants, of whom 507 were diagnosed with AD.
Primary and secondary outcome measures
Including accuracy, precision, recall, F1 score, etc.
Results
11 variables were critical in the training phase, including educational level, depression, insomnia, age, Body Mass Index (BMI), medication count, gender, stenting, systolic blood pressure (sbp), neurosis and rapid eye movement. The XGBoost model exhibited superior performance compared with other models, achieving area under the curve of 0.915, sensitivity of 76.2% and specificity of 92.9%. The most influential predictors were educational level, total medication count, age, sbp and BMI.
Conclusions
The proposed classifier can help guide preclinical screening of AD in the elderly population.
Keywords: Dementia, Machine Learning, Prognosis
STRENGTHS AND LIMITATIONS OF THIS STUDY.
Unlike previous studies, this research employs the Least Absolute Shrinkage and Selection Operator and Recursive Feature Elimination with Random Forest techniques to screen the features most relevant to Alzheimer’s disease.
The model interpretability tool (ie, the Shapley Additive exPlanation value) was used to explain the direction and magnitude of the best-performing algorithm, so as to provide more targeted suggestions for preventing Alzheimer’s disease in the elderly.
Leverages straightforward, relevant and easily identifiable risk variables.
The exclusion of several features potentially limits the model’s comprehensiveness.
The model’s generalisability and applicability across diverse populations and settings require further investigation.
Introduction
Alzheimer’s disease (AD) is a serious neurodegenerative disorder that profoundly affects the central nervous system. Its characteristic features include progressive decline in memory and cognitive functions, along with the emergence of behavioural and psychiatric symptoms. The disease was first documented by the German physician Alois Alzheimer.1 With the acceleration of the global population ageing process, the incidence of AD has risen sharply. It is predicted that 139 million people will be afflicted by this disease by 2050.2 This upward trend not only brings great pain to patients and their families but also imposes a heavy burden on the healthcare system. In 2023 alone, the total payments for healthcare, long-term care and hospice services for people aged 65 and above with dementia were estimated to reach as high as US$345 billion.3 In addition, study shows that most cases of AD are diagnosed in the mid-to-late stages.4 Therefore, a preclinical screening models for the early diagnosis of AD is necessary to improve the early detection rates, enhance timely treatment and thereby prevent disease progression. This approach aims to improve overall quality of life, reduce the burden on the healthcare system and mitigate the societal impact of the disease.
The onset of AD at an early stage occurs several years prior to the manifestation of dementia and is not accompanied by notable cognitive symptoms,5 offering a critical window for preclinical therapies.6 The objective of machine learning (ML) is to anticipate forthcoming events or circumstances that are unfamiliar to the computer system. Predictive modelling in large-scale data settings have unveiled novel connections between disease causes and manifestations.7 In AD research, ML is instrumental in predicting risk factors by handling complex data sets and uncovering patterns beyond traditional methods. These predictive models forecast disease progression, identify high-risk individuals and enable early intervention. Many researchers have identified neurobiological, genetic and neuroimaging biomarkers associated with cognitive decline in AD,8 9 showcasing the potential for AI analysis to integrate diverse data types for predictive purposes. For instance, a meta-analysis highlighted that radiomics holds significant potential for identifying patterns associated with AD progression through the analysis of medical images, thereby playing a crucial role in the early detection and diagnosis of the disease.10 Additionally, studies have shown that certain cerebrospinal fluid (CSF) biomarkers, such as β-amyloid protein (Aβ42) or the Aβ42/40 ratio, phosphorylated τ, total τ levels and/or changes in PET biomarkers, are closely associated with disease progression.11 However, the high costs associated with neuroimaging and CSF biomarkers limit their use in routine self-assessment and preclinical settings, hindering the identification of early signs of AD in older adults. Furthermore, CSF biomarkers can only be assessed through lumbar puncture, which may impose practical constraints on their clinical application.10 In contrast, scant attention has been given to acquiring cost-effective data for AD screening.
A previous study employed multivariate logistic regression (LR) to construct a risk prediction model, incorporating factors like gender, age, socioeconomic position, health status, lifestyle and genetic risk.12 However, the absence of segmented features restricts its dependability and accuracy in actual contexts. Another scholarly inquiry applied unsupervised ML to identify subgroups susceptible to dementia using demographic survey data.13 However, to the best of our knowledge, there is currently a lack of a simple and practical risk assessment tool that integrates these factors, which is essential for identifying individuals at high risk for AD in a preclinical stage. Hence, our study focuses on constructing ML models using readily available and cost-effective data. This approach streamlines prediction processes, enhancing applicability in clinical decision-making and comprehension among families. Our data set includes risk variables related to AD development, including social characteristics, behavioural habits, medical history, psychological condition and family genetic history. Through a comparative analysis of classification techniques—LR, random forest and gradient boosting—we aim to identify cost-effective and efficient approaches to predict the onset of AD in elderly individuals.
Methods
Data source
This study strictly followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD): The TRIPOD Statement.14 We retrieved data from the 2005–2022 uniform data set (UDS), which is publicly accessible within the National Alzheimer’s Coordinating Center (NACC) database.15 It compiles data from 37 Alzheimer’s Disease Research Center (ADRC)s, and contained records for over 45 000 patients. The patient cohort ranges from patients without dementia elderly individuals to those with varying degrees of dementia. The NACC-UDS is a longitudinal data set that includes clinical information, behavioural survey responses, neuropsychological test results and other diagnostic details for each subject, totalling 725 attributes.16 Annual follow-up continued until participants discontinued or experienced mortality, with researchers securing signed informed consent from all participants and coparticipants.17
Study participants
The participants were included based on the following criteria (1) according to the National Institute on Aging-Alzheimer’s Association (NIA-AA),6 the diagnostic criteria for AD were as follows: clinical identification of dementia through standardised assessments such as the Mini-Mental State Examination (MMSE), Blessed Dementia Rating Scale or equivalent tests; deficits in two or more cognitive domains, confirmed by neuropsychological evaluations; a progressive pattern of memory decline and impairment in other cognitive functions; absence of any disturbance in the level of consciousness; age of onset between 40 and 90 years, with the majority of cases occurring after the age of 65; exclusion of other systemic diseases or brain conditions that could account for the observed cognitive deficits; (2) they were 60 years of age or older. The exclusion criteria were as follows: (1) presence of suspected or confirmed symptoms of dementia other than Alzheimer’s, such as vascular dementia; and (2) a history of other severe systemic illnesses, including but not limited to malignant tumours, serious infections and significant hepatic or renal dysfunction. Study participants were classified into two groups: those who fulfilled the AD criteria and those who served as controls with normal cognitive function, without a history of neurodegenerative or significant neurological disorders (eg, Parkinson’s disease or stroke). It is important to note that chronic conditions such as diabetes and hypertension did not preclude individuals from participating in the study.
Patient and public involvement
Patients and/or the public were not directly involved in this study.
Candidate predictors
The candidate predictors related to AD were assessed based on predictors identified in previously published studies.18–21 We therefore selected 33 factors including sociodemographic (age, gender, marital status and education level), physical function and lifestyle factors (hearing and years of smoking); physiological indicators (systolic blood pressure (sbp), diastolic blood pressure and Body Mass Index (BMI)); medical history (angina pectoris (AP), atrial fibrillation (AF), stenting, heart valve replacement, congestive heart failure); mental health (anxiety and depression); sleep patterns (sleep apnoea, rapid eye movement (REM) sleep abnormalities and insomnia); chronic conditions (high blood pressure, diabetes and hypercholesterolemia); medication use (sedative-hypnotics, anticholinergics and total medication count) and genetic characteristics (cognitive impairment in at least one first-degree relative, maternal and paternal cognitive impairment). Online supplemental table S1 lists the definitions of candidate variables used for predicting AD risk in our study.
bmjopen-15-2-s001.pdf (500.6KB, pdf)
Data analysis strategy
Date set division
Data splitting was performed using the ‘traintestsplit’ function in the sklearn module, dividing the data set into three subsets, where 70% of the data set was used for training (including 20% for validation), and 30% was for testing. There is an imbalance in the data labelling categories, AD participants were a minority group. Data imbalance is a critical issue in many ML applications. Over-sampling and undersampling are two standard resampling techniques. Minority class samples are generated using over-sampling, while majority class samples are produced by undersampling. In this study, we employed the adaptive synthesis (ADASYN) technique22 to balance the sample sizes across classes. Its basic operating principle is to assign weights to different minority class samples in order to generate different amounts of synthetic data for each sample.23 In addition, compared with traditional over-sampling techniques, the new data in ADASYN is synthetically generated rather than simply duplicating the original data, which adds more diversity to the data set and appears more realistic.24 If the control group is undersampled to the same number as that of the cases, it might lead to the discard of a large amount of originally valuable data. Conversely, using ADASYN can generate more samples near the boundary between the two classes, which is conducive to reducing the learning bias initially introduced due to the unbalanced data distribution. Moreover, moderately increasing the number of AD case samples can offer the model more learning opportunities to capture AD-related features. As demonstrated in previous research,25 ADASYN has proven effective in achieving data set balance and significantly enhancing model performance after AD data set balancing. Therefore, in the proposed models, we employed the ADASYN oversampling technique to address the problem of an imbalanced data set.
Data preprocessing
Handling missing values is an essential part of preprocessing data. After data preparation, we initially excluded variables with more than 40% missing data to reduce the number of features lacking sufficient non-missing values. We then assumed that the data are missing at random (MAR) and employed MICE (Multivariate Imputation by Chained Equations) to generate five imputed data sets to address variability in missing data, which are subsequently combined into a comprehensive data set.26 Feature scaling is a method for standardising the independent features present in the data in a specific range. We are employing standardisation, which coverts different features into the range 0–1.27 Additionally, for non-normally distributed data, we are exploring robust scaling, which uses the median and the IQR for scaling.28 One-hot encoding was used to transform categorical variables into numerical variables, which assigns binary numbers (0 and 1) to input parameters.
Feature selection
Univariate analysis, including t-tests, χ2 tests and Mann-Whitney U tests were used to select the features with significant differences (p<0.05). Next, the Lasso (Least Absolute Shrinkage and Selection Operator) method was applied to further reduce the feature dimensionality. During Lasso regression, the optimal regularisation parameter λ was determined via 10-fold cross-validation, with the value of λ selected based on the minimum mean squared error. We set the regression coefficient threshold at 0.05 and retained only the variables with positive coefficients greater than this threshold. Lasso combines a penalty function with linear regression to enforce sparsity, shrinking the coefficients of some features to zero, thereby performing feature selection.29 Due to its efficacy in high dimensional data, Lasso has become widely used in both ML and statistical applications.30 However, the Lasso method may have limitations when dealing with highly correlated features, such as difficulty in completely removing redundant features. To further optimise the feature selection process, the features selected through L1 regularisation were input into the Recursive Feature Elimination Random Forest (RFE-RF) classifier with 10-fold cross-validation. RFE is a wrapper-based model method using a greedy algorithm, which can effectively remove redundant variables and select the optimal feature subset by deleting unimportant features in each iteration.31 In this study, the RF was used as the base classifier for RFE.
Model development and validation
Recent research has increasingly demonstrated the superior accuracy and efficiency of machine-learning algorithms, such as RF and Extreme Gradient Boosting (XGBoost), over traditional methods in medical classification problems.32 33 For this study, we selected RF and XGBoost as representative machine-learning models, along with the commonly used LR, to enable a comparison. XGBoost is an advanced implementation of gradient boosting that utilises an ensemble of classification and regression trees (CART).34 Each new CART is trained on the residuals of the previously trained ensemble, aiming to enhance predictive accuracy by minimising the loss function. This process, known as boosting, relies on the aggregation of multiple weak classifiers to form a strong predictive model. Each CART assigns a real score to each leaf node, and these scores are aggregated to compute the final prediction. The predictive power of XGBoost is further enhanced through the use of additive functions that assess the combined scores.34 RF is an ensemble-based classification algorithm that utilises multiple decision trees like XGBoost. Unlike XGBoost, which builds a sequence of trees to minimise a loss function, RF computes predictive scores by averaging the predictions of each individual tree in the ensemble.35 In RF, each is trained on a random subset of the data, a process known as bagging, which introduces diversity and reduces the risk of overfitting, making it suitable for complex, non-linear data analysis.36 LR employs the logistic function to model the probability of the result, presuming a linear relationship between the predictor variables and the log odds of the outcome, and it is widely used in diverse fields.37 Given its standing as a quintessential linear model for binary outcomes, it is incorporated as a benchmark.
The performance of these three predictive models was compared using area under the curve (AUC), specificity, sensitivity, accuracy, precision and F1 score. The DeLong test is used to compare the performance of different models in terms of AUC, and p<0.05 is considered significant. Additionally, the Brier score quantifies the average-squared discrepancy between the observed and projected outcomes, ranging from 0 to 1, where a score of 0 indicates the best possible calibration, while a score >0.3 denotes poor calibration.38 Finally, clinical utility was assessed using decision curve analysis (DCA), which shows graphically the net benefit obtained by applying the strategy of treating an individual if and only if predicted risk is larger than a threshold in function of the threshold probability.39 DCA evaluates the value of the information provided by the prediction model for informing clinical decisions without requiring additional data beyond those used for model development.40
Tuning of parameters
Hyperparameter tuning is crucial in ML model development to identify the best combination of hyperparameters that optimise model performance. We used the GridSearchCV method from the scikit-learn library for hyperparameter tuning. The GridSearchCV technique tunes hyperparameters in ML by exhaustively searching a defined parameter grid to determine the best hyperparameter combination that optimises model performance. Once the best hyperparameters are identified, the model is retrained using the entire training data set.41 The parameters used for each model are shown in online supplemental table S2. Additionally, to assess the model’s optimisation and improve its generalisation ability, we employed k-fold cross-validation technique. This method systematically evaluates the model’s performance across all data samples by segmenting the data set into ‘k’ distinct parts. Each part serves once as a test set, while the others are used to train the model, ensuring rigorous and robust testing.42 In our study, we selected ‘k’ to be 5, meaning the data set was divided into five parts. For each cycle of cross-validation, fourfolds are used to train the model, and the fifth fold serves as the test set. This process is repeated five times to allow for a thorough performance evaluation of the model.
Model interpretability
To gain better insight into the decision-making of our ML models, we utilised the Shapley Additive Explanations (SHAP) method to generate feature importance ranking plots. These plots visually represent each predictor’s contribution to the target variable. The magnitude of the SHAP value indicates the degree to which a feature influences the likelihood of an undesirable event.43 The ML was implemented in Python (V.3.9) using the scikit-learn (V.1.0.2). The overview of the proposed ML algorithms is shown in figure 1.
Figure 1.
Flowchart describing the overall research framework. NACC, National Alzheimer’s Coordinating Center; LASSO, Least Absolute Shrinkage and Selection Operator; RFE-RF, recursive feature elimination random forest; XGBoost, extreme gradient boosting.
Statistical analysis
Baseline categorical variables have been reported as numbers and proportions and compared using a χ2 test or Fisher exact test. Continuous variables were evaluated for normality using the Kolmogorov-Smirnov (K-S) test, which compares the sample data against a normal distribution with the same mean and variance, providing a p-value as the result. If the p>0.05, the data were deemed normally distributed and reported as means with SD, with comparisons performed using the independent-samples t-test. Conversely, if the p≤0.05, the data were considered non-normally distributed, presented as medians with IQRs, and compared using the Mann-Whitney U test. In this stage, the data set without preprocessing was used.
Results
Population characteristics
The original data set encompassed 2379 individuals, with 893 (37.5%) identified as male and 1486 (62.5%) as female. Among individuals with an average age of participants was 70.6 years, spanning from 60 to 98 years, of whom 507 had AD and 1872 were cognitive normal (CN) subjects. The sample size was estimated based on the established formula for cross-sectional studies (ie, . Here, n represents the sample size, is the two-sided test statistic corresponding to the significance level α, and d is the margin of error. In this study, α=0.05 (), with d=2%. According to the report of Alzheimer’s Disease Association,44 the incidence of AD among people aged 65 and over have a 10.9% (p). Based on these parameters, the sample size was calculated as 933. Considering a 10% non-response rate, the expected sample size was 1037. Therefore, the maximum sample size requirement was met in this study. Personal information was rigorously excluded from the data set to ensure privacy. According to the K-S test, the data for all continuous variables are non-normally distributed. The basic characteristics of all participants were shown in table 1. All participants were randomly assigned to a training set (n=1665) and a test set (n=714). Online supplemental table S3 provides descriptive statistics and analysis of variance (ANOVA) for the two data sets. ADASYN produced 1384 cases in the set to balance AD and CN patients. Online supplemental figure S1 shows the number of observations in each of the categories of the AD outcome variable in the imbalanced and balanced data sets.
Table 1.
Descriptive statistics and statistical analyses of variance for each variable in AD and CN patients in the unprepared data set
| Variables | AD (n=507) | CN (n=1872) | P value | ||
| Demographic characteristic | |||||
| Age | 60–98 | – | 73.0 (12.0) | 70.0 (8.0) | <0.0001 |
| Sex | 1 | Male | 232 (45.76%) | 661 (35.31%) | <0.0001 |
| 2 | Female | 275 (54.24%) | 1211 (64.69%) | ||
| Marital | 1 | Married | 379 (74.75%) | 1238 (66.13%) | 0.0003 |
| 2 | Other | 128 (25.25%) | 634 (33.87%) | ||
| Educational level | 1–25 | – | 16.0 (6.0) | 16.0 (2.0) | <0.0001 |
| Family history of cognitive impairment | |||||
| At least one family member at level 1* has a cognitive impairment | 0 | No | 209 (41.22%) | 745 (39.80%) | 0.5961 |
| 1 | Yes | 298 (59.78%) | 1127 (60.20%) | ||
| Mother has cognitive impairment | 0 | No | 333 (65.68%) | 1094 (58.44%) | 0.0037 |
| 1 | Yes | 174 (34.32%) | 778 (41.56%) | ||
| Father has cognitive impairment | 0 | No | 413 (81.46%) | 1475 (78.79%) | 0.2097 |
| 1 | Yes | 94 (18.54%) | 397 (21.21%) | ||
| Psychiatric disorder and sleep disorder | |||||
| Depression | 0 | No | 395 (77.91%) | 1765 (94.28%) | <0.0001 |
| 1 | Yes | 112 (22.09%) | 107 (5.72%) | ||
| Anxiety | 0 | No | 466 (91.91%) | 1832 (97.38%) | <0.0001 |
| 1 | Yes | 41 (8.09%) | 49 (2.62%) | ||
| Sleep apnoea | 0 | No | 429 (84.62%) | 1584 (84.62%) | 1.0000 |
| 1 | Yes | 78 (15.38%) | 288 (15.38%) | ||
| REM | 0 | No | 464 (91.52%) | 1839 (98.24%) | <0.0001 |
| 1 | Yes | 43 (8.48%) | 33 (1.76%) | ||
| Insomnia | 0 | No | 464 (91.52%) | 1616 (86.32%) | 0.0023 |
| 1 | Yes | 43 (8.48%) | 256 (13.68%) | ||
| Health history | |||||
| Angina pectoris | 0 | No | 491 (96.84%) | 1831 (97.81%) | 0.2724 |
| 1 | Yes | 16 (3.16%) | 41 (2.19%) | ||
| Atrial fibrillation | 0 | No | 485 (95.66%) | 1782 (95.19%) | 0.7463 |
| 1 | Yes | 22 (4.34%) | 90 (4.81%) | ||
| Stenting | 0 | No | 468 (92.31%) | 1807 (96.53%) | 0.0001 |
| 1 | Yes | 39 (7.69%) | 65 (3.47%) | ||
| Congestive heart failure | 0 | No | 499 (98.42%) | 1861 (99.41%) | 0.0523 |
| 1 | Yes | 8 (1.58%) | 11 (0.59%) | ||
| Heart valve replacement | 0 | No | 501 (98.82%) | 1852 (98.93%) | 1.0000 |
| 1 | Yes | 6 (1.18%) | 20 (1.07%) | ||
| Hearing | 0 | No | 405 (79.88%) | 1511 (80.72%) | 0.7206 |
| 1 | Yes | 102 (20.12%) | 361 (19.28%) | ||
| Lifestyle and health measures | |||||
| sbp/mm Hg | 87–229 | – | 138.0 (24.0) | 132.0 (23.0) | <0.0001 |
| dbp/mm Hg | 36–120 | – | 78.0 (14.0) | 77.0 (13.0) | 0.1694 |
| Years of smoking | 0–66 | – | 0.0 (8.0) | 0.0 (10.0) | 0.0635 |
| BMI | 16.3–56.1 | – | 25.8 (6.3) | 26.6 (6.4) | 0.0016 |
| Chronic disease and medication history | |||||
| High blood pressure | 0 | No | 261 (51.48%) | 1090 (58.23%) | 0.0076 |
| 1 | Yes | 246 (48.52%) | 782 (41.77%) | ||
| Hypercholesterolemia | 0 | No | 235 (46.35%) | 947 (50.59%) | 0.1005 |
| 1 | Yes | 272 (53.65%) | 925 (49.41%) | ||
| Diabetes | 0 | No | 441 (86.98%) | 1699 (90.76%) | 0.0144 |
| 1 | Type 1 | 5 (0.99%) | 6 (0.32%) | ||
| 2 | Type 2 | 61 (12.03%) | 167 (8.92%) | ||
| Sedative-hypnotics | 0 | No | 417 (82.25%) | 1606 (85.79%) | 0.0557 |
| 1 | Yes | 90 (17.75%) | 266 (14.21%) | ||
| Anticholinergics | 0 | No | 490 (96.665%) | 1827 (97.60%) | 0.3016 |
| 1 | Yes | 17 (3.35%) | 45 (2.40%) | ||
| Drugs taken | 0–28 | – | 7.0 (6.0) | 6.0 (6.0) | <0.0001 |
*Level 1 refers to first-degree relatives, including parents, siblings and children.
AD, Alzheimer’s disease; BMI, Body Mass Index; CN, cognitive normal; dbp, diastolic blood pressure; REM, rapid eye movement; sbp, systolic blood pressure.
Feature selection
In the feature selection stage, univariate analysis detailed in table 1 identified 15 variables significantly correlated with AD risk (p<0.05), which were then earmarked for further selection. LASSO regression was then applied, with an optimal λ value of 4.3, as shown in figure 2. The model retained 13 features, with their importance ranked in online supplemental figure S2. Finally, the RFE-RF method was used to refine the feature set. As shown in online supplemental figure 3A, the model achieved the highest accuracy when 11 features were retained. These 11 features—educational level, depression, insomnia, age, BMI, medication count, gender, stenting, sbp, neurosis and REM—were ultimately selected for ML assessment. The results of the feature importance analysis are presented in online supplemental figure 3B.
Figure 2.
Variable selection based on Lasso regression. (A) Coefficients-lambda graph: each coloured curve in the figure represents the changing trend of each feature coefficient value with the logarithm (λ). When λ increases, the penalty for unimportant features in the model intensifies, causing the corresponding coefficients to tend towards zero. (B) MSE-lambda graph: tuning parameter (λ) selection in the LASSO model used 10-fold cross-validation via minimum criteria, visually presents the performance of the model at various levels of regularisation strength. LASSO, Least Absolute Shrinkage and Selection Operator; MSE, mean squared error.
Model evaluation
Receiver operating characteristic (ROC) curves for the test sets are shown in figure 3. The ROC curve indicates that the XGBoost and RF models exhibit comparable performance in terms of AUC values, both of which are 0.91. Overall, the XGBoost model slightly outperforms the RF model on several metrics, including accuracy (0.843), specificity (0.929), precision (0.919) and F1 score (0.833) on the test set. Whereas the RF model shows better performance in terms of sensitivity (0.797). LR shows the lowest performance (table 2). The DeLong test showed that the AUC of the XGBoost model was significantly superior to that of the LR model (DeLong test, p<0.001), while the XGBoost model did not show a statistical advantage over the RF model (DeLong test, Z statistic=0.39998, p=0.689). Additionally, the Brier score of the XGBoost model is 0.1103, indicating fairly good accuracy in predicting the probability of specific events occurring. Online supplemental figure S4 shows the net benefit of different models across a range of threshold probabilities. The DCA indicates that between thresholds of 0.1 and 0.4, the net benefits of the RF model are similar to those of the XGBoost model. For thresholds above 0.4, the XGBoost model consistently shows higher net benefits than other models.
Figure 3.
ROC curves of the developed machine learning models on the test data set. AUC, area under the curve; ROC, receiver operating characteristic; XGBoost, extreme gradient boosting.
Table 2.
Evaluation of the performance of the three algorithms
| Algorithm | Data set | AUC | Specificity | Sensitivity | Accuracy | Precision | F1 score |
| LR | Validation | 0.737(0.691, 0.778) | 0.709(0.649, 0.762) | 0.652(0.592, 0.707) | 0.679(0.638, 0.717) | 0.706(0.645, 0.760) | 0.678(0.629, 0.724) |
| LR | Test | 0.737 (0.704, 0.766) |
0.698 (0.657, 0.734) |
0.636 (0.595, 0.676) |
0.666 (0.637, 0.694) |
0.690 (0.645, 0.760) |
0.662 (0.627, 0.692) |
| RF | Validation | 0.933 (0.913, 0.952) |
0.846 (0.797, 0.891) |
0.835 (0.792, 0.877) |
0.841 (0.808, 0.871) |
0.854 (0.810, 0.896) |
0.844 (0.810, 0.876) |
| RF | Test | 0.912 (0.895, 0.928) |
0.863 (0.835, 0.891) |
0.797 (0.760, 0.829) |
0.829 (0.805, 0.850) |
0.860 (0.829, 0.888) |
0.827 (0.801, 0.850) |
| XGBoost | Validation | 0.928 (0.906, 0.950) |
0.795 (0.748, 0.842) |
0.909 (0.870, 0.945) |
0.850 (0.818, 0.880) |
0.904 (0.866, 0.941) |
0.846 (0.808, 0.880) |
| XGBoost | Test | 0.915 (0.897, 0.931) |
0.929 (0.909, 0.950) |
0.762 (0.727, 0.794) |
0.843 (0.823, 0.864) |
0.919 (0.895, 0.943) |
0.833 (0.809, 0.856) |
AUC, area under the curve; LR, logistic regression; RF, random forest; XGBoost, extreme gradient boosting.
Interpretability analysis of models
Given the overall best performance of the XGBoost prediction model, we selected the XGBoost model for interpretability analysis to explore the most important predictive factors. Using the SHAP method, the shap.forceplot() function was employed to generate a plot for the 11 significant features in the XGBoost model, as shown in online supplemental figure S5. The x-axis displays SHAP values, while the y-axis lists the features, with each point representing a sample. The colour gradient from blue to red indicates low to high feature values, respectively. The top five features by importance are educational level, total medication count, age, sbp and BMI.
Comparison with existing methods
Gomar et al45 employed ANOVA to identify significant characteristics among biomarkers and cognitive indicators, subsequently using these features to train LR models. Their models achieved a commendable accuracy rate of 72%. Similarly, Zhang et al46 utilised baseline MRI, FDG-PET and CSF data, applying a Multi-Modal Multi-Task (M3T) learning approach to differentiate between mild cognitive impairment due to Alzheimer’s disease (MCI-C) and non-Alzheimer’s mild cognitive impairment (MCI-NC). Their method achieved an accuracy of 73.9% and an AUC of 0.797 on a data set with 38 MCI-C and 42 MCI-NC cases. The criteria for MCI were consistent with those defined by Petersen.47 In contrast to studies that differentiate between cognitively mild cognitive impairment (cMCI) and non-cognitive mild cognitive impairment (ncMCI), our investigation presents a unique challenge by focusing on the early identification of AD in a population of elderly individuals with typical cognitive abilities. This task is particularly challenging due to the subtle cognitive changes that maybe indicative of early AD, which can be easily overlooked.48 Monai et al49 developed a ML model incorporating socio-demographic data, self-reported health information and blood biomarkers, achieving an impressive AUC value of 0.801 for the XGB model. Burge et al50 utilised a dynamic Bayesian network on functional MRI (fMRI) data, achieving a classification accuracy of 73% in distinguishing between older adults with dementia and healthy individuals. Table 3 provides a comprehensive overview of the performance metrics for each classification technique.
Table 3.
Comparison of classification methods based on different features
| Author | Population | Methods | Modalities | Result | Data source | Sample size (mean age) |
| Wang et al12 | AD vs HC | LR | Sociodemographic data and self-health info | AUC=80.1%, SEN=73.2%, SPE=73.6% | China hospital | 1099 (66.85±4.07) |
| Gomar et al45 | AD vs MCI | LR | Blood biomarkers | AUC=80%, SEN=72.4%, SPE=78.8%, ACC=72.4% | ADNI | 517 (75.56±6.24) |
| Zhang et al46 | MCI-C vs MCI-NC | M3T | MRI, FDG-PET and CSF | AUC=79.7%, SEN=68.6%, SPE=73.6%, ACC=73.9% | ADNI | 186 (75.34±6.74) |
| Monai et al49 | AD vs HC | XGB | Sociodemographic data, self-health info and blood biomarkers |
AUC=80.1%, SEN=72.4%, SPE=78.8%, ACC=72.4%, PRE=71.6% | ADNI | 482 (74.29±6.48) |
| This study | AD vs HC | XGB | Sociodemographic data and self-health info | AUC=91.5%, SEN=76.2%, SPE=92.9%, ACC=84.3%, PRE=91.9% | NACC | 2379 (70.64±8.60) |
ACC, accuracy; AD, Alzheimer’s disease; ADNI, Alzheimer’s disease neuroimaging; AUC, area under the curve; CSF, cerebrospinal fluid; FDG-PET, fluorodeoxyglucose-positron emission tomography; HC, healthy controls; LR, logistic regression; MCI, mild cognitive impairment; MCI-NC, non-Alzheimer’s mild cognitive impairment; M3T, Multi-Modal Multi-Task; NACC, National Alzheimer’s Coordinating Center; PRE, precision; SEN, sensitivity; SPE, specificity; XGB, extreme gradient boosting.
Discussion
The escalating global prevalence of dementia, driven by demographic shifts towards an ageing population, accentuates the imperative for accurate early identification of cognitive illnesses.51 Detecting these conditions before overt symptoms manifest is crucial, given that irreversible and profound brain impairment may already be underway by the time symptoms become apparent.7 Timely identification is vital to empower at-risk individuals for proactive interventions, potentially slowing disease progression and reducing life-threatening consequences. ML emerges as a valuable tool, enhancing precision and predictive capabilities through extensive data sets and computational efficiency.52 Presently, AD identification relies heavily on clinical assessments,6 aided by technologies like MRI, PET and biomarker analysis in CSF, with genetic markers providing additional diagnostic utility. Our study centres on constructing ML models using data that are readily available, which may streamline prediction processes and potentially serve as an efficient alternative for preliminary AD screening.
The combination of LASSO and SVM-RFE techniques has demonstrated promising diagnostic performance in AD.53 However, few studies have utilised the combination of LASSO and RFE-RF to select AD-related features. In this study, we first selected statistically significant variables from univariate analysis and sequentially input these variables into the LASSO and RF-RFE algorithms to remove redundant or irrelevant features, ultimately identifying the most informative subset of variables. Using this hybrid feature selection method, we extracted a feature subset consisting of 11 variables. The XGBoost model built on this subset achieved the best classification performance (accuracy: 0.843, precision: 0.919, specificity: 0.929, recall: 0.762, F1 score: 0.833, AUC: 0.915). The DeLong test showed that the AUC of the XGBoost model was significantly superior to that of the LR model (DeLong test, p<0.001), while the XGBoost model did not show a statistical advantage over the RF model (DeLong test, z score=0.39998, p=0.689). It is noteworthy that certain factors may impact the observed performance differences. To begin with, variations in data sources may introduce biases. For example, the participants in our study were selected from the NACC database, which primarily includes individuals from ADRC at major medical institutions across the USA. In contrast, three studies45 46 49 utilised data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). According to the most recent ADNI publication, the ADNI sample consists of 79.3% White, 11.5% Black, 5.6% Latinx, 2.7% Asian, 0.8% Native American and 0.5% individuals from other racial groups.54 This variation in genetic backgrounds and demographic representation could lead to differences in model prediction performance. Additionally, since age is a critical risk factor for AD, differences in the average age of participants across studies might influence the model outcomes. Moreover, variations in study population inclusion criteria during study design may result in differing baseline cognitive levels, as measured by the MMSE. Additionally, the selection of different features for model construction can significantly impact model performance. Finally, it is crucial to acknowledge that different classifiers possess unique advantages and disadvantages, leading to diverse outcomes in testing. In this study, the performance of ML algorithms, specifically RF and XGBoost, surpasses conventional techniques. Both RF and XGBoost, as integrated learning algorithms, outperform traditional methods, aligning with previous research highlighting the efficacy of integrated methodologies in AD categorisation.55 56 The XGBoost model stood out among all the others in this study.
To improve the interpretability of the model, we also applied SHAP methods, which helped us better understand the contribution of each feature to the model’s decision-making process. This approach provides more targeted and practical recommendations for clinical strategies aimed at the early prevention of AD in elderly populations. Our SHAP analysis underscores the significance of individual characteristics, particularly the strong influence of education level on AD prediction. Higher education correlates with preserved white matter microstructure, a potential protective factor against cognitive decline.57 58 Furthermore, elevated levels of education show a strong correlation with cognitive resilience to the apolipoprotein E (APOE ε4) allele.59 Educational attainment may enhance adaptation to APOE ε4 and clusterin (CLU) genetic risk variants by bolstering cognitive reserve.60 Moreover, higher education can mitigate the adverse effects of the APOE ε4 gene and prevent memory decline.61 Medical history appears to have a smaller impact, emphasising the influence of demographic variables, mental health and lifestyle factors on prediction accuracy. Variables such as age, gender and marital status emerge as significant predictors. Research findings indicate that advanced dementia is mostly influenced by age in comparison to other contributing variables.62 Regarding gender, studies have consistently demonstrated that women not only display an increased vulnerability to AD but also experience a more accelerated decline in cognitive function compared with males at equivalent stages of the illness. This observed gender disparity is potentially linked to the endocrine activity of female hormones.63 64 Moreover, numerous studies have demonstrated a more pronounced association of APOE ε4 with women compared with men.65 66 However, it s important to highlight that this difference between sexes may diminish in more advanced age cohorts, particularly among women who have reached the postmenopausal stage.67Nevertheless, despite potential reductions in the disparity, a higher incidence of AD persists among women compared with males. Anxiety, an understudied aspect, surfaces as a noteworthy predictor of AD, echoing previous findings.68 Marital status also plays a role, with lifelong singles and the bereaved exhibiting a higher dementia risk.69 In summary, this research posits that the significance of education level persists as a crucial determinant, particularly among women with lower educational attainment. Hence, emphasising educational enrichment emerges as a beneficial strategy for averting dementia and cognitive decline, particularly given the observed significance of educational levels, especially among women with lower educational attainment. Additionally, placing emphasis on the preservation of optimal sleep quality and mental well-being stands as another valuable avenue in reducing the likelihood of developing dementia.
ML can improve clinical decision-making in various ways, including providing early warnings, aiding diagnosis, conducting broad screenings, personalising treatment and assessing patient responses to treatment.70 Integrating predictive models into clinical practice necessitates presenting predictions in a comprehensive, user-centred manner.71 An example is the clinical decision support system developed by Yang et al, based on a distance-based novelty detection model. This system includes a graphical user interface tailored for non-technical stakeholders to assist in diagnosing MCI and AD, thereby simplifying the decision-making process.72 This study’s findings indicate that XGBoost model not only demonstrate potential in predicting AD using rich data but also exhibit stable and high clinical value as evidenced by DCA. For instance, identifying high-risk individuals for AD enables early intervention, while identifying low-risk individuals helps avoid unnecessary testing and reduce healthcare costs. However, it is important to note that the net benefit values derived from DCA are not inherent properties of the model, as they depend on the distribution and characteristics of the training data set.73 Additionally, while ML models provide predictive capabilities, their decisions may not always align with clinical reasoning. Ensuring that the predictions generated by these models are clinically relevant and actionable remains an ongoing challenge in the field of predictive analytics.74
Most previous work on predicting AD progression ignore the issue of missing data.75 76 However, missing data are commonly encountered in real-world data sets. Current studies note that imputation methods are generally preferable to simple deletion of missing values in the data set, as they help to avoid bias in subsequent analyses.77 78 Additionally, in medical and health data, where physiological records may encompass a wide range of critical factors, discarding incomplete records entirely could result in data bias and wasted information.79 Therefore, this study employed MICE to handle missing data, aiming to mitigate the potential impacts of missingness. However, it should be noted that the application of MICE may introduce bias if the missing data do not adhere to the MAR assumption.
In conclusion, our study underscores the robust performance of the XGBoost model in predicting AD risk, leveraging straightforward, relevant and easily identifiable risk variables. The internal validation results affirm the model’s efficacy. However, it is crucial to acknowledge the limitations of this research. First, the NACC’s recruitment mainly comes from memory clinics and volunteers, and most of the subjects were well educated, which may not represent the general population. Expanding the research cohort by increasing participant diversity and including various geographical areas is imperative for enhancing the precision, applicability and validation of future models. Second, although the ADASYN method adopted in this study reduces the impact of class imbalance on various metrics to some extent, there is still a risk of overfitting during model construction. Further research into other independent alternative data sets will help improve the quality and reliability of these findings when available. Third, due to time and resource constraints, this study did not conduct a further evaluation of the quality of the data imputed by MICE, which may potentially affect the accuracy and reliability of the research results. Future research needs to consider various factors to determine the appropriate data processing strategy. Fourth, this study lacks external validation, and this will be the research project we plan for next. Finally, the parameters considered, especially the ‘clinical’ ones, refer to the instant of assessment when the patient is already diagnosed with AD or not. In view of this, the focus of ‘early identification’ in this study should not be overly placed on parameters that remain constant over time (such as educational level, age, gender and family history), but rather on those parameters that involve changing situations or current conditions, such as depression, insomnia, BMI, the number of medications taken, sbp and cognitive impairment. One possible future research direction could be to develop more accurate and efficient early diagnostic models or tools by long-term tracking and monitoring of these dynamic parameters, combined with advanced data modelling and analysis techniques. Addressing these limitations in subsequent research endeavours will refine the risk prediction model and advance our understanding of AD risk factors.
Conclusion
The use of models based on easily accessible, non-invasive indicators, such as BMI, may provide a practical and convenient approach to mass screening and early identification of AD. Our model streamlines risk assessment, adding early problem identification and interventions for families. Healthcare professionals efficiently refer high-risk patients to specialised care, enhancing older adults’ access to dementia care, improving their quality of life and preventing disease progression. Additionally, it serves as an educational tool for early identification of high-risk older individuals, benefiting families pivotal in identifying those susceptible to AD.
Supplementary Material
Acknowledgments
The authors would like to express sincere gratitude to all men and women who participated in this study, and the date support from the NACC database.
Footnotes
Contributors: BW drafted the manuscript. RX,XZ, CD, WQ, YS, JY and XL acquired and analyzed the data. DH, YC, BW and SC critically revised the manuscript. All authors read, edited and approved the final version of the manuscript. BW accepts full responsibility for the work and/or the conduct of the study, had access to the data, and controlled the decision to publish.
Competing interests: None declared.
Patient and public involvement: Patients and/or the public were involved in the design, or conduct, or reporting or dissemination plans of this research. Refer to the Methods section for further details.
Provenance and peer review: Not commissioned; externally peer reviewed.
Supplemental material: This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.
Data availability statement
Data are available in a public, open access repository. Detailed descriptions of the data used, as well as information on the process for obtaining it, are available at https://naccdata.org/.
Ethics statements
Patient consent for publication
Not applicable.
Ethics approval
This study involves human participants.The University of Hangzhou Normal University has granted approval for research utilising the National Alzheimer’s Coordinating Center database. Informed consent was duly acquired from all participants at the individual US Alzheimer’s Disease Centers. To uphold confidentiality, the data from the National Alzheimer’s Coordinating Center were deidentified. All procedures were conducted in strict compliance with pertinent guidelines and regulations. Participants gave informed consent to participate in the study before taking part.
References
- 1.Son JH, Shim JH, Kim KH, et al. Neuronal autophagy and neurodegenerative diseases. Exp Mol Med 2012;44:89. 10.3858/emm.2012.44.2.031 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.WHO . The world has not solved the problem of dementia, Available: https://www.who.int/zh/ news/item/02-09-2021-world-failing-to-address-dementia-challenge
- 3.2023 Alzheimer’s disease facts and figures. Alzheimers Dement 2023;19:1598–695. 10.1002/alz.13016 [DOI] [PubMed] [Google Scholar]
- 4.Comas-Herrera A, Guerchet M, Karagiannidou M, et al. World Alzheimer Report 2016: Improving healthcare for people living with dementia: Coverage, quality and costs now and in the future, 2016. Available: https://www.alzint.org/resource/ world-alzheimer-report-2016/
- 5.Frisoni GB, Winblad B, O’Brien JT. Revised NIA-AA criteria for the diagnosis of Alzheimer’s disease: a step forward but not yet ready for widespread clinical use. Int Psychogeriatr 2011;23. [DOI] [PubMed] [Google Scholar]
- 6.McKhann GM, Knopman DS, Chertkow H, et al. The diagnosis of dementia due to Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement 2011;7:263–9. 10.1016/j.jalz.2011.03.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Bratić B, Kurbalija V, Ivanović M, et al. Machine Learning for Predicting Cognitive Diseases: Methods, Data Sources and Risk Factors. J Med Syst 2018;42:30368611. 10.1007/s10916-018-1071-x [DOI] [PubMed] [Google Scholar]
- 8.Winblad B, Amouyel P, Andrieu S, et al. Defeating Alzheimer’s disease and other dementias: a priority for European science and society. Lancet Neurol 2016;15:455–532. 10.1016/S1474-4422(16)00062-4 [DOI] [PubMed] [Google Scholar]
- 9.Frisoni GB, Boccardi M, Barkhof F, et al. Strategic roadmap for an early diagnosis of Alzheimer’s disease based on biomarkers. Lancet Neurol 2017;16:661–76. 10.1016/S1474-4422(17)30159-X [DOI] [PubMed] [Google Scholar]
- 10.Bevilacqua R, Barbarossa F, Fantechi L, et al. Radiomics and Artificial Intelligence for the Diagnosis and Monitoring of Alzheimer’s Disease: A Systematic Review of Studies in the Field. J Clin Med 2023;12:5432. 10.3390/jcm12165432 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Jack CR, Bennett DA, Blennow K, et al. NIA-AA Research Framework: Toward a biological definition of Alzheimer’s disease. Alzheimers Dement 2018;14:535–62. 10.1016/j.jalz.2018.02.018 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Wang L, Li P, Hou M, et al. Construction of a risk prediction model for Alzheimer’s disease in the elderly population. BMC Neurol 2021;21:271. 10.1186/s12883-021-02276-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Cleret de Langavant L, Bayen E, Yaffe K. Unsupervised Machine Learning to Identify High Likelihood of Dementia in Population-Based Surveys: Development and Validation Study. J Med Internet Res 2018;20:e10493. 10.2196/10493 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Collins GS, Reitsma JB, Altman DG, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med 2015;162:55–63. 10.7326/M14-0697 [DOI] [PubMed] [Google Scholar]
- 15.Beekly DL, Ramos EM, Lee WW, et al. The National Alzheimer’s Coordinating Center (NACC) database: the Uniform Data Set. Alzheimer Dis Assoc Disord 2007;21:249–58. 10.1097/WAD.0b013e318142774e [DOI] [PubMed] [Google Scholar]
- 16.Morris JC, Weintraub S, Chui HC, et al. The Uniform Data Set (UDS): clinical and cognitive variables and descriptive data from Alzheimer Disease Centers. Alzheimer Dis Assoc Disord 2006;20:210–6. 10.1097/01.wad.0000213865.09806.92 [DOI] [PubMed] [Google Scholar]
- 17.Monin JK, McAvay G, Zang E, et al. Associations between dementia staging, neuropsychiatric behavioral symptoms, and divorce or separation in late life: A case control study. PLoS ONE 2023;18:e0289311. 10.1371/journal.pone.0289311 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Richardson K, Fox C, Maidment I, et al. Anticholinergic drugs and risk of dementia: case-control study. BMJ 2018;361:k1315. 10.1136/bmj.k1315 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Lin and others . Correction to: Atrial Fibrillation and Dementia: A Report From the AF-SCREEN International Collaboration. Circulation 2022;145. 10.1161/CIR.0000000000001067 [DOI] [PubMed] [Google Scholar]
- 20.Irwin MR, Vitiello MV. Implications of sleep disturbance and inflammation for Alzheimer’s disease dementia. Lancet Neurol 2019;18:296–306. 10.1016/S1474-4422(18)30450-2 [DOI] [PubMed] [Google Scholar]
- 21.van Dalen JW, Brayne C, Crane PK, et al. Association of Systolic Blood Pressure With Dementia Risk and the Role of Age, U-Shaped Associations, and Mortality. JAMA Intern Med 2022;182:142. 10.1001/jamainternmed.2021.7009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.He H, Bai Y, Garcia EA. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. IEEE world congress on computational intelligence. 2008. 10.1109/IJCNN.2008.4633969 [DOI] [Google Scholar]
- 23.Gosain A, Sardana Saanchi. Handling class imbalance problem using oversampling techniques: a review. 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI); Udupi, 2017:79–85. 10.1109/ICACCI.2017.8125820 [DOI] [Google Scholar]
- 24.Teji JS, Jain S, Gupta SK, et al. NeoAI 1.0: Machine learning-based paradigm for prediction of neonatal and infant risk of death. Comput Biol Med 2022;147:105639. 10.1016/j.compbiomed.2022.105639 [DOI] [PubMed] [Google Scholar]
- 25.Ahmed G, Er MJ, Fareed MMS, et al. DAD-Net: Classification of Alzheimer’s Disease Using ADASYN Oversampling Technique and Optimized Neural Network. Molecules 2022;27:7085. 10.3390/molecules27207085 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Sterne JAC, White IR, Carlin JB, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 2009;338:b2393. 10.1136/bmj.b2393 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Ahsan M, Mahmud M, Saha P, et al. Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance. Technologies (Basel) 2021;9:52. 10.3390/technologies9030052 [DOI] [Google Scholar]
- 28.Cao XH, Stojkovic I, Obradovic Zoran. A robust data scaling algorithm to improve classification accuracies in biomedical data. BMC Bioinformatics 2016;17:359. 10.1186/s12859-016-1236-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Tibshirani RJ. The lasso problem and uniqueness. Electron J Statist 2013;7:1456–90. 10.1214/13-EJS815 [DOI] [Google Scholar]
- 30.Liu C-X, Zhang P-A, Lu Y. Estimating stellar atmospheric parameters based on Lasso features. Res Astron Astrophys 2014;14:423–32. 10.1088/1674-4527/14/4/005 [DOI] [Google Scholar]
- 31.Liu W, Wang Jianyu. Recursive elimination–election algorithms for wrapper feature selection. Appl Soft Comput 2021;113:107956. 10.1016/j.asoc.2021.107956 [DOI] [Google Scholar]
- 32.Park JH, Cho HE, Kim JH, et al. Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data. NPJ Digit Med 2020;3:46. 10.1038/s41746-020-0256-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Shimoda A, Li Y, Hayashi H, et al. Dementia risks identified by vocal features via telephone conversations: A novel machine learning prediction model. PLoS One 2021;16:e0253988. 10.1371/journal.pone.0253988 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Chen T, Guestrin C. Xgboost: a scalable tree boosting system. Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining. PLoS One 2016.785–94. 10.1145/2939672.2939785 [DOI] [Google Scholar]
- 35.Breiman L. Random Forests. Mach Learn 2001;45:5–32. 10.1023/A:1010933404324 [DOI] [Google Scholar]
- 36.Rong G, Alu S, Li K, et al. Rainfall Induced Landslide Susceptibility Mapping Based on Bayesian Optimized Random Forest and Gradient Boosting Decision Tree Models—A Case Study of Shuicheng County, China. Water (Basel) 2020;12:3066. 10.3390/w12113066 [DOI] [Google Scholar]
- 37.David HW, Stanley L, Rodney SX. Applied logistic regression. John Wiley & Sons, 2000. [Google Scholar]
- 38.Angraal S, Mortazavi BJ, Gupta A, et al. Machine Learning Prediction of Mortality and Hospitalization in Heart Failure With Preserved Ejection Fraction. JACC Heart Fail 2020;8:12–21. 10.1016/j.jchf.2019.06.013 [DOI] [PubMed] [Google Scholar]
- 39.Hochberg Y. A Sharper Bonferroni Procedure for Multiple Tests of Significance. Biometrika 1988;75:800. 10.2307/2336325 [DOI] [Google Scholar]
- 40.Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making 2006;26:565–74. 10.1177/0272989X06295361 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Tariq Z, Murtaza M, Mahmoud M, et al. Machine learning approach to predict the dynamic linear swelling of shales treated with different waterbased drilling fluids. Fuel (Lond) 2022;315. 10.1016/j.fuel.2022.123282 [DOI] [Google Scholar]
- 42.Suleman MT, Khan YD. m1A-pred: Prediction of Modified 1-methyladenosine Sites in RNA Sequences through Artificial Intelligence. Comb Chem High Throughput Screen 2022;25:2473–84. 10.2174/1386207325666220617152743 [DOI] [PubMed] [Google Scholar]
- 43.Rodríguez-Pérez R, Bajorath J. Interpretation of machine learning models using shapley values: application to compound potency and multi-target activity predictions. J Comput Aided Mol Des 2020;34:1013–26. 10.1007/s10822-020-00314-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Better MA. 2023 Alzheimer’s disease facts and figures. Alzheimer's & Dementia 2023;19:1598–695. 10.1002/alz.13016 [DOI] [PubMed] [Google Scholar]
- 45.Gomar JJ. Utility of Combinations of Biomarkers, Cognitive Markers, and Risk Factors to Predict Conversion From Mild Cognitive Impairment to Alzheimer Disease in Patients in the Alzheimer’s Disease Neuroimaging Initiative. Arch Gen Psychiatry 2011;68:961. 10.1001/archgenpsychiatry.2011.96 [DOI] [PubMed] [Google Scholar]
- 46.Zhang D, Shen D, Alzheimer’s Disease Neuroimaging Initiative . Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in Alzheimer’s disease. Neuroimage 2012;59:895–907. 10.1016/j.neuroimage.2011.09.069 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Petersen RC, Aisen PS, Beckett LA, et al. Alzheimer’s Disease Neuroimaging Initiative (ADNI): clinical characterization. Neurology (ECronicon) 2010;74:201–9. 10.1212/WNL.0b013e3181cb3e25 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Porsteinsson AP, Isaacson RS, Knox S, et al. Diagnosis of Early Alzheimer’s Disease: Clinical Practice in 2021. J Prev Alzheimers Dis 2021;8:371–86. 10.14283/jpad.2021.23 [DOI] [PubMed] [Google Scholar]
- 49.Jia M, Wu Y, Xiang C, et al. Predicting Alzheimer’s Disease with Interpretable Machine Learning. Dement Geriatr Cogn Disord 2023;52:249–57. 10.1159/000531819 [DOI] [PubMed] [Google Scholar]
- 50.Burge J, Clark V, Link HE, et al. Bayesian Classification of FMRI Data: Evidence for Altered Neural Networks in Dementia, 2004. Available: https://www.semanticscholar
- 51.Alzheimer’s Disease International World Alzheimer Report 2023: Reducing Dementia Risk: Never too early, never too late, 2023. Available: https://www.alzint.org/resource/world-alzheimer-report-2023/
- 52.Haug CJ, Drazen JM. Artificial Intelligence and Machine Learning in Clinical Medicine, 2023. N Engl J Med 2023;388:1201–8. 10.1056/NEJMra2302038 [DOI] [PubMed] [Google Scholar]
- 53.Li C-L, Wang Q, Wu L, et al. The PANoptosis-related hippocampal molecular subtypes and key biomarkers in Alzheimer’s disease patients. Sci Rep 2024;14:23851. 10.1038/s41598-024-75377-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Weiner MW, Veitch DP, Miller MJ, et al. Increasing participant diversity in AD research: Plans for digital screening, blood testing, and a community-engaged approach in the Alzheimer’s Disease Neuroimaging Initiative 4. Alzheimers Dement 2023;19:307–17. 10.1002/alz.12797 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Kleiman MJ, Barenholtz E, Galvin JE, et al. Screening for Early-Stage Alzheimer’s Disease Using Optimized Feature Sets and Machine Learning. JAD 2021;81:355–66. 10.3233/JAD-201377 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.El-Sappagh S, Alonso JM, Islam SMR, et al. A multilayer multimodal detection and prediction model based on explainable artificial intelligence for Alzheimer’s disease. Sci Rep 2021;11:2660. 10.1038/s41598-021-82098-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Teipel SJ, Meindl T, Wagner M, et al. White Matter Microstructure in Relation to Education in Aging and Alzheimer’s Disease1. JAD 2009;17:571–83. 10.3233/JAD-2009-1077 [DOI] [PubMed] [Google Scholar]
- 58.Seyedsalehi A, Warrier V, Bethlehem RAI, et al. Educational attainment, structural brain reserve and Alzheimer’s disease: a Mendelian randomization analysis. Brain (Bacau) 2023;146:2059–74. 10.1093/brain/awac392 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Kaup AR, Nettiksimmons J, Harris TB, et al. Cognitive resilience to apolipoprotein E ε4: contributing factors in black and white older adults. JAMA Neurol 2015;72:340–8. 10.1001/jamaneurol.2014.3978 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Schneeweis N, Skirbekk V, Winter-Ebmer Rudolf. Does education improve cognitive performance four decades after school completion? Demography 2014;51:619–43. 10.1007/s13524-014-0281-1 [DOI] [PubMed] [Google Scholar]
- 61.Arenaza-Urquijo EM, Gonneaud J, Fouquet M, et al. Interaction between years of education and APOE ε4 status on frontal and temporal metabolism. Neurology (ECronicon) 2015;85:1392–9. 10.1212/WNL.0000000000002034 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Masters CL. Major risk factors for Alzheimer’s disease: age and genetics. Lancet Neurol 2020;19:475–6. 10.1016/S1474-4422(20)30155-1 [DOI] [PubMed] [Google Scholar]
- 63.Jiang J, Young K, Pike CJ. Second to fourth digit ratio (2D:4D) is associated with dementia in women. Early Hum Dev 2020;149:105152. 10.1016/j.earlhumdev.2020.105152 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Beheshti I, Nugent S, Potvin O, et al. Disappearing metabolic youthfulness in the cognitively impaired female brain. Neurobiol Aging 2021;101:224–9. 10.1016/j.neurobiolaging.2021.01.026 [DOI] [PubMed] [Google Scholar]
- 65.Payami H, Montee KR, Kaye JA, et al. Alzheimer’s disease, apolipoprotein E4, and gender. JAMA 1994;271:1316–7. [PubMed] [Google Scholar]
- 66.Damoiseaux JS, Seeley WW, Zhou J, et al. Gender modulates the APOE ε4 effect in healthy older adults: convergent evidence from functional brain connectivity and spinal fluid tau levels. J Neurosci 2012;32:8254–62. 10.1523/JNEUROSCI.0305-12.2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Fratiglioni L, Launer LJ, Andersen K, et al. Incidence of dementia and major subtypes in Europe: A collaborative study of population-based cohorts. Neurologic Diseases in the Elderly Research Group. Neurology (ECronicon) 2000;54:S10–5. [PubMed] [Google Scholar]
- 68.Burke SL, O’Driscoll J, Alcide A, et al. Moderating risk of Alzheimer’s disease through the use of anxiolytic agents. Int J Geriatr Psychiatry 2017;32:1312–21. 10.1002/gps.4614 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Chen C, Mok VCT. Marriage and risk of dementia: systematic review and meta-analysis of observational studies. J Neurol Neurosurg Psychiatry 2018;89:227. 10.1136/jnnp-2017-317178 [DOI] [PubMed] [Google Scholar]
- 70.Adlung L, Cohen Y, Mor U, et al. Machine learning in clinical decision making. Med 2021;2:642–65. 10.1016/j.medj.2021.04.006 [DOI] [PubMed] [Google Scholar]
- 71.Anderson NS, Norman DA, Draper SW. User Centered System Design: New Perspectives on Human-Computer Interaction. Am J Psychol 1988;101:148. 10.2307/1422802 [DOI] [Google Scholar]
- 72.Yang H, Mao J, Ye Q, et al. Distance-based novelty detection model for identifying individuals at risk of developing Alzheimer’s disease. Front Aging Neurosci 2024;16:1285905. 10.3389/fnagi.2024.1285905 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Kerr KF, Brown MD, Zhu K, et al. Assessing the Clinical Impact of Risk Prediction Models With Decision Curves: Guidance for Correct Interpretation and Appropriate Use. J Clin Oncol 2016;34:2534–40. 10.1200/JCO.2015.65.5654 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Abujaber AA, Imam Y, Albalkhi I, et al. Utilizing machine learning to facilitate the early diagnosis of posterior circulation stroke. BMC Neurol 2024;24:156. 10.1186/s12883-024-03638-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Liu M, Zhang J, Adeli E, et al. Joint Classification and Regression via Deep Multi-Task Multi-Channel Learning for Alzheimer’s Disease Diagnosis. IEEE Trans Biomed Eng 2019;66:1195–206. 10.1109/TBME.2018.2869989 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Lei B, Yang P, Wang T, et al. Relational-Regularized Discriminative Sparse Learning for Alzheimer’s Disease Diagnosis. IEEE Trans Cybern 2017;47:1102–13. 10.1109/TCYB.2016.2644718 [DOI] [PubMed] [Google Scholar]
- 77.Alam S, Ayub MS, Arora S, et al. An investigation of the imputation techniques for missing values in ordinal data enhancing clustering and classification analysis validity. Decision Analytics Journal 2023;9:100341. 10.1016/j.dajour.2023.100341 [DOI] [Google Scholar]
- 78.Mera-Gaona M, Neumann U, Vargas-Canas R, et al. Evaluating the impact of multivariate imputation by MICE in feature selection. PLoS ONE 2021;16:e0254720. 10.1371/journal.pone.0254720 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Khan A, Zubair S. Usage of random forest ensemble classifier based imputation and its potential in the diagnosis of Alzheimer’s disease. Int J Sci Technol Res 2019;8:271–5. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
bmjopen-15-2-s001.pdf (500.6KB, pdf)
Data Availability Statement
Data are available in a public, open access repository. Detailed descriptions of the data used, as well as information on the process for obtaining it, are available at https://naccdata.org/.



