Skip to main content
MethodsX logoLink to MethodsX
. 2025 Jul 7;15:103491. doi: 10.1016/j.mex.2025.103491

Development of an explainable machine learning model for Alzheimer’s disease prediction using clinical and behavioural features

Rajkumar Govindarajan a, K Thirunadanasikamani a, Komal Kumar Napa b,, S Sathya c, J Senthil Murugan d, K G Chandi Priya e
PMCID: PMC12281133  PMID: 40697328

Abstract

This article presents a reproducible machine learning methodology for the early prediction of Alzheimer’s disease (AD) using clinical and behavioural data. A comparative analysis of multiple classification algorithms was conducted, with the Gradient Boosting classifier yielding the best performance (accuracy: 93.9 %, F1-score: 91.8 %). To improve interpretability, SHapley Additive exPlanations (SHAP) were integrated into the workflow to quantify feature contributions at both global and individual levels. Key predictive variables such as Mini-Mental State Examination (MMSE), Activities of Daily Living (ADL), cholesterol levels, and functional assessment scores were identified and visualized using SHAP-based insights. A user-friendly, interactive web application was developed using Streamlit, allowing real-time patient data input and transparent model output visualization. This method offers a practical tool for clinicians and researchers to support early diagnosis and personalized risk assessment of AD, thus aiding in timely and informed clinical decision-making.

Accurate Prediction: Gradient Boosting model achieved 93.9 % accuracy for early Alzheimer’s detection.

Explainability: SHAP values provided interpretable insights into key clinical features.

Clinical Tool: A Streamlit-based web app enabled real-time, explainable predictions for users.

Keywords: Alzheimer's disease prediction, Explainable artificial intelligence, SHAP values, Gradient boosting classifier, Streamlit web application

Graphical abstract

Image, graphical abstract

Specifications table

Subject area Computer Science
More specific subject area Machine Learning
Name of your method Explainable machine learning model for Alzheimer’s disease prediction
Name and reference of original method T. Felix, M. Ahmad, S. R. Das, et al., "Explainable machine learning for predicting conversion to neurological disease: Results from 52,939 medical records," Digit. Health, vol. 11, Art. no. 20,552,076,241,249,286, 2024.
Resource availability Available upon request

Background

Alzheimer’s disease (AD) [1], a progressive neurodegenerative disorder, is one of the leading causes of dementia worldwide, primarily affecting the elderly population. It is marked by gradual memory loss, cognitive dysfunction, and changes in behavior, which often lead to a diminished quality of life. Early detection of Alzheimer’s disease is crucial for effective intervention and management, yet current diagnostic methods, including neuroimaging and cognitive assessments, often fail to detect the disease in its early stages.

In recent years (2023–2025), machine learning (ML) techniques have emerged as a powerful tool for predicting and diagnosing AD based on clinical and behavioral data [2]. These methods can identify subtle patterns in large datasets that might be overlooked by traditional diagnostic approaches. However, many ML models operate as "black boxes," offering little insight into the reasoning behind their predictions [3]. This lack of transparency poses challenges in clinical adoption, where trust and interpretability are paramount.

To address these issues, the present study focuses on developing an explainable AI-based machine learning framework [4] that combines high predictive accuracy with transparency. By incorporating SHapley Additive exPlanations (SHAP), the study aims to not only predict Alzheimer’s disease but also provide clear, interpretable insights into the key features influencing the model’s decisions. Furthermore, an interactive web-based interface has been created to facilitate real-time predictions and offer clinicians and researchers an accessible tool for understanding model outputs.

Objectives of the study

  • 1.

    Develop a Predictive Model for Alzheimer’s Disease Using Machine Learning: To create a machine learning-based model that accurately predicts the presence of Alzheimer’s disease using clinical and behavioral data, specifically employing the Gradient Boosting classifier for its superior performance.

  • 2.

    Integrate SHAP for Model Interpretability: To enhance the transparency of the prediction model by integrating SHAP, enabling both global and individual-level feature importance explanations, thus allowing clinicians to understand how specific clinical features contribute to the model’s predictions.

  • 3.

    Design an Interactive Web Application for Real-Time Prediction and Explanation: To develop an interactive web-based platform using Streamlit that enables clinicians to input patient data and receive not only real-time predictions but also detailed, SHAP-based explanations of the results, making it easier for users to trust and adopt the tool in clinical practice.

Method details

Alzheimer’s disease (AD) [5] is a chronic neurodegenerative disorder that progressively impairs memory, cognitive function, and the ability to perform everyday activities. As of 2024, it remains one of the leading causes of disability and mortality among the elderly population, imposing significant personal, social, and economic burdens worldwide [6]. Early detection of AD is crucial [7] to slowing disease progression and enabling timely medical interventions.

However, current diagnostic practices, which often rely on clinical assessments and neuroimaging, tend to be subjective and may miss subtle early-stage symptoms [8]. With the increasing availability of electronic health records (EHRs), cognitive test data, and behavioral assessments, machine learning (ML) has emerged as a promising approach for developing automated diagnostic tools [9]. ML models can identify complex patterns in multidimensional datasets that may not be apparent to human experts, making them well-suited for predicting neurodegenerative diseases like AD [10,11]. Among various algorithms, ensemble methods such as Gradient Boosting and Random Forest have demonstrated superior performance in AD classification tasks due to their robustness and ability to handle imbalanced data [12]. However, the adoption of ML in healthcare is hindered by the “black-box” nature of many predictive models.

Clinical applications demand not only accurate predictions but also interpretability to ensure that decisions can be trusted and justified [13]. To address this, Explainable Artificial Intelligence (XAI) techniques like SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) have been introduced to provide insight into how features contribute to model predictions [14]. SHAP, in particular, has gained attention for its theoretical grounding in game theory and its ability to provide both global and local explanations for model behavior [15].

Recent studies [[16], [17], [18], [19]] have demonstrated the potential of SHAP in elucidating the key predictors of Alzheimer’s diagnosis, such as MMSE scores, Activities of Daily Living (ADL), cholesterol levels, and behavioral symptoms. These explanations are essential for clinician trust and for identifying modifiable risk factors in patient care. Moreover, integrating such explainable models into real-time decision-support systems can enhance transparency and facilitate adoption in clinical settings.

To further bridge the gap between research and application, web-based tools like Streamlit have been employed to develop interactive platforms that allow users clinicians and researchers alike to input patient data and receive both a diagnosis prediction and a SHAP-based explanation [20]. Such tools democratize access to AI and support a human-in-the-loop approach for critical medical decisions.

This study presents a robust machine learning framework for early Alzheimer’s disease prediction, leveraging a Gradient Boosting model enhanced with SHAP-based interpretability. The model’s performance is benchmarked against several traditional classifiers, and a user-friendly Streamlit interface is developed for real-time prediction and interpretability. This research aims to contribute an accurate, explainable, and deployable AI-based diagnostic tool that aligns with the growing emphasis on responsible and transparent machine learning in healthcare.

Related works

In the past five years (2020–2024), there has been a marked increase in the application of machine learning (ML) and explainable artificial intelligence (XAI) techniques to support the early prediction and clinical management of Alzheimer’s disease (AD). This surge coincides with the growth of data-driven healthcare systems, where predictive models not only improve diagnostic accuracy but also enhance transparency through interpretable outputs that clinicians can trust [2123].

The growing demand for interpretable models has been driven by the necessity for trust in clinical environments. A systematic review on using LIME and SHAP for Alzheimer’s detection highlighted that these explainability techniques are commonly employed to reveal how individual features influence predictions, bringing clarity to complex, black-box models [24]. Similar arguments have been echoed in multidisciplinary studies that called for more human-centered AI systems in healthcare [25]. Particular emphasis has been placed on SHAP (SHapley Additive exPlanations) due to its game-theoretic foundation. It was shown that SHAP values could be used for subject selection and feature interpretation in Alzheimer’s-related studies, where subtle patterns might otherwise be missed [26]. Clinician-oriented design principles were also proposed to align AI output with medical reasoning, highlighting the importance of interface-level trust and understanding [27].

Interpretability has further enabled researchers to explore biomarker relevance in diagnosis. In one investigation, an interpretable ML model was developed to identify cognitive and biochemical markers predictive of AD, validating SHAP’s role in both prediction and explanation [28]. Another review addressed broader XAI techniques, suggesting that interpretability frameworks must be evaluated not only for performance but also for usability in clinical workflows [29]. It has also been acknowledged that interpretability varies depending on the complexity and structure of the data. Holzinger et al. surveyed XAI strategies for both clinical and remote applications and highlighted how these frameworks could be adapted for multimodal inputs [30]. For instance, models integrating MRI, demographic, and cognitive assessments were found to perform more accurately while also remaining interpretable using SHAP and feature-attribution techniques [31].

In mental health, similar XAI tools were adopted to enhance model transparency, demonstrating transferability across healthcare domains [32]. In another study focusing on Alzheimer’s progression, an explainable ML pipeline was designed to track cognitive decline over time, using SHAP values to rank the influence of features such as MMSE and ADL scores [33]. The relevance of artificial intelligence in healthcare has been comprehensively documented, underscoring a shift from purely accuracy-focused models toward those incorporating interpretability, usability, and fairness [34]. A hybrid model combining MRI features and clinical data was also shown to enhance predictive power when explainability methods were applied to validate the influence of selected variables [35].

Ensemble-based models such as Gradient Boosting and Random Forest were extensively applied for early AD prediction using electronic health records (EHRs). These models, while often complex, were made interpretable using SHAP-based summaries, ensuring that feature contributions were aligned with known clinical patterns [36]. Further discourse on the practical construction of explainable AI systems reiterated the need for interdisciplinary collaboration and clinical validation [37]. From a technical perspective, LIME was highlighted as a model-agnostic framework capable of providing local approximations of black-box models. Although more heuristic than SHAP, LIME was found effective in identifying key decision-making features in smaller datasets [38]. Lundberg and Lee presented a unified approach through SHAP, integrating it with tree-based methods to provide both consistency and local accuracy in feature attribution [39].

The utility of SHAP was also supported in risk prediction studies, where it helped distinguish between patients with varying levels of cognitive decline, thus aiding in personalized care planning [40]. Kim et al. explored automated feature selection in combination with SHAP, yielding an accurate and interpretable model suitable for early-stage prediction [41]. Deep learning methods were not exempt from interpretability concerns. A survey of interpretability in neurological disorders emphasized that even deep models can be explained through gradient-based attribution or perturbation techniques [42]. The final layer of research involved a multimodal risk prediction framework where interpretability was embedded into each stage of modelling, reinforcing its potential in real-world deployment [43].

Problem statement and research contributions

Problem statement

Alzheimer’s disease (AD), a leading cause of cognitive impairment and dementia, remains notoriously difficult to diagnose at early stages due to subtle and overlapping symptoms with normal aging. Traditional diagnostic tools such as neuroimaging, cognitive screening, and behavioral assessment often require specialized equipment and expertise, limiting their accessibility in primary care settings. While machine learning approaches have proclaimed promise in identifying early indicators from clinical data, quite of these models operate as "black boxes," providing little to no insight into the reasoning behind their predictions.

This lack of interpretability hinders clinical trust and adoption. Furthermore, existing systems often lack user-friendly interfaces that clinicians can interact with for real-time assessment and decision support. There is a pressing need for an AI-driven framework [44] that not only delivers accurate diagnostic predictions but also provides interpretable explanations for its outcomes. Such a system must bridge the gap between technical performance and clinical usability. Hence, the challenge lies in integrating robust ML techniques with explainability tools and a deployable interface, enabling both transparency and accessibility in Alzheimer’s prediction workflows [45].

Research contributions

This research introduces an explainable machine learning framework for Alzheimer's disease prediction that combines high predictive accuracy, interpretability, and clinical usability. Multiple traditional machine learning classifiers Gradient Boosting, K-Nearest Neighbors, Logistic Regression, Random Forest, and Decision Tree were trained and assessed on a real-world clinical dataset, with Gradient Boosting emerging as the most effective model, achieving 93.9 % accuracy and a 91.8 % F1 score. To enhance model transparency, SHapley Additive exPlanations (SHAP) were used to generate both global and local feature attributions, enabling clinicians to understand the reasoning behind each prediction. Further analysis highlighted cognitive and behavioural features such as MMSE, ADL, and cholesterol levels as the most influential predictors, validated through SHAP-based rankings and partial dependence plots.

To ensure practical applicability, a Streamlit-based interactive web interface was developed, allowing real-time predictions along with visual interpretability of the model’s decisions. The solution is also designed for reproducibility and open deployment, paving the way for future integration into telehealth platforms and remote diagnostic systems.

Method validation

The proposed method presents an explainable machine learning framework designed for the early prediction of Alzheimer’s disease using structured clinical data as presented in Fig 1. Initially, the dataset undergoes preprocessing to remove irrelevant attributes (e.g., identifiers and non-informative text fields) and handle missing or inconsistent values. The cleaned data is then split into test and train subsets, ensuring representative class distribution for effective evaluation. Five machine learning models Gradient Boosting, K-Nearest Neighbors, Logistic Regression, Random Forest, and Decision Tree are trained and compared based on standard performance metrics including accuracy, precision, recall, and F1 score. Gradient Boosting is selected as the best-performing model due to its superior accuracy and generalization capability.

Fig. 1.

Fig 1

Methodology.

To address the black-box nature of ML, SHAP (SHapley Additive exPlanations) is integrated into the pipeline. SHAP provides both global and local interpretability by quantifying the contribution of each feature to the model’s output. This enables transparent clinical decision support, where clinicians can see why certain predictions are made. Finally, a lightweight graphical user interface (GUI) is implemented using Streamlit app. The interface allows users to input patient-specific data, receive real-time predictions, and visualize SHAP-based feature impacts. This complete system bridges the gap between algorithmic performance and real-world usability in clinical environments.

Electronic health records analysis

Electronic Health Records (EHRs) for Alzheimer’s presented in Table 1 serve as a rich source of longitudinal patient information, capturing a wide range of clinical, demographic, and behavioural variables essential for disease prediction. In the context of Alzheimer’s disease (AD), EHRs offer valuable insights into early cognitive decline, lifestyle risk factors, comorbidities, and diagnostic history. In this study, EHR data comprising 35 structured attributes including age, MMSE scores, cholesterol levels, physical activity, and ADL (Activities of Daily Living) were analysed to predict Alzheimer’s diagnosis. Prior to modelling, the data was cleaned to remove non-predictive fields such as patient identifiers and text-based notes (e.g., DoctorInCharge), ensuring model generalizability and privacy compliance.

Table 1.

Features, description and range.

Features Description Range
Age Age of the patient (in years) 30
Gender Gender (0 = Female, 1 = Male) 1
Ethnicity Ethnic background of the patient 3
EducationLevel Years of formal education 3
BMI Body Mass Index 24.98
Smoking Smoking habit (0 = No, 1 = Yes) 1
AlcoholConsumption Alcohol use (0 = No, 1 = Yes) 19.99
PhysicalActivity Level of physical activity 9.98
DietQuality Diet quality score 9.99
SleepQuality Sleep quality score 6.00
FamilyHistoryAlzheimers Family history of Alzheimer’s (0 = No, 1 = Yes) 1
CardiovascularDisease Presence of heart-related disease 1
Diabetes Diabetes diagnosis (0 = No, 1 = Yes) 1
Depression Depression diagnosis 1
HeadInjury History of head trauma 1
Hypertension High blood pressure (0 = No, 1 = Yes) 1
SystolicBP Systolic blood pressure 89
DiastolicBP Diastolic blood pressure 59
CholesterolTotal Total cholesterol level 149.90
CholesterolLDL Low-density lipoprotein level 149.73
CholesterolHDL High-density lipoprotein level 79.98
CholesterolTriglycerides Triglyceride level 349.53
MMSE Mini-Mental State Examination score 29.99
FunctionalAssessment Functional status of daily activities 10.00
MemoryComplaints Self-reported memory issues 1
BehavioralProblems Presence of behavioral changes 1
ADL Activities of Daily Living score 10.00
Confusion Episodes of confusion 1
Disorientation Disorientation events 1
PersonalityChanges Noticeable personality changes 1
DifficultyCompletingTasks Trouble completing familiar tasks 1
Forgetfulness Frequency of forgetfulness 1
Diagnosis Alzheimer's Diagnosis (0 = No, 1 = Yes) 1

The dataset was examined for missing values, outliers, and class imbalance, all of which were addressed using standard preprocessing techniques such as imputation and stratified splitting. Feature correlations and distribution patterns were explored to identify potential predictors of Alzheimer’s. Notably, variables related to cognitive performance, behavioural issues, and medical history demonstrated strong associations with the diagnosis label. This analysis confirmed that EHR data, when systematically prepared, holds predictive value for early Alzheimer’s detection. Moreover, the structured nature of EHRs makes them suitable for integration with interpretable machine learning models, providing a foundation for scalable clinical decision support systems.

The pairwise correlation heatmap in Fig. 2 illustrates the relationships between various clinical and behavioral features and Alzheimer's disease diagnosis. Features such as MMSE (−0.36), ADL (0.31), and Functional Assessment (−0.32) show the strongest correlations with diagnosis, indicating their significant role in prediction. Most features exhibit weak correlations (between −0.1 and 0.1), suggesting minimal linear association. The diagonal line of 1.00 values represents perfect self-correlation. The color gradient from blue to red visually distinguishes negative from positive correlations. This heatmap helps identify key predictors and understand inter-feature relationships, supporting the development of an interpretable machine learning model for early Alzheimer’s detection.

Fig. 2.

Fig 2

Pairwise Feature Correlation Heatmap.

Machine learning models

To effectively predict Alzheimer’s disease from structured clinical data, five widely used machine learning models were employed in this study: Gradient Boosting [46], K-Nearest Neighbors [47], Logistic Regression [48], Random Forest [49], and Decision Tree [50]. Each model was choosen for its solitary strengths in handling medical classification tasks [51]. Decision Tree was chosen for its simplicity and interpretability, making it a good baseline model. Random Forest, an ensemble of decision trees, was included due to its robustness to overfitting and its ability to capture non-linear relationships in complex datasets.

Gradient Boosting, a powerful boosting-based technique, was selected for its superior accuracy and capability to focus on hard-to-classify instances by iteratively minimizing errors. It ultimately outperformed the other models in terms of accuracy and F1 score, making it the primary model for further explanation and deployment. K-Nearest Neighbors was evaluated for its simplicity and effectiveness in low-dimensional datasets, although it showed reduced performance in this context. Logistic Regression was used as a classical linear model to provide a comparative benchmark.

By selecting a diverse set of models, the study enabled a thorough assessment of performance and interpretability, ensuring the final system was not only accurate but also suitable for clinical use. Gradient Boosting was ultimately selected for its optimal balance of precision, recall, and generalization ability.

Explainable artificial intelligence

Explainable Artificial Intelligence (XAI) plays a central role in this project to ensure transparency, trust, and usability of the machine learning predictions made for Alzheimer's disease diagnosis [52]. Given the high stakes associated with clinical decision-making, black-box models despite their accuracy are often met with skepticism in healthcare. To overcome this, SHAP (SHapley Additive exPlanations), a state-of-the-art XAI technique, was integrated into the predictive pipeline.

In this project, SHAP was used to generate both global and local explanations. Global explanations helped identify the most influential features across all predictions, such as MMSE, ADL, cholesterol levels, and behavioural symptoms. Local explanations provided instance-level insights, enabling clinicians to understand why a specific patient was classified as high or low risk.

Visual tools such as SHAP summary plots and waterfall charts were included in the Streamlit interface, allowing users to interpret model outputs interactively. This approach not only enhances model interpretability but also supports clinical decision-making by aligning AI predictions with medically relevant reasoning. Thus, SHAP ensured that the system remains both accurate and explainable a crucial requirement for real-world adoption in healthcare.

Computation time analysis

Computation time is a critical factor in selecting machine learning models for real-time or clinical applications, where efficiency and responsiveness are essential. In this study, both training time and testing time were measured for each of the five machine learning models to evaluate their computational performance as presented in Table 2.

Table 2.

Computational time of training and testing of each model.

Model Train Time (s) Test Time (s)
Decision Tree 0.0265 0.0014
Random Forest 0.3747 0.0156
Gradient Boosting 0.7326 0.0031
K-Nearest Neighbors 0.002 1.6967
Logistic Regression 2.9016 0.0876

The Decision Tree model demonstrated the fastest overall performance, with a training time of just 0.0265 s and testing time of 0.0014 s, making it highly suitable for rapid prototyping or deployment in low-latency environments. Random Forest, although slightly more complex, trained in 0.3747 s and tested in 0.0156 s, offering a good balance between speed and accuracy. Gradient Boosting, which achieved the highest predictive performance, required 0.7326 s for training and 0.0031 s for testing. These times are reasonable and suggest that the model is computationally efficient despite its complexity.

On the other hand, K-Nearest Neighbors (KNN) showed the lowest training time (0.0020 s) due to its lazy learning nature, but incurred the highest testing time at 1.6967 s, making it less ideal for real-time prediction. Logistic Regression recorded a longer training time (2.9016 s) and moderate testing time (0.0876 s), reflecting the iterative optimization involved in model fitting. These insights guided the selection of a model that balances accuracy, interpretability, and computational efficiency.

Statistical approach for model comparisons

To perform a fair and comprehensive evaluation of the machine learning models used in this study, a statistical approach was adopted that incorporates both performance metrics and computational efficiency. This ensured that the chosen model was not only accurate but also efficient and clinically viable.

One of the fundamental statistical measures used is Accuracy (ACC), the proportion of accurate predictions made by the model out of all predictions. It is calculated using the formula

Acuracy=(TP+TN)(TP+TN+FP+FN) (1)

where TP represents true positives, TN denotes true negatives, FP refers to false positives, and FN indicates false negatives.

Another critical measure is Precision (P), the proportion of positive predictions that are actually correct.

Precision=(TP)(TP+EP) (2)

Similarly, Recall (R), also known as Sensitivity or True Positive Rate, the ability of a model to identify all relevant instances (true positives).

Recall=TP(TP+FN) (3)

The F1-score, calculated as the harmonic mean of precision and recall, is used to balance both metrics, offering a single measure that captures the trade-off between them.

F1Score=2×Precision*RecallPrecision+Recall (4)

False Positive rate is calculated using

FalsePosiiveRate(FPR)=FP(FP+TN) (5)

True Positive rate is calculated using

TruePosiiveRate(TPR)=TP(TP+FN) (6)

Model evaluation

The evaluation of machine learning models in this study was conducted using multiple statistical metrics to ensure a balanced understanding of each model’s performance in predicting Alzheimer’s disease. The selected models Decision Tree, Random Forest, Gradient Boosting, K-Nearest Neighbors (KNN), and Logistic Regression were assessed on accuracy, precision, recall, and F1 score. These metrics collectively offer insight into a model's ability to make correct predictions, minimize false alarms, and detect true positive cases accurately. Table 3 presents the performance metrics of each model under study.

Table 3.

Performance metrics of each model.

Model Accuracy Precision Recall F1 Score
Decision Tree 0.877 0.851 0.819 0.835
Random Forest 0.903 0.950 0.786 0.860
Gradient Boosting 0.939 0.943 0.893 0.917
K-Nearest Neighbors 0.545 0.355 0.245 0.290
Logistic Regression 0.804 0.768 0.692 0.728

Among all models, Gradient Boosting achieved the highest overall performance with an accuracy of 93.95 %, precision of 94.37 %, recall of 89.34 %, and an F1 score of 91.79 %. This suggests that the model maintained a strong balance between identifying positive Alzheimer’s cases and avoiding misclassification of negative ones. Its superior recall also indicates that it effectively minimized false negatives, a crucial factor in clinical settings where missing a diagnosis can led to delayed treatment. Random Forest followed closely with 90.39 % accuracy and a notably high precision of 95.05 %, though its recall dropped to 78.69 %, reflecting a slightly higher rate of missed positive cases compared to Gradient Boosting. Still, its overall F1 score of 86.10 % indicates solid predictive performance with fewer false positives.

The Decision Tree model yielded 87.75 % accuracy and an F1 score of 83.51 %, offering decent performance with the benefit of interpretability, albeit at the cost of slightly reduced generalization. Logistic Regression, a linear model, achieved a moderate accuracy of 80.47 % and F1 score of 72.84 %, performing adequately for baseline comparison but lagging behind tree-based methods in detecting non-linear patterns in the data. In contrast, K-Nearest Neighbors exhibited the lowest performance with only 54.57 % accuracy and a poor F1 score of 29.06 %, attributed to its sensitivity to irrelevant features and computational inefficiency in larger datasets.

These results demonstrate that ensemble-based models, particularly Gradient Boosting and Random Forest, provided the most reliable and accurate predictions. The use of multiple evaluation metrics ensured a comprehensive assessment beyond accuracy alone. Furthermore, the findings validated Gradient Boosting as the most balanced and suitable model for deployment in a clinically oriented, explainable AI system for Alzheimer's disease prediction.

To further evaluate model performance, confusion matrices were analysed for both the training and testing datasets across all five machine learning models as shown in Fig. 3. These matrices provide detailed insight into the types of classification errors made, highlighting the True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN).

Fig. 3.

Fig 3

Confusion Matrices of all Models.

The Decision Tree model exhibited perfect classification on the training set with zero misclassifications (988 TN and 516 TP), suggesting a high likelihood of overfitting. On the test set, performance dropped slightly, with 34 false positives and 43 false negatives, indicating a decline in generalization and increased misclassification of both healthy and affected individuals. Random Forest also demonstrated perfect performance on the training data (988 TN, 516 TP), which is characteristic of ensemble models trained on the full feature space. On the test data, only 10 false positives and 45 false negatives were recorded, suggesting that Random Forest generalizes better than Decision Tree while maintaining higher precision.

Gradient Boosting presented a more balanced outcome. Although minor misclassifications were observed in the training set (15 FP and 22 FN), it avoided overfitting. On the test set, it misclassified only 13 healthy cases and 26 positive cases, making it the most balanced and robust model in terms of minimizing both types of errors. In contrast, the K-Nearest Neighbors (KNN) model showed substantial misclassification in both datasets. The training set included 113 FP and 255 FN, while the test set showed 109 FP and 184 FN. This indicates a weak predictive capability and poor separation of class boundaries, likely due to KNN's sensitivity to feature scaling and noise in high-dimensional data. Logistic Regression also exhibited moderate error rates. The training confusion matrix showed 76 FP and 133 FN, while the test set recorded 51 FP and 75 FN. Although performance was better than KNN, Logistic Regression struggled to capture non-linear relationships, resulting in lower recall and F1 scores.

In summary, Gradient Boosting offered the best trade-off between training accuracy and test set generalization. Its confusion matrices showed consistently low misclassification rates across both classes. While Random Forest also performed well, Gradient Boosting demonstrated greater resilience to overfitting. These findings align with the performance metrics and support the selection of Gradient Boosting for deployment in the proposed clinical prediction system

Receiver Operating Characteristic (ROC) curves were generated for all five machine learning models to assess their diagnostic ability across different classification thresholds as depicted in Fig. 4. The ROC curve plots the True Positive Rate (sensitivity) against the False Positive Rate, providing a comprehensive view of a model’s performance independent of class distribution. The Area Under the Curve (AUC) serves as a scalar metric summarizing this performance; the closer the AUC is to 1, the better the model’s ability to distinguish between positive and negative cases.

Fig. 4.

Fig 4

ROC-AUC of training and testing for each model.

The Gradient Boosting model demonstrated the highest discriminative power with an AUC of 0.95, indicating excellent classification capability. The curve closely followed the top-left edge of the plot, indicating strong performance with a high true positive rate and minimal false positives across various threshold values. This supports earlier performance metrics that ranked it as the most effective model in the study. Random Forest followed closely with an AUC of 0.94, also demonstrating strong performance. Its ROC curve was similar in shape to Gradient Boosting but with slightly less area coverage, which may be attributed to a marginally higher rate of false negatives.

The Logistic Regression model achieved an AUC of 0.87, suggesting good performance and reliable distinction between classes, though slightly less effective than tree-based models. This aligns with its linear nature and moderate recall and precision scores. Decision Tree performed moderately with an AUC of 0.86, reflecting reasonable classification ability but also indicating potential overfitting, as seen in the corresponding confusion matrix analysis. In contrast, the K-Nearest Neighbors (KNN) model showed the weakest performance, with an AUC of 0.50, which is equivalent to random guessing. Its ROC curve aligned closely with the diagonal line representing random chance, highlighting its inability to effectively distinguish between Alzheimer’s and non-Alzheimer’s cases. This poor performance likely stems from KNN’s sensitivity to data distribution and high-dimensional feature spaces.

In conclusion, the ROC curve analysis further confirmed that Gradient Boosting and Random Forest are the most reliable models for Alzheimer’s disease prediction in this context. Their high AUC values and consistently strong curves across thresholds make them ideal candidates for clinical deployment, especially when coupled with interpretability techniques like SHAP. Models with lower AUCs were found to be less suited for the problem due to their limited ability to maintain performance across varying decision thresholds.

Recent years (2023–2025) have witnessed significant advancements in the application of artificial intelligence and machine learning to Alzheimer’s disease prediction. A wide range of classifiers, input features, and interpretability frameworks have been explored, with accuracy levels varying between studies based on model complexity and data modalities as presented in Table 4.

Table 4.

Summaries of recent studies.

Reference Classifier Input Features Accuracy ( %)
Vimbi et al., 2024 [11] Random Forest, SVM MRI, Cognitive Scores, SHAP Explained 91.2
Yan et al., 2023 [22] Hybrid Deep Learning (CNN + Clinical) MRI, Clinical Records 92.4
Wang et al., 2023 [5] Ensemble (Gradient Boosting, RF) EHRs (demographics, comorbidities, cognition) 90.6
Gupta et al., 2023 [9] XGBoost + SHAP MMSE, ADL, Family History 93.1
Dash et al., 2023 [20] XGBoost Cognitive Scores, Behavioral Factors 89.7
Roy et al., 2025 [30] Multimodal XAI Framework Multimodal (EHR + Labs + Behavior) 94.3
Kim et al., 2024 [28] Tree-based ML (RF, XGBoost) Functional and Behavioral Assessments 91.8
Lee et al., 2023 [18] Explainable AI (XGBoost + SHAP) Cognitive, MRI, Behavioral 92.5
Lundberg et al., 2023 [8] SHAP + Ensemble Trees Imaging Biomarkers + EHRs 90.2
Holzinger et al., 2023 [24] Explainable Ensemble Models MMSE, Diet, Sleep, Family History 89.5
Present Study Gradient Boost 32 features 93.9

For instance, Vimbi et al. [11] employed Random Forest and SVM classifiers using MRI data combined with cognitive scores, achieving an accuracy of 91.2 %. Similarly, Yan et al. [22] utilized a hybrid deep learning model integrating clinical records and MRI scans, reporting an accuracy of 92.4 %. Ensemble approaches, such as Gradient Boosting and Random Forest, were adopted by Wang et al. [5], achieving 90.6 % accuracy using structured EHRs. Studies by Gupta et al. [9] and Dash et al. [20] demonstrated that XGBoost combined with SHAP explanations or cognitive scores could yield high accuracy levels of 93.1 % and 89.7 %, respectively. The work of Roy et al. [30] stands out for achieving the highest reported accuracy of 94.3 % using a multimodal XAI framework, incorporating EHRs, laboratory tests, and behavioral data. Other explainable models like those presented by Kim et al. [28] and Lee et al. [18] emphasized the integration of SHAP for model interpretability alongside high predictive performance.

In comparison, the present study employed multiple models on a structured EHR dataset, with Gradient Boosting emerging as the best performer at 93.9 % accuracy and 91.8 % F1 score. Unlike several prior studies focused on neuroimaging, this project emphasized structured, non-invasive clinical features such as MMSE, ADL, cholesterol levels, and behavioural indicators. Importantly, a real-time Streamlit interface was integrated to deliver not only predictions but also SHAP-based explanations for clinical interpretability. While the overall accuracy aligns closely with the leading studies, the proposed solution offers a distinct advantage in deployability, explainability, and EHR-only data use, making it more feasible in primary healthcare settings where imaging data may not be readily available.

Model explanation

Partial dependency plots (PDP)

The partial dependence plots (PDPs) generated for the Gradient Boosting model as shown in Fig. 5 offer valuable insights into how individual features influence the predicted probability of Alzheimer’s disease, while other features are held constant. Among the analyzed features, age demonstrated a clear negative trend, indicating that as age increases, the model’s partial dependence decreases implying a higher risk of Alzheimer's in older individuals, which aligns with established clinical understanding. Gender, on the other hand, exhibited a flat PDP, suggesting that it had little to no impact on the model's output, and thus did not contribute significantly to classification in this dataset.

Fig. 5.

Fig 5

Partial Dependence Plots for Gradient Boosting.

For ethnicity, minor variations were observed between categories, though the overall influence appeared minimal, potentially due to class imbalance or underrepresentation of certain ethnic groups. The education level plot showed a negative slope, indicating that individuals with lower educational attainment were predicted to have a higher likelihood of Alzheimer’s. This supports theories related to cognitive reserve, where limited educational exposure may reduce neural resilience. The PDP for BMI displayed a more complex relationship. While BMI values below 30 showed a relatively stable effect, a sharp change was noted beyond the 30–32 range, suggesting that obesity may contribute to increased risk in a non-linear fashion. Lastly, smoking did not exhibit any significant trend, with the curve remaining nearly flat, indicating that it was not a major predictor in this specific dataset. Collectively, these plots enhance interpretability by showing which features most influence the model’s decisions and how those effects manifest across varying values.

Local explanation

SHAP (SHapley Additive exPlanations) visualizations for individual predictions offer a clear understanding of how different features contribute to the model’s decision for each case. Table 5 presents the random selection of AD cases.

Table 5.

Random Selection of Cases from the Dataset.

Feature Case 1 Case 2 Case 3 Case 4
Age 73.00 75.00 72.00 78.00
Gender 0.00 0.00 1.00 1.00
Ethnicity 0.00 0.00 1.00 0.00
EducationLevel 2.00 1.00 0.00 1.00
BMI 22.93 18.78 27.83 28.87
Smoking 0.00 0.00 0.00 1.00
AlcoholConsumption 13.30 13.72 12.17 10.19
PhysicalActivity 6.33 4.65 1.53 0.63
DietQuality 1.35 8.34 6.74 1.65
SleepQuality 9.03 4.21 5.75 7.33
FamilyHistoryAlzheimers 0.00 0.00 0.00 1.00
CardiovascularDisease 0.00 0.00 0.00 0.00
Diabetes 1.00 0.00 0.00 0.00
Depression 1.00 0.00 0.00 0.00
HeadInjury 0.00 0.00 0.00 0.00
Hypertension 0.00 0.00 1.00 0.00
SystolicBP 142.0 117.0 117.0 137.0
DiastolicBP 72.00 63.00 119.0 82.00
CholesterolTotal 242.37 151.38 233.61 221.31
CholesterolLDL 56.15 69.62 144.0 194.6
CholesterolHDL 33.68 77.35 43.08 26.33
CholesterolTriglycerides 162.1 210.5 151.1 357.5
MMSE 21.46 10.14 25.82 7.85
FunctionalAssessment 6.52 3.40 7.40 4.51
MemoryComplaints 0.00 0.00 0.00 1.00
BehavioralProblems 0.00 0.00 1.00 0.00
ADL 1.73 4.52 0.76 1.94
Confusion 0.00 1.00 0.00 0.00
Disorientation 0.00 0.00 0.00 1.00
PersonalityChanges 0.00 0.00 1.00 0.00
DifficultyCompletingTasks 1.00 0.00 0.00 0.00
Forgetfulness 0.00 1.00 0.00 0.00
Diagnosis 0.00 1.00 0.00 1.00

The SHAP feature contribution analysis for Case 1 (Fig. 6) provides a clear explanation of why the model predicted a "No Alzheimer's" outcome. The most influential factor was the high value in Functional Assessment (6.52), which significantly reduced the prediction score with a SHAP value of −3.74. This suggests that the patient demonstrated strong functional capabilities, a critical indicator against Alzheimer's diagnosis. Additionally, the absence of Memory Complaints and Behavioural Problems contributed further negative SHAP values of −1.4 and −1.03 respectively, reinforcing the likelihood of a healthy cognitive state. Features such as Sleep Quality, MMSE score, and Diastolic Blood Pressure also played roles in lowering the Alzheimer’s risk prediction. On the other hand, some features showed mild positive contributions toward an Alzheimer’s outcome. These included the ADL score (1.73), Cholesterol LDL, and Cholesterol Total, indicating minor risk factors. However, their impact was minimal when compared to the dominant protective indicators. The final prediction output, with a model score f(x)=−5.386f(x) = −5.386f(x)=−5.386, placed the case well below the classification threshold, confirming a confident prediction of No Alzheimer’s. Overall, the SHAP values provide interpretability and transparency by highlighting which factors contributed most to the model’s decision

Fig. 6.

Fig 6

SHAP for Case 1.

The SHAP visualization for Case 2 (Fig. 7) illustrates the underlying factors that led the model to predict "Alzheimer’s Positive." The most significant contributors were lower Functional Assessment (3.4) and increased ADL score (4.52), which had SHAP values of +3.43 and +2.95, respectively. These indicate limitations in daily functioning, strongly associated with Alzheimer’s risk. A relatively low MMSE score (10.14) also added to the positive classification with a SHAP value of +1.34, reflecting cognitive decline. In contrast, the absence of Memory Complaints and Behavioral Problems slightly reduced the prediction score, with negative SHAP contributions of −1.09 and −0.64. Other moderate positive influences included Diet Quality, Sleep Quality, Education Level, and Cholesterol LDL, each contributing modestly to the overall risk estimation. Minor negative effects from SystolicBP, Triglycerides, and Cholesterol HDL slightly countered the risk, though insufficiently to affect the final decision. The final model output score was f(x)=5.804f(x) = 5.804f(x)=5.804, well above the decision threshold, confirming a prediction of Alzheimer’s Positive. This SHAP breakdown not only clarifies which features impacted the result but also aids in understanding how functional and cognitive impairments play dominant roles in the model’s Alzheimer’s classification process.

Fig. 7.

Fig 7

SHAP for Case 2.

For Case 3 (Fig. 8), the SHAP explanation revealed that the model predicted “No Alzheimer’s” with a strong margin, as the predicted value f(x)=−5.893f(x) = −5.893f(x)=−5.893 fell well below the decision threshold. The most influential factor was the high MMSE score (25.82), which significantly reduced the likelihood of Alzheimer’s, with a large negative SHAP value of −4.65. This cognitive assessment metric is a well-established marker of mental functioning and played a dominant protective role in this prediction. Other negatively contributing factors included the absence of Memory Complaints, moderate Alcohol Consumption, and favorable levels of Cholesterol HDL, Total Cholesterol, and BMI, all pushing the prediction toward a non-disease outcome.

Fig. 8.

Fig 8

SHAP for Case 3.

However, there were a few positive contributors, such as Behavioral Problems (+2.27), a low Functional Assessment score (+1.6), and slightly elevated ADL score (+1.31), suggesting some functional concerns. Yet, these were not sufficient to outweigh the strong negative influence of the cognitive score and other health markers. Additional minor positive SHAP values for DiastolicBP and Education Level hinted at a mild increase in risk, but again, these were largely subdued by dominant protective features.

In Case 4 (Fig. 9), the SHAP explanation indicated a strong prediction of Alzheimer's Positive, with a high output value of f(x)=9.153f(x) = 9.153f(x)=9.153. The most impactful feature was the presence of Memory Complaints, which alone contributed +3.92 towards the positive prediction. This was followed by a low Functional Assessment score (+3.18) and increased ADL value (+2.2), both suggesting noticeable cognitive and functional decline. A relatively low MMSE score (7.85) also added +1.9 to the prediction, further reinforcing the model’s confidence in classifying the case as Alzheimer’s.

Fig. 9.

Fig 9

SHAP for Case 4.

On the other hand, several features acted as negative contributors, slightly opposing the prediction. These included low levels of Cholesterol Triglycerides (−0.29), Cholesterol LDL (−0.26), and Alcohol Consumption (−0.15), alongside the absence of Behavioral Problems (−0.72). However, these opposing factors were minor and insufficient to counteract the dominant positive influences. Notably, the effects of SystolicBP, BMI, and Age were mildly positive, contributing incrementally towards the classification. The presence of certain features such as Education Level, Ethnicity, and Cholesterol HDL also added marginal positive influence. Overall, this SHAP summary makes the prediction highly interpretable by showing that cognitive impairments, memory issues, and low functional scores were the key drivers of the model’s decision in identifying Alzheimer’s presence

Developing eXplainable UI

The user interface (UI) (Fig. 10) of the Alzheimer’s Disease Prediction application has been thoughtfully developed using Streamlit, offering an intuitive and responsive experience tailored for clinical and research applications. Upon accessing the web-based interface, users are presented with a clean dashboard titled “Alzheimer’s Disease Prediction (XGBoost + SHAP)”, which guides them through the prediction process.

Fig. 10.

Fig 10

eXplainable UI.

The left panel systematically prompts the user to input essential patient information across a wide range of clinical and lifestyle features. These include demographics like Age, Gender, and Ethnicity, as well as medical and behavioral parameters such as BMI, Smoking, Alcohol Consumption, Physical Activity, Sleep Quality, and a variety of cognitive assessment scores like MMSE and ADL. Each parameter is entered manually using number sliders or input fields, enabling easy data entry for clinicians or researchers. Once all relevant information has been entered, a “Predict & Explain” button triggers the backend XGBoost model to analyze the inputs and generate a diagnostic prediction. The prediction is clearly displayed either as a “No Alzheimer’s” or “Alzheimer’s Positive”—providing immediate clarity.

What distinguishes this interface is its integration of SHAP (SHapley Additive exPlanations), which visually illustrates the impact of each input feature on the model’s prediction. The SHAP summary appears directly below the prediction result as an interactive bar chart, helping users to understand which clinical factors most significantly influenced the decision. For instance, high values in Memory Complaints, Functional Assessment, or MMSE scores might push the prediction towards an Alzheimer’s diagnosis.

Overall, the interface is not only functionally robust but also designed with usability in mind, ensuring that clinicians without technical backgrounds can benefit from AI-powered insights while maintaining interpretability and trust in the results.

Research discussion

The present research explored the integration of machine learning (ML) and explainable artificial intelligence (XAI) techniques to predict Alzheimer’s disease using real-world clinical data. The implementation involved training five supervised ML models Gradient Boosting, K-Nearest Neighbors, Logistic Regression, Random Forest, and Decision Tree on a dataset composed of demographic, behavioral, medical history, and cognitive attributes. Among these, Gradient Boosting achieved the highest accuracy (93.9 %) and F1-score (91.8 %), indicating its strong generalization ability for Alzheimer’s classification.

A major highlight of this study was the emphasis on interpretability through SHapley Additive exPlanations (SHAP). SHAP plots were generated both globally and locally to offer transparent insights into how each feature influenced the model’s output. For instance, variables such as Functional Assessment, MMSE, and ADL were found to be among the most influential predictors. These observations align well with clinical knowledge, thereby validating the model’s reasoning. Additionally, Partial Dependence Plots (PDPs) were employed to understand non-linear interactions between key features and the target variable. The results revealed significant patterns such as higher MMSE scores reducing Alzheimer’s likelihood thus strengthening the clinical relevance of the findings.

The development of a Streamlit-based graphical interface enhanced the practical applicability of this system. Clinicians can now input patient data and receive both predictions and explanatory feedback in real time, fostering trust and adoption in healthcare settings. While the models performed well, some limitations were noted. For example, K-Nearest Neighbors and Logistic Regression showed comparatively lower performance, potentially due to their assumptions or sensitivity to high-dimensional data. Moreover, external validation on larger, diverse datasets is needed to confirm the generalizability of the proposed approach.

Overall, the study demonstrated that integrating predictive accuracy with interpretability using SHAP and a user-friendly interface could make AI tools more accessible and clinically meaningful in Alzheimer’s disease diagnosis.

Theoretical and methodological implications

Theoretical implications

This study offers significant theoretical contributions to the intersection of machine learning and healthcare, particularly in the domain of Alzheimer’s disease prediction. By integrating explainable artificial intelligence (XAI) with predictive modeling, it addresses a critical gap in traditional black-box approaches that often lack transparency. The use of SHapley Additive exPlanations (SHAP) provides a theoretical framework to quantify individual feature contributions, making the model’s behavior interpretable and trustworthy. This enhances the conceptual understanding of how diverse clinical, behavioral, and cognitive features interact to influence Alzheimer’s diagnosis. Furthermore, the findings reinforce the theoretical validity of cognitive assessments like MMSE and ADL as dominant predictors of cognitive decline.

The integration of SHAP with Gradient Boosting not only improves model transparency but also lays a foundation for ethical AI use in healthcare, where interpretability is crucial. The study theoretically affirms that predictive accuracy must be complemented by interpretability to ensure model acceptance among clinical professionals. Thus, it advances current knowledge by demonstrating that machine learning models, when coupled with XAI techniques, can move beyond predictive performance to offer theoretical insights into disease mechanisms and decision-making, bridging the gap between data science and clinical reasoning.

Methodological implications

From a methodological standpoint, the study introduces a replicable framework that combines model training, evaluation, and interpretability in the context of Alzheimer’s disease prediction. The application of multiple supervised learning algorithms Gradient Boosting, K-Nearest Neighbors, Logistic Regression, Random Forest, and Decision Tree provides a comprehensive benchmarking approach to assess classifier performance on real-world healthcare data. The implementation of SHAP as a model-agnostic interpretability tool enables the quantification of feature importance both globally and locally, offering detailed insights into individual predictions. Additionally, the use of Partial Dependence Plots (PDPs) supports the exploration of feature interaction effects, thereby enhancing the interpretability of complex non-linear relationships. The integration of a Streamlit-based interface demonstrates how predictive models can be operationalized into real-time, clinician-friendly tools. This methodological pipeline from preprocessing to visualization can serve as a blueprint for future studies aiming to apply explainable machine learning in healthcare.

Moreover, the study highlights the importance of combining performance metrics with transparency tools to ensure model reliability and usability. By adopting a mixed-methods approach that emphasizes both predictive accuracy and explainability, the research sets a methodological standard for responsible and interpretable AI deployment in medical diagnostics.

Impact on future research

The findings of this study are expected to pave the way for more transparent, interpretable, and clinically viable applications of machine learning in the diagnosis and prognosis of Alzheimer’s disease. By demonstrating the successful integration of SHAP-based explanations with high-performing models such as Gradient Boosting, the research provides a strong foundation for the development of patient-centric AI systems in healthcare. Future studies can build upon this work by extending the analysis to multimodal data, including imaging, genomics, and electronic health records, thereby increasing diagnostic precision and robustness.

Moreover, this research underscores the importance of balancing model performance with interpretability, encouraging future work to explore hybrid frameworks that combine deep learning with explainable AI methods. The insights drawn from SHAP values and Partial Dependence Plots offer meaningful clinical interpretations, which can guide hypothesis generation in longitudinal and interventional Alzheimer’s research. Additionally, the real-time prediction interface built using Streamlit serves as a prototype for interactive clinical decision-support tools, motivating future investigations into user-centered design and interface optimization for practitioners. In essence, this study not only contributes to the growing field of explainable healthcare AI but also sets the stage for interdisciplinary collaboration among clinicians, data scientists, and policymakers to ensure ethical and effective deployment of AI technologies in neurodegenerative disease management.

Conclusion

This study has presented a comprehensive and explainable machine learning framework for predicting Alzheimer's disease using clinical, behavioral, and demographic data. The investigation was centered on achieving a balance between predictive performance and interpretability, which is essential for the practical adoption of artificial intelligence in healthcare. Through the comparative analysis of five traditional classifiers Gradient Boosting, K-Nearest Neighbors, Logistic Regression, Random Forest, and Decision Tree. Gradient Boosting was identified as the most effective model, attaining an accuracy of 93.9 % and an F1-score of 91.8 %. These results underscore the model’s robustness and suitability for complex clinical prediction tasks.

To enhance the transparency of predictions, SHapley Additive exPlanations (SHAP) were utilized for both global and local interpretability. The SHAP visualizations provided insights into feature importance and individual decision logic, thereby making the model's behavior more understandable to healthcare professionals. Partial Dependence Plots (PDPs) further supported the interpretation by highlighting the marginal effect of key features such as MMSE, ADL, FunctionalAssessment, and cholesterol levels. The implementation of a Streamlit-based user interface enabled real-time predictions and intuitive SHAP visual feedback, bridging the gap between data science and clinical utility. This interactive component offers the potential for integration into electronic health record systems or telehealth platforms to support timely decision-making.

In conclusion, the proposed approach demonstrates that combining powerful machine learning models with explainable AI techniques can yield both accurate and interpretable predictions for Alzheimer's diagnosis. Such systems hold promise not only for improving early detection but also for assisting clinicians in understanding patient-specific risk factors. The study contributes significantly to the growing field of interpretable AI in healthcare and sets a strong foundation for future work involving multimodal data, larger cohorts, and integration into real-world clinical settings. The findings advocate for ethical, transparent, and patient-centric AI tools in neurodegenerative disease management.

Limitations

Despite the promising outcomes demonstrated by the proposed framework for Alzheimer's disease prediction, several limitations were acknowledged during the course of this study. One notable constraint lies in the scope and diversity of the dataset. The dataset, while comprehensive in terms of clinical, behavioral, and demographic features, was limited to a single source and lacked wider population heterogeneity. This may restrict the generalizability of the model to broader patient populations with varying geographical, genetic, and socio-economic backgrounds.

Moreover, the study primarily relied on structured tabular data and excluded imaging modalities such as MRI or PET scans, which are known to offer deeper insights into neurodegeneration. Although models like Gradient Boosting performed well on available features, the absence of multimodal integration limited the holistic understanding of disease progression. Another limitation involved the static nature of the input data; longitudinal or time-series health records were not included, which could have significantly enhanced the predictive capabilities by accounting for changes over time. In terms of future work, the integration of multimodal data including imaging, genomic sequences, and longitudinal EHRs can be explored to strengthen the predictive performance and interpretability of the models. Federated learning frameworks may also be considered to ensure data privacy while enabling cross-institutional model training. Additionally, usability studies involving healthcare practitioners can be conducted to refine the Streamlit-based interface, ensuring it aligns with clinical workflows and decision-making requirements.

Expanding the model’s applicability to earlier stages of cognitive decline, such as mild cognitive impairment (MCI), could further contribute to timely interventions. The explainability component can also be improved by combining SHAP with other model-agnostic techniques such as LIME or counterfactual explanations for more robust and clinician-friendly insights. These directions will ensure that the proposed system evolves toward a scalable, ethical, and impactful solution for real-world clinical adoption.

Related research article

None

Declaration of generative AI and AI-assisted technologies in the writing process

During the preparation of this work the author(s) used ChatGPT 4.5 in order to improve the language and readability. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the publication.

Ethics statements

The research presented in this study was conducted using a publicly available or anonymized dataset containing no personally identifiable information. No human participants, animals, or personally sensitive data were involved directly in this study. As such, formal ethical approval was not required. All data preprocessing, analysis, and model development were carried out in compliance with ethical guidelines for research involving secondary health data. The primary objective of the study is to support clinical decision-making through interpretable machine learning, ensuring transparency, accountability, and fairness in predictive healthcare applications.

CRediT authorship contribution statement

Rajkumar Govindarajan: Writing – original draft, Methodology, Formal analysis. K. Thirunadanasikamani: Data curation, Formal analysis. Komal Kumar Napa: Writing – original draft, Methodology, Formal analysis. S. Sathya: Project administration, Investigation. J. Senthil Murugan: Writing – review & editing, Supervision, Visualization. K. G. Chandi Priya: Writing – review & editing.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Data availability

Data will be made available on request.

References

  • 1.Zhou T., et al. A review of deep learning in Alzheimer’s disease: applications and challenges. Knowl.-Based Syst. 2021;221 [Google Scholar]
  • 2.Feng Q., Zhang H., Dong Q., Liu Q. Interpretable machine learning for early prediction of Alzheimer’s disease using clinical and cognitive data. IEEE J. Biomed. Health Inf. 2023;27(2):789–800. [Google Scholar]
  • 3.Islam J., Zhang Y. Brain MRI analysis for Alzheimer’s disease diagnosis using an ensemble system of deep convolutional neural networks. Brain Inf. 2018;5(2):2. doi: 10.1186/s40708-018-0080-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Bae J., Yoon H.J. Explainable machine learning for predicting Alzheimer’s disease from clinical data. Diagnostics. 2021;11(7):1292. [Google Scholar]
  • 5.Alzheimer’s Disease International . 2022. World Alzheimer Report 2022.https://www.alzint.org/u/World-Alzheimer-Report-2022.pdf [Google Scholar]
  • 6.World Health Organization . WHO; Geneva: 2023. Global Status Report On the Public Health Response to Dementia 2023.https://www.who.int/publications/i/item/9789240075046 [Google Scholar]
  • 7.Zhang D., et al. Deep learning models for Alzheimer’s disease classification. IEEE Access. 2022;10:67271–67283. [Google Scholar]
  • 8.Dubois B., et al. Advancing research diagnostic criteria for Alzheimer's disease: the IWG-2 criteria. Lancet Neurol. 2014;13(6):614–629. doi: 10.1016/S1474-4422(14)70090-0. [DOI] [PubMed] [Google Scholar]
  • 9.Li X., Liu Y., Sheng X., et al. Building a neuropsychiatric testing database for veterans using unstructured electronic health records. Alzheimer’s Dement. 2023;19(S22) doi: 10.1002/alz.075390. [DOI] [Google Scholar]
  • 10.Rajpurkar P.G., et al. AI in healthcare: past, present and future. Nat. Biomed. Eng. 2022;6(2):134–150. [Google Scholar]
  • 11.Yan R., et al. A hybrid deep learning approach for AD prediction using MRI and clinical data. Brain Inf. 2023;10(1):12–23. [Google Scholar]
  • 12.Wang L., et al. Ensemble learning for early Alzheimer’s disease prediction using EHRs. J. Biomed. Inf. 2023;138 doi: 10.1016/j.jbi.2023.104322. [DOI] [Google Scholar]
  • 13.Holzinger J., et al. What do we need to build explainable AI systems for health care? Patterns. 2023;4(3) doi: 10.1016/j.patter.2023.100664. [DOI] [Google Scholar]
  • 14.Ribeiro R., et al. Proc. ACM SIGKDD. 2023. Model-agnostic interpretability for black box models: LIME. [DOI] [Google Scholar]
  • 15.Lundberg S., Lee S. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2022;31 [Google Scholar]
  • 16.Gupta R., et al. SHAP-based model explanation in Alzheimer’s risk prediction. BMC Med. Inf. Decis. Mak. 2023;23(1) doi: 10.1186/s12911-023-02238-9. [DOI] [Google Scholar]
  • 17.Jahan S., Abu Taher K., Kaiser M.S., Mahmud M., Rahman M.S., Hosen A.S., et al. Explainable AIbased Alzheimer’s prediction and management using multimodal data. PLoS ONE. 2023;18(11) doi: 10.1371/journal.pone.0294253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Vlontzou M. .E., Athanasiou M., Dalakleidi K. .V., Skampardoni I., Davatzikos C., Nikita K. A comprehensive interpretable machine learning framework for mild cognitive impairment and Alzheimer’s disease diagnosis. Sci Rep. 2025;15:8410. doi: 10.1038/s41598-025-92577-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Menon A.P., Gunasundari R. SHAP‑based feature selection and explainable machine learning classification of Alzheimer’s disease. J. Comput. Anal. Appl. 2024;33(6):798–817. May 22. [Google Scholar]
  • 20.Patel A., et al. Real-time disease prediction with streamlit and explainable AI. J. Comput. Health. 2024;4(2):85–93. [Google Scholar]
  • 21.Can D.-C., Tang Q.-H., Ha H., Nguyen B.T., Chén O.Y. REMEMBER: retrieval-based explainable multimodal evidence-guided modeling for brain evaluation and reasoning in zero and fewshot neurodegenerative diagnosis. 2025. https://arxiv.org/abs/2504.09354
  • 22.Baghirova N., Vũ D.-T., Can D.-C., Schneuwly Diaz C., Bodlet J., Blanc G., Hrusanov G., Ries B., Chén O.Y. Explainable graphtheoretical machine learning: with application to Alzheimer’s disease prediction. 2025. https://arxiv.org/abs/2503.16286
  • 23.Chandler C., Diaz-Asper C., Turner R.S., Reynolds B., Elvevåg B. An explainable machine learning model of cognitive decline derived from speech. Alzheimer’s Dement.: Diagn. Assess. Dis. Monit. 2023;15(4) doi: 10.1002/dad2.12516. Dec. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Vimbi V., Shaffi N., Mahmud M. Interpreting artificial intelligence models: a systematic review on the application of LIME and SHAP in Alzheimer’s disease detection. Brain Inf. 2024;11(10) doi: 10.1186/s40708-024-00222-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Tonekaboni S., et al. Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Med, Inf., Decis, Mak. 2020;20(1) doi: 10.1186/s12911-020-01332-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Lundberg S., et al. Data analysis with Shapley values for automatic subject selection in Alzheimer’s disease studies. Alzheimers Res, Ther. 2021;13(1) doi: 10.1186/s13195-021-00879-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Holzinger A., et al. Solving the explainable AI conundrum by bridging clinicians' needs and technological requirements. NPJ Digit, Med. 2023;6(1) doi: 10.1038/s41746-023-00837-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Wang Y., et al. Interpretable machine learning-driven biomarker identification and risk prediction for Alzheimer’s disease. Sci, Rep. 2024;14(1) doi: 10.1038/s41598-024-80401-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Ghassemi M., et al. A review of explainable artificial intelligence in healthcare. Comput, Biol, Med. 2022;149 doi: 10.1016/j.compbiomed.2022.106203. [DOI] [Google Scholar]
  • 30.Holzinger J., et al. Explainable AI for clinical and remote health applications: a survey. Artif, Intell, Rev. 2022;55(7):5731–5780. doi: 10.1007/s10462-022-10304-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Lee J.D., et al. Explainable AI-based Alzheimer's prediction and management using multimodal data. Front, Aging Neurosci. 2023;15 doi: 10.3389/fnagi.2023.1267020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Ribeiro M.T., et al. Explainable artificial intelligence for mental health through transparency and interpretability. NPJ Digit, Med. 2023;6(1) doi: 10.1038/s41746-023-00751-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Dash S.S., et al. An explainable machine learning-based prediction model for Alzheimer's disease progression. Front, Aging Neurosci. 2023;15 doi: 10.3389/fnagi.2023.1267020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Rajpurkar P.G., et al. AI in healthcare: past, present and future. Nat, Biomed, Eng. 2022;6(2):134–150. doi: 10.1038/s41551-021-00758-7. [DOI] [Google Scholar]
  • 35.Yan R., et al. A hybrid deep learning approach for AD prediction using MRI and clinical data. Brain Inf. 2023;10(1):12–23. doi: 10.1186/s40708-023-00158-9. [DOI] [Google Scholar]
  • 36.Wang L., et al. Ensemble learning for early Alzheimer’s disease prediction using EHRs. J. Biomed, Inf. 2023;138 doi: 10.1016/j.jbi.2023.104322. [DOI] [Google Scholar]
  • 37.Holzinger J., et al. What do we need to build explainable AI systems for health care? Patterns. 2023;4(3) doi: 10.1016/j.patter.2023.100664. [DOI] [Google Scholar]
  • 38.Ribeiro R., et al. Proc. ACM SIGKDD. 2023. Model-agnostic interpretability for black box models: LIME. [DOI] [Google Scholar]
  • 39.Lundberg S., Lee S. A unified approach to interpreting model predictions. Adv, Neural Inf, Process, Syst. 2022:31. https://proceedings.neurips.cc/paper/2022/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html [Google Scholar]
  • 40.Gupta R., et al. SHAP-based model explanation in Alzheimer’s risk prediction. BMC Med, Inf., Decis, Mak. 2023;23(1) doi: 10.1186/s12911-023-02238-9. [DOI] [Google Scholar]
  • 41.Kim A.N., et al. Automated feature selection and explainable machine learning for early Alzheimer’s diagnosis using cognitive tests and behavioral data. IEEE J. Biomed, Health Inf. 2024;28(1):72–81. [Google Scholar]
  • 42.Sarraf M.H., Yin Y.T. A survey on interpretability techniques in deep learning models for neurological disorders. IEEE Trans, Neural Syst, Rehabil, Eng. 2024;32:512–525. [Google Scholar]
  • 43.Roy K.K., et al. An explainable AI-driven framework for early Alzheimer’s risk prediction using multimodal health data. Heal., Anal. 2025;5(2) [Google Scholar]
  • 44.Kumar N.K., et al. Proc. Int. Conf. Comput. Intell. Netw. Syst. (CINS), Dubai, UAE. 2024. Enhancing Alzheimer’s disease detection: a comparative study of deep learning techniques with transfer learning and custom CNN models; pp. 1–4. [DOI] [Google Scholar]
  • 45.Kumar N.K., et al. Proc. Int. Conf. Smart Electron. Commun. ICOSEC; Trichy, India: 2024. Enhanced Alzheimer’s disease prediction through advanced imaging: a study of machine learning and deep learning approaches; pp. 1177–1182. [DOI] [Google Scholar]
  • 46.Chen T., Guestrin C. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016. XGBoost: a scalable tree boosting system; pp. 785–794. [DOI] [Google Scholar]
  • 47.Cover T., Hart P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory. 1967;13(1):21–27. doi: 10.1109/TIT.1967.1053964. [DOI] [Google Scholar]
  • 48.Hosmer D.W., Lemeshow S., Sturdivant R.X. 3rd ed. Wiley; Hoboken, NJ: 2013. Applied Logistic Regression. [Google Scholar]
  • 49.Breiman L. Random forests. Mach Learn. 2001;45(1):5–32. doi: 10.1023/A:1010933404324. [DOI] [Google Scholar]
  • 50.Quinlan J.R. Induction of decision trees. Mach Learn. 1986;1(1):81–106. doi: 10.1007/BF00116251. [DOI] [Google Scholar]
  • 51.Sarker I.H. Machine learning: algorithms, real-world applications and research directions. SN Comput. Sci. 2021;2(3):160. doi: 10.1007/s42979-021-00592-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Darekar G., Murad T., Miao H.-Y., Thakuri D.S., Chand G.B. AgeNetSHAP: an explainable AI approach for optimally mapping multivariate regional brain age and clinical severity patterns in Alzheimer’s disease. medRxiv. 2025 doi: 10.1101/2025.02.28.25323097. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data will be made available on request.


Articles from MethodsX are provided here courtesy of Elsevier

RESOURCES