Abstract
Introduction
Early Alzheimer's disease (AD) risk assessment requires accessible alternatives to invasive biomarkers. We developed a multi‐modal machine learning framework using questionnaire metadata from participants with concurrent microbiome sequencing data.
Methods
We analyzed 9832 participants with 120 metadata features across five categories (demographic, dietary, lifestyle, nutritional, medical). Features were selected via Pearson correlation and chi‐squared tests. Four algorithms were trained using 10‐fold cross‐validation with synthetic minority oversampling technique (SMOTE), validated on 1967 samples. The 16S rRNA sequencing data from the same cohort with 2000 samples enabled microbiome composition analysis.
Results
Medical history (area under the curve [AUC] = 0.871) and dietary patterns (AUC = 0.874) achieved best performance, outperforming demographic (0.795), lifestyle (0.660), and nutritional (0.569) domains (p < 0.001). Microbiome analysis revealed dysbiosis markers (Prevotella/Bacteroides ratio: 1.921) linking dietary factors to potential neuroinflammatory pathways.
Discussion
These findings support non‐invasive, multi‐modal screening combining medical and dietary evaluation for AD risk stratification, with preliminary microbiome evidence suggesting gut–brain axis dysbiosis as a mechanistic pathway warranting validation in larger cohorts.
Keywords: Alzheimer's disease, gut–brain axis, machine learning, microbiome dysbiosis, multi‐modal prediction, risk stratification, SHAP analysis, XG‐Boost
Highlights
Data modifiable factors were selected via statistical filtering (chi‐squared and Pearson correlation) before machine learning modeling, improving predictive performance and interpretability.
Artificial intelligence (AI) ‐powered models accurately distinguished Alzheimer's disease (AD) cases from controls using questionnaire‐based modifiable risk factors.
Gut microbial diversity measures and genus‐level profiling identified microbial signatures associated with AD risk.
Shapley additive explanation (SHAP) analysis identified dietary patterns, lifestyle behaviors, and medical history as influential exposomic predictors of Alzheimer's risk.
The study demonstrates the value of combining explainable AI with exploratory gut microbiome analysis to investigate potential mechanistic links for early AD risk assessment.
1. INTRODUCTION
1.1. Global burden of Alzheimer's disease
Alzheimer's disease (AD) affects over 55 million people worldwide, with projections reaching 152 million by 2050. 1 As the leading cause of dementia (60%–70% of cases), AD imposes devastating consequences through progressive cognitive decline and functional impairment. 2 The global economic burden exceeds $1.3 trillion annually, projected to double by 2030 without effective interventions. 3 , 4 This burden necessitates accessible, population‐specific risk assessment strategies beyond traditional biomarker approaches. 5
1.2. Current challenges in early detection and diagnosis
Early AD detection remains challenging. Current paradigms rely on invasive cerebrospinal fluid (CSF) analysis or expensive neuroimaging (amyloid/tau positron emission tomography [PET]: $3000–5000), which identify disease at advanced stages when interventions show minimal efficacy. 6 , 7 , 8 Blood‐based biomarkers (phosphorylated tau [p‐tau] 181, p‐tau217, neurofilament light chain [NfL], glial fibrillary acidic protein [GFAP]) offer improved accessibility but remain expensive (> $500/test), require specialized infrastructure, and face standardization challenges. 9 , 10 , 11 Moreover, biomarker positivity without cognitive symptoms complicates risk stratification. 12 These limitations underscore the need for complementary approaches leveraging accessible, non‐invasive data for population‐level screening.
1.3. The gut–brain axis: A paradigm shift in Alzheimer's pathophysiology
The gut–brain axis has emerged as a critical modulator of neurodegeneration. 13 , 14 The gut microbiome (> 100 trillion microorganisms) influences brain function through: (1) neuroactive metabolite production (short‐chain fatty acids [SCFAs], neurotransmitter precursors) modulating neuroinflammation; 15 , 16 (2) intestinal barrier regulation, with dysbiosis‐induced permeability enabling lipopolysaccharide (LPS) translocation and immune activation; (3) vagal nerve signaling affecting cognition; 17 , 18 and (4) synthesis of neuroprotective vitamins and metabolites. 19
AD patients consistently exhibit microbiome dysbiosis: reduced alpha diversity, altered beta diversity, depletion of short‐chain fatty acid (SCFA) producers (Faecalibacterium, Roseburia, Coprococcus), and enrichment of pro‐inflammatory taxa. 20 , 21 , 22 , 23 These signatures correlate with AD severity and CSF biomarkers, suggesting dose‐dependent relationships. 24 , 25
1.4. Dietary and lifestyle modulation of microbiome–Alzheimer's associations
Diet potently modulates microbiome composition within days to weeks. 26 , 27 , 28 Mediterranean, DASH, and MIND dietary patterns associate with reduced AD incidence and slower cognitive decline, 29 promoting beneficial microbiomes through fiber‐rich foods fueling SCFA producers, 30 , 31 omega‐3 fatty acids reducing pro‐inflammatory bacteria, 32 , 33 and polyphenols increasing Akkermansia muciniphila. 34 Conversely, Western diets high in saturated fat and processed foods induce dysbiosis, increasing intestinal permeability, systemic inflammation, and AD risk. 27 , 35 , 36 These bidirectional diet–microbiome interactions suggest modifiable pathways for AD prevention. 37 , 38 , 39
1.5. Machine learning for Alzheimer's prediction: Current landscape and limitations
Machine learning (ML) algorithms (support vector machine [SVM], gradient boosting, neural networks) demonstrate capacity to integrate high‐dimensional data for AD prediction, achieving >80% accuracy with neuroimaging and electronic health records. 40 , 41 , 42 , 43 However, critical limitations constrain translation: (1) single‐modality focus ignoring multifactorial etiology; 44 , 45 (2) “black box” models lacking interpretability; (3) small microbiome cohorts without integration of diet/lifestyle factors; and (4) inadequate handling of class imbalance. 46
RESEARCH IN CONTEXT
Systematic review: Although emerging evidence implicates gut dysbiosis in Alzheimer's pathogenesis and suggests microbiome‐targeted interventions may hold therapeutic promises, fundamental questions remain unanswered. First, the relative contributions of microbiome composition versus environmental factors (diet, lifestyle, medical comorbidities) that shape the microbiome to Alzheimer's disease (AD) risk are unknown. Are microbial signatures independent predictors, or do they merely reflect the cumulative effects of health behaviors? Second, which data modalities provide the greatest discriminative capacity for AD prediction: demographic characteristics, dietary patterns, nutritional intake, lifestyle behaviors, or medical history? Systematic, head‐to‐head comparisons within unified analytical frameworks are lacking. Third, how do dietary patterns influence microbiome composition, and does this microbial mediation explain diet–AD associations, or do diets exert direct effects on brain health independent of microbiome changes? Fourth, can machine learning (ML) models leveraging non‐invasive, questionnaire‐based data achieve clinically meaningful predictive performance (area under the curve [AUC] > 0.80) comparable to expensive biomarker or imaging approaches?
Interpretation: This study addresses these critical gaps through a comprehensive, multi‐modal ML framework systematically comparing five distinct factor categories, demographic characteristics, dietary patterns, nutritional intake, lifestyle behaviors, and medical history, for AD prediction. By integrating rigorous statistical feature selection, multiple ML algorithms with proper handling of class imbalance, interpretable artificial intelligence (AI) (Shapley additive explanation [SHAP] analysis) to identify key predictive features, and preliminary microbiome profiling to explore mechanistic pathways, we aim to: (1) establish the relative predictive capacity of each factor category; (2) identify the most influential features within top‐performing modalities; (3) reveal whether dietary patterns or nutritional quantities better predict AD risk; (4) provide mechanistic context linking dietary factors to neurodegeneration through gut–brain axis dysbiosis; and (5) inform the development of accessible, non‐invasive screening tools and modifiable intervention targets for Alzheimer's prevention.
Future directions: This framework offers a scalable and interpretable approach for early AD risk assessment and supports targeted preventive strategies. Correlative microbiome findings provide hypotheses for future longitudinal studies to investigate causality, potentially informing precision interventions and public health strategies.
1.6. Knowledge gaps and study rationale
Although emerging evidence implicates gut dysbiosis in Alzheimer's pathogenesis and suggests microbiome‐targeted interventions may hold therapeutic promises, fundamental questions remain unanswered. First, the relative contributions of microbiome composition versus environmental factors (diet, lifestyle, medical comorbidities) that shape the microbiome to AD risk are unknown. Are microbial signatures independent predictors, or do they merely reflect the cumulative effects of health behaviors? Second, which data modalities provide the greatest discriminative capacity for AD prediction: demographic characteristics, dietary patterns, nutritional intake, lifestyle behaviors, or medical history? Systematic, head‐to‐head comparisons within unified analytical frameworks are lacking. Third, how do dietary patterns influence microbiome composition, and does this microbial mediation explain diet‐AD associations, or do diets exert direct effects on brain health independent of microbiome changes? Fourth, can ML models leveraging non‐invasive, questionnaire‐based data achieve clinically meaningful predictive performance (area under the curve [AUC] > 0.80) comparable to expensive biomarker or imaging approaches?
We address these gaps through a comprehensive multi‐modal ML framework systematically for AD prediction in 9,832 participants. By integrating rigorous feature selection, multiple ML algorithms with proper class imbalance handling, Shapley additive explanation (SHAP) interpretability analysis, and exploratory microbiome profiling, we aim to: (1) establish relative predictive capacity of each category; (2) identify key features within top‐performing modalities; (3) compare dietary patterns versus nutritional quantities; (4) provide mechanistic context through gut–brain axis dysbiosis; and (5) inform development of accessible screening tools and modifiable intervention targets for AD prevention.
2. METHODOLOGY
2.1. Study design and data source
This study utilized data from the National Center for Biotechnology Information (NCBI) bio‐sample database (PRJEB11419), comprising 33,374 biosamples with 16S rRNA amplicon sequencing and comprehensive metadata including demographics, diet, lifestyle, and medical history. After quality control (excluding samples with >20% missing values, duplicates, and records lacking AD diagnosis), 9832 high‐quality samples with 120 predictor features across five categories remained for analysis (Appendix A, Figure A1). Chi‐squared and correlation analyses were used for initial dimensionality reduction, while non‐linear relationships were captured through model‐based feature importance and non‐linear ML models.
Data retrieval was performed using the git clone command to access the metadata repository, followed by structured extraction using the biosampletable.py script. All computational analyses were conducted in Python 3.9 within Jupyter Notebook environments deployed on a high‐performance computing (HPC) cluster.
2.2. Data preprocessing and quality control
Chi‐squared tests identified categorical features with significant associations with AD status, while Pearson correlation assessed linear relationships among continuous variables. Only features with positive and significant correlations were retained. The 120 features were organized into five categories: Demographic (age, gender, body mass index [BMI], weight change), Dietary (food group frequencies, Mediterranean/Western diet patterns), Nutritional Intake (macronutrients, micronutrients, vitamins), Lifestyle (physical activity, sleep, smoking, alcohol), and Medical History (cardiovascular disease, diabetes, depression, antibiotic use, surgical history). The target column was binary, with healthy individuals coded as 0 and diseased individuals as 1; this aligns with the clinical diagnosis, and no additional cognitive scoring or International Classification of Diseases (ICD) ‐based coding was applied. Participants had undergone diagnostic evaluation at the time of data collection; however, the exact timing of microbiome sampling, questionnaire completion, and Alzheimer's diagnosis was unavailable. As the data are cross‐sectional and reflect post‐screening status, causal or pre‐diagnostic inference is not possible. Consequently, the models identify disease‐associated patterns rather than predict disease onset.
The dataset was partitioned 80/20 into training (n = 7865, 30.5% AD prevalence) and test sets (n = 1967) using stratified sampling. Model performance was evaluated both with and without synthetic minority oversampling technique (SMOTE) to quantify its performance on training data within cross‐validation folds to address class imbalance while preventing data leakage.
The dataset's observations were grouped into distinct, non‐overlapping categories. When the null hypothesis is assumed to be true, the resulting test statistic follows a chi‐squared (χ2) distribution. The primary objective of this test as represented in Equation (1), is to evaluate how likely the observed frequencies (O) are in comparison to the expected frequencies (E), assuming no significant difference exists between categories under the null hypothesis. 47
| (1) |
Alongside the chi‐squared analysis (Appendix A, Figure A2), the Pearson correlation test was conducted on numerical data to assess linear relationships among continuous variables. Pearson correlation is a statistical method used to determine the strength and direction of a linear relationship between two variables (Appendix A, Figure A3). The resulting coefficient ranges from −1 to +1, where values closer to +1 indicate a strong positive correlation, and values near 0 or negative suggest weak or no correlation. 48
(2)
In Equation (2), x and y represent the variables under analysis, while xˉ and yˉ are their respective means. These averages serve as baselines to measure deviations of individual data points from the central tendency of each dataset. A coefficient of 1.00 indicates a perfect positive relationship, whereas values approaching −1 reflect an inverse or weak relationship. For this study, only features with positive and significant correlation values were retained for further evaluation due to their stronger predictive potential.
To mitigate class imbalance and improve model sensitivity, the SMOTE was applied exclusively to the training set. SMOTE generates synthetic minority class samples by interpolating between existing minority instances and their k‐nearest neighbors in feature space. This approach increases minority class representation without simply duplicating existing samples, which can lead to overfitting. SMOTE was applied after train–test splitting and within each cross‐validation fold during model training to prevent data leakage. The test set remained untouched by SMOTE, ensuring that model performance was evaluated on the original, realistic class distribution. Post‐SMOTE, the training set achieved perfect class balance.
2.3. ML model development
Models were selected following LazyPredict, which identified Logistic Regression, SVM, XGBoost, and LightGBM as top performers for the structured tabular data used in this study. Hyperparameter tuning used five‐fold stratified cross‐validation with grid search, optimizing for receiver operating characteristics (ROC) ‐AUC. Nested 10‐fold cross‐validation ensured valid performance estimates. Models were evaluated using accuracy, precision, recall, F1‐score, specificity, confusion matrices, and ROC‐AUC as the primary metric. SHAP analysis identified influential features and their directional effects on predictions, providing interpretable insights into model decision‐making.
Hyperparameter tuning was conducted using five‐fold stratified cross‐validation with grid search (GridSearchCV) on the training set. Optimal hyperparameters were selected based on maximizing ROC‐AUC score, prioritizing discriminative capacity over raw accuracy, given the clinical importance of sensitivity‐specificity tradeoffs.
To ensure valid performance estimates and prevent data leakage, a nested cross‐validation framework was employed using imbalanced‐learn's Pipeline: Outer Loop (10‐fold stratified cross‐validation): Training data were split into 10 folds, maintaining class distribution. Inner Loop (per‐fold processing): Within each fold, SMOTE was applied to the fold's training portion (9/10 of training data), the model was trained on SMOTE‐resampled data, and predictions were made on the fold's validation portion (1/10 of training data, original distribution). This approach ensures that synthetic samples never influence validation or test set evaluation, providing unbiased estimates of generalization performance. Cross‐validation metrics (accuracy, F1‐score, ROC‐AUC) were averaged across 10 folds, with standard deviations reported to assess stability.
To elucidate which features most strongly influenced model predictions and provide clinically actionable insights, SHAP analysis was applied to the best‐performing models from each factor category. SHAP values decompose each prediction into contributions from individual features, grounded in cooperative game theory. Each point represents a test sample, colored by feature value (red = high, blue = low), positioned horizontally by SHAP value magnitude. Features are ranked by means absolute SHAP value. This visualization reveals both feature importance and directional effects. Scatter plots of feature values (x‐axis) versus SHAP values (y‐axis) for the top six features, revealing non‐linear relationships and interaction effects. Trend lines (linear regression) summarize average directional effects.
2.4. Microbiome composition analysis
To investigate potential mechanistic pathways linking dietary and medical factors to AD risk through gut–brain axis modulation, available 16S rRNA gene sequencing data were subjected to comprehensive taxonomic profiling, diversity analysis, and dysbiosis characterization. This exploratory analysis aimed to identify microbial signatures that could mechanistically connect modifiable risk factors identified through ML to neurodegenerative processes. Available 16S rRNA sequencing data underwent taxonomic profiling at genus level. Alpha diversity (Shannon index) and beta diversity were calculated. Differential abundance analysis (LEfSe/ALDEx2) identified discriminant taxa (|log2FC| > 1, FDR < 0.05). A composite dysbiosis score integrated Shannon diversity, SCFA producer abundance, and Prevotella/Bacteroides ratio, classified as low, moderate, or high dysbiosis. Spearman correlations linked dysbiosis markers with SHAP values from top dietary/medical features.
3. RESULTS
3.1. Dataset characteristics and study population
The study cohort comprised 9,832 participants from diverse geographic and demographic backgrounds. The dataset was partitioned into training (n = 7865, 80%) and independent test (n = 1967, 20%) sets using stratified random sampling to preserve class distribution. The prevalence of AD was 30.5% (n = 2401/7865) in the training set, reflecting substantial class imbalance that was addressed through SMOTE applied exclusively to training data to prevent information leakage. Due to the high dimensionality of structured metadata, features were grouped into five domains, demographic, dietary, nutritional, lifestyle, and medical to enable domain‐specific analysis of Alzheimer's risk. ML models were trained within individual domains to reveal domain‐level patterns and on the combined feature set to capture cross‐domain interactions. ML prediction and microbiome pathway analysis were conducted on cohort‐aligned data from the same project accession using metadata and sequence data without feature‐level integration.
3.1.1. Demographic factors
Weight emerged as the strongest demographic predictor, showing strong negative correlation with AD risk: lower weight elevated risk (positive SHAP values), while higher weight was protective (negative SHAP), suggesting weight loss serves as a powerful risk indicator for prodromal AD (Appendix A, Figure A4). Weight change demonstrated dose‐dependent effects, with greater weight loss amplifying risk more than gradual changes, supporting its role as a dynamic neurodegeneration biomarker. Dominant hand exhibited unexpected patterns: right‐handedness increased risk (SHAP +1), left‐handedness was protective (SHAP −1), and ambidexterity showed extreme risk elevation (SHAP +2), potentially reflecting altered brain lateralization and cognitive reserve. BMI showed weaker positive correlation with risk. Deodorant usage unexpectedly associated with elevated risk (SHAP +0.2), potentially through aluminum‐containing antiperspirants promoting amyloid aggregation in Figure 1.
FIGURE 1.

Top demographic dependence features.
3.1.2. Dietary factors
SHAP dependence plots reveal the six most influential dietary predictors (Appendix A, Figure A5). Lactose emerged as the strongest predictor (mean impact: 1.5206); Figure 2 demonstrates a strong negative correlation with AD risk, where higher intake was protective (SHAP −1.5) while lower intake elevated risk (SHAP +1.0), suggesting lactose insufficiency serves as a powerful risk indicator. Water Source showed a moderate negative trend (impact: 0.2156) with dispersed distributions indicating context‐dependent effects, potentially reflecting contamination patterns. Infant Feeding demonstrated subtle positive trend (impact: 0.1739), with suboptimal feeding practices associated with elevated risk. Antibiotic History exhibited negative relationship (impact: 0.1495), where consistent antibiotic use paradoxically associated with lower AD risk despite microbiome alterations. Sugary Sweets (impact: 0.1443) and Animal Products (impact: 0.1394) showed positive associations with risk through glycemic dysregulation, trimethylamine N‐oxide (TMAO) production, and microbiome shifts.
FIGURE 2.

Top dietary SHAP dependence features. SHAP, Shapley additive explanation.
3.1.3. Lifestyle factors
SHAP dependence plots for the six most influential lifestyle predictors are presented (Appendix A, Figure A6). Alcohol Frequency demonstrated the strongest influence (mean impact: 0.3306); Figure 3 shows dose‐dependent risk elevation as consumption increases from 0 to 10+ occasions weekly (SHAP: −0.5 to +2.5), reflecting neurotoxicity, thiamine deficiency, and neuroinflammation. Cat Ownership exhibited unexpected protective effects (impact: 0.2353), with cat owners showing SHAP values of −1.5 to −2.0, potentially via Toxoplasma gondii neuroprotection through preconditioning mechanisms. Sleep Duration showed a moderate negative trend (impact: 0.1669), where longer sleep was protective, reflecting glymphatic clearance and metabolic restoration. Smoking Frequency displayed a paradoxical strong negative relationship (impact: 0.1665), with increasing frequency driving SHAP to −3.0 (protective). Alcohol Consumption (impact: 0.1474) showed modest effects, distinguishing consumption intensity from frequency. Cosmetics Frequency exhibited positive association (impact: 0.1239), with increased use elevating risk (SHAP: ‐0.5 to +1.0), potentially through endocrine disruption and aluminum exposure.
FIGURE 3.

Top lifestyle dependence features.
3.1.4. Nutritional intake
SHAP dependence plots for the most influential nutritional features are presented (Appendix A, Figure A7). In Figure 4, Plant Protein (mean impact: 0.1958) demonstrated strong negative trend across 0–8 servings, with higher consumption driving SHAP to −2.5 (protective), suggesting dose‐dependent neuroprotection. Multivitamin Use (impact: 0.1686) unexpectedly showed protective effects with daily users clustering near SHAP −1.0, potentially reflecting healthy user bias or micronutrient deficiency prevention. Water Intake (impact: 0.1545) displayed weak positive trends, supporting glymphatic clearance of amyloid‐β and tau. Supplements (impact: 0.1488) showed slight negative associations with frequent users reaching SHAP −2.0, likely reflecting omega‐3, vitamin D, or probiotic supplementation, though benefits appear conditional on baseline deficiency status. Frozen Dessert Consumption (impact: 0.1327) exhibited positive trend, elevating risk through insulin resistance, advanced glycation end product (AGEs), dysbiosis, and correlation with Western dietary patterns.
FIGURE 4.

Top nutritional dependence features.
3.1.5. Medical history factors
Key medical history predictors include cerebrovascular disease, hypertension, diabetes, depression, and cardiovascular events, linked to vascular pathology and metabolic dysfunction in AD progression (Appendix A, Figure A8). In Figure 5, Appendix Removal (mean impact: 0.8983) showed the strongest influence, with appendectomy associated with strongly positive SHAP values reaching +2.5, indicating powerful risk elevation. Acid Reflux (impact: 0.4880) displayed a clear negative trend, with reflux showing protective SHAP values reaching −2.0, serving as a strong binary discriminator. Tonsils Removed (impact: 0.2662) showed a positive relationship, with tonsillectomy associated with SHAP +0.5 to +0.75, indicating consistent but moderate predictive power. Chickenpox (impact: 0.0944) displayed a negative relationship, with chickenpox history showing protective effects and modest predictive value with interaction effects. Diabetes and Migraine showed a positive trend with diagnosis associated with SHAP up to +0.75, representing consistent positive predictors. The combined feature importance across all domains is shown (Appendix A Figure A9).
FIGURE 5.

Top medical dependence features.
3.2. Comparative performance analysis across modalities
SMOTE was applied to the training data to address class imbalance, with performance evaluated both with and without oversampling. For XGBoost, accuracy improved from 0.70 to 0.80 and F1‐score from 0.60 to 0.71.
Demographics provided moderate predictive capacity, Gradient Boosting achieved best performance (ROC‐AUC = 0.796, Accuracy = 67.0%), with cross‐validation ROC‐AUC = 0.744 ± 0.024, showing greater variability than medical or dietary modalities (Appendix A, Figures A10 and A11).
Dietary Patterns demonstrated near‐equivalent performance to medical history. XGBoost achieved optimal results (ROC‐AUC = 0.874, 95% CI: 0.860‐0.888; Accuracy = 80.4%; Recall = 75.0%; F1 = 0.700), the highest discriminative capacity across all modalities (Appendix A, Figures A12 and A13). Cross‐validation yielded mean accuracy 83.4 ± 0.9% with ROC‐AUC consistently > 0.85. High recall (75.0%) indicates suitability for population‐level screening. SHAP analysis revealed omega‐3‐rich foods, antioxidants, and whole grains were protective, while processed meats, refined carbohydrates, and saturated fats increased risk. Dietary patterns outperformed individual nutrient quantities, suggesting synergistic interactions.
XGBoost (ROC‐AUC = 0.695, Accuracy = 60.2%) marginally exceeded chance in lifestyle behaviors that showed weak discrimination (Appendix A, Figures A14 and A15). Cross‐validation revealed high variability (accuracy = 58.2 ± 2.3%), with weak univariate associations, suggesting longer follow‐up periods or measurement error limitations. SVM (ROC‐AUC = 0.600) at chance level, with accuracy ranging 52.4%–64.8% and low precision (0.34–0.42) in nutritional intake performed poorest (Appendix A, Figures A16 and A17). Medical History emerged as the strongest predictor: Gradient Boosting (ROC‐AUC = 0.862, 95% CI: 0.847–0.877; Accuracy = 82.0%; Precision = 70.7%; Recall = 70.2%; F1 = 0.705). Cross‐validation showed robust generalization (ROC‐AUC = 0.871 ± 0.013). The confusion matrix revealed specificity = 87.1% and NPV = 87.2% (Appendix A, Figures A18 and A19).
3.3. Microbiome composition and mechanistic context
To investigate potential biological mechanisms underlying the strong predictive capacity of dietary and medical factors, while this preliminary analysis is underpowered for statistical inference, it provides a mechanistic context for understanding diet‐disease associations through the gut–brain axis (Appendix A, Figure A20). Figure 6 presents a comprehensive analysis linking gut microbiome dysbiosis to AD risk through multiple analytical approaches, supporting the hypothesis that dietary and medical history factors influence AD susceptibility via microbial mechanisms. It quantifies dysbiosis severity using key microbial community metrics on a sample size of 2000, with a dysbiosis threshold of 1.0 (red dashed line) indicating pathological imbalance. Shannon Diversity (5.29) shows the most severe dysbiosis, exceeding the threshold by over five‐fold, indicating significantly reduced microbial diversity, a hallmark of gut dysbiosis associated with neurological disorders. Prevotella/Bacteroides ratio (1.92) is nearly two‐fold above threshold, suggesting altered fermentation patterns and shifted metabolic capacity. Coprococcus (2.13) and Lactobacillus (1.66) both exceed the threshold, indicating depletion of these neurologically beneficial genera that produce neuromodulator metabolites. Conversely, Faecalibacterium (1.30) shows moderate dysbiosis, while Roseburia SCFA (0.48) falls below threshold, indicating severe depletion of this critical SCFA producer, particularly concerning given SCFA's role in maintaining gut barrier integrity and reducing neuroinflammation.
FIGURE 6.

Microbiome dysbiosis mechanistic link to Alzheimer's.
The metagenomic prediction analysis identifies bacteria that are significantly enriched or depleted between the comparison groups (Appendix A, Figure A21). Significantly enriched taxa include Christensenellaceae (∼4.5‐fold), Coprococcus (∼4‐fold), Blautia (∼3‐fold), and Barnesiellaceae (∼2.5‐fold) all beneficial genera associated with lean body mass, metabolic health, and SCFA production. Ruminococcus, Ruminococcaceae, Corynebacterium, and Peptoniphilus show 2–2.5‐fold enrichment. Moderately enriched taxa include Lachnospiraceae, Clostridiaceae, Oscillospira, and Faecalibacterium key butyrate producers. Conversely, depleted taxa include Bacteroides (the most severely reduced), Desulfovibrio (a sulfate‐reducing bacterium linked to inflammation), Acinetobacter, Pseudomonas, and Eubacterium generally opportunistic or dysbiosis‐associated bacteria. Notably, Prevotella appears minimally depleted, while S24‐7 and Treponema show slight reductions. This differential abundance pattern suggests Group 1 (likely traditional/healthier population) is enriched in beneficial SCFA producers and leanness‐associated taxa, while Group 2 (likely industrialized/dysbiotic population) harbors more opportunistic bacteria. These findings directly support the dysbiosis signature described in Panel A and provide taxonomic resolution for the mechanistic pathways linking diet, microbiome composition, and AD risk (Appendix A, Figure A22).
4. DISCUSSION
This study shows that AD risk can be predicted with high accuracy (AUC 0.86–0.87) using only non‐invasive medical and dietary questionnaires, offering performance comparable to biomarker‐based approaches. Among all domains, medical history and dietary patterns were the strongest predictors, indicating that vascular conditions, depression, and eating behaviors provide meaningful early risk stratification.
Key medical predictors, including cerebrovascular disease, diabetes, hypertension, and depression, align with the vascular and neuropsychiatric mechanisms implicated in Alzheimer's, consistent with the two‐hit vascular hypothesis. 10 Dietary patterns also contributed strongly; processed and refined foods increased risk, while plant‐based and omega‐3‐rich diets appeared protective. 27 Notably, qualitative dietary behaviors predicted risk better than quantified nutrient intake, highlighting the combined influence of lifestyle and metabolic factors.
Exploratory microbiome findings suggested reduced SCFA‐producing and increased pro‐inflammatory taxa among higher‐risk individuals, consistent with prior reports of gut microbial dysbiosis in Alzheimer's. 7 Although feature‐level integration was not performed, cohort‐aligned analyses of structured metadata and microbiome‐derived features provide complementary insights and reveal cross‐domain patterns beyond prior unimodal analyses.
Overall, the results support a low‐cost, questionnaire‐based screening approach, which may precede confirmatory biomarker testing. Future studies should validate these findings longitudinally and examine targeted interventions focused on vascular health, diet, and microbiome modulation.
5. CONCLUSION
This study demonstrates that non‐invasive questionnaire‐based medical and dietary data can predict AD with high accuracy (AUC 0.86–0.87), comparable to biomarker methods but at far lower cost and burden. Dietary patterns proved stronger predictors than quantified nutrient intake, underscoring the importance of whole‐diet behaviors in risk assessment. Exploratory microbiome analysis suggests that diet‐linked dysbiosis and inflammation may contribute to Alzheimer's risk, providing biological context for the identified modifiable factors. Although preliminary, these findings warrant investigation in larger longitudinal cohorts. Together, these results support early, accessible risk stratification using modifiable factors, guided by interpretable ML insights to inform practical prevention strategies.
ETHICS STATEMENT
This article used open‐source data. No additional ethical approval was required for this secondary analysis of anonymized data, according to institutional guidelines.
CONSENT STATEMENT
This study used anonymized, publicly available data; therefore, informed consent was not required.
CONFLICT OF INTEREST STATEMENT
The authors declare no conflict of interest. There are no personal, professional, or institutional relationships that could be perceived as influencing the results presented in this manuscript. Author disclosures are available in the supporting information.
Supporting information
Supporting Information
Supporting Information
6. ACKNOWLEDGMENTS
Open access publishing facilitated by University of Technology Sydney, as part of the Wiley ‐ University of Technology Sydney agreement via the Council of Australasian University Librarians. Tallat Jabeen acknowledges the UTS President Schlaorship and International Research Scholarship for her PhD study.
Open access publishing facilitated by University of Technology Sydney, as part of the Wiley ‐ University of Technology Sydney agreement via the Council of Australasian University Librarians.
DATA AVAILABILITY STATEMENT
REFERENCES
- 1. Xie Z, Li L, Hou W, et al. Critical role of Oas1g and STAT1 pathways in neuroinflammation: insights for Alzheimer's disease therapeutics. J Transl Med. 2025;23(1):182. doi: 10.1186/s12967-025-06112-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Peddinti V, Avaghade MM, Suthar SU, et al. Gut instincts: unveiling the connection between gut microbiota and Alzheimer's disease. Clin Nutr ESPEN. Elsevier Ltd. 2024;60:266‐280. doi: 10.1016/j.clnesp.2024.02.019 [DOI] [PubMed] [Google Scholar]
- 3. Ahmed SK, Mohammed RA. Obesity: prevalence, causes, consequences, management, preventive strategies and future research directions. Metabol Open. 2025;27:100375. doi: 10.1016/j.metop.2025.100375 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Alzheimer's Association, Alzheimer's Association Report: 2025 Alzheimer's disease facts and figures. Alzheimer's & Dementia. 2025;21(4): 1‐119. doi: 10.1002/alz.70235 [DOI] [Google Scholar]
- 5. Singh H, Chopra C, Singh H, et al. Gut‐brain axis and Alzheimer's disease: therapeutic interventions and strategies. J Funct Foods. Elsevier Ltd. 2024;112. doi: 10.1016/j.jff.2023.105915 [DOI] [Google Scholar]
- 6. Mofrad R Babapour . The Use of Biofluid Biomarkers in Dementia: Implementation in Clinical Practice and Breaking New Grounds. PhD Thesis. 2021. https://research.vu.nl/en/publications/the‐use‐of‐biofluid‐biomarkers‐in‐dementia‐implementation‐in‐clin/
- 7. Anoop A, Singh PK, Jacob RS, Maji SK. CSF Biomarkers for Alzheimer's Disease Diagnosis. Int J Alzheimers Dis. 2010;2010:606802. doi: 10.4061/2010/606802 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Eratne D, Collins S, Nestor PJ, et al. Using cerebrospinal fluid biomarkers to diagnose Alzheimer's disease: an Australian perspective. Front Psychiatry. 2024;15. doi: 10.3389/fpsyt.2024.1488494 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Patel N, Agrawal N, Mishra R, et al. Emerging blood biomarkers in Alzheimer's disease: a proteomic perspective. Clinica Chimica Acta. Elsevier B.V. 2025;576. doi: 10.1016/j.cca.2025.120397 [DOI] [PubMed] [Google Scholar]
- 10. Abukuri, Daniel Naawenkangua . Novel Biomarkers for Alzheimer's Disease: Plasma Neurofilament Light and Cerebrospinal Fluid. International Journal of Alzheimer's Disease. 2024;15:6668159. 10.1155/2024/6668159 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Kang H, Woo SY, Shin D, et al. Reproducibility of plasma biomarker measurements across laboratories: insights into ptau217, GFAP, and NfL. Dement Neurocogn Disord. 2025;24(2):91. doi: 10.12779/dnd.2025.24.2.91 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Schöll M, Verberk IMW, del Campo M, et al. Challenges in the practical implementation of blood biomarkers for Alzheimer's disease. Lancet Healthy Longev. Elsevier Ltd. 2024;5(10):1‐12. doi: 10.1016/j.lanhl.2024.07.013 [DOI] [PubMed] [Google Scholar]
- 13. Kowalski K, Mulak A. Brain‐gut‐microbiota axis in Alzheimer's disease. J Neurogastroenterol Motil. 2019;25(1):48‐60. doi: 10.5056/jnm18087 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Varesi A, Pierella E, Romeo M, et al. The potential role of gut microbiota in Alzheimer's Disease: from diagnosis to treatment. Nutrients. MDPI. 2022;14(3):668. doi: 10.3390/nu14030668 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Tarawneh R, Penhos E. The gut microbiome and Alzheimer's disease: complex and bidirectional interactions. Neurosci Biobehav Rev. Elsevier Ltd. 2022;141:22. doi: 10.1016/j.neubiorev.2022.104814 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Silva YP, Bernardi A, Frozza RL. The role of short‐chain fatty acids from gut microbiota in gut‐brain communication. Front Endocrinol (Lausanne). Frontiers Media S.A. 2020;11. doi: 10.3389/fendo.2020.00025 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Van Hul M, Neyrinck AM, Everard A, et al. Role of the intestinal microbiota in contributing to weight disorders and associated comorbidities. Clin Microbiol Rev. American Society for Microbiology. 2024;37(3). doi: 10.1128/cmr.00045-23 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Zhang ZX, Peng J, Ding WW. Lipocalin‐2 and intestinal diseases. World J Gastroenterol. Baishideng Publishing Group Inc. 2024;30(46):4864‐4879. doi: 10.3748/wjg.v30.i46.4864 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Pekdemir B, Raposo A, Saraiva A, et al. Mechanisms and potential benefits of neuroprotective agents in neurological health. Nutrients. Multidisciplinary Digital Publishing Institute (MDPI). 2024;16(24). doi: 10.3390/nu16244368 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Kang JW, Khatib LA, Heston MB, et al. Gut microbiome compositional and functional features associate with Alzheimer's disease pathology. Alzheimer's Dementia. 2025;21(7). doi: 10.1002/alz.70417 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Fan KC, Lin CC, Chiu YL, Koh SH, Liu YC, Chuang YF. Compositional and functional gut microbiota alterations in mild cognitive impairment: links to Alzheimer's disease pathology. Alzheimer's Research and Therapy. 2025;17(1). doi: 10.1186/s13195-025-01769-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Sepúlveda‐Rivera V, Olivieri‐Henry G, Morales‐González H, et al. Gut microbiota distinguishes aging hispanics with Alzheimer's disease: associations with cognitive impairment and severity. Sci Rep. 2025;15(1). doi: 10.1038/s41598-025-13262-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Rouskas K, Mamalaki E, Ntanasi E, et al. Gut microbiome alterations in mild cognitive impairment: findings from the ALBION greek cohort. Microorganisms. 2025;13(9). doi: 10.3390/microorganisms13092112 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Mateo D, Carrión N, Cabrera C, et al. Gut microbiota alterations in Alzheimer's disease: relation with cognitive impairment and mediterranean lifestyle. Microorganisms. 2024;12(10). doi: 10.3390/microorganisms12102046 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Lewis N, Villani A, Lagopoulos J. Gut dysbiosis as a driver of neuroinflammation in attention‐deficit/hyperactivity disorder: a review of current evidence. Neuroscience. 2025;569:298‐321. doi: 10.1016/j.neuroscience.2025.01.031 [DOI] [PubMed] [Google Scholar]
- 26. Berendsen AM, Kang JH, Feskens EJM, de Groot CPGM, Grodstein F, van de Rest O. Association of long‐term adherence to the mind diet with cognitive function and cognitive decline in American women. J Nutr Health Aging. 2018;22(2):222‐229. doi: 10.1007/s12603-017-0909-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Dissanayaka DMS, Jayasena V, Rainey‐Smith SR, Martins RN, Fernando WMADB. The role of diet and gut microbiota in Alzheimer's disease. Nutrients. Multidisciplinary Digital Publishing Institute (MDPI). 2024;16(3). doi: 10.3390/nu16030412 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Bartsch M, Hahn A, Berkemeyer S. Bridging the gap from enterotypes to personalized dietary recommendations: a metabolomics perspective on microbiome research. Metabolites. Multidisciplinary Digital Publishing Institute (MDPI). 2023;13(12). doi: 10.3390/metabo13121182 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Dai L, Lin X, Wang S, Gao Y, He F. The mediterranean‐dietary approaches to stop hypertension diet intervention for neurodegenerative delay (MIND) diet: a bibliometric analysis. Front Nutr. 2024;11. doi: 10.3389/fnut.2024.1348808 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Meiners F, Ortega‐Matienzo A, Fuellen G, Barrantes I. Gut microbiome‐mediated health effects of fiber and polyphenol‐rich dietary interventions. Front Nutr. Frontiers Media SA. 2025;12. doi: 10.3389/fnut.2025.1647740 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Fu J, Zheng Y, Gao Y, Xu W. Dietary fiber intake and gut microbiota in human health. Microorganisms. MDPI. 2022;10(12). doi: 10.3390/microorganisms10122507 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Bolte LA, Vich Vila A, Imhann F, et al. Long‐term dietary patterns are associated with pro‐inflammatory and anti‐inflammatory features of the gut microbiome. Gut. 2021;70(7):1287‐1298. doi: 10.1136/gutjnl-2020-322670 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Holscher HD. Dietary fiber and prebiotics and the gastrointestinal microbiota. Gut Microbes. Taylor and Francis Inc. 2017;8(2):172‐184. doi: 10.1080/19490976.2017.1290756 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Grace‐Farfaglia P, Frazier H, Iversen MD. Essential factors for a healthy microbiome: a scoping review. Int J Environ Res Public Health. MDPI. 2022;19(14). doi: 10.3390/ijerph19148361 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Zhang F, Fan D, lin Huang J, Zuo T. The gut microbiome: linking dietary fiber to inflammatory diseases. Medicine in Microecology. Elsevier B.V. 2022;14. doi: 10.1016/j.medmic.2022.100070 [DOI] [Google Scholar]
- 36. Agarwal P, Ford CN, Leurgans SE, et al. Dietary sugar intake associated with a higher risk of dementia in community‐dwelling older adults. Journal of Alzheimer's Disease. 2023;95(4):1417‐1425. doi: 10.3233/JAD-230013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Rob M, Yousef M, Lakshmanan AP, Mahboob A, Terranegra A, Chaari A. Microbial signatures and therapeutic strategies in neurodegenerative diseases. Biomedicine and Pharmacotherapy. Elsevier Masson s.r.l. 2025;184. doi: 10.1016/j.biopha.2025.117905 [DOI] [PubMed] [Google Scholar]
- 38. Durazzo TC, Mattsson N, Weiner MW. Smoking and increased Alzheimer's disease risk: a review of potential mechanisms. Alzheimer's & Dementia. 2014;10 : S122–S145. doi: 10.1016/j.jalz.2014.04.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Suresh S, Singh S A, Rushendran R, Vellapandian C, Prajapati B. Alzheimer's disease: the role of extrinsic factors in its development, an investigation of the environmental enigma. Front Neurol. Frontiers Media SA. 2023;14. doi: 10.3389/fneur.2023.1303111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Ivarsson Orrelid C, Rosberg O, Weiner S, et al. Applying machine learning to high‐dimensional proteomics datasets for the identification of Alzheimer's disease biomarkers. Fluids Barriers CNS. 2025;22(1). doi: 10.1186/s12987-025-00634-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Basanta‐Torres S, Rivas‐Fernández MÁ, Galdo‐Alvarez S. Artificial Intelligence for Alzheimer's disease diagnosis through T1‐weighted MRI: A systematic review. Comput Biol Med. Elsevier Ltd. 2025;197. doi: 10.1016/j.compbiomed.2025.111028 [DOI] [PubMed] [Google Scholar]
- 42. Taiyeb Khosroshahi M, Morsali S, Gharakhanlou S, et al. Explainable artificial intelligence in neuroimaging of Alzheimer's disease. Diagnostics. Multidisciplinary Digital Publishing Institute (MDPI). 2025;15(5). doi: 10.3390/diagnostics15050612 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Li Y, Chang X, Wu J, Liu Y, Wang H, Zhang Y. Machine learning in early diagnosis of neurological diseases: advancing accuracy and overcoming challenges. Brain Network Disorders. 2025;1(3):132‐139. doi: 10.1016/j.bnd.2025.04.001 [DOI] [Google Scholar]
- 44. Ahmadzadeh M, Christie GJ, Cosco TD, et al. Neuroimaging and machine learning for studying the pathways from mild cognitive impairment to alzheimer's disease: a systematic review. BMC Neurol. 2023;23(1). doi: 10.1186/s12883-023-03323-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Mirabnahrazam G, Ma D, Lee S, et al. Machine learning based multimodal neuroimaging genomics dementia score for predicting future conversion to Alzheimer's disease. Journal of Alzheimer's Disease. 2022;87(3):1345‐1365. doi: 10.3233/JAD-220021 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Huang W, Shu N. AI‐powered integration of multimodal imaging in precision medicine for neuropsychiatric disorders. Cell Rep Med. Cell Press. 2025;6(5). doi: 10.1016/j.xcrm.2025.102132 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Alzakari SA, Allinjawi A, Aldrees A, et al. Early detection of autism spectrum disorder using explainable AI and optimized teaching strategies. J Neurosci Methods. 2025;413. doi: 10.1016/j.jneumeth.2024.110315 [DOI] [PubMed] [Google Scholar]
- 48. Peck FC, Gabard‐Durnam LJ, Wilkinson CL, Bosl W, Tager‐Flusberg H, Nelson CA. Prediction of autism spectrum disorder diagnosis using nonlinear measures of language‐related EEG at 6 and 12 months. J Neurodev Disord. 2021;13(1):1–13. doi: 10.1186/s11689-021-09405-x [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supporting Information
Supporting Information
