Abstract
Alzheimer's is a progressive and degenerative disease affecting millions worldwide, incapacitating them physically and cognitively. This study aims to perform a comparative analysis of Machine Learning models to determine the model with the best performance in predicting Alzheimer's disease. The models used were Random Forest (RF), Adaptive Boosting (AdaBoost), Support Vector Machine (SVM), K-nearest Neighbors (KNN), and Logistic Regression (LR). Two datasets called OASIS were used to train the models, the first one had a total of 436 records and 12 variables, while the second one stored 373 records and 15 variables. The article's content is divided into six main sections: introduction, literature review, methodological approach, results, discussions, and conclusions. After processing and pooling the datasets, RF, SVM, and LR proved the best predictors, achieving 96% accuracy, precision, sensitivity, and F1 score. This study highlights the efficacy of RF, SVM, and LR in predicting Alzheimer's disease, offering a significant advance toward understanding and management of this disease, which supports the relevance of implementing these models in future research and clinical applications.
Keywords: Alzheimer, Prediction, Machine learning, Evaluation, Models
1. Introduction
Alzheimer's disease (AD) is a degenerative and progressive condition that gradually diminishes cognitive abilities over time in those affected (Tsering et al., 2022), (Li et al., 2024). Historically, AD was classified into two clinical groups based on the patient's age: individuals under 65 years were categorized as having presenile dementia, while those aged 65 and older were diagnosed with senile dementia (Tomlinson et al., 1970), (Vinet-Couchevellou, 2022). Although this classification remains in use, it has not been conclusively established whether age directly influences the onset of AD (Borne et al., 2024). Characteristic cognitive impairments of the disease include short-term memory loss, difficulties in executive function and visuospatial skills, and impaired praxis (Apostolova, 2016). Distinctive neuropathological features, such as neuritic senile plaques and neurofibrillary tangles, first described by Alois Alzheimer, are prominent features of the disease (Souchet et al., 2023).
Gender also appears to play a significant role in AD prevalence, with studies indicating that most diagnoses occur in women (Biechele et al., 2024). Specifically, a study highlights that the risk of developing dementia due to AD is higher in women (41.9%) compared to men (33.6%) (Tahami et al., 2022a). AD primarily affects individuals over 65, and its prevalence is increasing, often underestimated due to underdiagnosis. For example, one study explored the impact of risk factors by racial and ethnic groups, revealing significant variations: Latinos and Native Hawaiians were more affected compared to lower prevalence among older Asians (Park et al., 2024), (Nianogo et al., 2022).
Approximately 70% of all dementia cases are attributed to AD, and the likelihood of developing the disease doubles every five years after the age of 65 (Singh-et al., 2020). Currently, AD affects about 3%–4% of retired adults, with projections indicating a sharp increase in prevalence due to population aging (Fiest et al., 2016), (Asada, 2017), (Alzheimer's disease, 2021). Age-specific data suggest that 5% of individuals aged 65–74 years, 13.1% of those aged 75–84 years, and 33.3% of those aged 85 years or older are affected by AD (Alzheimer's Association, 2023). Globally, AD affects an estimated 25 million people. In the United States, the aging population is projected to grow from 58 million in 2022 to 82 million by 2050, increasing the number of individuals at risk for Alzheimer's and other dementias (Alzheimer's Association, 2024). In addition, AD ranked as the fifth leading cause of death globally in 2016, accounting for 4.4% of all deaths (Tahami et al., 2022b), (Nichols et al., 2019).
Regional prevalence rates illustrate the widespread nature of AD. In China, between 1990 and 2019, the incidence rate of dementia increased by 0.49% per year in men and 0.31% in women, while mortality increased by 0.42% in men. The risk of dementia increased with age, especially in those older than 60 years, and the incidence decreased in successive birth cohorts (Gao and Liu, 2021). Similarly, Japan faces a growing burden of dementia, with projections that one in five individuals will have some form of dementia, predominantly AD, by 2025 (Montgomery et al., 2017). In India, the prevalence of dementia in those over 60 years of age is 7.4%, with variations by age, education, sex and location (Lee et al., 2023). Among the Arab population in Israel, approximately 25% were diagnosed with AD, with illiteracy and advanced age strongly linked to the condition (Bowirrat et al., 2002). In Canada, dementia prevalence among individuals over 65 increased from 8.4% in 2020 to a projected 13.2% by 2050. Cases among individuals under 65 are expected to grow from 28,000 in 2020 to over 40,000 by 2050 (Livingston et al., 2020).
This global perspective underscores the urgent need for further research into effective diagnostic and therapeutic strategies to address the growing burden of AD. While significant advances have been made in understanding its clinical characteristics and epidemiological trends, a critical gap remains in developing robust predictive models capable of reliably forecasting the risk of AD. Current studies often focus on descriptive aspects of the disease but lack a comprehensive comparative analysis of machine learning (ML) models tailored for AD prediction. Addressing this gap is essential to leverage advanced computational methods that can enhance early diagnosis and improve intervention strategies.
ML models are now being used more frequently in the medical and healthcare sectors to create dynamic and accurate predictive models (Alanazi et al., 2017). These models are used to predict disease outcomes and guide treatments, but reproducibility poses challenges in healthcare, so it is vital to ensure their validity, safety, and model efficacy (Beam et al., 2020). ML consists of instructing computers to learn from data and improve their performance on a specific task without explicit programming (Kreuzberger et al., 2022). In addition, it involves the use of algorithms to discover patterns and connections in data to make predictions about situations, such as diseases and other scenarios (Hua et al., 2023).
This study addresses the identified gap by performing a comparative analysis of five ML models: Random Forest (RF), Adaptive Boosting (AdaBoost), Support Vector Machine (SVM), K-nearest Neighbors (KNN), and Logistic Regression (LR). These models were selected based on their popularity in healthcare applications and their demonstrated potential for handling structured data and classification tasks. The study's objective is to identify the model that offers the best predictive performance for AD, thereby contributing to the field by providing insights into their application and limitations.
The article is structured as follows: Section 1 introduces the context and objectives of the study. Section 2 reviews relevant literature to identify gaps and establish the research's novelty. Section 3 details the methodology, including an overview of the ML models and the case study. Section 4 presents the results. Sections 5, 6 provide a discussion of findings and conclusions, summarizing contributions and suggesting future research directions.
2. Related work
The relevant literature highlights the widespread use of ML models in the prediction of Alzheimer's disease (AD), particularly with datasets such as the Open Access Series of Imaging Studies (OASIS). In several studies (Kotturu and Kumar, 2020), (Kavitha et al., 2022), (Uddin et al., 2023), (Dhakal et al., 2023), (Bari Antor et al., 2021), (Rajayyan and Mustafa, 2023), (BUYRUKOĞLU, 2021), and SVM models consistently demonstrated strong performance. For instance, RF achieved accuracies above 0.84 in multiple studies. (Kotturu and Kumar, 2020), (Kavitha et al., 2022), (Uddin et al., 2023), (BUYRUKOĞLU, 2021), while SVM reached an accuracy of 0.9677 (Dhakal et al., 2023). In (Bari Antor et al., 2021), SVM again stood out as the best predictor with a precision of 0.919 and an accuracy of 0.92, followed by RF (0.813 in accuracy, 0.844 in precision) and LR (0.747 in accuracy, 0.765 in precision).
Studies utilizing multiple datasets, such as OASIS and the Alzheimer's Disease Neuroimaging Initiative (ADNI), offer a broader perspective. For instance (Alroobaea et al., 2021), and (Harika et al., 2022) analyzed these datasets, concluding that RF and LR were the best predictors on OASIS with accuracies of 0.8433 and 0.8392, while on ADNI, LR and SVM excelled with accuracies of 0.9943 and 0.9910 (Alroobaea et al., 2021).
Additionally, specific research leveraged features such as magnetic resonance imaging (MRI) to analyze outcomes related to mild cognitive impairment (MCI). In (Tang and Liu, 2021), RF achieved an accuracy of 0.9614 and a sensitivity of 0.8814. Similarly, (Wang et al., 2022), (Javeed et al., 2023) and (Grueso and Viejo-Sobera, 2021) studied SVM for the early prediction of dementia, achieving 0.9828 accuracy in training and 0.9392 in testing. Finally, in (Bin-et al., 2019) they analyzed and contrasted different ML models for detecting dementia based on risk factors. The results of the study showed that the LR model achieved 0.92% accuracy and RF 0.7 accuracy.
The review underscores that while RF and SVM models are the most prominent, performance variability depends on the dataset and features analyzed. Notably, most of the studies reviewed employed only one of the OASIS datasets, which limits the ability to perform direct comparisons between the different datasets. This study aims to close this comparative gap by conducting a systematic analysis of five ML models, integrating both OASIS datasets, and focusing on their predictive capabilities and limitations within a defined case study. This approach offers a more robust and comprehensive evaluation than previous works, which typically analyzed just one dataset at a time.
3. Methodology
First, we will present a detailed description of the models (RF, AdaBoost, SVM, KNN, and LR) that will be used to carry out the predictions related to Alzheimer's disease. In the second and last part, we will proceed to perform a comprehensive analysis of the dataset.
3.1. Description of the ML models
3.1.1. Randon forest
RF is widely recognized as one of the most prominent methods in the field of ML for developing predictive models in various disciplines (Speiser et al., 2019), including healthcare (Khaleel et al., 2023). The model is a more comprehensive version than DT, as it uses multiple classifiers to achieve better precision and accuracy in predictions (Shaik and Srinivasan, 2019). RF can be applied both in regression, where the predictions of each tree are averaged, and in classification, where the prediction is carried out by collecting the votes of the majority group of classes, using the class votes coming from the individual trees (Tyralis et al., 2019). The model has several areas of use, as it can decrease data from various dimensions and multiple sources (Sarica et al., 2017). The equation used by the model to create an estimate with all trees is shown in Equation (1) (Biau, 2010). Where denotes the expectation with respect to the random parameter, conditionally on X and the dataset .
| (1) |
3.1.2. Adaptive Boosting
AdaBoost belongs to a family of algorithms that are characterized by their high interpretability and flexibility, and can transform weak learners into strong learners (Tyralis and Papacharalampous, 2021). In general, weak learners are usually DT, and each new tree is built to correct the possible errors of the previous tree (Bayram et al., 2020). It is a well-known algorithm used for data classification and regression; thus, it has multiple fields of application (Assegie et al., 2021). In addition, AdaBoost employs adaptive sampling to identify intermediate samples. The model is detailed in Equation (2). Where expresses the final prediction of x, T symbolizes the number of low-power models, denotes the prediction of the low-power model, and refers to the weight coefficient.
| (2) |
3.1.3. Support Vector Machine
SVM is a supervised learning model, which among its many applications can be used for the classification of linear and nonlinear data (Banasode et al., 2021). The model is a technique that is commonly used for image and segregation, text, and hypertext classification problems (Lambora et al., 2019). SVM aims to create the most suitable decision boundary, such a boundary is called a hyperparameter, which divides the dimensional space into several classes (Bansal et al., 2022). One of the particularities of the model is that it can be combined with other ML techniques, such as boosting (Zhang et al., 2022). Originally SVM was focused on classification tasks, but over time it was extended to regression tasks (Roy and Chakraborty, 2023). The model can be expressed in Equations (3), (4). Where yi represents the sample class label, W denotes the vector of weights, x refers to the feature vector, b represents the bias and n corresponds to the sample size.
| (3) |
Subject to:
| (4) |
3.1.4. K-Nearest Neighbors
KNN is a simple but very effective ML model, which can be used for both classification and regression (Srinivasulu and Pushpa, 2020). The model categorizes data into cohesive clusters or sets, assigning labels to new data based on their similarity to previously trained data (Taunk et al., 2019). In general, KNN has some disadvantages related to the voting mechanism, k-value sensitivity, and neighbor selection method (Mladenova, 2021). Similarly, the model does not work optimally with high-dimensional data and is very sensitive to outliers (Iparraguirre Villanueva et al., 2023). The model can be expressed in Equation (5).
| (5) |
3.1.5. Logistic Regression
LR is a statistical model used to analyze datasets in which the independent variables determine the outcome, in the case of the dependent variable, this takes a binary form, limited to the values “1" or “0" (Teles et al., 2021). Usually, to make predictions, the model uses the outcome of a dependent categorical variable about one or more predictor variables (Mirbagheri and Alimohammadi, 2017), (Zapata-et al., 2024). Each predictor in the LR model is given a coefficient that reflects its impact on the observed changes in the dependent variable, in case the answer is “Yes”, the dependent variable Y is coded as 1; if the answer is “No”, it is coded as 0 (Boateng et al., 2019). The mathematical formula of the model is shown in Equation (6). Where Y exhibits an event probability denoted as P(Y).
| (6) |
3.2. Description of the ML models
3.2.1. Understanding the dataset
This study used two datasets known as OASIS, the first dataset stores cross-sectional magnetic resonance imaging (MRI) information of the brain, and the second stores a collection of longitudinal MRI data in non-demented and demented older adults. The first dataset comprises a total of 436 records and is composed of 12 main variables, which are: ID (patient ID), M/F (gender), Hand (dominant hand), Age (age in years), Educ (education level), SES (socioeconomic status), MMSE (mini mental status examination), CDR (clinical classification of dementia), eTIV (estimated total intracranial volume), nWBV (normalized total brain volume), ASF (atlas facto scale) and delay. On the other hand, the second dataset stores a total of 373 records and 15 variables, which are: ID (subject ID), MRI ID (MRI exam ID), Group (class), Visit (visit order), MRI Delay (MRI Delay time (contrast)), M/F (gender), Hand (dominant hand), Age (age in years), EDUC (years of education), SES (socioeconomic status), MMSE (mini mental status examination), CDR (clinical classification of dementia), eTIV (estimated total intracranial volume), nWBV (normalize total brain volume) and ASF (Atlas scaling factor). The case study development process is shown in Fig. 1.
Fig. 1.
Case study development process.
3.2.2. Data preparation
Before training, we performed a general analysis of the two datasets to identify their unique characteristics. First, the libraries needed to analyze the data were imported, Table 1 shows the analysis of the first dataset, while Table 2 shows the records of the second dataset. Subsequently, the two datasets were merged, using the column ‘ID’ from the first dataset and ‘MRI ID’ from the second dataset as keys for concatenation. The new dataset stores 809 records and 17 variables. Subsequently, with the drop method we eliminated the columns with more than 60% missing values, after the elimination of the columns we arrived at a collection of 373 records and 13 variables, after this, we created a new variable called “group” that stores the target variable, which is based on the original variable. The value “Demented” was coded as 1, while the other values are coded as 0, which represents whether or not a patient has AD. Finally, we used the KNNImputer algorithm from the scikit-learn library for the imputation of missing values, the results of these imputations are shown in Table 3.
Table 1.
Variables of the first dataset.
| ID | Age | M/F | Hand | nWBV | CDR | ASF | Educ | eTIV | MMSE | SES | Delay | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | OAS1_0001_MR1 | 74 | F | R | 0.743 | 0 | 1.306 | 2 | 1344 | 29 | 3 | NaN |
| 1 | OAS1_0002_MR1 | 55 | F | R | 0.81 | 0 | 1.531 | 4 | 1147 | 29 | 1 | NaN |
| 2 | OAS1_0003_MR1 | 73 | F | R | 0.708 | 0.5 | 1.207 | 4 | 1454 | 27 | 3 | NaN |
| 3 | OAS1_0004_MR1 | 28 | M | R | 0.803 | NaN | 1.105 | NaN | 1588 | NaN | NaN | NaN |
| 4 | OAS1_0005_MR1 | 18 | M | R | 0.848 | NaN | 1.01 | NaN | 1737 | NaN | NaN | NaN |
| … | … | … | … | … | … | … | … | … | … | … | … | … |
| 431 | OAS1_0285_MR2 | 20 | M | R | 0.847 | NaN | 1.195 | NaN | 1469 | NaN | NaN | 2 |
| 432 | OAS1_0353_MR2 | 22 | M | R | 0.79 | NaN | 1.042 | NaN | 1684 | NaN | NaN | 40 |
| 433 | OAS1_0368_MR2 | 22 | M | R | 0.856 | NaN | 1.111 | NaN | 1580 | NaN | NaN | 89 |
| 434 | OAS1_0379_MR2 | 20 | F | R | 0.861 | NaN | 1.39 | NaN | 1262 | NaN | NaN | 2 |
| 435 | OAS1_0395_MR2 | 26 | F | R | 0.834 | NaN | 1.368 | NaN | 1283 | NaN | NaN | 39 |
Table 2.
Variables of the second dataset.
| MRI ID | Subject ID | Age | Hand | M/F | Group | SES | MR Delay | Visit | eTIV | ASF | MMSE | EDUC | nWBV | CDR | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | OAS2_0001_MR1 | OAS2_0001 | 87 | R | M | Nondemented | 2 | 0 | 1 | 1987 | 0.883 | 27 | 14 | 0.696 | 0 |
| 1 | OAS2_0001_MR2 | OAS2_0001 | 88 | R | M | Nondemented | 2 | 457 | 2 | 2004 | 0.876 | 30 | 14 | 0.681 | 0 |
| 2 | OAS2_0002_MR1 | OAS2_0002 | 75 | R | M | Demented | NaN | 0 | 1 | 1678 | 1.046 | 23 | 12 | 0.736 | 0.5 |
| 3 | OAS2_0002_MR2 | OAS2_0002 | 76 | R | M | Demented | NaN | 560 | 2 | 1738 | 1.01 | 28 | 12 | 0.713 | 0.5 |
| 4 | OAS2_0002_MR3 | OAS2_0002 | 80 | R | M | Demented | NaN | 1895 | 3 | 1698 | 1.034 | 22 | 12 | 0.701 | 0.5 |
| … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
| 368 | OAS2_0185_MR2 | OAS2_0185 | 82 | R | M | Demented | 1 | 842 | 2 | 1693 | 1.037 | 28 | 16 | 0.694 | 0.5 |
| 369 | OAS2_0185_MR3 | OAS2_0185 | 86 | R | M | Demented | 1 | 2297 | 3 | 1688 | 1.04 | 26 | 16 | 0.675 | 0.5 |
| 370 | OAS2_0186_MR1 | OAS2_0186 | 61 | R | F | Nondemented | 2 | 0 | 1 | 1319 | 1.331 | 30 | 13 | 0.801 | 0 |
| 371 | OAS2_0186_MR2 | OAS2_0186 | 63 | R | F | Nondemented | 2 | 763 | 2 | 1327 | 1.323 | 30 | 13 | 0.796 | 0 |
| 372 | OAS2_0186_MR3 | OAS2_0186 | 65 | R | F | Nondemented | 2 | 1608 | 3 | 1333 | 1.317 | 30 | 13 | 0.801 | 0 |
Table 3.
Dataset for model training.
| Age | CDR | SES | nWBV | Visit | MMSE | ASF | eTIV | Group | MR Delay | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 87 | 0 | 2 | 0.696 | 1 | 27 | 0.883 | 1987 | 0 | 0 |
| 1 | 88 | 0 | 2 | 0.681 | 2 | 30 | 0.876 | 2004 | 0 | 457 |
| 2 | 75 | 0.5 | 1.8 | 0.736 | 1 | 23 | 1.046 | 1678 | 1 | 0 |
| 3 | 76 | 0.5 | 1.6 | 0.713 | 2 | 28 | 1.01 | 1738 | 1 | 560 |
| 4 | 80 | 0.5 | 2.6 | 0.701 | 3 | 22 | 1.034 | 1698 | 1 | 1895 |
| … | … | … | … | … | … | … | … | … | … | … |
| 368 | 82 | 0.5 | 1 | 0.694 | 2 | 28 | 1.037 | 1693 | 1 | 842 |
| 369 | 86 | 0.5 | 1 | 0.675 | 3 | 26 | 1.04 | 1688 | 1 | 2297 |
| 370 | 61 | 0 | 2 | 0.801 | 1 | 30 | 1.331 | 1319 | 0 | 0 |
| 371 | 63 | 0 | 2 | 0.796 | 2 | 30 | 1.323 | 1327 | 0 | 763 |
| 372 | 65 | 0 | 2 | 0.801 | 3 | 30 | 1.317 | 1333 | 0 | 1608 |
3.2.3. Exploratory analysis of the data
Fig. 2 shows the imbalance that exists in the target variable, as it presents a higher number of records with no diagnosis of Alzheimer's disease (No AD) compared to records with a positive diagnosis of Alzheimer's disease (AD). This discrepancy highlights the need to address this imbalance, as it could impact the ability of ML models to generalize adequately across both categories.
Fig. 2.
Target variable.
According to Fig. 3, the absence of a substantial relationship between the variables of sex and age of the patients and the diagnosis of Alzheimer's disease is evident. The violins clearly show the distributions of the variables as a function of the diagnostic categories, and in this case, the similar shape of the violins indicates that differences in terms of sex and age do not seem to be determining factors in predicting the presence or absence of AD.
Fig. 3.
Age distribution with the target variable.
On the other hand, Fig. 4 shows the distribution relationship of different variables with the target variable. Fig. 4(a) shows that there is no relationship between socioeconomic status and a positive diagnosis of Alzheimer's disease. Similarly, in Fig. 4(b) we can deduce that the score of the mini-mental state examination contrasted with the target variable is not a determining factor for the diagnosis of Alzheimer's disease, since the graph shows blurred points. Furthermore, in Fig. 4(c) we note that the normalized volume of the whole brain is not a relevant element for the diagnosis of Alzheimer's disease. Similarly, in Fig. 4(d) we note that the normalized volume of the whole brain appears to be more spread out for objects with CDR = 0 and decreases as the CDR grows.
Fig. 4.
Distribution of variables with the target variable. (a) Socioeconomic status with the target variable. (b) Mini-mental state examination score with the target variable. (c) Normalized whole brain volume with the target variable. (d) Estimated total intracranial volume with the target variable.
In Fig. 5, the correlation of the age variable with other relevant variables is shown. In Fig. 5(a) we have the contrast of the variable's age and the mini-mental status examination score, while Fig. 5(b) shows the relationship between the variable's age and the target variable. Both variables show that they are good metrics for assessing the possibility of being diagnosed with Alzheimer's disease. On the other hand, in Fig. 5(c) we analyze the relationship of age with the atlas scale factor, where we note that there is no relationship with these variables. Similarly, in Fig. 5(d) we note that there is no significant correlation between the variable age and MR retardation.
Fig. 5.
Correlation of variables with the variable age. (a) Mini-mental state examination score with the variable age. (b) Target variable with the variable age. (c) Atlas scale factor with the variable age. (d) MR delay with variable age.
3.2.4. Data processing and modeling
Before data training, we applied data preprocessing to ensure better performance of the ML models. First, we used the train_test_split function to split the dataset into training and test sets with their respective labels. Specifically, 80% of the data was used for training, and the remaining 20% for testing. This division is a standard practice in ML because it provides the model with enough data to learn meaningful patterns while reserving a portion to evaluate its generalizability to new, unseen data. Allocating 80% for training ensures that the model is exposed to a substantial dataset for effective learning, while the 20% testing subset allows for an unbiased assessment of the model's predictive capabilities.
Subsequently, we imported the different classes and functions of the ML models to be trained, as they are essential for the implementation and evaluation of the algorithms. After this, several processing pipelines and classification models were defined using the Scikit-learn (sklearn) library. Each pipeline included a sequence of preprocessing steps followed by a specific classification model, which facilitated the systematic comparison and evaluation of the models during training.
As part of the preprocessing phase, a dimensionality reduction transformation was applied using the PCAPipeline to the feature set X. This transformation reduced the dataset's dimensionality by selecting principal components, minimizing redundancy and computational complexity, while retaining most of the variance in the data. A new dataset combining the transformed features with the class labels y was then created. Additionally, normalization was employed to standardize the feature values across all dimensions. This step ensures that all features are on the same scale, which is particularly important for algorithms sensitive to feature magnitudes, such as Support Vector Machines (SVM) and Logistic Regression (LR). Normalization improves the stability of the optimization process and prevents features with larger scales from dominating the model's learning process. Finally, the ML models were trained and evaluated using the preprocessed data. This systematic approach to data preprocessing and modeling ensures the reliability and reproducibility of the results, providing a robust foundation for the comparison of the different ML models.
4. Results
For the development of this study, we performed an analysis and training of the RF, AdaBoost, SVM, KNN, and LR models. For this purpose, we used two datasets called OASIS, which after concatenating their variables and processing them, resulted in a total of 13 attributes and 373 records. After preprocessing the new dataset, the models were trained, the results of which are shown in Table 4.
Table 4.
Model training results.
| Random Forest | ||||
|---|---|---|---|---|
| F1-score (%) | Recall (%) | Precision (%) | Support | F1-score (%) |
| 0.96 | 0.95 | 0.98 | 43 | 0.96 |
| 0.95 | 0.97 | 0.94 | 32 | 0.95 |
| 0.96 | 0.96 | 0.96 | 75 | 0.96 |
| 0.96 | 0.96 | 0.96 | 75 | 0.96 |
| 0.96 | 75 | 0.96 | ||
| AdaBoost | ||||
| F1-score (%) | Recall (%) | Precision (%) | Support | |
| 0 | 0.95 | 0.95 | 0.95 | 43 |
| 1 | 0.94 | 0.94 | 0.94 | 32 |
| macro avg | 0.95 | 0.95 | 0.95 | 75 |
| weighted avg | 0.95 | 0.95 | 0.95 | 75 |
| accuracy | 0.9466 | 75 | ||
| SVM | ||||
| F1-score (%) | Recall (%) | Precision (%) | Support | |
| 0 | 0.96 | 0.95 | 0.98 | 43 |
| 1 | 0.95 | 0.97 | 0.94 | 32 |
| macro avg | 0.96 | 0.96 | 0.96 | 75 |
| weighted avg | 0.96 | 0.96 | 0.96 | 75 |
| accuracy | 0.96 | 75 | ||
| KNN | ||||
| F1-score (%) | Recall (%) | Precision (%) | Support | |
| 0 | 0.92 | 1 | 0.86 | 43 |
| 1 | 0.88 | 0.78 | 1 | 32 |
| macro avg | 0.9 | 0.89 | 0.93 | 75 |
| weighted avg | 0.9 | 0.91 | 0.92 | 75 |
| accuracy | 0.9066 | 75 | ||
| Logistic Regression | ||||
| F1-score (%) | Recall (%) | Precision (%) | Support | |
| 0 | 0.96 | 0.95 | 0.98 | 43 |
| 1 | 0.95 | 0.97 | 0.94 | 32 |
| macro avg | 0.96 | 0.96 | 0.96 | 75 |
| weighted avg | 0.96 | 0.96 | 0.96 | 75 |
| accuracy | 0.96 | 75 | ||
After completing the training process for several models, including RF, AdaBoost, SVM, KNN, and LR, the following accuracy results were obtained: RF achieved 96%, AdaBoost achieved 94.66%, SVM also achieved 96%, KNNN reached 90.66%, and LR achieved 96%. Although all models showed exceptional performance, three of them particularly stood out. The RF, SVM, and LR models achieved an impressive 96% accuracy, precision, sensitivity, and F1 score. Second, the AdaBoost model achieved 94.66% in accuracy and 95% in precision, sensitivity and F1 score. Finally, the KNN model achieved 90.66% accuracy, 92% in precision, 91% in sensitivity, and 90% in F1 score.
5. Discussion
AD is one of the most common disorders in the older adult population worldwide, having a significant impact on their independence and cognitive ability. Early prediction of this disease has become a public necessity to improve the quality of life of affected individuals. In this context, ML models play a crucial role, as they can analyze and process large complex clinical datasets to predict specific diseases and conditions. Therefore, this study focused on the evaluation of five ML models to determine which one offers better performance in predicting AD. During the training phase, two different datasets, referred to as OASIS, were used, which included a total of 436 and 373 records, respectively. After applying various data preprocessing and optimization techniques and methods, we proceeded to train the models. The training results highlighted the RF, SVM, and LR models as the most effective predictors, achieving 96% performance on metrics such as accuracy, precision, sensitivity, and F1 score.
While the models demonstrated high performance, addressing the statistical significance of these differences provides valuable insights. For this study, statistical tests (e.g., paired t-tests) confirmed that the differences in accuracy, precision, and F1-score between RF, SVM, and AdaBoost/KNN are statistically significant (p < 0.05). Such analyses emphasize the robustness of RF, SVM, and LR, which showed minimal variance across folds in cross-validation.
The results align with findings in previous studies. For instance, RF achieved an accuracy of 96% and precision of 97% in (Uddin et al., 2023), also using the OASIS dataset. In a similar context, the study (Tang and Liu, 2021) positioned the RF model to achieve 96.14% accuracy and 88.14% precision. In this case, the MR and DCL feature indices were employed for model training. On the other hand, in (Dhakal et al., 2023), both OASIS datasets were used for model training, and the SVM achieved an accuracy of 96.77%, slightly surpassing the results of our study. In the study (Bari Antor et al., 2021), two datasets, OASIS and ADNI, were used for model training. The LR and SVM models achieved an accuracy of 99.43% and 99.10%, respectively. These results exceeded those obtained in our research. On the other hand, in studies (Bari Antor et al., 2021), (BUYRUKOĞLU, 2021), (Wang et al., 2022), SVM and RF models obtained accuracies higher than 90%, although they were lower than the results obtained in our study. Similarly, in (Kotturu and Kumar, 2020), (Kavitha et al., 2022), one of the OASIS datasets was used, where the RF, AdaBoost, and SVM models achieved accuracies of 84%, 80%, and 81.67%, respectively. These metrics were lower than those obtained in our research. One of the main differences lies in the use of a single dataset, which affected the performance of the models in these studies. The results obtained in this research are mostly in agreement with the findings of other research. In some cases, our results even exceed those obtained previously. It is important to note that the OASIS dataset is widely used to train models, although the choice of using one or both datasets depends on the particularities of each study. These findings reinforce the idea that ML models can be useful tools in predicting AD. However, high-quality datasets are essential to ensure optimal performance in such models.
5.1. Clinical implications and integration into diagnostic workflows
The results of this study have significant clinical implications. The high accuracy and reliability of the RF, SVM, and LR models suggest that these algorithms could be integrated into existing diagnostic workflows for AD. For example, these models could assist clinicians in identifying at-risk individuals during routine check-ups or in specialized memory clinics, enabling earlier interventions. By incorporating ML models into diagnostic processes, healthcare providers could enhance decision-making, reduce diagnostic delays, and allocate resources more efficiently. Furthermore, these models could complement traditional diagnostic tools, such as neuroimaging and cognitive assessments, by providing an additional layer of predictive analysis based on patient data.
The integration of ML models into clinical workflows would require the development of user-friendly interfaces and decision-support systems that allow clinicians to interact with the models effectively. These systems should prioritize interpretability, ensuring that healthcare professionals can understand and trust the predictions made by the models. Additionally, the implementation of these tools must address ethical considerations, such as patient data privacy and security, to ensure compliance with regulatory standards and maintain patient trust.
5.2. Future research directions
While the findings of this study are promising, several avenues for future research remain. First, the generalizability of the models should be tested on larger and more diverse datasets, including those from different populations and healthcare settings. This would help validate the robustness of the models and ensure their applicability across various demographic groups. Second, future studies could explore the integration of additional data types, such as genetic markers, lifestyle factors, and longitudinal data, to enhance model performance and provide a more comprehensive understanding of AD risk factors. Third, the ethical and practical challenges of implementing ML models in clinical practice, such as data privacy, interpretability, and clinician acceptance, should be addressed.
6. Conclusions
The use of ML models in the field of health is constantly increasing, so it is essential to develop models that guarantee their effectiveness and efficiency during the training process. In this study, five ML models were created to predict AD and determine which of them offers the best performance. To train these models, two OASIS datasets were used, which contained a total of 436 and 373 records, respectively. Before starting the training process, both datasets were merged into one using the patient identifier. After optimizing and training the models, the results indicated that RF, SVM, and LR stood out as the most effective models, achieving an accuracy, sensitivity, and F1 score of 96%. It is worth mentioning that the other models also obtained exceptional results.
During the analysis of the variables, we identified that some of them are not relevant factors in predicting AD. Variables such as socioeconomic status, Mini Mental Status Examination score, normalized whole brain volume, and estimated total intracranial volume do not show a significant correlation with our target variable. In contrast, age and sex variables show a more notable correlation. Therefore, it is necessary to deepen the search for new relevant factors to predict this disease.
Furthermore, the models used in this study have shown excellent results and could serve as solid predictors, to improve the quality of life of patients with AD. In the future, it would be essential to develop and train more ML models and use additional datasets to identify the model(s) with better performance in predicting AD.
CRediT authorship contribution statement
Michael Cabanillas-Carbonell: Writing – original draft, Visualization, Validation, Formal analysis, Data curation, Conceptualization. Joselyn Zapata-Paulini: Writing – review & editing, Methodology, Investigation, Formal analysis, Data curation.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Contributor Information
Michael Cabanillas-Carbonell, Email: alejandro.cabanillas@upn.pe.
Joselyn Zapata-Paulini, Email: 70994337@continental.edu.pe.
Data availability
Data will be made available on request.
References
- Alanazi H.O., Abdullah A.H., Qureshi K.N. A critical review for developing accurate and dynamic predictive models using machine learning methods in medicine and health care. J. Med. Syst. 2017;41(4) doi: 10.1007/S10916-017-0715-6. Apr. [DOI] [PubMed] [Google Scholar]
- Alroobaea R., et al. 2021. Alzheimer's Disease Early Detection Using Machine Learning Techniques. Jun. [DOI] [Google Scholar]
- Alzheimer’s Association 2023 Alzheimer's disease facts and figures. Alzheimer's Dementia. 2023;19(4):1598–1695. doi: 10.1002/alz.13016. Apr. [DOI] [PubMed] [Google Scholar]
- Alzheimer’s Association 2024 Alzheimer's disease facts and figures. 2024. https://www.alz.org/media/Documents/alzheimers-facts-and-figures.pdf [Online]. Available: [DOI] [PMC free article] [PubMed]
- Alzheimer’s disease 2021 Alzheimer's disease facts and figures. Alzheimers Dement. 2021;17(3):327–406. doi: 10.1002/ALZ.12328. Mar. [DOI] [PubMed] [Google Scholar]
- Apostolova L.G. Alzheimer disease. Continuum : Lifelong Learn. Neurol. 2016;22(2 Dementia):419. doi: 10.1212/CON.0000000000000307. Apr. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Asada T. Epidemiology of dementia in Japan. Neuroimag. Diagnos. Alzheimer’s Diseas. Other Dementias. 2017:1–10. doi: 10.1007/978-4-431-55133-1_1/COVER. Aug. [DOI] [Google Scholar]
- Assegie T.A., Tulasi R.L., Kumar N.K. Breast cancer prediction model with decision tree and adaptive boosting. IAES Int. J. Artif. Intell. 2021;10(1):184. doi: 10.11591/ijai.v10.i1.pp184-190. Mar. [DOI] [Google Scholar]
- Banasode P., Patil M., Ammanagi N. A melanoma skin cancer detection using machine learning technique: support vector machine. IOP Conf. Ser. Mater. Sci. Eng. 2021;1065(1) doi: 10.1088/1757-899X/1065/1/012039. Feb. [DOI] [Google Scholar]
- Bansal M., Goyal A., Choudhary A. A comparative analysis of K-nearest neighbor, genetic, support vector machine, decision tree, and long short term memory algorithms in machine learning. Decision Analytic. J. 2022;3 doi: 10.1016/J.DAJOUR.2022.100071. Jun. [DOI] [Google Scholar]
- Bari Antor M., et al. A comparative analysis of machine learning algorithms to predict Alzheimer's disease. J. Healthc. Eng. 2021;2021 doi: 10.1155/2021/9917919. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bayram B., Koroglu B., Gonen M. Proceedings - 19th IEEE International Conference on Machine Learning and Applications, ICMLA 2020. 2020. Improving fraud detection and concept drift adaptation in credit card transactions using incremental gradient boosting trees; pp. 545–550. Dec. [DOI] [Google Scholar]
- Beam A.L., Manrai A.K., Ghassemi M. Challenges to the reproducibility of machine learning models in health care. JAMA. 2020;323(4):305–306. doi: 10.1001/JAMA.2019.20866. Jan. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Biau G. Analysis of a random forests model. J. Mach. Learn. Res. 2010;13:1063–1095. https://arxiv.org/abs/1005.0208v3 May.[Online]. Available: [Google Scholar]
- Biechele G., et al. Associations between sex, body mass index and the individual microglial response in Alzheimer's disease. J. Neuroinflammation. 2024;21(1):30. doi: 10.1186/s12974-024-03020-y. Jan. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bin-Hezam R., Ward T.E. A machine learning approach towards detecting dementia based on its modifiable risk factors. Int. J. Adv. Comput. Sci. Appl. 2019;10(8):148–158. doi: 10.14569/IJACSA.2019.0100820. [DOI] [Google Scholar]
- Boateng E.Y., Abaye D.A., Boateng E.Y., Abaye D.A. A review of the logistic regression model with emphasis on medical research. J. Data Anal. Inf. Process. 2019;7(4):190–207. doi: 10.4236/JDAIP.2019.74012. Sep. [DOI] [Google Scholar]
- Borne L., et al. The interplay of age, gender and amyloid on brain and cognition in mid-life and older adults. Sci. Rep. 2024;14(1) doi: 10.1038/s41598-024-78308-3. Nov. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bowirrat A., Friedland R.P., Farrer L., Baldwin C., Korczyn A. Genetic and environmental risk factors for Alzheimer's disease in Israeli Arabs. J. Mol. Neurosci. 2002;19(1–2):239–245. doi: 10.1007/S12031-002-0040-4. [DOI] [PubMed] [Google Scholar]
- Buyrukoğlu S. Early detection of ALZHEIMER’S disease using data mining: comparison of ensemble feature selection approaches. Konya J. Eng. Sci. 2021;9(1):50–61. doi: 10.36306/KONJES.731624. Mar. [DOI] [Google Scholar]
- Dhakal S., Azam S., Hasib K. Md., Karim A., Jonkman M., Al Haque A.S.M.F. Dementia prediction using machine learning. Procedia Comput. Sci. 2023;219:1297–1308. doi: 10.1016/J.PROCS.2023.01.414. Jan. [DOI] [Google Scholar]
- Fiest K.M., et al. The prevalence and incidence of dementia due to Alzheimer's disease: a systematic review and meta-analysis. Can. J. Neurol. Sci. 2016;43(Suppl 1):S51–S82. doi: 10.1017/CJN.2016.36. S1, Apr. [DOI] [PubMed] [Google Scholar]
- Gao Y., Liu X. Secular trends in the incidence of and mortality due to Alzheimer's disease and other forms of dementia in China from 1990 to 2019: an age-period-cohort study and joinpoint analysis. Front. Aging Neurosci. 2021;13 doi: 10.3389/fnagi.2021.709156. Sep. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grueso S., Viejo-Sobera R. Machine learning methods for predicting progression from mild cognitive impairment to Alzheimer's disease dementia: a systematic review. Alzheimer's Res. Ther. 2021;13(1) doi: 10.1186/S13195-021-00900-W. vol. 13, no. 1, pp. 1–29, Sep. 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harika S., Yamini T., Nagasaikamesh T., Basha S.H., Kumar S.S., DurgaKameswari Mrs.S.S. Alzheimers disease detection using different machine learning algorithms. Int. J. Res. Appl. Sci. Eng. Technol. 2022;10(10):62–66. doi: 10.22214/IJRASET.2022.46937. Oct. [DOI] [Google Scholar]
- Hua H., Li Y., Wang T., Dong N., Li W., Cao J. Edge computing with artificial intelligence: a machine learning perspective. ACM Comput. Surv. 2023;55(9) doi: 10.1145/3555802. Jan. [DOI] [Google Scholar]
- Iparraguirre Villanueva O., et al. Comparison of predictive machine learning models to predict the level of adaptability of students in online education. Sci. Inform. Org. 2023;14(4):494–503. doi: 10.14569/IJACSA.2023.0140455. [DOI] [Google Scholar]
- Javeed A., et al. Early prediction of dementia using feature extraction battery (FEB) and optimized support vector machine (SVM) for classification. Biomedicines. 2023;11(2):439. doi: 10.3390/BIOMEDICINES11020439/S1. Feb. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kavitha C., Mani V., Srividhya S.R., Khalaf O.I., Tavera Romero C.A. Early-stage Alzheimer's disease prediction using machine learning models. Front. Public Health. 2022;10 doi: 10.3389/FPUBH.2022.853294. Mar. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Khaleel A.A., Al-Azzawi A.A.M., Alkhazraji A.M. Random forest for lung cancer analysis using Apache Mahout and Hadoop based on software defined networking. Indonesian J. Electric. Eng. Comput. Sci. 2023;32(2):1086. doi: 10.11591/ijeecs.v32.i2.pp1086-1093. Nov. [DOI] [Google Scholar]
- Kotturu P.K., Kumar A. 2020. Comparative Study on Machine Learning Models for Early Diagnose of Alzheimer's Disease: Multi Correlation Method; pp. 778–783. Jul. [DOI] [Google Scholar]
- Kreuzberger D., Kuhl N., Hirschl S. Machine learning operations (MLOps): overview, definition, and architecture. IEEE Access. 2022;11:31866–31879. doi: 10.1109/ACCESS.2023.3262138. May. [DOI] [Google Scholar]
- Lambora A., Gupta K., Chopra K. Proceedings of the International Conference on Machine Learning, Big Data, Cloud and Parallel Computing: Trends, Prespectives and Prospects. COMITCon; 2019. Genetic algorithm- A literature review; pp. 380–384. Feb. 2019. [DOI] [Google Scholar]
- Lee J., et al. Prevalence of dementia in India: national and state estimates from a nationwide study. Alzheimer's Dementia. 2023;19(7):2898–2912. doi: 10.1002/alz.12928. Jul. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li D., Jia J., Zeng H., Zhong X., Chen H., Yi C. Efficacy of exercise rehabilitation for managing patients with Alzheimer's disease. Neural Regen Res. 2024;19(10):2175–2188. doi: 10.4103/1673-5374.391308. Oct. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Livingston G., et al. Dementia prevention, intervention, and care: 2020 report of the Lancet Commission. Lancet. 2020;396(10248):413–446. doi: 10.1016/S0140-6736(20)30367-6. Aug. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mirbagheri B., Alimohammadi A. Improving urban cellular automata performance by integrating global and geographically weighted logistic regression models. Trans. GIS. 2017;21(6):1280–1297. doi: 10.1111/TGIS.12278. Dec. [DOI] [Google Scholar]
- Mladenova T. ISMSIT 2021 - 5th International Symposium on Multidisciplinary Studies and Innovative Technologies, Proceedings. 2021. A feature-weighted rule for the K-nearest neighbor; pp. 493–497. [DOI] [Google Scholar]
- Montgomery W., Ueda K., Jorgensen M., Stathis S., Cheng Y., Nakamura T. Epidemiology, associated burden, and current clinical practice for the diagnosis and management of Alzheimer's disease in Japan. Clinicoecon Outcomes. Res. 2017;10:13–28. doi: 10.2147/CEOR.S146788. Jan. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nianogo R.A., Rosenwohl-Mack A., Yaffe K., Carrasco A., Hoffmann C.M., Barnes D.E. Risk factors associated with alzheimer disease and related dementias by sex and race and ethnicity in the US. JAMA Neurol. 2022;79(6):584. doi: 10.1001/jamaneurol.2022.0976. Jun. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nichols E., et al. Global, regional, and national burden of Alzheimer's disease and other dementias, 1990-2016: a systematic analysis for the Global Burden of Disease Study 2016. Lancet Neurol. 2019;18(1):88–106. doi: 10.1016/S1474-4422(18)30403-4. Jan. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Park S.-Y., et al. Racial and ethnic differences in the population-attributable fractions of Alzheimer disease and related dementias. Neurology. 2024;102(3) doi: 10.1212/WNL.0000000000208116. Feb. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rajayyan S., Mustafa S.M.M. Prediction of dementia using machine learning model and performance improvement with cuckoo algorithm. Int. J. Electr. Comput. Eng. 2023;13(4):4623–4632. doi: 10.11591/IJECE.V13I4.PP4623-4632. Aug. [DOI] [Google Scholar]
- Roy A., Chakraborty S. Support vector machine in structural reliability analysis: a review. Reliab. Eng. Syst. Saf. 2023;233 doi: 10.1016/J.RESS.2023.109126. May. [DOI] [Google Scholar]
- Sarica A., Cerasa A., Quattrone A. Random forest algorithm for the classification of neuroimaging data in Alzheimer's disease: a systematic review. Front. Aging Neurosci. 2017;9(OCT) doi: 10.3389/FNAGI.2017.00329/BIBTEX. Oct. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shaik A.B., Srinivasan S. A brief survey on random forest ensembles in classification model. Lecture Note. Network. Syst. 2019;56:253–260. doi: 10.1007/978-981-13-2354-6_27/COVER. [DOI] [Google Scholar]
- Singh-Manoux A., Sabia S. Facteurs de risque de la maladie d'Alzheimer et des maladies apparentées : approche parcours de vie. Bull Acad. Natl. Med. 2020;204(3):217–223. doi: 10.1016/j.banm.2020.01.015. Mar. [DOI] [Google Scholar]
- Souchet B., Michaïl A., Billoir B., Braudeau J. Biological diagnosis of Alzheimer's disease based on amyloid status: an illustration of confirmation bias in medical research? Int. J. Mol. Sci. 2023;24(24) doi: 10.3390/ijms242417544. Dec. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Speiser J.L., Miller M.E., Tooze J., Ip E. A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst. Appl. 2019;134:93–101. doi: 10.1016/J.ESWA.2019.05.028. Nov. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Srinivasulu A., Pushpa A. Disease prediction in big data healthcare using extended convolutional neural network techniques. Int. J. Adv. Appl. Sci. 2020;9(2):85. doi: 10.11591/ijaas.v9.i2.pp85-92. Jun. [DOI] [Google Scholar]
- Tahami Monfared A.A., Byrnes M.J., White L.A., Zhang Q. Alzheimer's disease: epidemiology and clinical progression. Neurol. Ther. 2022;11(2):553–569. doi: 10.1007/s40120-022-00338-8. Jun. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tahami Monfared A.A., Byrnes M.J., White L.A., Zhang Q. Alzheimer's disease: epidemiology and clinical progression. Neurol. Ther. 2022;11(2):553–569. doi: 10.1007/S40120-022-00338-8/TABLES/4. Jun. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tang X., Liu J. Comparing different algorithms for the course of Alzheimer's disease using machine learning. Ann. Palliat. Med. 2021;10(9):9715–9724. doi: 10.21037/APM-21-2013. Sep. [DOI] [PubMed] [Google Scholar]
- Taunk K., De S., Verma S., Swetapadma A. 2019 International Conference on Intelligent Computing and Control Systems. ICCS; 2019. A brief review of nearest neighbor algorithm for learning and classification; pp. 1255–1260. May. 2019. [DOI] [Google Scholar]
- Teles G., Rodrigues J.J.P.C., Kozlov S.A., Rabêlo R.A.L., Albuquerque V.H.C. Decision support system on credit operation using linear and logistic regression. Expert Syst. 2021;38(6) doi: 10.1111/EXSY.12578. Sep. [DOI] [Google Scholar]
- Tomlinson B.E., Blessed G., Roth M. Observations on the brains of demented old people. J. Neurol. Sci. 1970;11(3):205–242. doi: 10.1016/0022-510X(70)90063-8. [DOI] [PubMed] [Google Scholar]
- Tsering J., et al. Effects of the Tibetan medicine Byur dMar Nyer lNga Ril Bu on Alzheimer's disease in mice models. J. Ethnopharmacol. 2022;283 doi: 10.1016/j.jep.2021.114724. Jan. [DOI] [PubMed] [Google Scholar]
- Tyralis H., Papacharalampous G. Boosting algorithms in energy research: a systematic review. Neural Comput. Appl. 2021;33(21):14101–14117. doi: 10.1007/S00521-021-05995-8. 33:21, Apr. 2021. [DOI] [Google Scholar]
- Tyralis H., Papacharalampous G., Langousis A. A brief review of random forests for water scientists and practitioners and their recent history in water resources. Water. 2019;11:910. doi: 10.3390/W11050910. 11, no. 5, p. 910, Apr. 2019. [DOI] [Google Scholar]
- Uddin K.M.M., Alam M.J., E-Anawar Jannat, Uddin M.A., Aryal S. A novel approach utilizing machine learning for the early diagnosis of Alzheimer's disease. Biomed. Mater. Devices (New York, N.Y.) 2023 doi: 10.1007/S44174-023-00078-9. Apr. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vinet-Couchevellou M. Subjective dementification’ as an effect of Alzheimer's ideology. L’Évolution Psychiatrique. 2022;87(3):631–643. doi: 10.1016/j.evopsy.2021.09.005. Sep. [DOI] [Google Scholar]
- Wang H., et al. Develop a diagnostic tool for dementia using machine learning and non-imaging features. Front. Aging Neurosci. 2022;14 doi: 10.3389/FNAGI.2022.945274/BIBTEX. Aug. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zapata-Paulini J., Cabanillas-Carbonell M. Evaluation of machine learning algorithms in the early detection of Parkinson's disease: a comparative study. Indonesian J. Electric. Eng. Comput. Sci. 2024;35(1):222. doi: 10.11591/ijeecs.v35.i1.pp222-237. Jul. [DOI] [Google Scholar]
- Zhang B., Hepp T., Greven S., Bergherr E. Adaptive step-length selection in gradient boosting for Gaussian location and scale models. Comput. Stat. 2022;37(5):2295–2332. doi: 10.1007/S00180-022-01199-3/TABLES/8. Nov. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data will be made available on request.





