Supervised Algorithms |
Logistic Regression |
High power for supervised classification with a dichotomous variable |
Not useful for continuous variables |
Yang, 2022 [83] |
Support Vector Machine |
Applied in non-linear models and survival prediction in cancer and demographic studies, among others. Good control of overfitting and good classifier |
Complex algorithm structure. Training is slower. |
Huang, 2022 [84] |
Decision Trees |
Easy algorithm for data training. Used in diagnostic protocols |
Can have overfitting problems, especially when there is a significant increase in branching in internal nodes |
Lai, 2020 [75]; Batra, 2022 [85], 2022 [7] |
Random Forest |
Good predictive algorithm used in medicine in different imaging studies and recently in biomarker studies |
May have overfitting problems |
Batra, 2022 [85]; Handelman, 2018 [80] |
Naïve Bayes |
Still used in symptom characterization, complication prediction, imaging data, and demographic data. |
As it is based on probabilistic statistical models, it can assume that attributes are independent. Redundant attributes can induce classification errors |
Yang, 2023 [86] |
K-Nearest Neighbor |
Used as a classification and prediction algorithm in demographic models and genomic data, among others. Tolerant to noisy and missing data |
Can assume that data attributes are equally important and may have similar classifications. Computationally complex with increasing data and attributes |
Podolsky, 2016 [82] |
Artificial Neural Networks |
Algorithmic model capable of classifying and predicting based on a combination of parameters and applying it at the same time. |
May have overfitting with too many attributes, and the optimal network structure is determined for experimentation |
Lian, 2022 [87]; Civit-Masot, 2022 [88] |
Unsupervised Algorithms |
K-Means |
Widely used algorithm in biological and medical research and is easy to adapt and understand. Performs well on large datasets |
The number of K needs to be manually assigned. Outliers can generate incorrect clusters. Scaling issues with the number of dimensions |
Huang, 2021 [89] |
Principal Component Analysis (PCA) |
Linear dimensionality reduction algorithm that allows pattern observation and generates independent variables called principal components. Widely used in biological and genomic data observation |
Does not allow non-linear dimensionality reduction. Lack of data standardization can be detrimental to results and information loss |
Shin, 2018 [90] |
t-SNE |
Algorithm that enables visualization of high-dimensional datasets. Frequently used with PCA in biological and life sciences, primarily in omics analysis |
Some issues when applied to non-linear parameter dimensionality reduction |
Islam, 2021 [91]; Wang, 2021 [92] |
UMAP |
Next-generation algorithm that, similar to t-SNE, enables visualization of high-dimensional datasets. Offers higher accuracy when working with non-linear structures. Widely used in omics analysis |
Currently limited to dimensional reduction due to its relative lack of familiarity |
Islam, 2021 [91]; Nascimben, 2022 [93] |