Skip to main content
. 2023 Jul 3;15(13):3474. doi: 10.3390/cancers15133474

Table 4.

Supervised and unsupervised machine learning algorithms used in biological and life sciences.

Algorithm Advantages Limitations References
Supervised Algorithms
Logistic Regression High power for supervised classification with a dichotomous variable Not useful for continuous variables Yang, 2022 [83]
Support Vector Machine Applied in non-linear models and survival prediction in cancer and demographic studies, among others. Good control of overfitting and good classifier Complex algorithm structure. Training is slower. Huang, 2022 [84]
Decision Trees Easy algorithm for data training. Used in diagnostic protocols Can have overfitting problems, especially when there is a significant increase in branching in internal nodes Lai, 2020 [75]; Batra, 2022 [85], 2022 [7]
Random Forest Good predictive algorithm used in medicine in different imaging studies and recently in biomarker studies May have overfitting problems Batra, 2022 [85]; Handelman, 2018 [80]
Naïve Bayes Still used in symptom characterization, complication prediction, imaging data, and demographic data. As it is based on probabilistic statistical models, it can assume that attributes are independent. Redundant attributes can induce classification errors Yang, 2023 [86]
K-Nearest Neighbor Used as a classification and prediction algorithm in demographic models and genomic data, among others. Tolerant to noisy and missing data Can assume that data attributes are equally important and may have similar classifications. Computationally complex with increasing data and attributes Podolsky, 2016 [82]
Artificial Neural Networks Algorithmic model capable of classifying and predicting based on a combination of parameters and applying it at the same time. May have overfitting with too many attributes, and the optimal network structure is determined for experimentation Lian, 2022 [87]; Civit-Masot, 2022 [88]
Unsupervised Algorithms
K-Means Widely used algorithm in biological and medical research and is easy to adapt and understand. Performs well on large datasets The number of K needs to be manually assigned. Outliers can generate incorrect clusters. Scaling issues with the number of dimensions Huang, 2021 [89]
Principal Component Analysis (PCA) Linear dimensionality reduction algorithm that allows pattern observation and generates independent variables called principal components. Widely used in biological and genomic data observation Does not allow non-linear dimensionality reduction. Lack of data standardization can be detrimental to results and information loss Shin, 2018 [90]
t-SNE Algorithm that enables visualization of high-dimensional datasets. Frequently used with PCA in biological and life sciences, primarily in omics analysis Some issues when applied to non-linear parameter dimensionality reduction Islam, 2021 [91]; Wang, 2021 [92]
UMAP Next-generation algorithm that, similar to t-SNE, enables visualization of high-dimensional datasets. Offers higher accuracy when working with non-linear structures. Widely used in omics analysis Currently limited to dimensional reduction due to its relative lack of familiarity Islam, 2021 [91]; Nascimben, 2022 [93]