. 2023 Jul 3;15(13):3474. doi: 10.3390/cancers15133474

Table 4.

Supervised and unsupervised machine learning algorithms used in biological and life sciences.

Algorithm	Advantages	Limitations	References
Supervised Algorithms
Logistic Regression	High power for supervised classification with a dichotomous variable	Not useful for continuous variables	Yang, 2022 [83]
Support Vector Machine	Applied in non-linear models and survival prediction in cancer and demographic studies, among others. Good control of overfitting and good classifier	Complex algorithm structure. Training is slower.	Huang, 2022 [84]
Decision Trees	Easy algorithm for data training. Used in diagnostic protocols	Can have overfitting problems, especially when there is a significant increase in branching in internal nodes	Lai, 2020 [75]; Batra, 2022 [85], 2022 [7]
Random Forest	Good predictive algorithm used in medicine in different imaging studies and recently in biomarker studies	May have overfitting problems	Batra, 2022 [85]; Handelman, 2018 [80]
Naïve Bayes	Still used in symptom characterization, complication prediction, imaging data, and demographic data.	As it is based on probabilistic statistical models, it can assume that attributes are independent. Redundant attributes can induce classification errors	Yang, 2023 [86]
K-Nearest Neighbor	Used as a classification and prediction algorithm in demographic models and genomic data, among others. Tolerant to noisy and missing data	Can assume that data attributes are equally important and may have similar classifications. Computationally complex with increasing data and attributes	Podolsky, 2016 [82]
Artificial Neural Networks	Algorithmic model capable of classifying and predicting based on a combination of parameters and applying it at the same time.	May have overfitting with too many attributes, and the optimal network structure is determined for experimentation	Lian, 2022 [87]; Civit-Masot, 2022 [88]
Unsupervised Algorithms
K-Means	Widely used algorithm in biological and medical research and is easy to adapt and understand. Performs well on large datasets	The number of K needs to be manually assigned. Outliers can generate incorrect clusters. Scaling issues with the number of dimensions	Huang, 2021 [89]
Principal Component Analysis (PCA)	Linear dimensionality reduction algorithm that allows pattern observation and generates independent variables called principal components. Widely used in biological and genomic data observation	Does not allow non-linear dimensionality reduction. Lack of data standardization can be detrimental to results and information loss	Shin, 2018 [90]
t-SNE	Algorithm that enables visualization of high-dimensional datasets. Frequently used with PCA in biological and life sciences, primarily in omics analysis	Some issues when applied to non-linear parameter dimensionality reduction	Islam, 2021 [91]; Wang, 2021 [92]
UMAP	Next-generation algorithm that, similar to t-SNE, enables visualization of high-dimensional datasets. Offers higher accuracy when working with non-linear structures. Widely used in omics analysis	Currently limited to dimensional reduction due to its relative lack of familiarity	Islam, 2021 [91]; Nascimben, 2022 [93]