. 2024 Sep 27;24:220. doi: 10.1186/s12874-024-02341-z

Table 2.

Illustrates, each algorithm has strengths and challenges

Category	Algorithm	Advantages	Disadvantages
Linear Models	Logistic Regression [40]	Simple to implement and interpret. Efficient to train. Good for binary classification.	Assumes linear relationship between variables. Not suitable for complex relationships.
	Support Vector Machine (SVM) [42]	Effective in high dimensional spaces. Memory efficient. Versatile with kernel functions.	Requires careful parameter tuning. Not suitable for large datasets.
	SGD Classifier [47]	Efficient for large-scale problems. Easy to implement and provides a lot of opportunities for code tuning.	Sensitive to feature scaling. Requires a number of hyperparameters
Tree-Based Models	Decision Tree Classifier [43]	Easy to interpret and visualize. Can handle both numerical and categorical data.	Prone to overfitting. Can become unstable with small variations in data.
	Random Forest Classifier [44]	Handles overfitting well. Works well on large datasets. Provides feature importances.	Can be slow to predict. Complex and difficult to interpret.
	AdaBoost Classifier [45]	Improves classification accuracy. Flexible to combine with any learning algorithm.	Sensitive to noisy data and outliers. Can overfit on very complex datasets.
	Gradient Boosting [46]	Highly effective and flexible. Can optimize on different loss functions.	Prone to overfitting without proper tuning. Time-consuming to train.
Instance-Based Models	K-Nearest Neighbors (KNN) [41]	No assumption about data. Simple and effective. Adaptable to any type of data.	Computationally expensive. Performance depends on the number of dimensions.
Probabilistic Models	GaussanNB [48]	Works well with high-dimensional data. Simple and fast.	Assumes that features are independent. Performance can be affected if the independence assumption is not met.
Neural Network Model	MLP Classifier [49, 50]	Capable of modeling complex non-linear relationships and works well with large datasets.	Requires significant computational resources and can be prone to overfitting without proper regularization.