Skip to main content
. 2024 Sep 27;24:220. doi: 10.1186/s12874-024-02341-z

Table 2.

Illustrates, each algorithm has strengths and challenges

Category Algorithm Advantages Disadvantages
Linear Models

Logistic Regression

[40]

Simple to implement and interpret. Efficient to train. Good for binary classification. Assumes linear relationship between variables. Not suitable for complex relationships.
Support Vector Machine (SVM) [42] Effective in high dimensional spaces. Memory efficient. Versatile with kernel functions. Requires careful parameter tuning. Not suitable for large datasets.

SGD Classifier

[47]

Efficient for large-scale problems. Easy to implement and provides a lot of opportunities for code tuning. Sensitive to feature scaling. Requires a number of hyperparameters
Tree-Based Models

Decision Tree Classifier

[43]

Easy to interpret and visualize. Can handle both numerical and categorical data. Prone to overfitting. Can become unstable with small variations in data.

Random Forest Classifier

[44]

Handles overfitting well. Works well on large datasets. Provides feature importances. Can be slow to predict. Complex and difficult to interpret.

AdaBoost Classifier

[45]

Improves classification accuracy. Flexible to combine with any learning algorithm. Sensitive to noisy data and outliers. Can overfit on very complex datasets.

Gradient Boosting

[46]

Highly effective and flexible. Can optimize on different loss functions. Prone to overfitting without proper tuning. Time-consuming to train.
Instance-Based Models

K-Nearest Neighbors (KNN)

[41]

No assumption about data. Simple and effective. Adaptable to any type of data. Computationally expensive. Performance depends on the number of dimensions.
Probabilistic Models

GaussanNB

[48]

Works well with high-dimensional data. Simple and fast. Assumes that features are independent. Performance can be affected if the independence assumption is not met.
Neural Network Model

MLP Classifier

[49, 50]

Capable of modeling complex non-linear relationships and works well with large datasets. Requires significant computational resources and can be prone to overfitting without proper regularization.