. 2024 Jul 26;10(15):e35127. doi: 10.1016/j.heliyon.2024.e35127

Table 2.

Key features of Random Forest, Gradient Boosting, CatBoost, XGBoost, and LightGBM.

Model	Key Features	Advantages	Disadvantages
Random Forest	Ensemble of decision trees	Handle non-linear relationships, handle large datasets, reduces overfitting	Slower training time, more complex interpretation
Gradient Boosting	Ensemble of decision trees using boosting	Handle numerical and categorical features, handle non-linear relationships, high prediction accuracy	Slow training time, prone to overfitting, complex interpretation
CatBoost	Gradient boosting with categorical feature support	Handling categorical features without one-hot encoding, handling missing values, automated model tuning, fast parallel processing	Steep learning curve, slower training time
XGBoost	Gradient boosting for decision trees	Fast training speed and high prediction accuracy, handles missing values and non-linear features, parallel and GPU processing, many tuning parameters	Difficult interpretation, prone to overfitting, high memory usage
LightGBM	Gradient boosting with selective sampling of high gradient instances	Fast training performance, good prediction accuracy, low memory usage	Not suitable for complex models
Decision Tree Classifier	Uses a tree-like structure to represent decisions and their possible consequences. Each node represents a feature, and the branches represent the outcomes of that feature.	Easy to interpret and visualize. Can handle both numerical and categorical data. Can handle non-linear relationships between features and class labels. Can handle missing values and irrelevant features.	Prone to overfitting if the tree is too deep and has too many branches. Can be computationally expensive for large datasets. Can be biased towards features with a large number of outcomes. Can suffer from unstable performance if the data is noisy or has small variations.
AdaBoost Classifier	Combines multiple simple models (also known as weak learners) to make a strong model. Assigns weights to instances based on their difficulty to predict correctly.	Simple and easy to implement. Can handle both linear and non-linear problems. Can handle noisy and imbalanced data. Can improve the performance of the weak learners by combining them.	Can be sensitive to noisy data and outliers. Can be computationally expensive for large datasets. Can be vulnerable to overfitting if the weak learners are too complex.
Extra Trees Classifier	An ensemble method based on decision trees. Creates multiple random trees and combines their predictions.	Can handle high-dimensional data and noisy data. Can handle missing values and irrelevant features. Can be used for both classification and regression problems.	Can be computationally expensive for large datasets. Can suffer from overfitting if the number of trees is too large. Can be sensitive to the choice of hyperparameters.
Logistic Regression	A statistical method for solving classification problems. Fits a linear model to the relationship between the input features and the class labels.	Simple and easy to implement. Can handle large datasets efficiently. Can provide an estimate of the probability of the class labels. Can handle both linear and non-linear relationships between the input features and class labels.	Can be sensitive to irrelevant features. Can suffer from overfitting if the model is too complex. Can be vulnerable to outliers and noisy data.
Ridge Classifier	Linear regression with L2 regularization penalty term	Handles multicollinearity, provides stability to coefficients	Difficult to find optimal alpha value, not well-suited for large number of predictors or non-linear relationships
K-Nearest Neighbor (KNN)	Non-parametric supervised learning algorithm	Simple, efficient, sensitive to local structure of data	Can be computationally expensive, may suffer from the curse of dimensionality
Linear Discriminant Analysis (LDA)	Method for finding a linear combination of features that separates classes	Good for dimensionality reduction and classification	Assumes normality and equal covariance matrix across classes, may not perform well if classes are not well-separated
SVM with Linear Kernel	Support Vector Machine with linear kernel	Robust and efficient classifier, can handle non-linearly separable data with feature transformation	Sensitive to choice of kernel, may not perform well if classes are not linearly separable
Naive Bayes	Simple and easy to implement, fast and efficient for large datasets, works well with high-dimensional data, can handle both continuous and categorical data.	Simple and easy to implement, fast and efficient for large datasets, works well with high-dimensional data, can handle both continuous and categorical data.	Independence assumption is often not true, can be sensitive to irrelevant features, may not perform well for data with complex relationships between features.
QDA	A statistical linear discriminant analysis method used for binary and multi-class classification, allows each class to have its own covariance matrix.	Can capture non-linear relationships between features and classes, can model the different covariance structures of the classes, effective when the number of features is small compared to the number of samples.	Can be computationally expensive, requires a large number of samples to avoid overfitting, may perform poorly when the number of features is large compared to the number of samples.

Notes: Models evaluation: Bentéjac and Martínez-Muñoz (2021) [132]; Böhning (1992) [133]; Breiman (1996:2001) [134] [135],Chen and Guestrin (2016) [136]; Cortes and Vapnik (1995) [137]; Cover and Hart (1967) [138]; Freund and Abe (1999) [139]; Friedman (2001) [140]; Geurts et al. (2006) [141]; Hodges (1950) [142]; Ke et al. (2017) [143]; Leung (2007) [144]; Peng and Cheng (2007) [145]; Prokhorenkova et al. (2018) [146]; Rish (2001) [147]; Schapire (2003) [148]; Swain and Hauska (1977) [149]; Tharwat (2016) [150]; Vezhnevets and Vezhnevets (2005) [151].