Skip to main content
. 2024 Jul 26;10(15):e35127. doi: 10.1016/j.heliyon.2024.e35127

Table 2.

Key features of Random Forest, Gradient Boosting, CatBoost, XGBoost, and LightGBM.

Model Key Features Advantages Disadvantages
Random Forest Ensemble of decision trees Handle non-linear relationships, handle large datasets, reduces overfitting Slower training time, more complex interpretation
Gradient Boosting Ensemble of decision trees using boosting Handle numerical and categorical features, handle non-linear relationships, high prediction accuracy Slow training time, prone to overfitting, complex interpretation
CatBoost Gradient boosting with categorical feature support Handling categorical features without one-hot encoding, handling missing values, automated model tuning, fast parallel processing Steep learning curve, slower training time
XGBoost Gradient boosting for decision trees Fast training speed and high prediction accuracy, handles missing values and non-linear features, parallel and GPU processing, many tuning parameters Difficult interpretation, prone to overfitting, high memory usage
LightGBM Gradient boosting with selective sampling of high gradient instances Fast training performance, good prediction accuracy, low memory usage Not suitable for complex models
Decision Tree Classifier Uses a tree-like structure to represent decisions and their possible consequences. Each node represents a feature, and the branches represent the outcomes of that feature. Easy to interpret and visualize. Can handle both numerical and categorical data. Can handle non-linear relationships between features and class labels. Can handle missing values and irrelevant features. Prone to overfitting if the tree is too deep and has too many branches. Can be computationally expensive for large datasets. Can be biased towards features with a large number of outcomes. Can suffer from unstable performance if the data is noisy or has small variations.
AdaBoost Classifier Combines multiple simple models (also known as weak learners) to make a strong model. Assigns weights to instances based on their difficulty to predict correctly. Simple and easy to implement. Can handle both linear and non-linear problems. Can handle noisy and imbalanced data. Can improve the performance of the weak learners by combining them. Can be sensitive to noisy data and outliers. Can be computationally expensive for large datasets. Can be vulnerable to overfitting if the weak learners are too complex.
Extra Trees Classifier An ensemble method based on decision trees. Creates multiple random trees and combines their predictions. Can handle high-dimensional data and noisy data. Can handle missing values and irrelevant features. Can be used for both classification and regression problems. Can be computationally expensive for large datasets. Can suffer from overfitting if the number of trees is too large. Can be sensitive to the choice of hyperparameters.
Logistic Regression A statistical method for solving classification problems. Fits a linear model to the relationship between the input features and the class labels. Simple and easy to implement. Can handle large datasets efficiently. Can provide an estimate of the probability of the class labels. Can handle both linear and non-linear relationships between the input features and class labels. Can be sensitive to irrelevant features. Can suffer from overfitting if the model is too complex. Can be vulnerable to outliers and noisy data.
Ridge Classifier Linear regression with L2 regularization penalty term Handles multicollinearity, provides stability to coefficients Difficult to find optimal alpha value, not well-suited for large number of predictors or non-linear relationships
K-Nearest Neighbor (KNN) Non-parametric supervised learning algorithm Simple, efficient, sensitive to local structure of data Can be computationally expensive, may suffer from the curse of dimensionality
Linear Discriminant Analysis (LDA) Method for finding a linear combination of features that separates classes Good for dimensionality reduction and classification Assumes normality and equal covariance matrix across classes, may not perform well if classes are not well-separated
SVM with Linear Kernel Support Vector Machine with linear kernel Robust and efficient classifier, can handle non-linearly separable data with feature transformation Sensitive to choice of kernel, may not perform well if classes are not linearly separable
Naive Bayes Simple and easy to implement, fast and efficient for large datasets, works well with high-dimensional data, can handle both continuous and categorical data. Simple and easy to implement, fast and efficient for large datasets, works well with high-dimensional data, can handle both continuous and categorical data. Independence assumption is often not true, can be sensitive to irrelevant features, may not perform well for data with complex relationships between features.
QDA A statistical linear discriminant analysis method used for binary and multi-class classification, allows each class to have its own covariance matrix. Can capture non-linear relationships between features and classes, can model the different covariance structures of the classes, effective when the number of features is small compared to the number of samples. Can be computationally expensive, requires a large number of samples to avoid overfitting, may perform poorly when the number of features is large compared to the number of samples.

Notes: Models evaluation: Bentéjac and Martínez-Muñoz (2021) [132]; Böhning (1992) [133]; Breiman (1996:2001) [134] [135],Chen and Guestrin (2016) [136]; Cortes and Vapnik (1995) [137]; Cover and Hart (1967) [138]; Freund and Abe (1999) [139]; Friedman (2001) [140]; Geurts et al. (2006) [141]; Hodges (1950) [142]; Ke et al. (2017) [143]; Leung (2007) [144]; Peng and Cheng (2007) [145]; Prokhorenkova et al. (2018) [146]; Rish (2001) [147]; Schapire (2003) [148]; Swain and Hauska (1977) [149]; Tharwat (2016) [150]; Vezhnevets and Vezhnevets (2005) [151].