Table 2.
Key features of Random Forest, Gradient Boosting, CatBoost, XGBoost, and LightGBM.
| Model | Key Features | Advantages | Disadvantages |
|---|---|---|---|
| Random Forest | Ensemble of decision trees | Handle non-linear relationships, handle large datasets, reduces overfitting | Slower training time, more complex interpretation |
| Gradient Boosting | Ensemble of decision trees using boosting | Handle numerical and categorical features, handle non-linear relationships, high prediction accuracy | Slow training time, prone to overfitting, complex interpretation |
| CatBoost | Gradient boosting with categorical feature support | Handling categorical features without one-hot encoding, handling missing values, automated model tuning, fast parallel processing | Steep learning curve, slower training time |
| XGBoost | Gradient boosting for decision trees | Fast training speed and high prediction accuracy, handles missing values and non-linear features, parallel and GPU processing, many tuning parameters | Difficult interpretation, prone to overfitting, high memory usage |
| LightGBM | Gradient boosting with selective sampling of high gradient instances | Fast training performance, good prediction accuracy, low memory usage | Not suitable for complex models |
| Decision Tree Classifier | Uses a tree-like structure to represent decisions and their possible consequences. Each node represents a feature, and the branches represent the outcomes of that feature. | Easy to interpret and visualize. Can handle both numerical and categorical data. Can handle non-linear relationships between features and class labels. Can handle missing values and irrelevant features. | Prone to overfitting if the tree is too deep and has too many branches. Can be computationally expensive for large datasets. Can be biased towards features with a large number of outcomes. Can suffer from unstable performance if the data is noisy or has small variations. |
| AdaBoost Classifier | Combines multiple simple models (also known as weak learners) to make a strong model. Assigns weights to instances based on their difficulty to predict correctly. | Simple and easy to implement. Can handle both linear and non-linear problems. Can handle noisy and imbalanced data. Can improve the performance of the weak learners by combining them. | Can be sensitive to noisy data and outliers. Can be computationally expensive for large datasets. Can be vulnerable to overfitting if the weak learners are too complex. |
| Extra Trees Classifier | An ensemble method based on decision trees. Creates multiple random trees and combines their predictions. | Can handle high-dimensional data and noisy data. Can handle missing values and irrelevant features. Can be used for both classification and regression problems. | Can be computationally expensive for large datasets. Can suffer from overfitting if the number of trees is too large. Can be sensitive to the choice of hyperparameters. |
| Logistic Regression | A statistical method for solving classification problems. Fits a linear model to the relationship between the input features and the class labels. | Simple and easy to implement. Can handle large datasets efficiently. Can provide an estimate of the probability of the class labels. Can handle both linear and non-linear relationships between the input features and class labels. | Can be sensitive to irrelevant features. Can suffer from overfitting if the model is too complex. Can be vulnerable to outliers and noisy data. |
| Ridge Classifier | Linear regression with L2 regularization penalty term | Handles multicollinearity, provides stability to coefficients | Difficult to find optimal alpha value, not well-suited for large number of predictors or non-linear relationships |
| K-Nearest Neighbor (KNN) | Non-parametric supervised learning algorithm | Simple, efficient, sensitive to local structure of data | Can be computationally expensive, may suffer from the curse of dimensionality |
| Linear Discriminant Analysis (LDA) | Method for finding a linear combination of features that separates classes | Good for dimensionality reduction and classification | Assumes normality and equal covariance matrix across classes, may not perform well if classes are not well-separated |
| SVM with Linear Kernel | Support Vector Machine with linear kernel | Robust and efficient classifier, can handle non-linearly separable data with feature transformation | Sensitive to choice of kernel, may not perform well if classes are not linearly separable |
| Naive Bayes | Simple and easy to implement, fast and efficient for large datasets, works well with high-dimensional data, can handle both continuous and categorical data. | Simple and easy to implement, fast and efficient for large datasets, works well with high-dimensional data, can handle both continuous and categorical data. | Independence assumption is often not true, can be sensitive to irrelevant features, may not perform well for data with complex relationships between features. |
| QDA | A statistical linear discriminant analysis method used for binary and multi-class classification, allows each class to have its own covariance matrix. | Can capture non-linear relationships between features and classes, can model the different covariance structures of the classes, effective when the number of features is small compared to the number of samples. | Can be computationally expensive, requires a large number of samples to avoid overfitting, may perform poorly when the number of features is large compared to the number of samples. |
Notes: Models evaluation: Bentéjac and Martínez-Muñoz (2021) [132]; Böhning (1992) [133]; Breiman (1996:2001) [134] [135],Chen and Guestrin (2016) [136]; Cortes and Vapnik (1995) [137]; Cover and Hart (1967) [138]; Freund and Abe (1999) [139]; Friedman (2001) [140]; Geurts et al. (2006) [141]; Hodges (1950) [142]; Ke et al. (2017) [143]; Leung (2007) [144]; Peng and Cheng (2007) [145]; Prokhorenkova et al. (2018) [146]; Rish (2001) [147]; Schapire (2003) [148]; Swain and Hauska (1977) [149]; Tharwat (2016) [150]; Vezhnevets and Vezhnevets (2005) [151].