Table 3.
Model Category | Model Type | R Package | Model Description |
---|---|---|---|
Logic-based | Classification and regression tree (CART) | rpart [27] | - Lightweight and fast decision tree structure that allows for visibility of decisions. - However, they lack the complexity of other methods and may not perform as well as ensemble algorithms. |
Ensemble | Bagging | ||
Random forest (RF) | randomForest [28] | - Builds an ensemble of many independent decision trees using different sets of training data that are generated at random and replaced at each selection (known as bagging). - This large number of trees is used to create a consensus and results in the selection of the most common output that will lead to the maximum number of a class in a single node. |
|
Boosting | |||
Support vector machine (SVM), with radial basis function | e1071 [29] | - Boosting methods fit trees on a modified version of the original data. - By training multiple models additively and in a sequence, these algorithms can identify the errors of weaker, single decision trees. - For example, GBM differs from RF in the order the decision trees are built and the method by which the results are combined. - SVM is an effective tool in datasets with large dimensionality (i.e., a large number of features). |
|
eXtreme gradient boosting (XGB) | xgboost [15] | ||
C5.0 (C50) | C50 [30] | ||
Stochastic gradient boosting (GBM) | gbm [31] | ||
Neural network | Feed-forward neural network (Nnet) | nnet [32] | - Influenced by the function and structure of biological neural networks and can learn highly complex patterns. - By using hidden layers, they create intermediary representations of data that other models cannot reproduce. - AvNnet fits multiple Nnet models and uses the average of the predictions from each constituent model. |
Model averaged neural network (AvNnet) | avnnet [33] |