Skip to main content
. 2021 Oct 21;7:e745. doi: 10.7717/peerj-cs.745

Table 8. Description of machine learning classifiers used in the current study.

Classifier Description
SVM SVM is a renowned supervised machine learning algorithm that is widely used for classification and regression problems. SVM performs classification by building high dimensional hyperplanes which are also called decision planes. These hyperplanes help to extricate one type of data from the others (Schölkopf, Burges & Vapnik, 1996).
LR Most of the classification problems can be usually dealt with using LR. It is a statistical method that carries out predictive analysis using probabilistic inferences. It builds the relationship between the categorical dependent variable and one or more independent variables by approximating the probability by using a Sigmoid function (Boyd, Tolson & Copes, 1987).
GNB Naïve Bayes has many variants and GNB is one of the most commonly used ones. GNB is used for the continuous data values and encompasses probabilities (posterior and prior) of the classes in the data. GNB assumes that the features are following normal or Gaussian distribution (Perez, Larranaga & Inza, 2006).
ETC It works very similarly to that of random forest (RF), the only difference lies in the construction of the trees in the forest. Each tree in the ETC is made from the original training sample. Random samples of k best values are used for the decision and the Gini index is used to find the top features to split the data in the tree. These random samples of the feature are the indication of the generation of multiple de-correlated decision trees (Sharaff & Gupta, 2019).
GBM It is a popular machine learning algorithm where many weak classifiers work together to create a strong learning model. GBM works on the principle of the decision trees, however, it creates every tree independently which makes it time-consuming and expensive. It enhances the weak learning algorithms after a series of tweaks which increases the strength of the algorithm. This strength improvement of the algorithms is known as the probability approximately correct (PAC) learning. Due to PAC it works well on the unprocessed data and missing values can be handled efficiently using GBM (Friedman, 2001).
ADA AdaBoost is the short form of adaptive boosting and it is usually used in combination with the other algorithms to increase their performance. To train weak learners into strong learners, it utilizes the boosting approach. Every tree in Adaboost is dependent on the outcome error rate of the last built tree (Freund, Schapire & Abe, 1999).