Table 1.
Algorithm type | Classifiers | Train-test split | Hyper-parameters | Python library |
---|---|---|---|---|
Supervised machine learning algorithm | Logistics regression | • Solver = “lbfgs” • Penalty = “l2” |
sklearn.linear_model | |
Gaussian Naïve Bayes | • Variance smooting = 1e-09 | sklearn.naive_bayes | ||
Decision tree | • Quality of split criterion = “gini” • Value of max_depth was varied between range (1-11 with increment of 1) • Maximum number of features to consider = “auto” |
sklearn.tree | ||
Random forest | • Quality of split criterion = “gini” • Maximum depth of trees = 11 • Maximum number of features to consider = “auto” • Number of trees in the forest = 10 |
sklearn.ensemble | ||
AdaBoost | • Learning rate was varied between range (0.01-1.1 with increment of 0.01) • Maximum number of estimators at which boosting is terminated was varied between range (50-200 with increment of 10) • Algorithm = “SAMME.R” |
sklearn.ensemble | ||
K-nearest neighbors | • Number of neighbors required was set to 2 | sklearn.neighbors | ||
K-nearest neighbors | 70-30% and 10-fold cross validation |
• Number of neighbors required set at 5 | sklearn.neighbors | |
Unsupervised machine learning algorithm | Affinity propagation | • Damping factor was set at 0.8 to maintain current value relative to incoming value (weight 1-damping) • Maximum iteration = 200 • Maximum number of iterations with no change in number of estimated clusters = 15 |
sklearn.cluster | |
BIRCH | • Threshold from which the radius of subcluster should be lesser = 0.5 • Number of clusters = length of unique ids in training set (default = 2) |
sklearn.cluster | ||
DBSCAN | • Maximum distance between two samples for consideration as neighbors (eps) = 0.50 • Minimum samples in neighborhood of a point to consider it as core point = 9 • Distance calculation method = “eulidean” |
sklearn.cluster | ||
K-mean | • Number of neighbors required was set to 2 | sklearn.cluster | ||
Mini-batch K-mean | • Number of neighbors required was set to 2 | sklearn.cluster | ||
Mean shift | • Number of clusters = length of unique ids in training set (default = 2) | sklearn.cluster | ||
OPTICS | • Maximum distance between two samples for consideration as neighbors (eps) = 0.80 • Minimum samples in neighborhood of a point to consider it as core point = 10 |
sklearn.cluster |