Table 2. List of hyper-parameters identified for each model using random grid search (Bergstra & Bengio, 2012), their optimized values, and argument descriptions (Pedregosa et al., 2011).
Model | Parameter | Value | Argument description |
---|---|---|---|
LDA | n_components | 3 | Number of components for dimensionality reduction |
solver | svd | Solver to use | |
LR | multi_class | multinomial | Class type; either ‘one-versus-rest’ or ‘multinomial’ |
C | 973.755518841459 | Inverse of regularization strength | |
solver | lbfgs | Algorithm to use in the optimization problem | |
fit_intercept | FALSE | Specifies if a constant should be added to the decision function | |
class_weight | None | Weights associated with classes | |
NB | alpha | 0.97375551884146 | Smoothing parameter |
fit_prior | TRUE | Whether to learn class prior probabilities or not | |
class_prior | None | Prior probabilities of the classes | |
KNN | n_neighbours | 6 | Number of neighbors to use |
weights | distance | Weight function used in prediction | |
algorithm | brute | Algorithm used to compute the nearest neighbors | |
p | 1 | Power parameter for the Minkowski metric | |
CDT | max_features | sqrt | Number of features to consider when looking for the best split |
min_samples_split | 0.031313293 | Minimum number of samples required to split internal node | |
splitter | random | Strategy used to choose the split at each node | |
criterion | entropy | Function measuring the quality of a split | |
class_weight | None | Weights associated with classes | |
RF | max_features | sqrt | Number of features to consider when looking for the best split |
min_samples_split | 0.007066305 | Minimum number of samples required to split an internal node | |
class_weight | balanced_subsample | Weights associated with classes | |
criterion | entropy | Function measuring the quality of a split | |
n_estimator | 98 | Number of trees in the forest | |
SVM | kernel | poly | Kernel type to be used in the algorithm |
C | 21.234911067828 | Penalty parameter C of the error term | |
gamma | 617.482509627716 | Kernel coefficient | |
degree | 1 | Degree of the polynomial kernel function | |
NN | hidden_layer_size | 200 | The n-th element representing the number of neurons in the n-th hidden layer |
alpha | 0.017436642900 | Regularization term | |
activation | relu | Activation function for the hidden layer | |
solver | adam | Solver for weight optimization | |
batch_size | 32 | Size of minibatches for stochastic optimizers | |
learning_rate | 0.0001 | Learning rate schedule for weight updates | |
learning_rate_init | adaptive | The initial learning rate used | |
max_iter | 123 | Maximum number of iterations |
Models were fitted to 10 folds for each of 50 candidates, totaling 500 fits. Acronyms denote: LDA for Linear Discriminant Analysis, LR for Logistic Regression, NB for Naïve Bayes, SVM for Support Vector Machines, KNN for K-Nearest Neighbors, CDT for Classification Decision Tree, RF for Random Forest and NN for Neural Networks.