Table 7.
ClassifierName | HParamName | ValuesToUse |
---|---|---|
ExtraTreesClassifier (ETC) / Random Forest Classifier (RFC) | bootstrap | [0, 1] |
class_weight | ["balanced"] | |
criterion | ["gini","entropy"] | |
max_depth | [10, 20, 30] | |
max_features | [“auto”, “sqrt”, “log2”, 0.5] | |
max_samples | [None, 0.6] | |
min_impurity_decrease | [1e−5, 1e−4, 1e−3] | |
min_samples_leaf | [2, 6, 10, 20] | |
n_estimators | [100, 200] | |
oob_score | [0, 1] | |
random_state | [321] | |
GaussianProcessClassifier (GPC) | kernel | [RationalQuadratic, RBF] |
n_restarts_optimizer | [0, 1, 2] | |
random_state | [321] | |
KNeighborsClassifier (KNC) | algorithm | ["ball_tree", "kd_tree"] |
leaf_size | [10, 20, 30, 40, 50] | |
metric | ["euclidean","minkowski","mahalanobis","chebyshev"] | |
n_neighbors | [2, 5, 10, 15] | |
random_state | [321] | |
MLPClassifier (MLC) | activation | ["sigmoid","relu","tanh"] |
alpha | [1e−3, 1e−4, 1e−5] | |
early_stopping | [True] | |
epsilon | [1e−6, 1e−8] | |
hidden_layer_sizes | [(10,),(50,),(100,),(10,10,),(50,50,),(100,100,),(10,10,10,),(50,50,50,),(100,100,100,)] | |
learning_rate | ["adaptative"] | |
learning_rate_init | [1e−3, 1e−2, 2e−2] | |
n_iter_no_change | [2] | |
random_state | [321] | |
solver | ["adam"] | |
validation_fraction | [0.1] | |
RidgeClassifier. (RDC)/LogisticRegression (LOG) | alpha | [321] |
class_weight | ["balanced"] | |
fit_intercept | [True] | |
max_iter | [2000] | |
random_state | [321] | |
solver | [‘lsqr’ (RDG), ‘sparse_cg’ (RDC), ‘sag’, ‘saga’, ‘lbfgs’ (LOG), ‘liblinear’(LOG), ‘newton-cg’(LOG)] | |
tol | [1e−3, 1e−4, 1e−5, 1e−6] | |
SVC | class_weight | ["balanced"] |
fit_intercept | [True] | |
C | [0.1, 1, 10] | |
degree | [1, 2, 3, 4] | |
kernel | [‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’] | |
random_state | [321] |
Hyper-parameters used during the model selection for Table 3. ClassifierName column contains the name of the classifier and our acronym. Some classifiers were grouped since they have similar hyper-parameters like ExtraTreesClassifier (ETC) and Random Forest Classifier (RFC); and RidgeClassifier (RDC) and LogisticRegression (LOG). HParamName contains the names of the hyper-parameter names in the same format as sklearn version 0.24.2 (stable). ValuesToUse column contains the list of potential values of those hyper-parameters to be evaluated. Some values are specific for only one classifier and therefore have the acronym for the model in parenthesis (e.g. ‘lsqr’ (RDC) and ‘lbfgs’ (LOG)). random_state and class_weight hyper-parameters were intended to have the same value across all models. The validation_fraction was used in MLPC to use the feature of early stopping : this created a sub-validation set under the training set different from the validation sets created for the cross-validation.