Table 7.
Hyper-parameters for model selection.
| ClassifierName | HParamName | ValuesToUse |
|---|---|---|
| ExtraTreesClassifier (ETC) / Random Forest Classifier (RFC) | bootstrap | [0, 1] |
| class_weight | ["balanced"] | |
| criterion | ["gini","entropy"] | |
| max_depth | [10, 20, 30] | |
| max_features | [“auto”, “sqrt”, “log2”, 0.5] | |
| max_samples | [None, 0.6] | |
| min_impurity_decrease | [1e−5, 1e−4, 1e−3] | |
| min_samples_leaf | [2, 6, 10, 20] | |
| n_estimators | [100, 200] | |
| oob_score | [0, 1] | |
| random_state | [321] | |
| GaussianProcessClassifier (GPC) | kernel | [RationalQuadratic, RBF] |
| n_restarts_optimizer | [0, 1, 2] | |
| random_state | [321] | |
| KNeighborsClassifier (KNC) | algorithm | ["ball_tree", "kd_tree"] |
| leaf_size | [10, 20, 30, 40, 50] | |
| metric | ["euclidean","minkowski","mahalanobis","chebyshev"] | |
| n_neighbors | [2, 5, 10, 15] | |
| random_state | [321] | |
| MLPClassifier (MLC) | activation | ["sigmoid","relu","tanh"] |
| alpha | [1e−3, 1e−4, 1e−5] | |
| early_stopping | [True] | |
| epsilon | [1e−6, 1e−8] | |
| hidden_layer_sizes | [(10,),(50,),(100,),(10,10,),(50,50,),(100,100,),(10,10,10,),(50,50,50,),(100,100,100,)] | |
| learning_rate | ["adaptative"] | |
| learning_rate_init | [1e−3, 1e−2, 2e−2] | |
| n_iter_no_change | [2] | |
| random_state | [321] | |
| solver | ["adam"] | |
| validation_fraction | [0.1] | |
| RidgeClassifier. (RDC)/LogisticRegression (LOG) | alpha | [321] |
| class_weight | ["balanced"] | |
| fit_intercept | [True] | |
| max_iter | [2000] | |
| random_state | [321] | |
| solver | [‘lsqr’ (RDG), ‘sparse_cg’ (RDC), ‘sag’, ‘saga’, ‘lbfgs’ (LOG), ‘liblinear’(LOG), ‘newton-cg’(LOG)] | |
| tol | [1e−3, 1e−4, 1e−5, 1e−6] | |
| SVC | class_weight | ["balanced"] |
| fit_intercept | [True] | |
| C | [0.1, 1, 10] | |
| degree | [1, 2, 3, 4] | |
| kernel | [‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’] | |
| random_state | [321] |
Hyper-parameters used during the model selection for Table 3. ClassifierName column contains the name of the classifier and our acronym. Some classifiers were grouped since they have similar hyper-parameters like ExtraTreesClassifier (ETC) and Random Forest Classifier (RFC); and RidgeClassifier (RDC) and LogisticRegression (LOG). HParamName contains the names of the hyper-parameter names in the same format as sklearn version 0.24.2 (stable). ValuesToUse column contains the list of potential values of those hyper-parameters to be evaluated. Some values are specific for only one classifier and therefore have the acronym for the model in parenthesis (e.g. ‘lsqr’ (RDC) and ‘lbfgs’ (LOG)). random_state and class_weight hyper-parameters were intended to have the same value across all models. The validation_fraction was used in MLPC to use the feature of early stopping : this created a sub-validation set under the training set different from the validation sets created for the cross-validation.