Skip to main content
. 2021 Aug 3;11:15747. doi: 10.1038/s41598-021-94897-9

Table 7.

Hyper-parameters for model selection.

ClassifierName HParamName ValuesToUse
ExtraTreesClassifier (ETC) / Random Forest Classifier (RFC) bootstrap [0, 1]
class_weight ["balanced"]
criterion ["gini","entropy"]
max_depth [10, 20, 30]
max_features [“auto”, “sqrt”, “log2”, 0.5]
max_samples [None, 0.6]
min_impurity_decrease [1e−5, 1e−4, 1e−3]
min_samples_leaf [2, 6, 10, 20]
n_estimators [100, 200]
oob_score [0, 1]
random_state [321]
GaussianProcessClassifier (GPC) kernel [RationalQuadratic, RBF]
n_restarts_optimizer [0, 1, 2]
random_state [321]
KNeighborsClassifier (KNC) algorithm ["ball_tree", "kd_tree"]
leaf_size [10, 20, 30, 40, 50]
metric ["euclidean","minkowski","mahalanobis","chebyshev"]
n_neighbors [2, 5, 10, 15]
random_state [321]
MLPClassifier (MLC) activation ["sigmoid","relu","tanh"]
alpha [1e−3, 1e−4, 1e−5]
early_stopping [True]
epsilon [1e−6, 1e−8]
hidden_layer_sizes [(10,),(50,),(100,),(10,10,),(50,50,),(100,100,),(10,10,10,),(50,50,50,),(100,100,100,)]
learning_rate ["adaptative"]
learning_rate_init [1e−3, 1e−2, 2e−2]
n_iter_no_change [2]
random_state [321]
solver ["adam"]
validation_fraction [0.1]
RidgeClassifier. (RDC)/LogisticRegression (LOG) alpha [321]
class_weight ["balanced"]
fit_intercept [True]
max_iter [2000]
random_state [321]
solver [‘lsqr’ (RDG), ‘sparse_cg’ (RDC), ‘sag’, ‘saga’, ‘lbfgs’ (LOG), ‘liblinear’(LOG), ‘newton-cg’(LOG)]
tol [1e−3, 1e−4, 1e−5, 1e−6]
SVC class_weight ["balanced"]
fit_intercept [True]
C [0.1, 1, 10]
degree [1, 2, 3, 4]
kernel [‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’]
random_state [321]

Hyper-parameters used during the model selection for Table 3. ClassifierName column contains the name of the classifier and our acronym. Some classifiers were grouped since they have similar hyper-parameters like ExtraTreesClassifier (ETC) and Random Forest Classifier (RFC); and RidgeClassifier (RDC) and LogisticRegression (LOG). HParamName contains the names of the hyper-parameter names in the same format as sklearn version 0.24.2 (stable). ValuesToUse column contains the list of potential values of those hyper-parameters to be evaluated. Some values are specific for only one classifier and therefore have the acronym for the model in parenthesis (e.g. ‘lsqr’ (RDC) and ‘lbfgs’ (LOG)). random_state and class_weight hyper-parameters were intended to have the same value across all models. The validation_fraction was used in MLPC to use the feature of early stopping : this created a sub-validation set under the training set different from the validation sets created for the cross-validation.