. 2021 Aug 3;11:15747. doi: 10.1038/s41598-021-94897-9

Table 7.

Hyper-parameters for model selection.

ClassifierName	HParamName	ValuesToUse
ExtraTreesClassifier (ETC) / Random Forest Classifier (RFC)	bootstrap	[0, 1]
	class_weight	["balanced"]
	criterion	["gini","entropy"]
	max_depth	[10, 20, 30]
	max_features	[“auto”, “sqrt”, “log2”, 0.5]
	max_samples	[None, 0.6]
	min_impurity_decrease	[1e−5, 1e−4, 1e−3]
	min_samples_leaf	[2, 6, 10, 20]
	n_estimators	[100, 200]
	oob_score	[0, 1]
	random_state	[321]
GaussianProcessClassifier (GPC)	kernel	[RationalQuadratic, RBF]
	n_restarts_optimizer	[0, 1, 2]
	random_state	[321]
KNeighborsClassifier (KNC)	algorithm	["ball_tree", "kd_tree"]
	leaf_size	[10, 20, 30, 40, 50]
	metric	["euclidean","minkowski","mahalanobis","chebyshev"]
	n_neighbors	[2, 5, 10, 15]
	random_state	[321]
MLPClassifier (MLC)	activation	["sigmoid","relu","tanh"]
	alpha	[1e−3, 1e−4, 1e−5]
	early_stopping	[True]
	epsilon	[1e−6, 1e−8]
	hidden_layer_sizes	[(10,),(50,),(100,),(10,10,),(50,50,),(100,100,),(10,10,10,),(50,50,50,),(100,100,100,)]
	learning_rate	["adaptative"]
	learning_rate_init	[1e−3, 1e−2, 2e−2]
	n_iter_no_change	[2]
	random_state	[321]
	solver	["adam"]
	validation_fraction	[0.1]
RidgeClassifier. (RDC)/LogisticRegression (LOG)	alpha	[321]
	class_weight	["balanced"]
	fit_intercept	[True]
	max_iter	[2000]
	random_state	[321]
	solver	[‘lsqr’ (RDG), ‘sparse_cg’ (RDC), ‘sag’, ‘saga’, ‘lbfgs’ (LOG), ‘liblinear’(LOG), ‘newton-cg’(LOG)]
	tol	[1e−3, 1e−4, 1e−5, 1e−6]
SVC	class_weight	["balanced"]
	fit_intercept	[True]
	C	[0.1, 1, 10]
	degree	[1, 2, 3, 4]
	kernel	[‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’]
	random_state	[321]

Hyper-parameters used during the model selection for Table 3. ClassifierName column contains the name of the classifier and our acronym. Some classifiers were grouped since they have similar hyper-parameters like ExtraTreesClassifier (ETC) and Random Forest Classifier (RFC); and RidgeClassifier (RDC) and LogisticRegression (LOG). HParamName contains the names of the hyper-parameter names in the same format as sklearn version 0.24.2 (stable). ValuesToUse column contains the list of potential values of those hyper-parameters to be evaluated. Some values are specific for only one classifier and therefore have the acronym for the model in parenthesis (e.g. ‘lsqr’ (RDC) and ‘lbfgs’ (LOG)). random_state and class_weight hyper-parameters were intended to have the same value across all models. The validation_fraction was used in MLPC to use the feature of early stopping : this created a sub-validation set under the training set different from the validation sets created for the cross-validation.