. 2018 Aug 23;13(8):e0202214. doi: 10.1371/journal.pone.0202214

Table 2. List of hyper-parameters identified for each model using random grid search (Bergstra & Bengio, 2012), their optimized values, and argument descriptions (Pedregosa et al., 2011).

Model	Parameter	Value	Argument description
LDA	n_components	3	Number of components for dimensionality reduction
	solver	svd	Solver to use
LR	multi_class	multinomial	Class type; either ‘one-versus-rest’ or ‘multinomial’
	C	973.755518841459	Inverse of regularization strength
	solver	lbfgs	Algorithm to use in the optimization problem
	fit_intercept	FALSE	Specifies if a constant should be added to the decision function
	class_weight	None	Weights associated with classes
NB	alpha	0.97375551884146	Smoothing parameter
	fit_prior	TRUE	Whether to learn class prior probabilities or not
	class_prior	None	Prior probabilities of the classes
KNN	n_neighbours	6	Number of neighbors to use
	weights	distance	Weight function used in prediction
	algorithm	brute	Algorithm used to compute the nearest neighbors
	p	1	Power parameter for the Minkowski metric
CDT	max_features	sqrt	Number of features to consider when looking for the best split
	min_samples_split	0.031313293	Minimum number of samples required to split internal node
	splitter	random	Strategy used to choose the split at each node
	criterion	entropy	Function measuring the quality of a split
	class_weight	None	Weights associated with classes
RF	max_features	sqrt	Number of features to consider when looking for the best split
	min_samples_split	0.007066305	Minimum number of samples required to split an internal node
	class_weight	balanced_subsample	Weights associated with classes
	criterion	entropy	Function measuring the quality of a split
	n_estimator	98	Number of trees in the forest
SVM	kernel	poly	Kernel type to be used in the algorithm
	C	21.234911067828	Penalty parameter C of the error term
	gamma	617.482509627716	Kernel coefficient
	degree	1	Degree of the polynomial kernel function
NN	hidden_layer_size	200	The n-th element representing the number of neurons in the n-th hidden layer
	alpha	0.017436642900	Regularization term
	activation	relu	Activation function for the hidden layer
	solver	adam	Solver for weight optimization
	batch_size	32	Size of minibatches for stochastic optimizers
	learning_rate	0.0001	Learning rate schedule for weight updates
	learning_rate_init	adaptive	The initial learning rate used
	max_iter	123	Maximum number of iterations

Models were fitted to 10 folds for each of 50 candidates, totaling 500 fits. Acronyms denote: LDA for Linear Discriminant Analysis, LR for Logistic Regression, NB for Naïve Bayes, SVM for Support Vector Machines, KNN for K-Nearest Neighbors, CDT for Classification Decision Tree, RF for Random Forest and NN for Neural Networks.