. 2018 Jun 21;8:228. doi: 10.3389/fonc.2018.00228

Table 1.

Three representative machine learning methods with select pre-processing tips and tuning methods for complexity control.

Method	Pre-process	Complexity control	Reference
Support vector machine (SVM)	– Encode features as binary – Normalize to uniform distribution – Imputation for balancing data	– Recursive feature elimination for linear SVM – Soft margin width (C-parameter) – Kernel hyperparameters	(76, 160)

Bayesian networks	– Feature discretization – Variable selection to reduce graph search space – Imputation not necessary when using expectation maximization	– Constraints to a graph search space based on prior knowledge – Graph scoring functions that penalize complexity	(167–171)

Random forest	– No discretization or normalization necessary – Imputation required	– Number of features to sample at each node split (mtry) – Minimum number of samples in a terminal node	(172, 173)