Skip to main content
. 2018 Jun 21;8:228. doi: 10.3389/fonc.2018.00228

Table 1.

Three representative machine learning methods with select pre-processing tips and tuning methods for complexity control.

Method Pre-process Complexity control Reference
Support vector machine (SVM)
  • Encode features as binary

  • Normalize to uniform distribution

  • Imputation for balancing data

  • Recursive feature elimination for linear SVM

  • Soft margin width (C-parameter)

  • Kernel hyperparameters

(76, 160)

Bayesian networks
  • Feature discretization

  • Variable selection to reduce graph search space

  • Imputation not necessary when using expectation maximization

  • Constraints to a graph search space based on prior knowledge

  • Graph scoring functions that penalize complexity

(167171)

Random forest
  • No discretization or normalization necessary

  • Imputation required

  • Number of features to sample at each node split (mtry)

  • Minimum number of samples in a terminal node

(172, 173)