. 2016 Jul 13;46(12):2455–2465. doi: 10.1017/S0033291716001367

Table 1.

Main properties of a set of selected statistical learning algorithms

Machine learning algorithm	Details
General linear regression models (GLM)	• A very simple approach based on specifying a linear combination of predictors to predict a dependent variable (Hastie et al. 2009) • Coefficients of the model are a measure of the strength of effect of each predictor on the outcome • Include linear and logistic regression models (Hosmer et al. 2013) • Can present overfitting and multicollinearity in high-dimensional problems
Elastic net models	• Extension of general linear regression models (Zou, 2005) • Explore a large number of predictors to keep the best set of variables in predicting the outcome. This is an internal feature selection method that avoids too complex models and thus prevents of overfitting • Strongly correlated predictors are selected (or discarded) together (what is known as a ‘grouping effect’). This is especially interesting in an exploratory research where the full list of predictors to explore can result equally relevant and meaningful • Coefficients can be interpreted as in a general linear model • Lasso and ridge regression are particular cases (Tibshirani, 1994)
Naive Bayes	• Family of simple classifiers based on applying the Bayes’ Theorem (Russell & Norvig, 2010) (see Fig. 2) • Assumes (a) the value of a particular feature is independent of the value of any other feature and (b) a probability density for numeric predictors • Gives the probability of taking a specific outcome value for unseen cases
Classification and Regression Trees (CART)	• A tree is a flowchart like structure (Breiman, 1984), built by repeatedly splitting data into subsets based on a feature value test (see Fig. 2). Each terminal node (‘leaf’) holds a label • Allows modeling complex nonlinear relationships • Relatively fast to construct and produce interpretable models • Performs internal feature selection as an integral part of the procedure
Random forest	• Offers a rule to combine individual decision trees (Breiman, 2001b) • Multiple tree models are built using different randomly selected subsamples of the full dataset and different initial variables. Then they are aggregated and the most popular outcome value is voted • Good to control overfitting and improve stability and inaccuracy
Support vector machines (SVM)	• Classifier method that constructs hyperplanes in a multidimensional space that separates cases of different outcome values (Cortes & Vapnik, 1995; Scholkopf et al. 2003) (see Fig. 2) • A new case is classified depending on his relative position to the decision boundary • Allows modeling complex non-linear relationships. A set of transformations called ‘kernels’ is used to map data and make them linearly separable • Understanding the contribution of each predictor to outcome prediction is not straightforward and must be explored using specific methods (Altmann et al. 2010)
Artificial neural networks (ANN)	• A computer system that simulates the essential features of neurons and their interconnections with the aim of processing information the same way as real networks of neurons do (Ripley, 1996) (see Fig. 2) • A neuron receives inputs from other neurons through dendrites, processes them, and delivers an outcome through axon. Connections between neurons are weighted during training. Input nodes are features, output nodes are outcomes. Between them there are hidden layers that are formed of a set of nodes • Allow modelling complex nonlinear relationships • Less likely to be used in medical research due to the lack of interpretability of (a) the equations that ANNs generate and (b) the transformation of the original dataset into numerical values that ANNs apply

All methods listed above can be used for classification (categorical outcome) and for regression (quantitative outcomes) problems. All of them can handle multiple continuous and categorical predictors.