Skip to main content
. 2018 Sep 20;13(9):e0204161. doi: 10.1371/journal.pone.0204161

Table 1. Algorithms.

Algorithm Type Selected characteristics Interpretability
Logistic regression (generalized linear model, GLM) Classification Most commonly used model in medical literature. Models linear relationships, requires uncorrelated features ++++
Logistic regression with elastic net regularization (GLMNET) Classification Adaptation of logistic regression to handle correlated features (as well as high-dimensional datasets). Among correlated features, some will be dropped entirely from the model, even if predictive +++
Support Vector Machine—SVM Classification Popular ML tool in biomedical research offers competitive performance among multiple datasets but poor interpretability +
Classification and Regression Trees—CART Classification Builds an intuitive decision tree for easy patient stratification. Automatically models feature interactions ++++
Tree-Structured Boosting—MediBoost Classification Same structure as CART (builds a single decision tree), but with improved accuracy by considering weighted versions of all cases at each split ++++
Random Forest (RF) Classification Best out-of-the-box performance with no tuning. Variable importance suggests features that contribute to prediction after considering interactions, but no directionality or explicit interactions shown ++
Gradient Boosting Machine (GBM) Classification Best overall performance on structured data across real-world applications. Variable importance similar to RF ++
Penalized Cox regression (Adaptive Elastic Net) Survival Analysis Allows Cox survival analysis with high dimensional, correlated data and building of clinically-interpretable nomograms. As in classification, among highly correlated features, some may be dropped from the model, even if predictive. +++