Logistic regression (generalized linear model, GLM) |
Classification |
Most commonly used model in medical literature. Models linear relationships, requires uncorrelated features |
++++ |
Logistic regression with elastic net regularization (GLMNET) |
Classification |
Adaptation of logistic regression to handle correlated features (as well as high-dimensional datasets). Among correlated features, some will be dropped entirely from the model, even if predictive |
+++ |
Support Vector Machine—SVM |
Classification |
Popular ML tool in biomedical research offers competitive performance among multiple datasets but poor interpretability |
+ |
Classification and Regression Trees—CART |
Classification |
Builds an intuitive decision tree for easy patient stratification. Automatically models feature interactions |
++++ |
Tree-Structured Boosting—MediBoost |
Classification |
Same structure as CART (builds a single decision tree), but with improved accuracy by considering weighted versions of all cases at each split |
++++ |
Random Forest (RF) |
Classification |
Best out-of-the-box performance with no tuning. Variable importance suggests features that contribute to prediction after considering interactions, but no directionality or explicit interactions shown |
++ |
Gradient Boosting Machine (GBM) |
Classification |
Best overall performance on structured data across real-world applications. Variable importance similar to RF |
++ |
Penalized Cox regression (Adaptive Elastic Net) |
Survival Analysis |
Allows Cox survival analysis with high dimensional, correlated data and building of clinically-interpretable nomograms. As in classification, among highly correlated features, some may be dropped from the model, even if predictive. |
+++ |