Ridge
|
Reduces the impact of variables that are not important for the prediction |
Doesn’t eliminate irrelevant variables |
PMID: 26947266
|
Lasso
|
Reduces overfitting by adding a penalty to coefficients the model overemphasizes and eliminates them |
Doesn’t take into account multicollinearity in the model and could eliminate relevant independent variables |
PMID: 34048985
|
|
|
|
PMID: 33316739
|
|
|
|
PMID: 21743061
|
Elastic net
|
Combines both Lasso and Ridge aspects: It eliminates some variables while reducing the impact of some other variables |
Computationally more expensive than LASSO or Ridge |
PMID: 35113902
|
|
|
|
PMID: 29513198
|
Decision tree
|
Very simple to understand and visualize |
Subject to overfitting |
PMID: 26649272
|
|
|
Doesn’t work well with imbalanced data |
|
|
|
Very different trees can be generated if a small chance in the data is made |
|
Random forest (RF)
|
Can deal with imbalanced datasets and missing data |
The number of nodes in decision trees will grow exponentially with depth |
PMID: 29056906
|
|
Being an ensemble of decision trees, overfitting is not a problem |
The prediction needs to be uncorrelated |
PMID: 23922946
|
Gradient boosted decision trees (GBDT)
|
More accurate than RF |
Sensibility to outliers |
PMID: 32604706
|
|
Doesn’t need bootstrap sampling like RF |
Overfitting can be a problem when too many trees are added |
PMID: 35051896
|
Support vector machine (SVM)
|
Works well with 2D, 3D, or higher dimensions |
Computationally more expensive for larger datasets |
PMID: 29275361
|
|
|
|
PMID: 24417022
|
|
Outliers have less impact on the prediction since the hyperplane is influenced by the support vectors (data points closer to the hyperplane) |
Works poorly if the dataset has overlapped classes |
PMID: 34442108
|
Artificial neural networks (ANN)
|
Work very well with huge amount of data |
Can be quickly computationally and time consuming |
PMID: 30504368
|
|
Can handle unstructured data |
Big dependence on the training data, so overfitting can be a problem |
PMID: 22349176
|
|
|
|
PMID: 30519653
|
k-means
|
Very simple algorithm to implement |
Lack of robustness with big data analysis |
PMID: 34879829
|
|
|
Choosing K can be difficult |
PMID: 32493067
|
|
|
Doesn’t work well with imbalanced data or outliers |
PMID: 22255820
|
Weighted correlation network analysis (WGCNA)
|
Retains connectivity of nodes |
Can lack biological precision |
PMID: 32699331
|
|
|
|
PMID: 34225819
|
Bayesian network
|
Can handle missing data and avoid overfitting |
Need for sensitivity analysis, to be applied to the outcome |
PMID: 23690582
|
|
|
|
PMID: 32368197
|