Skip to main content
. 2022 Sep 9;15:914830. doi: 10.3389/fnmol.2022.914830

TABLE 1.

Machine learning methods used for research on the association between miRNAs and neurodegenerative diseases.

Methods Positive aspects Negative aspects Examples of use in the context of miRNA analysis
Ridge Reduces the impact of variables that are not important for the prediction Doesn’t eliminate irrelevant variables PMID: 26947266
Lasso Reduces overfitting by adding a penalty to coefficients the model overemphasizes and eliminates them Doesn’t take into account multicollinearity in the model and could eliminate relevant independent variables PMID: 34048985
PMID: 33316739
PMID: 21743061
Elastic net Combines both Lasso and Ridge aspects: It eliminates some variables while reducing the impact of some other variables Computationally more expensive than LASSO or Ridge PMID: 35113902
PMID: 29513198
Decision tree Very simple to understand and visualize Subject to overfitting PMID: 26649272
Doesn’t work well with imbalanced data
Very different trees can be generated if a small chance in the data is made
Random forest (RF) Can deal with imbalanced datasets and missing data The number of nodes in decision trees will grow exponentially with depth PMID: 29056906
Being an ensemble of decision trees, overfitting is not a problem The prediction needs to be uncorrelated PMID: 23922946
Gradient boosted decision trees (GBDT) More accurate than RF Sensibility to outliers PMID: 32604706
Doesn’t need bootstrap sampling like RF Overfitting can be a problem when too many trees are added PMID: 35051896
Support vector machine (SVM) Works well with 2D, 3D, or higher dimensions Computationally more expensive for larger datasets PMID: 29275361
PMID: 24417022
Outliers have less impact on the prediction since the hyperplane is influenced by the support vectors (data points closer to the hyperplane) Works poorly if the dataset has overlapped classes PMID: 34442108
Artificial neural networks (ANN) Work very well with huge amount of data Can be quickly computationally and time consuming PMID: 30504368
Can handle unstructured data Big dependence on the training data, so overfitting can be a problem PMID: 22349176
PMID: 30519653
k-means Very simple algorithm to implement Lack of robustness with big data analysis PMID: 34879829
Choosing K can be difficult PMID: 32493067
Doesn’t work well with imbalanced data or outliers PMID: 22255820
Weighted correlation network analysis (WGCNA) Retains connectivity of nodes Can lack biological precision PMID: 32699331
PMID: 34225819
Bayesian network Can handle missing data and avoid overfitting Need for sensitivity analysis, to be applied to the outcome PMID: 23690582
PMID: 32368197