. 2022 Sep 9;15:914830. doi: 10.3389/fnmol.2022.914830

TABLE 1.

Machine learning methods used for research on the association between miRNAs and neurodegenerative diseases.

Methods	Positive aspects	Negative aspects	Examples of use in the context of miRNA analysis
Ridge	Reduces the impact of variables that are not important for the prediction	Doesn’t eliminate irrelevant variables	PMID: 26947266
Lasso	Reduces overfitting by adding a penalty to coefficients the model overemphasizes and eliminates them	Doesn’t take into account multicollinearity in the model and could eliminate relevant independent variables	PMID: 34048985
			PMID: 33316739
			PMID: 21743061
Elastic net	Combines both Lasso and Ridge aspects: It eliminates some variables while reducing the impact of some other variables	Computationally more expensive than LASSO or Ridge	PMID: 35113902
			PMID: 29513198
Decision tree	Very simple to understand and visualize	Subject to overfitting	PMID: 26649272
		Doesn’t work well with imbalanced data
		Very different trees can be generated if a small chance in the data is made
Random forest (RF)	Can deal with imbalanced datasets and missing data	The number of nodes in decision trees will grow exponentially with depth	PMID: 29056906
	Being an ensemble of decision trees, overfitting is not a problem	The prediction needs to be uncorrelated	PMID: 23922946
Gradient boosted decision trees (GBDT)	More accurate than RF	Sensibility to outliers	PMID: 32604706
	Doesn’t need bootstrap sampling like RF	Overfitting can be a problem when too many trees are added	PMID: 35051896
Support vector machine (SVM)	Works well with 2D, 3D, or higher dimensions	Computationally more expensive for larger datasets	PMID: 29275361
			PMID: 24417022
	Outliers have less impact on the prediction since the hyperplane is influenced by the support vectors (data points closer to the hyperplane)	Works poorly if the dataset has overlapped classes	PMID: 34442108
Artificial neural networks (ANN)	Work very well with huge amount of data	Can be quickly computationally and time consuming	PMID: 30504368
	Can handle unstructured data	Big dependence on the training data, so overfitting can be a problem	PMID: 22349176
			PMID: 30519653
k-means	Very simple algorithm to implement	Lack of robustness with big data analysis	PMID: 34879829
		Choosing K can be difficult	PMID: 32493067
		Doesn’t work well with imbalanced data or outliers	PMID: 22255820
Weighted correlation network analysis (WGCNA)	Retains connectivity of nodes	Can lack biological precision	PMID: 32699331
			PMID: 34225819
Bayesian network	Can handle missing data and avoid overfitting	Need for sensitivity analysis, to be applied to the outcome	PMID: 23690582
			PMID: 32368197