Table 1.
Learning Type | Method | Description and Most Relevant Features | Main Applications | References |
---|---|---|---|---|
LR | Linear Regression is a supervised learning method to investigate the linear relationship between a dependent and one or more independent variables. LR was the oldest and the most widely used type of regression. To overcome the limit of linear assumption many regression techniques have been developed, varying in the type of cost function used: Non-Linear Regression, Polynomial Regression, Logistic Regression (Sigmoid function), Poisson Regression and many others. | Classification. Functional causal modelling. Metabolomics. Genotype-phenotype associations. |
2001 [49] 2005 [50] 2006 [51] 2008 [52] 2012 [53] 2014 [54] |
|
SVM | Support Vector Machines are supervised learning methods for binary classification. SVMs represent data as points in space and construct a hyperplane or set of hyperplanes in a high-dimensional space to separate the points and predict the belonging to a category [21]. SVMs can perform linear classification and non-linear classification using kernel methods, a class of algorithms for high-dimensional pattern analysis. | Cancer genomics classification. Outliers detection. Discovery of new biomarkers and new drug targets. |
2003 [55] 2007 [56] 2008 [57] 2011 [58] 2013 [59] 2014 [60] |
|
Supervised | RDF | Random Decision Forests are learning methods that train and average predictions provided by many Decision Trees (DTs). DTs are ML approaches in which predictions are represented by a series of decisions to predict the target value of a variable starting from features observations [61]. Target variable can take continuous (Regression Trees) or discrete values (Classification Trees). DTs are often unstable methods, but have the big advantage to be easily interpretable. | Genome-Wide Association (GWA). Epistasis detection. Pathway analysis. Visualization of decision processes. |
2003 [62] 2004 [63] 2006 [64] 2009 [65] 2012 [66] 2015[67] |
Naive Bayes | Bayes Classifiers are ML methods that use the Bayes’ theorem for the classification process. A strong assumption for Naive Bayes is mutual feature-independence. These classifiers are very fast and, despite their simplicity, they are efficient in many complex tasks, also with small training data sets. | Short-sequences classification. Multi-class prediction. DNA barcoding. Biomarker selection. |
2001 [68] 2002 [69] 2006 [70] 2009 [71] |
|
k-NN | The k-Nearest Neighbours is an instance-based learning algorithm used for classification or regression. The algorithm assigns weights to neighbour contribution. The nearest neighbours contribute more to the computed average than distant ones. | Cancer genomics classification. Gene expression analysis. |
2005 [72] 2006 [73] 2010 [74] |
|
PCA | Principal Component Analysis is a statistical procedure for the reduction of the dimensionality of variable space. PCA consists in a linear coordinate transformation that projects variables from an high-dimensional space to a low-dimensional space trying to maintain the variance as much as possible[19]. One of the main limits of this method is that it can capture only linear correlations between variables. To overcome this disadvantage, Sparse PCA and Nonlinear PCA have been recently introduced. | Dimensionality reduction. Cancer classification. SNPs tagging. Visualization of genetic distances. Proteomic analysis. |
2004 [75] 2007 [76] 2009 [77] 2011 [78] 2013 [79] 2014 [80] |
|
Unsupervised | DBNs | A Dynamic Bayesian Network is a Bayesian Network (a probabilistic graphical model that uses Bayesian inference for probability computations) with a temporal extension able to model stochastic processes over time [20]. The advantage of this kind of architectures is that they can model very complex time series and relationships between multiple time series. | Gene regulation analysis. Epigenetic data integration. Protein sequencing. |
2007 [81] 2010 [82] 2012 [83] 2014 [84] 2016 [85] |
LDA | Linear Discriminant Analysis is a linear dimensionality reduction technique for the projection of a dataset on a lower-dimensional space. LDA is very similar to PCA, but in addition to maximizing data variance, LDA is also interested in finding axes that minimize variance. | Data pre-processing. Motifs identification. Cancer genomics classification. |
2000 [86] 2008 [87] 2009 [88] |
|
k-Means | k-Means Clustering is a vector-quantization method for the partition of observations into k clusters. At each step the algorithm re-updates centroids as cluster barycenters and re-assigns each data point to the nearest centroid. k-Means is at the same time a simple and efficient algorithm for clustering problems. | Genome clustering. Gene expression pattern recognition. Image segmentation. |
2005 [89] 2007 [90] 2015 [91] 2016 [92] |