Skip to main content
. Author manuscript; available in PMC: 2022 Feb 19.
Published in final edited form as: Circ Res. 2021 Feb 18;128(4):544–566. doi: 10.1161/CIRCRESAHA.120.317872

Table 1.

Common ML algorithms and their advantages and disadvantages

Algorithm Acronyms/Variations Description Advantages Disadvantages References
Regressions LASSO, Ridge, Elastic Fit a simple function’s parameters to minimize the sum of squared distances to the observed data
  • Easily interpretable

  • Uncovers causal relationships

  • Computationally impractical for large, high-dimensional datasets

Tibshirani14, Kennedy13, Zou & Hastie15
Support Vector Machines SVM, SVR Use support vectors to identify decision boundaries in the data
  • Memory efficient

  • Robust, based on the Vapnik Chervonenkis theory

  • Not suitable for large datasets

  • No probabilistic interpretation of classifiers

Cortes & Vapnik16
k-nearest neighbor k-NN Classify new data based on labels of surrounding neighbors
  • Strong theoretical underpinnings

  • Easily interpretable

  • Not suitable for large datasets

  • Large memory footprint

Cover & Hart17
k-means clustering kMeans Assign each point in the dataset to one of k clusters to minimize within cluster- variance relative to cluster’s centroid
  • Requires apriori specifying k

  • Good convergence

  • Not suitable for high dimensional data

  • Sensitive to outliers

Lloyd18
Principal Component Analysis PCA, POD, SVD, EVD, KLT Change coordinates to orthonormal basis that maximizes the variance of the data along these new coordinates
  • Can be used as feature extraction

  • Computationally efficient algorithms available

  • Extensively studied led to many generalizations

  • Principal components can depend on all input variables.

  • Learned subspace is linear

Bishop8, Wold et al.157
Decision Tree and Random Forest RF, AdaBoost, XGBoost Build flowchart-like decision trees for which the questions are learned from data, then ensemble multiple to form random forests
  • Very easy to interpret, as intermediary decisions can be read directly

  • Performs well on large data sets

  • Requires manually crafted features

  • Not suitable for perceptual data (e.g., images)

Breiman20
Artificial Neural Networks (A)NN, CNN, RNN, DNN, GAN Directed, weighted acyclic graph of neurons arranged in layers, using a propagation function to transmit information
  • Automatic feature learning

  • Very good performance on imaging data

  • Applicable to a wide range of problems

  • Easy to continue training on additional data

  • May require large amounts of data

  • Prone to overfitting on small datasets

  • Hard to interpret

Chollet7, Bishop8
Naïve Bayes N/A Classification using Bayes’ theorem by assuming independence between features to model class conditional probability
  • Requires small number of training samples imaging data

  • Easy to interpret

  • Assumes features are independent

  • Not suitable for high dimensional data

Bishop8
Linear discriminant analysis LDA, NDA Find a linear combination of features that that separates input data in classes
  • Strong performance when assumptions met

  • Independent variables assumed normal

  • Sensitive to outliers

Duda et al.158
Gaussian Mixture Model GMM Assume data follows a linear combination of Gaussian distributions with parameters estimated from data
  • Fastest algorithm for learning mixture models

  • Simple likelihood-based optimization

  • Covariance matrix estimation can be difficult

  • Number of components is aproiri specified

Bishop8
Spectral Clustering N/A Use eigenvalue decomposition to cluster based on the similarity matrix whose entries Aij
express degree of similarity between points i and j
  • Simple to implement

  • Can be solved efficiently with linear algebra methods

  • Number of clusters needs to be specified in advance

Ng et al.159
Mean Shift N/A A centroid-based clustering method using an iterative approach to search through neighborhood of points and locate modes of density functions
  • No need to specify number of clusters

  • The bandwidth parameter has physical meaning

  • Not scalable, as it requires many nearest neighbor searches

Comaniciu & Meer160
Isomap N/A Non-linear dimensionality reduction using an isometric mapping (distance-preserving transformation between metric spaces)
  • High computational efficiency

  • Nonlinear

  • Globally optimal

  • Sensitive to the parameter governing the connectivity of each point

Tenenbaum et al.161
Local Linear Embedding LLE, HLLE, MLLE Non-linear dimensionality reduction by using linear combinations of projected neighborhood points to reconstruct data
  • Faster than Isomap

  • Can take advantage of sparse matrix algorithms

  • Sensitive to sampling density (i.e., performs poorly on non-uniform densities)

Roweis & Saul162
Diffusion Maps N/A Feature extraction and dimensionality reduction based on a nonlinear approach, in which distances between points are defined in terms of probabilities of diffusion
  • Nonlinear

  • Computation is insensitive to distribution of the points

  • Scaling parameter ε requires tuning

Coifman et al.163
t-distributed stochastic neighbor embedding tSNE Data visualization tool which defines similarity between two points as the conditional probability one would pick the other as neighbor if neighbors were picked based on Student t probabilities centered at the first point
  • Constructs 2- or 3-dimensional representations of the data for easy visualization

  • Nonlinear

  • Highly computationally expensive

  • Due to stochastic nature, sensitive to initial conditions

Roweis & Hinton164