Skip to main content
. Author manuscript; available in PMC: 2022 Feb 19.
Published in final edited form as: Circ Res. 2021 Feb 18;128(4):544–566. doi: 10.1161/CIRCRESAHA.120.317872

Figure 3. Commonly used machine learning algorithms. A. Least square regressions.

Figure 3.

For these models, assume the data can be fitted by a given (usually, linear) function (regression line), but may deviate due to noise, and find the function’s parameters which minimize the sum of squared distances (errors) to the observed data. B. Support Vector Machines. Typically using binary classification in a supervised setting, the model aims to locate a decision boundary based on a subset of data points (support vectors) that maximizes the margin, i.e., the perpendicular distance between the decision boundary and the closest of the data points. C. k-nearest neighbors. These are non-parametric models for classification and regression problems, in which the idea is to use a vote (for example, majority) of the k closest neighbors to inform a new point’s predicted value/label. D. K means clustering. The goal of this unsupervised clustering algorithm is to split the available data into k clusters by re-assigning each point to different clusters until some distance (typically, Euclidean) is minimized between all points and the respective cluster’s centroid. E. Random forests. Decision trees are a supervised approach to classification and regression problems in which input data is sequentially classified through a flowchart-like structure, where the features to be learned are questions about the data (e.g., “Is patient’s age less than 40?”). Random forests use many decision trees to construct an ensembled output, offering a more robust learning algorithm. F. Principal component analysis. The goal is to change coordinates for the data to an orthogonal basis (principal components, in red) that maximizes the variance of the data along these new principal component directions. This allows for a truncation after a suitable number of principal components, reducing the dimensions of the data. G. Neural networks. Shown here is a neural network autoencoder which consists of artificial neurons (or nodes, gray and red circles) organized in layers (shaded in gray), sharing weighted, directed connections (thin black lines) amongst themselves, each being responsible for combining inputs via a propagation function and generating outputs to be passed further in the network. The input data is passed through fully connected layers to produce a low dimensional encoding (red circles) during encoding, then decoded using additionally fully-connected layers to produce the reconstructed image (here the image of the number 5).