Table 2.
Examples of machine learning algorithms
Logistic regression: Often used to quantify effect sizes in traditional statistics, logistic regression (Fig. 1a) may be used as an ML model by assigning each feature of a dataset to a specific parameter and then tuning these on a training set. Additional adjustments to the model design (i.e., parameter regularization) may be used to decrease the model’s reliance on any singular feature and augment performance [44]. Support vector machines (SVM): SVMs (Fig. 1b) learn to set a decision boundary (“hyperplane”) that maximizes the separation (“margin”) between different sets of observations in the data space [45]. Decision boundaries may be adjusted to be nonlinear boundaries through specialized methods [46]. The relative simplicity of an SVM’s decision often makes this model a good choice for avoiding overfitting in complex datasets (e.g., longitudinal neuroimaging data such as fMRI) [47]. K-nearest neighbors (KNN): The KNN algorithm (Fig. 1c) predicts the outcome from a set of input features from the k most similar points, where k is a small integer chosen by the investigator [48]. In practice, KNN is often used as a data-informed strategy for imputing missing values in a dataset. E.g., in longitudinal cohorts, KNN can be used to estimate missing variables (e.g. neurocognitive test scores [49]) by training subjects for whom full data is available Decision trees: Decision trees (Fig. 1d) are essentially flow charts that can be used to predict outcomes based on branching logic. Given input feature values, the model undertakes a series of binary decisions to reach the proper outcome. Over the course of training, the model learns to navigate each branch point with increasing accuracy [50]. These models are best used when a high degree of interpretability is sought, but their performance may suffer relative to more complex modeling strategies Random forests (RF): RFs (Fig. 1d) combine many randomly generated decision trees to provide an overall prediction. The overall final prediction is the result of averaging the results from individual trees in the forest through procedures such as majority voting or arithmetic averaging [51]. So-called “boosting” algorithms [52] are often used in RFs to generate successive trees that minimize the errors of earlier trees and provide improved performance. Boosted RFs are often a good benchmark for non-neural network performance, with growing deployment in academic studies [53]. Neural networks (deep learning): Neural networks (Fig. 1e) use chains of mathematical functions (or “layers”) to make predictions. When many such layers are connected, the network is deep. Deep learning is a massive field that powers most of modern ML, and new “architectures” for networks are constantly emerging. However, we may note some crucial categories as follows: • Multilayer perceptrons (MLP): These networks consist of an input layer of feature values that undergo further calculations in “hidden layers” before they are used to make a final prediction [54]. Such relatively simple networks may be used as standalone models or as parts of larger deep learning models • Convolutional neural networks (CNN): CNNs use a series of filters that distill data patterns to their most essential properties. This is particularly useful in computer vision, where complex patterns of pixels (e.g., abscesses and organ boundaries) must be learned by building them up from simpler shapes (e.g. lines and edges) [55]. As such, these networks are best-use standards for tasks such as visual diagnosis and segmentation of critical structures in radiologic scans • Recurrent neural networks (RNN): RNNs process sequential data using looped calculations that allow the network to form a “memory” of its previous processing behavior. This is highly useful for linguistic data and time series such as electrocardiograms [56] or electroencephelograms • Transformers: Transformers are relatively new neural network models that differentially weight pieces of their input data using a concept known as “attention” [25]. This allows the appearance of contextual awareness in a model, such as in language tasks [58] (e.g., differing importance of different parts of a sentence) or computer vision [58] (e.g., differing importance of different areas of an image) |