Table 1.
Machine Learning Type | Key Features | Clinical Example |
Supervised learning useful when the outcome variable of interest in the training data set is known (e.g., presence or absence of disorder) | Task-driven classification or regression: We ‘supervise’ the program's learning by giving it labeled input (or ‘features’) and corresponding output (or ‘target variables’). The program then ‘learns’ the relationship between them. Learning depends on a mathematical function (the loss function) that estimates the goodness of fit that is used to determine adjustments required to improve model performance. The goal is to use the training data to minimize errors such as misclassification. If successful, the model can eventually be used to predict an outcome such as the presence or absence of a disease from new data. | Distinguish between benign and malignant lesions |
Unsupervised learning useful when the outcome of interest is unknown, unlabeled, or undefined or when we want to explore and identify patterns within the data | Data-driven clustering: The model examines a collection of unlabeled examples, then groups them in an ‘unsupervised’ manner based on some shared commonalities it detects. Unsupervised learning can include clustering and/or dimensionality reduction. In dimensionality reduction, the goal is to reduce the number of variables of the data set while keeping the principal ones that explain the most variation in the data. Ultimately, the output of unsupervised learning can serve as inference and inputs for supervised learning. | A researcher who is interested in how symptoms or features of ADHD in a data set tend to cluster together may use unsupervised learning techniques to identify groups of patients. |
ADHD = attention-deficit/hyperactivity disorder.