. 2020 Dec 9;41(3):1427–1473. doi: 10.1002/med.21764

Table 1.

AI‐related learning techniques used in drug discovery

Category of learning	Definition
Supervised learning	A predictive model trained on data points with known outcomes (“labeled data”) Two types of problems:
	Regression: Model finds outputs that are real variables
	Classification: The model divides inputs into classes or groups
Algorithm	Task	Description
Naïve Bayes	Classification	A “probabilistic classifier” that determines the probability of the features occurring in each class by treating every feature independently to return the most likely class based on the Bayes rule. Particularly suited when the dimensionality of the inputs is high.
Support vector machines	Classification	A discriminative classifier that outputs an optimal hyperplane to categorize new examples. The vectors that define the hyperplane are the support vectors.
Random Forest	Classification/Regression	An ensemble of simple tree predictors that vote for the most popular class for classification problems. In the regression problems, the tree responses are averaged to obtain an estimate of the dependent variable. Overfitting is less likely to occur as more decision trees are added to the forest.
K‐nearest‐neighbors	Classification/Regression	A nonparametric algorithm based on feature similarity by assuming that similar things exist in close proximity. Useful for a classification study when there is little or no prior knowledge about the distribution data.
Artificial neural networks	Classification/Regression	A method that learns from input data based on layers of connected neurons consisting of input layers, hidden layers, and output layers.
Deep neural network	Classification/Regression	A collection of neurons organized in a sequence of multiple layers. Type of artificial neural network with several advantages (i.e., shared weights [parameter sharing), spatial relations, and local receptive fields Learning can be supervised, unsupervised, or semisupervised. End‐to‐end learning and transfer learning are the major approaches performed by the deep neural network. Autoencoders and generative adversarial networks are the two specific forms of deep neural networks.
Multiple regression	Regression	A statistical approach to find relationships between dependent variables and one or more independent variables.
Unsupervised learning	A self‐organized model that organizes the data in some way or describe its structure to learn underlying patterns of features directly from unlabeled data.
Algorithm	Task	Description
K‐means clustering	Clustering	A classification method that divides data into k groups by minimizing within‐group distances to the centroid
Fuzzy clustering	Clustering	A form of clustering (Fuzzy C‐means clustering) in which each data point can belong to more than one cluster. It computes the coefficients of being in the clusters for each data point.
Hierarchical clustering	Clustering	A classification method that builds a hierarchy of clusters by merging two close clusters into the same cluster. This algorithm ends when there is only one cluster left.
Principal component analysis	Dimensionality reduction	A nonparametric statistical technique that uses an orthogonal procedure to transform a set of correlated features to new independent variables called principal components
Independent component analysis	Dimensionality reduction	A statistical method that separates a multivariable output into statistical independent additive components
Autoencoders	Dimensionality reduction	A deep neural network trained with backpropagation to reconstruct its original input
Deep belief nets	Dimensionality reduction	Probabilistic generative models with many layers of stochastic, latent variables. Each layer is a Restricted Boltzmann machine.
Generative adversarial networks	Anomaly detection	Deep generative models that use two neural networks, pitting one against the other (thus the “adversarial”) to generate new synthetic but realistic instances of data.
Self‐organizing map	Dimensionality reduction	A competitive learning network that reduces the input dimensionality to represent its distribution as a map.
Semisupervised learning	A combination of supervised and unsupervised learning methods that uses a small amount of labeled data and also a large amount of unlabeled data during training to gain more understanding of the sample population.
Active learning	A particular case of semisupervised learning, where the algorithm is allowed to query the user for the label of a subset of training instances Used to construct a high‐performance classifier while keeping the size of the training data set to a minimum by actively selecting the valuable data points
Reinforcement learning	Dynamic programming that trains algorithms using a system of reward and punishment to maximize the performance.
Transfer learning	A deep learning technique enables developers to harness a neural network used for one task and apply it to another domain. It allows the reuse of a pretrained deep neural network on a new task with only a small amount of data. Useful when the data is insufficient for a new domain to be handled by a neural network, and there is a big preexisting data pool that can be transferred
Multitask learning	An approach to inductive transfer that improves generalization performance of multiple related tasks by leveraging useful information among them. Useful when there are multiple related tasks, each of which has limited training samples
Multiple kernel learning	A flexible learning method that use a predefined set of kernels and learn convex combinations of kernels over potentially different domains. Used when there are heterogeneous sources of data for the task at hand
Ensemble learning	A meta‐algorithm that combines decisions from multiple models into one predictive model to decrease variance (bagging), bias (boosting), or improve predictions (stacking).
End‐to‐end learning	A deep learning process in which all of the parameters are trained jointly, rather than step by step. It allows the training of a deep neural network based on raw data without descriptors. Since the pipeline is replaced with a single learning algorithm, it goes directly from the input to the desired output and thereby overcome limitations of the traditional approach.

Note: The rows with gray backgrounds show the basic learning categories and their definition, while the rows following supervised and unsupervised learning parts display the different algorithms used in these categories.