Supervised learning |
Uses human-coded information to train machine learning models to classify unseen data. |
Random forest |
This is an ensemble of decision trees. An item is classified according to the most common output from all of the decision trees. Due to the increased exposure to samples of training data, random forests have the advantage of not over-fitting a model to the data, compared to a single decision tree. |
Support vector machines (SVMs) |
SVMs allow the construction of models capable of separating training data into different classes. When presented with new data, these models are able to predict which class it should belong to. |
Artificial neural networks (ANNs) |
ANNs are modelled on the design of the brain. Due to their structure of interconnecting layers of neurons, artificial neural networks have been likened to the outer cortex of the brain. These networks are comprised of interconnected layers that are involved in the analysis and classification of input data. The greater the number of layers a network has, the higher the level of analysis; this forms the basis for deep learning. ANNs are able to learn which connections are the most useful for classifying data and thus weight these accordingly. |
Unsupervised learning |
The model is not provided with human-coded outcomes, so the model has to classify data itself based on its own analysis. This has the potential to identify novel relationships within the data. |
Clustering techniques |
This method is similar to SVMs, however as the data is unlabeled, the model is unable to classify it based on human-coded information. Instead, the model identifies the natural groupings, or clusters, of data and uses these clusters to classify new data. |
Naive Bayes |
These are a family of techniques which apply Bayes’ Theorem (Bayes’ Theorem states that the probability of an event can be affected by prior evidence) to classify data, assuming that the data are independent from each other. |
Principal component analysis (PCA) |
PCA is a technique that makes data easier to analyse by transforming potentially correlated variables into non-correlated variables, known as principal components. These principal components allow for feature extraction from the original dataset. |
Autoencoders |
These are a type of ANN that encode the input into a compressed dataset, learn from this compressed information, and then reconstruct this information as output. By compressing the input data, this technique aims to learn the most important features of the input data. |
Reinforcement learning |
The machine learns how to interact with its environment through trial and error, to maximize the rewards. It is analogous to how a baby learns to interact with its environment. |
Q-Learning |
In Q-Learning, the agent learns how to optimally process different types of data in different ways.(NB. whilst this is a powerful machine learning technique, its use in the medical field is limited at present). |