Table 1.
Algorithm | Learning type | Basic working principle |
---|---|---|
Regularised Linear Regression | Supervised | Extension of the linear regression by including a regularisation penalty to prevent overfitting. Finds a linear relationship between input variables and a continuous output variable |
Regularised Logistic Regression | Supervised | Extension of the logistic regression by including a regularisation penalty to prevent overfitting. Estimates probabilities using a logistic function, often used for binary classification |
Regularised Cox Proportional Hazards Model | Supervised | Extension of the Cox regression by including a regularisation penalty term. Used in survival analysis to model the time until an event occurs focusing on the relationship between survival time and one or more predictors and including censored data. Many machine learning algorithms have a version which allows modelling time until an event occurs |
Decision Trees | Supervised | Splits data into branches to form a tree structure, making decisions based on features |
Random Forest | Supervised | Ensemble of Decision Trees, used for classification and regression, improving accuracy and reducing overfitting |
XGBoost | Supervised | A highly optimised machine learning library known for its speed and performance. It combines decision trees (‘weak learners’) sequentially to develop stronger learners to improve predictions. This is done by training each weak learner on the errors of the preceding one, targeting areas of poor model performance |
Support Vector Machines | Supervised | An algorithm that helps to separate data points into distinct categories by finding the best-dividing line (or plane in more complex multidimensional situations) between different sets of data points |
K-Nearest Neighbours | Supervised | Classifies data based on the majority vote of its ‘k’ nearest neighbours in the feature space |
K-Means Clustering | Unsupervised | Partitions data into ‘k’ distinct clusters based on feature similarity |
Hierarchical Clustering | Unsupervised | Creates a tree of clusters by iteratively grouping data points based on their similarity |
Self-Organising Maps | Unsupervised | Neural network based, used for dimensionality reduction and visualisation, organising high-dimensional data into a low-dimensional map |
Principal Component Analysis | Unsupervised | Reduces data dimensionality by transforming to a new set of variables (principal components) |
Neural Networks | Supervised/ Unsupervised |
Composed of interconnected nodes or neurons, mimicking the human brain, used in complex pattern recognition |
Deep Learning | Supervised/ Unsupervised/ Reinforcement |
Utilises multilayered neural networks to analyse large and complex datasets, excelling in complex tasks like image and speech recognition and natural language processing. Usually not recommended for small data sets |
Q-Learning | Reinforcement | A model-free reinforcement learning algorithm that seeks to learn a policy, which tells an agent what action to take under what circumstances |