Table 1.
Comparison between supervised and unsupervised machine learning.
Supervised ML | Unsupervised ML | |
---|---|---|
Initial data | Prior knowledge (labelled examples) of the expected type of output e.g. input variables (features) and known output variable (label). | No examples of the expected type of output e.g. only input variables (features) |
Goal | Predict the classification or values of unseen data e.g. given new input data (features) predict output variables (label). | Model the underlying structure or distribution in the input data to detect novel patterns that are difficult or impossible to detect by manual human observation. |
Method | Statistical patterns are searched for within labelled examples (i.e. training data) using algorithms designed to detect similar patterns in unseen/future data. This algorithm learning method using training data can be thought of as a teacher supervising the learning process. | Unsupervised algorithms group unsorted/unordered information based on similarities and differences. These algorithms have no prior knowledge or training. The unsupervised learning process can be thought of as learning without a teacher. |
Example | A dog image is not a random collection of pixels. There is a pattern specific to a dog. The more examples, the more finely tuned the algorithm becomes in learning the relationship between the features (pixels) and label (dog image), and the more accurate it can be in performing the classification task. | Given unsorted pixels of cats and dogs, unsupervised algorithms attempt to figure out on their own how to sort them according to similarities, relationships, and differences even when the algorithm has no notion of the type of categories to expect. |
Expected output | Correct answers are known in the form of training data. The ML algorithm iteratively makes predictions on the training data to determine (learn) optimum algorithm values/parameters. Learning stops when the optimum attains an acceptable minimum difference (loss) between prediction and correct answer. | No training data so correct answers are typically unknown. ‘Appropriate interpretation of results’ and ‘validation that the algorithm has solved the intended problem’ is at the discretion of the user. |
ML = machine learning