Supervised Learning (e.g., support vector machine,84 random forest85) |
Learn a model discriminating one class of biological phenomena from one or more other classes. |
Precise model with predictive and interpretative properties. |
Requires equally large number of examples from each class. |
Unsupervised Learning (e.g., K-means,86 hierarchical clustering87) |
Learn a model descriptive of the biological phenomena in the data. |
Does not require class labels on data. |
Sensitive to similarity measure; results difficult to interpret. |
Semi-supervised Learning (e.g., transduction88) |
Learn model from mixture of labeled and unlabeled data. |
Utilize all available data; typically outperforms use just labeled data. |
Sensitive to errors in propagating class labels from labeled to unlabeled data. |
Feature Selection (e.g., PCA,89 LDA,90 wrapper91) |
Reduce large number of features to fewer, more informative features. |
Improves efficiency and accuracy of learning. |
Sensitive to feature evaluation metric; may discard informative features. |
Active Learning (e.g., uncertainty sampling,92 most informative instance93) |
Identify most informative instances to label for accurate model learning. |
Reduces number of examples needed to learn model; reduces burden on human expert and experiment cost. |
May focus learner on outliers rather than prominent classes. |
Imbalanced class Learning (e.g., minority over-sampling,94 boosting95) |
Learn in the presence of large skew in the number of examples of each class. |
Learn with relatively few examples of biological phenomenon of interest. |
May underfit or overfit data depending on bias toward minority class. |
Deep Learning (DeepBind,14 DeepMotif15) |
Learns complex representations of concepts in the data. |
General purpose and high accuracy. |
Sensitive to parameter choices; long training times. |