Skip to main content
. 2022 Oct 3;8:37. doi: 10.1038/s41540-022-00247-4

Table 3.

Learning-based Methods for Predicting Molecular Interactions.

Type Advantages Disadvantages Applications
Supervised learning Use full label information of omics data. Rely heavily on size of labeled data. Data preprocessing for noise and features may be needed, but this causes information loss. Logistic regression for genome-wise prediction on relevant functions, disease and trait104; Multiple kernel learning for predicting drug response of cancer cell lines using omics profiles and pathways90; Support vector machine for drug-target interaction75; Convolutional neural network for identifying drug-drug interaction from document56; Random Forests for predicting protein contact79.
Unsupervised learning No need for data labels. Suitable for the case where the labeled data is few and expensive to obrain. Lose the informative features brought by labels. Autoencoder for denoising a single-cell RNA-sequencing model93 and extracting representative features from drug molecular structure and protein sequences95; Hierarchical clustering algorithm for clustering patients based on genome-wise similarity and variability91.
Semi-supervised learning Combine the benefits of feature extraction brought by unsupervised learning, and also make full use of the informative label data. Algorithms work under proper assumptions. The trained model will loss generalization on testing data if assumptions don’t hold. Autoencoder-based semi-supervised learning for predicting DTI99, DDI100 and PPI101; Label propagation for predicting DDIs89.