Skip to main content
. 2022 Mar 29;23(3):bbac106. doi: 10.1093/bib/bbac106

Table 5.

Summary of SL prediction methods and representative models

Methods and representative models Description Advantages Disadvantages Application scenarios
Statistical-based methods Fit existing data based on certain hypothesis From the perspective of systems biology Do not require known SL data The selection of hypothesis or threshold is highly subjective and unstable There are insufficient known SL data
DAISY [13] Identifies SL interactions in cancer through three statistical procedures in parallel Comprehendible to biologists Mining data from clinical cancer samples The biological data are at times noisy and inaccurate Identification of clinical-related SL interactions in cancer
Network-based methods Study SL pairs from the perspective of biological network Add network structure information to gain a more comprehensive understanding of genes globally Network data are incomplete and contains a lot of noises There are insufficient known SL data
IDLE [21] Predicts enzymatic SDLs from a GSMM The first computational method that captures enzymatic SDL effects in metabolic networks Uncovers the mechanisms behind SDLs Does not integrate more data source such as patient-specific omics data Identifies SDLs that have a significant impact on tumor in clinical settings
Fast-SL [22] Rapidly identifies SL pairs in metabolic networks Overcomes the issue of computational complexity Does not identify human SL gene pairs Identifies higher order SL pairs in metabolic network
Classic ML methods Learn general patterns from a limited set of known SL data and use those patterns to make predictions about unknown or unobserved SL gene pairs Good performance on small data sets Effectively integrate multidimensional feature data Manually generated features and need to understand the features that represent the data Lacks of negative samples Require known SL data and feature data of high quality
De Kegel et al. study [26] RF-based model to predict paralog SL pairs Makes interpretable predictions for paralog SL pairs Restricted in the identification of paralog SL pairs Identifies context-specific paralog SL pairs
GRSMF [28] A GRSMF model Has the ability of data-adaptiveness and avoids determining the dimension of the latent space Focuses on mapping genes to latent representations and cannot aggregate information from neighbor genes There are not enough negative samples
Deep learning methods Use a multistep feature transformation to obtain a feature representation of the original data, and further input into the prediction function to obtain the final result Discover deep features for representation learning and pattern recognition from large dataset Does not require manual feature extraction. Demand a large amount of data and computational resources. Limited by the quality and quantity of the data, which contain many false positives and false negatives. It is hard to train the model. Poor interpretability Lack of negative samples Require sufficient known SL data and feature data of high quality
EXP2SL [41] A semisupervised neural network method Utilizes unlabeled SL data to predict cell-line-specific SL pairs Demonstrates that L1000 expression profiles are effective features data for SL prediction Limited sample space and cell lines Predicts cell-line specific SL pairs There are insufficient labeled SL samples
DDGCN [31] A dual-dropout GCN method Uses SL dataset with better quality Aggregates information from neighbor genes Focuses solely on known SL pairs and ignores other data sources of genes There are sufficient SL samples of high quality and insufficient feature data