. 2022 Mar 29;23(3):bbac106. doi: 10.1093/bib/bbac106

Table 5.

Summary of SL prediction methods and representative models

Methods and representative models	Description	Advantages	Disadvantages	Application scenarios
Statistical-based methods	Fit existing data based on certain hypothesis	From the perspective of systems biology Do not require known SL data	The selection of hypothesis or threshold is highly subjective and unstable	There are insufficient known SL data
DAISY [13]	Identifies SL interactions in cancer through three statistical procedures in parallel	Comprehendible to biologists Mining data from clinical cancer samples	The biological data are at times noisy and inaccurate	Identification of clinical-related SL interactions in cancer
Network-based methods	Study SL pairs from the perspective of biological network	Add network structure information to gain a more comprehensive understanding of genes globally	Network data are incomplete and contains a lot of noises	There are insufficient known SL data
IDLE [21]	Predicts enzymatic SDLs from a GSMM	The first computational method that captures enzymatic SDL effects in metabolic networks Uncovers the mechanisms behind SDLs	Does not integrate more data source such as patient-specific omics data	Identifies SDLs that have a significant impact on tumor in clinical settings
Fast-SL [22]	Rapidly identifies SL pairs in metabolic networks	Overcomes the issue of computational complexity	Does not identify human SL gene pairs	Identifies higher order SL pairs in metabolic network
Classic ML methods	Learn general patterns from a limited set of known SL data and use those patterns to make predictions about unknown or unobserved SL gene pairs	Good performance on small data sets Effectively integrate multidimensional feature data	Manually generated features and need to understand the features that represent the data Lacks of negative samples	Require known SL data and feature data of high quality
De Kegel et al. study [26]	RF-based model to predict paralog SL pairs	Makes interpretable predictions for paralog SL pairs	Restricted in the identification of paralog SL pairs	Identifies context-specific paralog SL pairs
GRSMF [28]	A GRSMF model	Has the ability of data-adaptiveness and avoids determining the dimension of the latent space	Focuses on mapping genes to latent representations and cannot aggregate information from neighbor genes	There are not enough negative samples
Deep learning methods	Use a multistep feature transformation to obtain a feature representation of the original data, and further input into the prediction function to obtain the final result	Discover deep features for representation learning and pattern recognition from large dataset Does not require manual feature extraction.	Demand a large amount of data and computational resources. Limited by the quality and quantity of the data, which contain many false positives and false negatives. It is hard to train the model. Poor interpretability Lack of negative samples	Require sufficient known SL data and feature data of high quality
EXP2SL [41]	A semisupervised neural network method	Utilizes unlabeled SL data to predict cell-line-specific SL pairs Demonstrates that L1000 expression profiles are effective features data for SL prediction	Limited sample space and cell lines	Predicts cell-line specific SL pairs There are insufficient labeled SL samples
DDGCN [31]	A dual-dropout GCN method	Uses SL dataset with better quality Aggregates information from neighbor genes	Focuses solely on known SL pairs and ignores other data sources of genes	There are sufficient SL samples of high quality and insufficient feature data