Statistical-based methods |
Fit existing data based on certain hypothesis |
From the perspective of systems biology Do not require known SL data |
The selection of hypothesis or threshold is highly subjective and unstable |
There are insufficient known SL data |
DAISY [13] |
Identifies SL interactions in cancer through three statistical procedures in parallel |
Comprehendible to biologists Mining data from clinical cancer samples |
The biological data are at times noisy and inaccurate |
Identification of clinical-related SL interactions in cancer |
Network-based methods |
Study SL pairs from the perspective of biological network |
Add network structure information to gain a more comprehensive understanding of genes globally |
Network data are incomplete and contains a lot of noises |
There are insufficient known SL data |
IDLE [21] |
Predicts enzymatic SDLs from a GSMM |
The first computational method that captures enzymatic SDL effects in metabolic networks Uncovers the mechanisms behind SDLs |
Does not integrate more data source such as patient-specific omics data |
Identifies SDLs that have a significant impact on tumor in clinical settings |
Fast-SL [22] |
Rapidly identifies SL pairs in metabolic networks |
Overcomes the issue of computational complexity |
Does not identify human SL gene pairs |
Identifies higher order SL pairs in metabolic network |
Classic ML methods |
Learn general patterns from a limited set of known SL data and use those patterns to make predictions about unknown or unobserved SL gene pairs |
Good performance on small data sets Effectively integrate multidimensional feature data |
Manually generated features and need to understand the features that represent the data Lacks of negative samples |
Require known SL data and feature data of high quality |
De Kegel et al. study [26] |
RF-based model to predict paralog SL pairs |
Makes interpretable predictions for paralog SL pairs |
Restricted in the identification of paralog SL pairs |
Identifies context-specific paralog SL pairs |
GRSMF [28] |
A GRSMF model |
Has the ability of data-adaptiveness and avoids determining the dimension of the latent space |
Focuses on mapping genes to latent representations and cannot aggregate information from neighbor genes |
There are not enough negative samples |
Deep learning methods |
Use a multistep feature transformation to obtain a feature representation of the original data, and further input into the prediction function to obtain the final result |
Discover deep features for representation learning and pattern recognition from large dataset Does not require manual feature extraction. |
Demand a large amount of data and computational resources. Limited by the quality and quantity of the data, which contain many false positives and false negatives. It is hard to train the model. Poor interpretability Lack of negative samples |
Require sufficient known SL data and feature data of high quality |
EXP2SL [41] |
A semisupervised neural network method |
Utilizes unlabeled SL data to predict cell-line-specific SL pairs Demonstrates that L1000 expression profiles are effective features data for SL prediction |
Limited sample space and cell lines |
Predicts cell-line specific SL pairs There are insufficient labeled SL samples |
DDGCN [31] |
A dual-dropout GCN method |
Uses SL dataset with better quality Aggregates information from neighbor genes |
Focuses solely on known SL pairs and ignores other data sources of genes |
There are sufficient SL samples of high quality and insufficient feature data |