Table 2.
PLI prediction methods as regression tasks based on the ML framework in recent yearsa.
Toolb | Date | Input protein features | Input compound features | Protein feature extractor | Compound feature extractor | Methods |
---|---|---|---|---|---|---|
SimBoost [26] | 04/2017 | Target similarity | Drug similarity | – | – | Gradient boosting tree model |
ACNN [83] | 2017 | Atomic coordinates | Atomic coordinates | Atomic convolution layer | Atomic convolution layer | Atomic fully connected layer |
DeepDTA [84] | 09/2018 | Label encoding | Label encoding | CNN blocks | CNN blocks | Fully connected layer |
DeepAffinity [46] | 02/2019 | Structural property sequence representation | Structural property sequence representation | Seq2seq autoencoders | Seq2seq autoencoders | Unified RNN-CNN |
WideDTA [85] | 02/2019 | Textual information | Textual information | CNN blocks | CNN blocks | Fully connected layers |
GraphDTA [86] | 06/2019 | One-hot encoding | Molecular graph | Convolutional layers | 4 graph neural network variants | Fully connected layers |
RFScore [17] | 08/2019 | 36 intermolecular features | 36 intermolecular features | – | – | Random forest |
AttentionDTA [36] | 11/2019 | Label encoding | Label encoding | CNN block | CNN block | Attention block- fully connected layers |
Taba [87] | 01/2020 | The average distance between pairs of atoms | The average distance between pairs of atoms | – | – | Machine-learning model |
GAT_GCN [88] | 04/2020 | Peptide frequency | Graph structure | CNN | GCN | Fully connected layers |
SAnDReS [89] | 05/2020 | Docking scores | Docking scores | – | – | Machine-learning model |
DeepCDA [90] | 05/2020 | N-gram embedding | SMILES sequence | CNN-LSTM-Two-sided attention mechanism | CNN-LSTM-Two-sided attention mechanism | Fully connected layers |
DGraphDTA [91] | 06/2020 | Protein graph | Molecular graph | GNN | GNN | Fully connected layers |
JoVA [92] | 08/2020 | Multiple unimodal representations | Multiple unimodal representations | Joint view attention module | Joint view attention module | Prediction model |
Fusion [93] | 11/2020 | Atomic representation | Atomic representation | CNNs | SG-GCNs | Fully connected layers |
DeepGS [44] | 2020 | Symbolic sequences | Molecular structure | Prot2Vec-CNN-BiGRU blocks | Smi2Vec-CNN-BiGRU blocks | Fully connected layer |
DeepDTAF [94] | 01/2021 | Sequence, structural property information | SMILES string | Dilated/traditional convolution layers | Dilated convolution layers | Fully connected layers |
GanDTI [37] (classification and regression) |
03/2021 | Protein sequences | Molecule fingerprints-adjacency matrix | Attention module | Residual graph neural network | MLP |
Multi-PLI [38] (classification and regression) |
04/2021 | One-hot vectors | One-hot vectors | CNN blocks | CNN blocks | Fully connected layers |
ML-DTI [95] | 04/2021 | Protein sequences | SMILES string | CNN block (mutual learning) | CNN block (mutual learning) | Linear transformation layers |
DEELIG [47] | 06/2021 | Atomic level-structural information-sequences | Physical properties-fingerprints | CNN | Fully connected layers | Fully connected layers |
GEFA [55] | 07/2021 | Sequence embedding features | Graph representation | GCN | GCN | Linear layers |
SAG-DTA [96] | 08/2021 | Label encoding | Molecular graph | CNN | Graph convolutional layer-SAGPooling layer | Fully connected layers |
Tanoori et al. [97] | 08/2021 | SW sequence similarity | CS similarity | – | – | GBM |
EmbedDTI [56] | 11/2021 | Amino acids | Structural information | CNN | Attention-GCNs | Fully connected layers |
DeepPLA [45] | 12/2021 | Protein sequences (ProSE) | SMILES strings (Mol2Vec) | Head CNN modules-ResNet-based CNN module | Head CNN modules-ResNet-based CNN module | BiLSTM module-MLP module |
DeepGLSTM [98] | 01/2022 | Amino acids | Adjacency representation | BiLSTM | GCN | Fully connected layers |
MGraphDTA [99] | 01/2022 | Integers | Graph structure | Multiscale convolutional neural network | GNN | MLP |
FusionDTA [100] | 01/2022 | word embeddings | SMILES strings | BiLSTM | BiLSTM | Multi- head linear attention blocks/Fully connected layer |
HoTS [39] (classification and regression) |
02/2022 | Protein sequences | Morgan/circular fingerprints | Transformer blocks | Transformer blocks | Fully connected layers |
ELECTRA-DTA [101] | 03/2022 | Protein sequences | SMILES string | Squeeze-and-excitation convolutional neural network blocks | Squeeze-and-excitation convolutional neural network blocks | Fully connected layers |
Note: “-” in the table indicates that there is no such information in the corresponding article.
Abbreviations: CNN – convolutional neural network; GNN – graph neural network; GCNs – graph convolutional networks; LSTM – long short-term memory; SG-CNNs – spatial graph neural networks; BiGRU – bidirectional gate recurrent unit; MLP – multilayer perceptron; GCN – graph convolutional network; SW – Smith-Waterman; CS – chemical structure; GBM – gradient boosting machine; BiLSTM – bidirectional long short-term memory;
URL addresses for the listed tools: SimBoost – https://zenodo.org/record/164436; ACNN – https://github.com/deepchem/deepchem; DeepDTA – https://github.com/hkmztrk/DeepDTA; DeepAffinity – https://github.com/Shen-Lab/DeepAffinity; GraphDTA – https://github.com/thinng/GraphDTA; Taba – https://github.com/azevedolab/taba; SAnDReS – https://github.com/azevedolab/sandres; DeepCDA – https://github.com/LBBSoft/DeepCDA; Fusion – https://github.com/llnl/fast; DeepGS – https://github.com/jacklin18/DeepGS; DeepDTAF – https://github.com/KailiWang1/DeepDTAF; GanDTI – https://github.com/shuyu-wang/GanDTI; Multi-PLI – https://github.com/Deshan-Zhou/Multi-PLI; ML-DTI – https://github.com/guaguabujianle/ML-DTI.git; DEELIG – https://github.com/asadahmedtech/DEELIG; GEFA – https://github.com/ngminhtri0394/GEFA; EmbedDTI – https://github.com/Aurorayuan/EmbedDTI; DeepPLA – https://github.com/David-BominWei/DeepPLA; DeepGLSTM – https://github.com/MLlab4CS/DeepGLSTM.git; MGraphDTA – https://github.com/guaguabujianle/MGraphDTA; FusionDTA – https://github.com/yuanweining/FusionDTA; HoTS – https:// github. com/ GIST- CSBL/ HoTS; ELECTRA-DTA – https://github.com/IILab-Resource/ELECTRA-DTA.