Skip to main content
. 2019 Nov 5;10:1303. doi: 10.3389/fphar.2019.01303

Table 2.

Performances comparison of traditional ML and DL in Drug Discovery.

Ref. Performance traditional ML Performance deep-learning
(Koutsoukas et al., 2017) (1) RF: MCC = 0.89 DNN: MCC = 0.91
(Dahl et al., 2014) (2) RF: AUC = 0.78 MT NN: AUC = 0.82
(Lenselink et al., 2017) SVM: MCC = 0.50, BEDROC = 0.88 DNN_MC: MCC = 0.57, BEDROC = 0.92
RF: MCC = 0.56, BEDROC = 0.82
(Mayr et al., 2016) SVM: AUC = 0.71 ST: AUC = 0.72
MT: AUC = 0.75
(Feinberg et al., 2018) RF: Pearson = 0.783 GNN: Pearson = 0.822
(Segler and Waller, 2017b) LR: Acc = 0.86 (reaction prediction) NN: Acc = 0.92 (reaction prediction)
LR: Acc = 0.64 (retrosynthesis) NN: Acc = 0.78 (retrosynthesis)
(Wu et al., 2018) (3) SVM: AUC = 0.822 GC: AUC = 0.829
(Xiong et al., 2019) (4) SVM: AUC = 0.792 Attentive FP: AUC = 0.832
(Yang et al., 2019) (5) RF: AUC = 0.619 FFN: AUC = 0.788
(Ma et al., 2015) (6) RF: R2 = 0.42 DNN: R2 = 0.49
(Ramsundar et al., 2017) (7) RF: R2 = 0.428 ST: R2 = 0.448
MT: R2 = 0.468

LR, ST, MT, GC, GNN, and FFN refer to Linear Regression, Single- and Multi-Task, Graph Convolution, Graph, and Feedforward Neural Network, respectively. (1) Averaged performance on validation sets over 7 datasets. (2) Averaged performance on test sets over 19 datasets. (3) Performance on a test subset of the Tox21 dataset. (4) Performance on the HIV dataset. (5) Performance on the Tox21 dataset. (6) Averaged performance over 15 datasets. (7) Model performance on a test set.