Table 2.
Ref. | Performance traditional ML | Performance deep-learning |
---|---|---|
(Koutsoukas et al., 2017) (1) | RF: MCC = 0.89 | DNN: MCC = 0.91 |
(Dahl et al., 2014) (2) | RF: AUC = 0.78 | MT NN: AUC = 0.82 |
(Lenselink et al., 2017) | SVM: MCC = 0.50, BEDROC = 0.88 | DNN_MC: MCC = 0.57, BEDROC = 0.92 |
RF: MCC = 0.56, BEDROC = 0.82 | ||
(Mayr et al., 2016) | SVM: AUC = 0.71 | ST: AUC = 0.72 |
MT: AUC = 0.75 | ||
(Feinberg et al., 2018) | RF: Pearson = 0.783 | GNN: Pearson = 0.822 |
(Segler and Waller, 2017b) | LR: Acc = 0.86 (reaction prediction) | NN: Acc = 0.92 (reaction prediction) |
LR: Acc = 0.64 (retrosynthesis) | NN: Acc = 0.78 (retrosynthesis) | |
(Wu et al., 2018) (3) | SVM: AUC = 0.822 | GC: AUC = 0.829 |
(Xiong et al., 2019) (4) | SVM: AUC = 0.792 | Attentive FP: AUC = 0.832 |
(Yang et al., 2019) (5) | RF: AUC = 0.619 | FFN: AUC = 0.788 |
(Ma et al., 2015) (6) | RF: R2 = 0.42 | DNN: R2 = 0.49 |
(Ramsundar et al., 2017) (7) | RF: R2 = 0.428 | ST: R2 = 0.448 |
MT: R2 = 0.468 |
LR, ST, MT, GC, GNN, and FFN refer to Linear Regression, Single- and Multi-Task, Graph Convolution, Graph, and Feedforward Neural Network, respectively. (1) Averaged performance on validation sets over 7 datasets. (2) Averaged performance on test sets over 19 datasets. (3) Performance on a test subset of the Tox21 dataset. (4) Performance on the HIV dataset. (5) Performance on the Tox21 dataset. (6) Averaged performance over 15 datasets. (7) Model performance on a test set.