Table 5.
Year | Data size | Data set | Method | Descriptors | Model validation | Statistical results | Refs. | |
---|---|---|---|---|---|---|---|---|
Training | Test | |||||||
2007 | 123 | 80 | 43 | OPLS-DA | Descriptors from SELMA software package | Y-rand | GATE = 0.79 | Matsson et al. [16] |
2009 | 122 | 83 | 39 | PLS-DA | Descriptors from DragonX version 3.0 | Y-rand | aNA | Matsson et al. [115] |
2013 | 109 | 30 | 79 | Pharmacophore modeling | NA | NA | MCCTE = 0.29, GATE = 0.66 | Pan et al. [11] |
2013 | 203 | 124 | 79 | NB | ECFP_6, FCFP_6 fingerprints | LOO CV | AUCTR(LOO CV) = 0.795, MCCTE = 0.69 | Pan et al. [11] |
2013 | 382 | 382 | NA | SVM, k-NN, RF, and consensus modeling | Dragon, MOE descriptors | Fivefold CV, Y-rand | BATR(fivefold cv) = 0.83 ± 0.04 (Consensus) | Sedykh et al. [121] |
2014 | 275 | 96 | Test: 32, external set: 147 | ensembles of ANN, ensembles of SVM | Descriptors from ADMET Modeler | NA | GATE = 0.87, GAExternal = 0.67 (ensembles of ANN) | Eric et al. [122] |
2014 | 780 | 780 | NA | NB | ECFP_6 fingerprints | Tenfold CV | GATR(tenfold CV) = 0.919, AUCTR(tenfold cv) = 0.854 | Montanari et al. [20] |
2015 | 394 | 197 | Test: 99, external set: 98 | SVM, k-NN, ANN, and Consensus Modeling | Dragon descriptors | NA | GATE = 0.878, MCCTE = 0.73; GAExternal = 0.745, MCCExternal = 0.46 (ANN) | Belekar et al. [21] |
2016 | aNA | NA | NA | GTM-kNNd, GTM-Bayes, RF, SVM, and k-NN | MOE descriptors | Fivefold CV with five repetitions | NA | Gimadiev et al. [123] |
2017 | 978 | 978 | NA | NB, LR, SVM, and RF | MACCS, Morgan, ECFP8 fingerprints, VolSurf descriptors | Tenfold CV, leave-sources-out validation | MCCTR(tenfold CV) = 0.65, AUCTR(tenfold CV) = 0.90 (LR) | Montanari et al. [22] |
2019 | 2799 | 2240 | 559 | NB, LR, SVM, k-NN, XGBoost, SGB, DNN and consensus modeling | MOE descriptors and Pubchem fingerprints | Fivefold CV | MCCTE = 0.812, AUCTE = 0.958, GATE = 0.911, BATE = 0.905 (SVM) | This study |
Mean ± st.dev across fivefold CV
TR training set, TE test set, OPLS-DA orthogonal partial least-squares projection to latent structures discriminant analysis, NA not available, GA global accuracy, Y-Rand Y-Randomization test, PLS-DA partial least-squares projection to latent structures discriminant analysis, NB Naive Bayes, LOO CV leave-one-out cross-validation, AUC the area under the receiver operating characteristic curve, MCC Matthews correlation coefficient, SVM support vector machine, k-NN k-nearest neighbors, RF random forest, CV cross-validation, BA balanced accuracy, ANN artificial neural networks, GTM generative topographic mapping, LR logistic regression
There are many models developed based on different methods or descriptors, and we only extracted the best statistical results for the test set or cross-validation
aThe exact values are not available in the publication