Skip to main content
. 2020 Mar 5;12:16. doi: 10.1186/s13321-020-00421-y

Table 5.

The reported classification models for BCRP inhibitors and non-inhibitors

Year Data size Data set Method Descriptors Model validation Statistical results Refs.
Training Test
2007 123 80 43 OPLS-DA Descriptors from SELMA software package Y-rand GATE = 0.79 Matsson et al. [16]
2009 122 83 39 PLS-DA Descriptors from DragonX version 3.0 Y-rand aNA Matsson et al. [115]
2013 109 30 79 Pharmacophore modeling NA NA MCCTE = 0.29, GATE = 0.66 Pan et al. [11]
2013 203 124 79 NB ECFP_6, FCFP_6 fingerprints LOO CV AUCTR(LOO CV) = 0.795, MCCTE = 0.69 Pan et al. [11]
2013 382 382 NA SVM, k-NN, RF, and consensus modeling Dragon, MOE descriptors Fivefold CV, Y-rand BATR(fivefold cv) = 0.83 ± 0.04 (Consensus) Sedykh et al. [121]
2014 275 96 Test: 32, external set: 147 ensembles of ANN, ensembles of SVM Descriptors from ADMET Modeler NA GATE = 0.87, GAExternal = 0.67 (ensembles of ANN) Eric et al. [122]
2014 780 780 NA NB ECFP_6 fingerprints Tenfold CV GATR(tenfold CV) = 0.919, AUCTR(tenfold cv) = 0.854 Montanari et al. [20]
2015 394 197 Test: 99, external set: 98 SVM, k-NN, ANN, and Consensus Modeling Dragon descriptors NA GATE = 0.878, MCCTE = 0.73; GAExternal = 0.745, MCCExternal = 0.46 (ANN) Belekar et al. [21]
2016 aNA NA NA GTM-kNNd, GTM-Bayes, RF, SVM, and k-NN MOE descriptors Fivefold CV with five repetitions NA Gimadiev et al. [123]
2017 978 978 NA NB, LR, SVM, and RF MACCS, Morgan, ECFP8 fingerprints, VolSurf descriptors Tenfold CV, leave-sources-out validation MCCTR(tenfold CV) = 0.65, AUCTR(tenfold CV) = 0.90 (LR) Montanari et al. [22]
2019 2799 2240 559 NB, LR, SVM, k-NN, XGBoost, SGB, DNN and consensus modeling MOE descriptors and Pubchem fingerprints Fivefold CV MCCTE = 0.812, AUCTE = 0.958, GATE = 0.911, BATE = 0.905 (SVM) This study

Mean ± st.dev across fivefold CV

TR training set, TE test set, OPLS-DA orthogonal partial least-squares projection to latent structures discriminant analysis, NA not available, GA global accuracy, Y-Rand Y-Randomization test, PLS-DA partial least-squares projection to latent structures discriminant analysis, NB Naive Bayes, LOO CV leave-one-out cross-validation, AUC the area under the receiver operating characteristic curve, MCC Matthews correlation coefficient, SVM support vector machine, k-NN k-nearest neighbors, RF random forest, CV cross-validation, BA balanced accuracy, ANN artificial neural networks, GTM generative topographic mapping, LR logistic regression

There are many models developed based on different methods or descriptors, and we only extracted the best statistical results for the test set or cross-validation

aThe exact values are not available in the publication