Skip to main content
. 2019 Jan 8;20:2. doi: 10.1186/s40360-018-0282-6

Table 2.

Performance of various machine learning classifiers to predict toxicity. The following classifiers are tested

Dataset Metric Toxicity classifiers
LDA MLP RF ET
FDA-appr. /
TOXNET
ACC 0.745 0.744 0.760 0.756
TPR / FPR 0.723 / 0.232 0.679 / 0.180 0.733 / 0.218 0.719 / 0.186
MCC 0.495 0.525 0.528 0.523
KEGG-Drug /
T3DB
ACC 0.647 0.645 0.674 0.721
TPR / FPR 0.671 / 0.362 0.675 / 0.365 0.688 / 0.331 0.631 / 0.248
MCC 0.272 0.273 0.316 0.353
TCM Tox-score 0.504 ± 0.013 0.537 ± 0.242 0.574 ± 0.143 0.552 ± 0.122
% toxic 63.9 61.8 68.5 59.7

Linear Discriminant Analysis (LDA), Multi-Layer Perceptron (MLP), Random Forest (RF), and Extra Trees (ET). Individual models are first trained and 5-fold cross-validated against FDA-approved and TOXNET datasets and then applied to KEGG-Drug and T3DB as an additional validation against independent datasets. The performance of toxicity classifiers on FDA-approved / TOXNET and KEGG-Drug / T3DB datasets is assessed with the accuracy (ACC, Eq. 1), true (TPR, Eq. 2) and false (FPR, Eq. 3) positive rates, and the Matthews correlation coefficient (MCC, Eq. 4). The best performance across all models in terms of the highest ACC and MCC values are highlighted in bold. Finally, the trained models are applied to estimate the toxicity of traditional Chinese medicines in the TCM dataset and the average ± standard deviation Tox-score values as well as the percentage of predicted toxic molecules are reported