. 2019 Jan 8;20:2. doi: 10.1186/s40360-018-0282-6

Table 2.

Performance of various machine learning classifiers to predict toxicity. The following classifiers are tested

Dataset	Metric	Toxicity classifiers
Dataset	Metric	LDA	MLP	RF	ET
FDA-appr. / TOXNET	ACC	0.745	0.744	0.760	0.756
	TPR / FPR	0.723 / 0.232	0.679 / 0.180	0.733 / 0.218	0.719 / 0.186
	MCC	0.495	0.525	0.528	0.523
KEGG-Drug / T3DB	ACC	0.647	0.645	0.674	0.721
	TPR / FPR	0.671 / 0.362	0.675 / 0.365	0.688 / 0.331	0.631 / 0.248
	MCC	0.272	0.273	0.316	0.353
TCM	Tox-score	0.504 ± 0.013	0.537 ± 0.242	0.574 ± 0.143	0.552 ± 0.122
TCM	% toxic	63.9	61.8	68.5	59.7

Linear Discriminant Analysis (LDA), Multi-Layer Perceptron (MLP), Random Forest (RF), and Extra Trees (ET). Individual models are first trained and 5-fold cross-validated against FDA-approved and TOXNET datasets and then applied to KEGG-Drug and T3DB as an additional validation against independent datasets. The performance of toxicity classifiers on FDA-approved / TOXNET and KEGG-Drug / T3DB datasets is assessed with the accuracy (ACC, Eq. 1), true (TPR, Eq. 2) and false (FPR, Eq. 3) positive rates, and the Matthews correlation coefficient (MCC, Eq. 4). The best performance across all models in terms of the highest ACC and MCC values are highlighted in bold. Finally, the trained models are applied to estimate the toxicity of traditional Chinese medicines in the TCM dataset and the average ± standard deviation Tox-score values as well as the percentage of predicted toxic molecules are reported