Skip to main content
. 2020 Jun 12;128(6):067010. doi: 10.1289/EHP6508

Table 1.

Performance of the C4.5 decision tree, logistic regression, and support vector machine models on the training set and test set.

Model Data set P/N n ACC SE SP AUC MCC
C4.5 Decision tree Training set 43/133 176 0.95 0.90 0.96 0.95 0.86
Test set 20/44 64 0.92 0.83 0.98 0.95 0.83
Logistic regression Training set 43/133 176 0.86 0.73 0.90 0.77 0.56
Test set 20/44 64 0.80 0.73 0.82 0.77 0.58
Support vector machine
(Polynomial kernel)
Training set 43/133 176 0.85 0.74 0.87 0.76 0.61
Test set 20/44 64 0.83 0.80 0.84 0.77 0.50

Note: Predictive accuracy was reflected by four indices: sensitivity (SE=TP/[TP + FN]), specificity (SP=TN/[TN + FP]), overall predictive accuracy (ACC=[TP +TN]/[TP + FP + TN + FN]), and Matthews correlation coefficient (MCC=TP×TN FP×FN(TP + FP)(TP + FN)(TN + FP)(TN + FN)). The area under the receiver operating characteristic curve (AUC) is a measure of how well a model distinguishes positive and negative data points; the >95% model AUC illustrated a high classification power. FN, false negatives; FP, false positives; n, number of data points in the data set; P/N, ratio of positive/negative data points; TN, true negatives; TP, true positives.