Table 2.
Classification schemea | CM or No CM | NB or No NB | CD or No CD | CMCD or No CMCD |
---|---|---|---|---|
Accuracy (%) | 87±4 | 77±4 | 73±5 | 82±3 |
Significance respect to random—P(N) | 1.1 × 10−39 | 1.2 × 10−24 | 2.4 × 10−16 | 6.4 × 10−26 |
Default class | CM | No NB | No CD | CMCD |
Majority class | CM | No NB | No CD | CMCD |
No. of cases | 533 | 496 | 513 | 634 |
Sensitivity (%) | 93±4 | 86±4 | 78±7 | 92±3 |
Specificity (%) | 83±3 | 78±2 | 78±3 | 85±2 |
Minority class | ||||
No. of cases | 308 | 345 | 328 | 207 |
Sensitivity (%) | 67±5 | 64±5 | 66±9 | 50±11 |
Specificity (%) | 85±8 | 77±4 | 66±5 | 71±5 |
CM, central metabolism path compound; NB, nonbiodegradable path compound; CD, carbon dioxide path compound; CMCD, central metabolism and carbon dioxide path compounds. Accuracy is the percentage of compounds correctly classified. Sensitivity is the percentage of compounds correctly classified as belonging to a specific class, relative to the total number of cases of that particular class. Specificity is the percentage of compounds correctly classified as belonging to a specific class, relative to the total number of predictions for that particular class. Accuracy, sensitivity and specificity are indicated in the Table as the average±s.d. for the five iterations in the cross-validation experiment. The statistical significance of the observed difference in the performance of the c4.5-based system compared with a random prediction is indicated by the P(N) of a sign test.