Table 5.
PubChem descriptor model performance for split chemical space validation sets
| Algorithm | Measure | Aliphatic halogen | Aromatic nitro | Aziridine | Bay region PAH | Carboxylic acid | Epoxide | Aromatic amine (primary) | Aromatic amine (secondary) | Aromatic amine (tertiary) |
|---|---|---|---|---|---|---|---|---|---|---|
| SVM |
AUC |
0.83 |
0.78 |
--- |
0.73 |
0.96 |
0.85 |
0.82 |
0.91 |
0.77 |
|
BAC |
0.77 |
0.64 |
NaN |
0.58 |
0.91 |
0.77 |
0.75 |
0.82 |
0.73 |
|
|
SEN |
0.83 |
0.96 |
1.00 |
0.97 |
0.88 |
0.87 |
0.86 |
0.82 |
0.83 |
|
|
SPEC |
0.71 |
0.32 |
NaN |
0.20 |
0.94 |
0.67 |
0.64 |
0.82 |
0.64 |
|
| RF |
AUC |
0.8 |
0.8 |
--- |
0.72 |
0.94 |
0.87 |
0.84 |
0.95 |
0.82 |
|
BAC |
0.75 |
0.64 |
NaN |
0.70 |
0.9 |
0.73 |
0.74 |
0.85 |
0.68 |
|
|
SEN |
0.87 |
0.96 |
1.00 |
1.00 |
0.88 |
0.92 |
0.9 |
0.88 |
0.78 |
|
|
SPEC |
0.63 |
0.32 |
NaN |
0.40 |
0.92 |
0.54 |
0.58 |
0.82 |
0.57 |
|
| DT |
AUC |
0.66 |
0.58 |
--- |
0.65 |
0.87 |
0.56 |
0.71 |
0.82 |
0.57 |
|
BAC |
0.66 |
0.55 |
NaN |
0.50 |
0.82 |
0.54 |
0.63 |
0.82 |
0.70 |
|
|
SEN |
0.7 |
0.96 |
0.92 |
1.0 |
0.74 |
1.00 |
0.76 |
0.65 |
0.83 |
|
|
SPEC |
0.63 |
0.13 |
NaN |
0.00 |
0.9 |
0.08 |
0.50 |
1.00 |
0.57 |
|
| kNN |
AUC |
0.77 |
0.79 |
--- |
0.74 |
0.91 |
0.70 |
0.78 |
0.88 |
0.78 |
|
BAC |
0.73 |
0.64 |
NaN |
0.5 |
0.82 |
0.65 |
0.69 |
0.8 |
0.70 |
|
|
SEN |
0.69 |
0.96 |
1.00 |
1.00 |
0.79 |
0.71 |
0.85 |
0.88 |
0.83 |
|
| SPEC | 0.77 | 0.32 | NaN | 0.00 | 0.85 | 0.58 | 0.53 | 0.73 | 0.57 |
Where NaN = not a number result as all predictions were true positive, AUC = area under curve, BAC = balanced accuracy, SEN = sensitivity, SPEC = specificity.