Table 3.
Statistical parameters of QSAR models obtained before and after curation.
| ID | Name | R2 | Q2 | R2EF | Sws | Scv | SEF | R2EVS | R2EVS(NM) |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Rat | 0.96 | 0.84-0.93 | 0.89-0.92 | 0.11-0.13 | 0.16-0.24 | 0.20-0.26 | – | – |
| 2 | Rat(NM) | 0.91-0.97 | 0.89-0.95 | 0.45-0.88 | 0.10-0.18 | 0.14-0.28 | 0.28-0.58 | – | – |
| 3 | TP | 0.83 | – | 0.76 | 0.33 | – | 0.38 | 0.54 | −0.58 |
| 4 | TP(NM) | 0.85 | – | 0.54 | 0.31 | – | 0.54 | 0.49 | 0.44 |
| 5 | DILI non-curated | No modeling was possible | |||||||
| 6 | DILI50 | Modeling Set 5-fold external CV Accuracy = 62-68% External sets Accuracy = 56-73% |
|||||||
| 7* |
62Ames non-curated |
SensitivityRF=83%; SensitivitySVM=87%; SpecificityRF=SpecificitySVM=75% AUCGp=88%; AUCSVM=89%; AUCRF=83% |
|||||||
| 8* |
63Ames curated |
SensitivityRF=SensitivitySVM=79%; SpecificityRF=SpecificitySVM=81% AUCGP=86%; AUCSVM=84%; AUCRF=83% |
|||||||
Where:
TP – Tetrahymena pyriformis dataset, (NM) – modeling set with various representations of nitro groups
R2 - determination coefficient, Q2 - cross validation determination coefficient
R2EF- determination coefficient for external folds extracted from the modeling set
Sws - standard error of a prediction for work set
Scv - standard error of prediction for work set in cross validation terms
Sts - standard error of a prediction for external folds extracted from the modeling set
A - number of PLS latent variables, D - number of descriptors, M - number of molecules in the work set
R2EVS - determination coefficient for external validation set
R2EVS(NM) - determination coefficient for external validation set with shuffled nitro groups
AUC – Area Under Curve statistical parameter
RF – Random Forest
SVM – Supporting Vector Machine
GP – Gaussian Processes
Prediction performances are reported for external validation set.