Table 1. Results of PCM using different combinations of ligand descriptors and four protein field descriptors (polar, lipophilic, unstable and stable water fields).
Ligand descriptors | Correlation (R2) | Predictability (Q2) | RMSEE a | RMSEPcv b | RMSEPtest c | R test 2 d |
Random forest models | ||||||
RDkit | 0.957 | 0.737 | 0.360 | 0.799 | 0.810 | 0.716 |
MOE | 0.961 | 0.703 | 0.360 | 0.857 | 0.840 | 0.695 |
4-PFP | 0.928 | 0.566 | 0.480 | 1.025 | 0.990 | 0.569 |
GRIND | 0.951 | 0.430 | 0.470 | 1.175 | 1.150 | 0.426 |
RDkit e | 0.585 | 0.429 | 1.060 | 1.188 | 1.110 | 0.492 |
Target only models f | 0.111 | 0.107 | 1.450 | 1.455 | 1.420 | 0.128 |
ID based models g | 0.835 | 0.298 | 0.660 | 1.338 | 1.340 | 0.276 |
Partial least squares regression models with cross-terms | ||||||
RDkit | 0.671 | 0.588 | 0.884 | 1.024 | 1.007 | 0.557 |
MOE | 0.504 | 0.433 | 1.085 | 1.194 | 1.129 | 0.439 |
4-PFP | 0.554 | 0.451 | 1.029 | 1.216 | 1.136 | 0.437 |
GRIND | 0.311 | 0.264 | 1.278 | 1.348 | 1.285 | 0.273 |
RDkit e | 0.349 | 0.300 | 1.243 | 1.295 | 1.226 | 0.338 |
Target only models f | 0.103 | 0.100 | 1.458 | 1.461 | 1.428 | 0.113 |
ID based models g | 0.000 | –0.001 | 45.282 | 45.307 | 43.439 | 0.000 |
RDkit (no cross-terms) | 0.397 | 0.365 | 1.196 | 1.233 | 1.182 | 0.386 |
RDKit (only cross-terms) h | 0.598 | 0.471 | 0.977 | 1.144 | 1.107 | 0.465 |
aRoot-mean-square error of estimation for observations in the training set.
bRoot-mean-square error of prediction resulting from 5-fold cross-validation.
cRoot-mean-square error of prediction calculated using the external test set.
dCorrelation between the observed and predicted values of the external test set.
eGlobal QSAR models.
fModels based on protein fields with exclusion of ligand descriptors.
gModels with CHEMBL ids of compounds and targets used as descriptors.
hModels based on cross-terms with exclusion of protein and ligand descriptors.