Skip to main content
. 2017 Mar 15;8(5):1037–1045. doi: 10.1039/c6md00701e

Table 1. Results of PCM using different combinations of ligand descriptors and four protein field descriptors (polar, lipophilic, unstable and stable water fields).

Ligand descriptors Correlation (R2) Predictability (Q2) RMSEE a RMSEPcv b RMSEPtest c R test 2 d
Random forest models
RDkit 0.957 0.737 0.360 0.799 0.810 0.716
MOE 0.961 0.703 0.360 0.857 0.840 0.695
4-PFP 0.928 0.566 0.480 1.025 0.990 0.569
GRIND 0.951 0.430 0.470 1.175 1.150 0.426
RDkit e 0.585 0.429 1.060 1.188 1.110 0.492
Target only models f 0.111 0.107 1.450 1.455 1.420 0.128
ID based models g 0.835 0.298 0.660 1.338 1.340 0.276
Partial least squares regression models with cross-terms
RDkit 0.671 0.588 0.884 1.024 1.007 0.557
MOE 0.504 0.433 1.085 1.194 1.129 0.439
4-PFP 0.554 0.451 1.029 1.216 1.136 0.437
GRIND 0.311 0.264 1.278 1.348 1.285 0.273
RDkit e 0.349 0.300 1.243 1.295 1.226 0.338
Target only models f 0.103 0.100 1.458 1.461 1.428 0.113
ID based models g 0.000 –0.001 45.282 45.307 43.439 0.000
RDkit (no cross-terms) 0.397 0.365 1.196 1.233 1.182 0.386
RDKit (only cross-terms) h 0.598 0.471 0.977 1.144 1.107 0.465

aRoot-mean-square error of estimation for observations in the training set.

bRoot-mean-square error of prediction resulting from 5-fold cross-validation.

cRoot-mean-square error of prediction calculated using the external test set.

dCorrelation between the observed and predicted values of the external test set.

eGlobal QSAR models.

fModels based on protein fields with exclusion of ligand descriptors.

gModels with CHEMBL ids of compounds and targets used as descriptors.

hModels based on cross-terms with exclusion of protein and ligand descriptors.