Skip to main content
. 2022 Apr 5;12(17):10686–10700. doi: 10.1039/d2ra00136e

Correlation coefficient values of best ML-QSAR regression models.

ML method Selected model descriptorsa,b,c r 2 d r L20% 2 e r LOO 2 f r PRESS 2 g
GFA-SVR Hypo(5-R2-08), Hypo(6-R2-07), Hypo(8-R3-08), LEU244HNLD, VAL324HBLD, ASP325HALD, CHI_2, Num_Rings6 0.91 0.65 0.66 0.76
GFA-RF Hypo(5-R2-08), Hypo(6-R2-03), Hypo(8-R3-08), Hypo(2-R5-05), VAL324HBLD, ASP325HALD, CHI_2 0.94 0.63 0.57 0.77
GFA-PNN Hypo(3-R6-08), Hypo(1-R6-02), LEU210HD12CD, Num_Rings5, Kappa_3 0.96 0.07 0.01 0.71
GFA-XGBoost Hypo(5-R2-07), Hypo(6-R2-08), Hypo(8-R2-04), Hypo(2-R5-05), Kappa_3, Dipole_Y 0.96 0.47 0.46 0.75
GFA-MLR log(1/IC50) = + 0.12 Hypo(5-R6-08) + 0.129 Hypo(1-R2-08) − 0.276 LYS191HZ2CD − 0.22 VAL324 HBLD + 0.433 Num_Rings5 − 0.002 PMI_x − 2.65 Shadow_XYfrac − 1.667 0.65 0.46 0.50 0.53
a

Hypo(5-R2-08) is the 8th pharmacophore model generated using training subset 5 (Table S2 under ESI) with the 2nd HYPOGEN run settings (Table S3 under ESI), Hypo(6-R2-07): is the7th pharmacophore model generated using training subset 6 (Table S2) with the 2nd HYPOGEN run settings (Table S3), Hypo(8-R3-08) is the 8th pharmacophore model generated using training subset 8 (Table S2) with the 3rd HYPOGEN run settings (Table S3), Hypo(6-R2-03) is the 3rd pharmacophore model generated using training subset 6 (Table S2) with the 2nd HYPOGEN run settings (Table S3), Hypo(2-R5-05) is the 5th pharmacophore model generated using training subset 2 (Table S2) with the 5th HYPOGEN run settings (Table S3), Hypo(3-R6-08) is the 8th pharmacophore model generated using training subset 3 (Table S2) with the 6th HYPOGEN run settings (Table S3), Hypo(1-R6-02) is the 2nd pharmacophore model generated using training subset 1 (Table S2) with the 6th HYPOGEN run settings (Table S3), Hypo(5-R2-07) is the 7th pharmacophore model generated using training subset 5 (Table S2) with the 2nd HYPOGEN run settings (Table S3), Hypo(6-R2-08) is the 8th pharmacophore model generated using training subset 6 (Table S2) with the 2nd HYPOGEN run settings (Table S3), Hypo(8-R2-04) is the 4th pharmacophore model generated using training subset 8 (Table S2) with the 2nd HYPOGEN run settings (Table S3), Hypo(5-R6-08) is the 8th pharmacophore model generated using training subset 5 (Table S2) with the 6th HYPOGEN run settings (Table S3), Hypo(1-R2-08) is the 8th pharmacophore model generated using training subset 1 (Table S2) with the 2nd HYPOGEN run settings (Table S3). Table 2 shows the X, Y, Z coordinates of pharmacophores Hypo(5-R2-08), Hypo(6-R2-07), and Hypo(8-R3-08).

b

LEU244HNLD is the hydrogen atom attached to peptidic N of Leu244 selected by LibDock score scoring function, VAL324HBLD is the hydrogen atom attached to beta carbon of Val324 selected by LibDock score scoring function, ASP325HALD is the hydrogen atom attached to alpha carbon of Asp325 selected by LibDock score scoring function. Fig. 3 shows the position of these three atoms within the binding pocket, LEU210HD12CD is the hydrogen atom attached to delta carbon of Leu210 selected by CDocker interaction energy scoring function, LYS191HZ2CD is one of the hydrogen atoms at the terminal amine on the side chain of Lys191 selected by CDocker interaction energy scoring function.

c

Num_Rings6: number of 6-membered rings. CHI_2: second order connectivity index, positively correlated with molecular size, Num_Rings5: number of 5-membered rings. Kappa_3: third order kappa shape index, related to molecular flexibility, Dipole_Y: 3D the calculated magnitude and the X-vector component of the molecular dipole moment in debyes as estimated from the partial atomic charges (calculated by Gasteiger method) and atomic coordinates. PMI_x: principle moment of inertia in the X-dimension, Shadow_XYfrac area of the molecular shadow in the XY plane.54,55

d

Resubstitution correlation coefficient: the model is trained on the training list and used to predict the bioactivities of the same training set.

e

Leave-20%-out correlation coefficient.

f

Leave-one-out correlation coefficient.

g

Predictive correlation coefficient on the external testing set.