Skip to main content
. 2023 Feb 3;13(7):4623–4640. doi: 10.1039/d2ra07007c

Accuracy and Cohen's Kappa values for ML models developed using different ML learners combined with LRCFs and scoring function values as descriptors.

Learner Features selectora Descriptorsb Accuracy Cohen's Kappa
L20% outc Testingd L20% outc Testingd
Xgboost GFA Ligscore2, PLP2, PMF, PMF04, Cdocker energy, GLU 594 HG2, SER 613 HN, TRP 623 CZ2, VAL 637 HA, GLN 643 HG1, GLY 656C, TYR 657 HE1, LYS 658 HG1, MET 660 HB1, MET 660 HE2, MET 660 O, PRO 669 CB, HOH 32 OH2, HOH 60H2, HOH 107 OH2 0.744 0.734 0.414 0.404
GFA + SHAP PLP2, PMF, Cdocker energy, SER 613 HN 0.714 0.719 0.348 0.368
RF GFA PMF04, Cdocker energy, Cdocker interaction energy, SER 613 HG, GLN 633 HE22, PRO 639 CD, TYR 640 HE1, TYR 657 OH, LYS 658 CG, ILE 659 HG11, ALA 662 HA, HOH 32 OH2, HOH 37H2, HOH 70 OH2, HOH 107 OH2, HOH 107H2, HOH 170 OH2, HOH 255H1, HOH 255H2, HOH 269H2 0.749 0.735 0.404 0.392
GFA + SHAP Cdocker energy, Cdocker interaction energy 0.634 0.635 0.169 0.181
a

GFA: genetic function algorithm, SHAP: the SHapley Additive exPlanations.

b

Amino acid and water heavy atom contacts are coded according to the protein databank, while hydrogen atoms are coded according to Discovery Studio 4.5. LigScore2, PMF, PMF04, Cdocker energy, Cdocker interaction energy, represent scoring values.

c

L20% out: leave 20% out cross-validation for accuracy and Cohen's Kappa.

d

Testing: accuracy and Cohen's Kappa determined against the testing set (Table S1 under ESI).