Skip to main content
. Author manuscript; available in PMC: 2022 Dec 17.
Published in final edited form as: Proc Mach Learn Res. 2022 Nov;165:78–87.

Table 3:

Full regression and classification results of three characterization tasks on the four enzyme datasets. Baseline model here used Morgan bit vector encoding and two layers of fully connected MLPs.

Tasks Dataset Phosphatase Halogenase Kinasea Aminotransferase
#Seq. × #Subs. 218 × 168 42 × 62 318 × 72 25 × 18
Simple task R Conv+ECFP6 0.816 0.892 0.845 0.838
R Baseline 0.728 0.838 0.805 0.808
AU-PRC 0.710 0.732 0.809 0.867
ROC-AUC 0.901 0.937 0.905 0.905
Substrates task R Conv+ECFP6 0.681 0.545 0.335 0.470
R Baseline 0.649 0.521 0.205 0.322
AU-PRC 0.588 0.606 0.403 0.756
ROC-AUC 0.858 0.931 0.730 0.697
Sequence task R Conv+ECFP6 0.465 0.673 0.735 0.790
R Baseline 0.422 0.581 0.716 0.796
AU-PRC 0.418 0.743 0.745 0.790
ROC-AUC 0.695 0.909 0.889 0.842
a

Classification performed to Kinase dataset uses self-defined labels.