Table 3:
Full regression and classification results of three characterization tasks on the four enzyme datasets. Baseline model here used Morgan bit vector encoding and two layers of fully connected MLPs.
| Tasks | Dataset | Phosphatase | Halogenase | Kinasea | Aminotransferase |
|---|---|---|---|---|---|
| #Seq. × #Subs. | 218 × 168 | 42 × 62 | 318 × 72 | 25 × 18 | |
| Simple task | R Conv+ECFP6 | 0.816 | 0.892 | 0.845 | 0.838 |
| R Baseline | 0.728 | 0.838 | 0.805 | 0.808 | |
| AU-PRC | 0.710 | 0.732 | 0.809 | 0.867 | |
| ROC-AUC | 0.901 | 0.937 | 0.905 | 0.905 | |
| Substrates task | R Conv+ECFP6 | 0.681 | 0.545 | 0.335 | 0.470 |
| R Baseline | 0.649 | 0.521 | 0.205 | 0.322 | |
| AU-PRC | 0.588 | 0.606 | 0.403 | 0.756 | |
| ROC-AUC | 0.858 | 0.931 | 0.730 | 0.697 | |
| Sequence task | R Conv+ECFP6 | 0.465 | 0.673 | 0.735 | 0.790 |
| R Baseline | 0.422 | 0.581 | 0.716 | 0.796 | |
| AU-PRC | 0.418 | 0.743 | 0.745 | 0.790 | |
| ROC-AUC | 0.695 | 0.909 | 0.889 | 0.842 |
Classification performed to Kinase dataset uses self-defined labels.