Table 3.
Comparison of our models’ performance on the five data sets with 2% labeled data with other models’ performance. Our four models are AE-MBO (autoencoder with MBO), BT-MBO (transformer with MBO), ECFP-MBO (extended-connectivity fingerprints with MBO), and Consensus-MBO, a model generating the consensus from our top two scoring methods for a given data set and percent of labeled data (more details in Section 2.6). The BT-GBDT, BT-RF, and BT-SVM models use the BT-FPs as features for gradient boosting decision trees, random forest, and support vector machine, respectively (note that these models are denoted by ‘SSLP-FP’ in [5]). AE-GBDT, AE-RF, and AE-SVM refer to the AE-FPs used as features for the specified machine learning method. Performance is given as average ROC-AUC score over 50 labeled sets with standard deviation, and Consensus-MBO performance is given as the average over 10 trials with standard deviation.
ROC-AUC Scores for 2% Labeled Data | |||||
---|---|---|---|---|---|
Model | Ames | Bace | BBBP | Beet | ClinTox |
BT-MBO (Proposed) | 0.677 ± .021 | 0.618 ± .037 | 0.736 ± .051 | 0.576 ± .075 | 0.704 ± .113 |
AE-MBO (Proposed) | 0.619 ± .016 | 0.589 ± .029 | 0.685 ± .048 | 0.548 ± .067 | 0.561 ± .030 |
ECFP-MBO (Proposed) | 0.672 ± .021 | 0.670 ± .034 | 0.682 ± .028 | 0.614 ± .089 | 0.551 ± .026 |
Consensus-MBO (Proposed) | 0.683 ± .023 | 0.642 ± .043 | 0.712 ± .058 | 0.593 ± .090 | 0.656 ± .076 |
BT-GBDT [5] | 0.674 ± .023 | 0.600 ± .036 | 0.643 ± .075 | 0.521 ± .035 | 0.513 ± .024 |
BT-RF [5] | 0.666 ± .025 | 0.588 ± .034 | 0.619 ± .071 | 0.510 ± .020 | 0.504 ± .010 |
BT-SVM [5] | 0.680 ± .017 | 0.605 ± .036 | 0.663 ± .082 | 0.522 ± .040 | 0.569 ± .080 |
AE-GBDT [17] | 0.632 ± .018 | 0.588 ± .038 | 0.614 ± .068 | 0.529 ± .036 | 0.504 ± .010 |
AE-RF [17] | 0.631 ± .019 | 0.581 ± .034 | 0.596 ± .054 | 0.517 ± .022 | 0.502 ± .004 |
AE-SVM [17] | 0.627 ± .015 | 0.580 ± .035 | 0.625 ± .066 | 0.512 ± .028 | 0.508 ± .012 |
ECFP2_512 [48] | 0.658 ± .018 | 0.629 ± .039 | 0.643 ± .051 | 0.552 ± .064 | 0.512 ± .017 |
ECFP2_1024 [48] | 0.663 ± .018 | 0.635 ± .038 | 0.621 ± .048 | 0.565 ± .057 | 0.508 ± .010 |
ECFP2_2048 [48] | 0.666 ± .019 | 0.634 ± .044 | 0.641 ± .041 | 0.541 ± .058 | 0.509 ± .010 |
ECFP4_512 [48] | 0.646 ± .019 | 0.634 ± .042 | 0.609 ± .052 | 0.542 ± .052 | 0.505 ± .008 |
ECFP4_1024 [48] | 0.655 ± .020 | 0.638 ± .036 | 0.617 ± .045 | 0.538 ± .050 | 0.505 ± .006 |
ECFP4_2048 [48] | 0.652 ± .021 | 0.645 ± .040 | 0.619 ± .053 | 0.549 ± .057 | 0.506 ± .008 |
ECFP6_512 [48] | 0.635 ± .025 | 0.632 ± .040 | 0.604 ± .051 | 0.531 ± .056 | 0.506 ± .011 |
ECFP6_1024 [48] | 0.639 ± .020 | 0.632 ± .046 | 0.592 ± .048 | 0.542 ± .049 | 0.503 ± .006 |
ECFP6_2048 [48] | 0.650 ± .022 | 0.635 ± .046 | 0.584 ± .046 | 0.531 ± .047 | 0.505 ± .007 |