Table 4.
Comparison of our models’ performance on the five data sets with 5% labeled data with other models’ performance. Our four models are AE-MBO (autoencoder with MBO), BT-MBO (transformer with MBO), ECFP-MBO (extended-connectivity fingerprints with MBO), and Consensus-MBO, a model generating the consensus from our top two scoring methods for a given data set and percent of labeled data (more details in Section 2.6). The BT-GBDT, BT-RF, and BT-SVM models use the BT-FPs as features for gradient boosting decision trees, random forest, and support vector machine, respectively (note that these models are denoted by ‘SSLP-FP’ in [5]). AE-GBDT, AE-RF, and AE-SVM refer to the AE-FPs used as features for the specified machine learning method. Performance is given as average ROC-AUC score over 50 labeled sets with standard deviation, and Consensus-MBO performance is given as the average over 10 trials with standard deviation.
ROC-AUC Scores for 5% Labeled Data | |||||
---|---|---|---|---|---|
Model | Ames | Bace | BBBP | Beet | ClinTox |
BT-MBO (Proposed) | 0.716 ± .014 | 0.680 ± .027 | 0.785 ± .036 | 0.621 ± .070 | 0.774 ± .058 |
AE-MBO (Proposed) | 0.653 ± .012 | 0.646 ± .024 | 0.730 ± .040 | 0.578 ± .047 | 0.596 ± .032 |
ECFP-MBO (Proposed) | 0.710 ± .012 | 0.720 ± .026 | 0.721 ± .025 | 0.662 ± .073 | 0.563 ± .029 |
Consensus-MBO (Proposed) | 0.722 ± .013 | 0.702 ± .032 | 0.765 ± .034 | 0.666 ± .070 | 0.712 ± .055 |
BT-GBDT [5] | 0.717 ± .009 | 0.654 ± .029 | 0.696 ± .061 | 0.551 ± .054 | 0.524 ± .029 |
BT-RF [5] | 0.709 ± .014 | 0.642 ± .030 | 0.684 ± .057 | 0.555 ± .058 | 0.513 ± .025 |
BT-SVM [5] | 0.721 ± .011 | 0.679 ± .027 | 0.739 ± .052 | 0.566 ± .051 | 0.635 ± .084 |
AE-GBDT [17] | 0.666 ± .011 | 0.653 ± .030 | 0.665 ± .050 | 0.549 ± .042 | 0.506 ± .007 |
AE-RF [17] | 0.662 ± .013 | 0.663 ± .028 | 0.632 ± .063 | 0.551 ± .040 | 0.503 ± .006 |
AE-SVM [17] | 0.653 ± .010 | 0.644 ± .028 | 0.716 ± .044 | 0.535 ± .036 | 0.520 ± .023 |
ECFP2_512 [48] | 0.700 ± .010 | 0.706 ± .025 | 0.703 ± .029 | 0.596 ± .059 | 0.517 ± .018 |
ECFP2_1024 [48] | 0.703 ± .011 | 0.701 ± .031 | 0.701 ± .028 | 0.603 ± .057 | 0.513 ± .010 |
ECFP2_2048 [48] | 0.705 ± .009 | 0.700 ± .031 | 0.696 ± .033 | 0.609 ± .063 | 0.512 ± .012 |
ECFP4_512 [48] | 0.685 ± .015 | 0.706 ± .025 | 0.677 ± .036 | 0.575 ± .058 | 0.513 ± .011 |
ECFP4_1024 [48] | 0.691 ± .010 | 0.704 ± .032 | 0.686 ± .036 | 0.587 ± .050 | 0.515 ± .012 |
ECFP4_2048 [48] | 0.700 ± .013 | 0.712 ± .025 | 0.678 ± .032 | 0.602 ± .064 | 0.510 ± .010 |
ECFP6_512 [48] | 0.669 ± .012 | 0.694 ± .029 | 0.664 ± .032 | 0.580 ± .057 | 0.508 ± .009 |
ECFP6_1024 [48] | 0.677 ± .009 | 0.698 ± .025 | 0.672 ± .033 | 0.574 ± .055 | 0.511 ± .009 |
ECFP6_2048 [48] | 0.688 ± .012 | 0.710 ± .028 | 0.670 ± .034 | 0.572 ± .057 | 0.510 ± .009 |