Table 4.

Comparison of our models’ performance on the five data sets with 5% labeled data with other models’ performance. Our four models are AE-MBO (autoencoder with MBO), BT-MBO (transformer with MBO), ECFP-MBO (extended-connectivity fingerprints with MBO), and Consensus-MBO, a model generating the consensus from our top two scoring methods for a given data set and percent of labeled data (more details in Section 2.6). The BT-GBDT, BT-RF, and BT-SVM models use the BT-FPs as features for gradient boosting decision trees, random forest, and support vector machine, respectively (note that these models are denoted by ‘SSLP-FP’ in [5]). AE-GBDT, AE-RF, and AE-SVM refer to the AE-FPs used as features for the specified machine learning method. Performance is given as average ROC-AUC score over 50 labeled sets with standard deviation, and Consensus-MBO performance is given as the average over 10 trials with standard deviation.

	ROC-AUC Scores for 5% Labeled Data
Model	Ames	Bace	BBBP	Beet	ClinTox
BT-MBO (Proposed)	0.716 ± .014	0.680 ± .027	0.785 ± .036	0.621 ± .070	0.774 ± .058
AE-MBO (Proposed)	0.653 ± .012	0.646 ± .024	0.730 ± .040	0.578 ± .047	0.596 ± .032
ECFP-MBO (Proposed)	0.710 ± .012	0.720 ± .026	0.721 ± .025	0.662 ± .073	0.563 ± .029
Consensus-MBO (Proposed)	0.722 ± .013	0.702 ± .032	0.765 ± .034	0.666 ± .070	0.712 ± .055
BT-GBDT [5]	0.717 ± .009	0.654 ± .029	0.696 ± .061	0.551 ± .054	0.524 ± .029
BT-RF [5]	0.709 ± .014	0.642 ± .030	0.684 ± .057	0.555 ± .058	0.513 ± .025
BT-SVM [5]	0.721 ± .011	0.679 ± .027	0.739 ± .052	0.566 ± .051	0.635 ± .084
AE-GBDT [17]	0.666 ± .011	0.653 ± .030	0.665 ± .050	0.549 ± .042	0.506 ± .007
AE-RF [17]	0.662 ± .013	0.663 ± .028	0.632 ± .063	0.551 ± .040	0.503 ± .006
AE-SVM [17]	0.653 ± .010	0.644 ± .028	0.716 ± .044	0.535 ± .036	0.520 ± .023
ECFP2_512 [48]	0.700 ± .010	0.706 ± .025	0.703 ± .029	0.596 ± .059	0.517 ± .018
ECFP2_1024 [48]	0.703 ± .011	0.701 ± .031	0.701 ± .028	0.603 ± .057	0.513 ± .010
ECFP2_2048 [48]	0.705 ± .009	0.700 ± .031	0.696 ± .033	0.609 ± .063	0.512 ± .012
ECFP4_512 [48]	0.685 ± .015	0.706 ± .025	0.677 ± .036	0.575 ± .058	0.513 ± .011
ECFP4_1024 [48]	0.691 ± .010	0.704 ± .032	0.686 ± .036	0.587 ± .050	0.515 ± .012
ECFP4_2048 [48]	0.700 ± .013	0.712 ± .025	0.678 ± .032	0.602 ± .064	0.510 ± .010
ECFP6_512 [48]	0.669 ± .012	0.694 ± .029	0.664 ± .032	0.580 ± .057	0.508 ± .009
ECFP6_1024 [48]	0.677 ± .009	0.698 ± .025	0.672 ± .033	0.574 ± .055	0.511 ± .009
ECFP6_2048 [48]	0.688 ± .012	0.710 ± .028	0.670 ± .034	0.572 ± .057	0.510 ± .009