Table 1.

Performance of different classifier models for predicting antibody polyreactivity

Model	Ab. set #1 (scFvs)	Ab. set #2 (scFvs)	Ab. set #3 (scFvs)	Ab. sets #4 and #6 (scFvs)	Ab. sets #4 and #5 (mAbs)
Random forest #1 (7 SB, 11 SB-SE features)	0.956	0.865	0.951	0.928	0.819
Random forest #2 (10 SB, 12 SB-SE features)	0.971	0.856	0.966	0.851	0.788
Random forest #3 (10 ESM-2, 12 SB-SE features)	0.964	0.792	0.971	0.797	0.797
Random forest #4 (10 SB features)	0.899	0.823	0.958	0.803	0.762
Random forest #5 (10 ESM-2 features)	0.866	0.858	0.887	0.746	0.620
Random forest #6 (12 SB-SE features)	0.941	0.695	0.935	0.708	0.758
Random forest #7 (31 SB features)	0.938	0.842	0.971	0.912	0.756
Random forest #8 (320 ESM-2 features)	0.927	0.757	0.924	0.912	0.692
Fv: net charge >+2.1	0.838	0.774	0.929	0.674	0.671
V_H: net charge >+2.0	0.829	0.750	0.892	0.804	0.707
Fv: WYRK >36	0.788	0.730	0.832	0.587	0.621
SVC: PSSM	0.863	0.699	0.842	0.575	0.616 (−)
SGD: OneHot	0.988	0.748	0.950	0.826	0.534
AIMS: SB features	0.951	0.810	0.956	0.690	0.580

The performance values of the best random forest model in this work for different antibody (Ab.) sets were compared to those for random forest models based on different feature sets, as well as a support vector classifier (SVC) model that uses position-specific scoring matrix (PSSM) features, a stochastic gradient descent (SGD) model that uses one-hot encoding (OneHot) features, and an automated immune molecule separator (AIMS) model with sequence-based (SB) features. In addition, the performance levels of three single molecular features, in the form of rules, were also evaluated for predicting antibody polyreactivity. The antibody sets that were evaluated are defined in the STAR Methods section. For antibody sets #1 and #2, the performance metrics are accuracies, while the performance metrics are AUC values for the other antibody sets.