Table 1.
Model | Ab. set #1 (scFvs) | Ab. set #2 (scFvs) | Ab. set #3 (scFvs) | Ab. sets #4 and #6 (scFvs) |
Ab. sets #4 and #5 (mAbs) |
---|---|---|---|---|---|
Random forest #1 (7 SB, 11 SB-SE features) | 0.956 | 0.865 | 0.951 | 0.928 | 0.819 |
Random forest #2 (10 SB, 12 SB-SE features) | 0.971 | 0.856 | 0.966 | 0.851 | 0.788 |
Random forest #3 (10 ESM-2, 12 SB-SE features) | 0.964 | 0.792 | 0.971 | 0.797 | 0.797 |
Random forest #4 (10 SB features) | 0.899 | 0.823 | 0.958 | 0.803 | 0.762 |
Random forest #5 (10 ESM-2 features) | 0.866 | 0.858 | 0.887 | 0.746 | 0.620 |
Random forest #6 (12 SB-SE features) | 0.941 | 0.695 | 0.935 | 0.708 | 0.758 |
Random forest #7 (31 SB features) | 0.938 | 0.842 | 0.971 | 0.912 | 0.756 |
Random forest #8 (320 ESM-2 features) | 0.927 | 0.757 | 0.924 | 0.912 | 0.692 |
Fv: net charge >+2.1 | 0.838 | 0.774 | 0.929 | 0.674 | 0.671 |
VH: net charge >+2.0 | 0.829 | 0.750 | 0.892 | 0.804 | 0.707 |
Fv: WYRK >36 | 0.788 | 0.730 | 0.832 | 0.587 | 0.621 |
SVC: PSSM | 0.863 | 0.699 | 0.842 | 0.575 | 0.616 (−) |
SGD: OneHot | 0.988 | 0.748 | 0.950 | 0.826 | 0.534 |
AIMS: SB features | 0.951 | 0.810 | 0.956 | 0.690 | 0.580 |
The performance values of the best random forest model in this work for different antibody (Ab.) sets were compared to those for random forest models based on different feature sets, as well as a support vector classifier (SVC) model that uses position-specific scoring matrix (PSSM) features, a stochastic gradient descent (SGD) model that uses one-hot encoding (OneHot) features, and an automated immune molecule separator (AIMS) model with sequence-based (SB) features. In addition, the performance levels of three single molecular features, in the form of rules, were also evaluated for predicting antibody polyreactivity. The antibody sets that were evaluated are defined in the STAR Methods section. For antibody sets #1 and #2, the performance metrics are accuracies, while the performance metrics are AUC values for the other antibody sets.