Table 1. AUC and accuracy of binding prediction for IVIG antibodies using the eight base classifiers that are a part of the ML-advanced machine learning approach*.
Attribute vector | AUC | Accuracy |
Frequencies of amino acids | 0.870 | 80.7% |
Difference between frequencies | 0.868 | 80.3% |
Frequencies of subsequences | 0.867 | 80.5% |
Physico-chemical properties | 0.873 | 81.2% |
Frequencies of amino acid classes | 0.866 | 80.5% |
Frequencies of subsequencesof classes | 0.865 | 80.6% |
Frequencies of pairs of amino acids | 0.873 | 81.2% |
Frequencies of amino acids at adistance from first position | 0.863 | 80.3% |
The base classifiers were cross-validated on the balanced training set (equal number of binding and non-binding peptides). Balanced data was chosen because the base classifiers were always trained on balanced data, the original data were only used in the final step of merging their results. The training set was chosen instead of the test set because comparing various methods on the test set can lead to selecting them based on those results, which defeats the purpose of an independent test set.