Table 1.
Method | Dataset | F1-metal | F1-XNA | F1-small | F1-all |
---|---|---|---|---|---|
bindEmbed21DL | DevSet1014 | 24 ± 2% | 18 ± 3% | 26 ± 2% | 39 ± 2% |
bindEmbed21DL | TestSet300 | 22 ± 4% | 24 ± 6% | 33 ± 3% | 43 ± 2% |
bindEmbed21DL | TestSetNew46 | 26 ± 14% | 19 ± 11% | 29 ± 9% | 37 ± 6% |
Random | TestSet300 | 1 ± 1% | 6 ± 2% | 6 ± 1% | 9 ± 1% |
bindEmbed21DL | TestSet225 | n/a | n/a | n/a | 47 ± 2% |
bindPredictML17 | TestSet225 | n/a | n/a | n/a | 34 ± 2% |
bindEmbed21DL | TestSet300XNA66 | n/a | 31 ± 5% | n/a | n/a |
ProNA2020 | TestSet300XNA66 | n/a | 33 ± 7% | n/a | n/a |
bindEmbed21DL | TestSet300Zinc51 | 58 ± 8% | n/a | n/a | n/a |
PredZinc | TestSet300Zinc51 | 58 ± 10% | n/a | n/a | n/a |
ZincBindPredict | TestSet300Zinc51 | 17 ± 9% | n/a | n/a | n/a |
Measure: F1 (Eq. 3); ± : 95% confidence intervals (1.96 standard errors); Methods: bindEMbed21DL: method introduced here, bindPredictML175: MSA-based method predicting binding, ProNA202019: method specialized on predicting binding to DNA, RNA, and other proteins; PredZinc20 and ZincBindPredict29: methods specialized on predicting zinc-binding; Random: random prediction by randomly shuffling the original output probabilities of bindEmbed21DL; Data: DevSet1014: development set (validation) set with 1014 proteins, TestSet300: Test set created during method development with 300 proteins, TestSet225: subset of test set shared with bindPredictML17, TestSetNew46: 46 sequence-unique proteins added since development of this work began—all sequence-unique with respect to each other and all other proteins used, TestSet300XNA66: subset with DNA or RNA (dubbed XNA) binding proteins from our test set. TestSet300Zinc51: subset with zinc-binding proteins from our test set.