Skip to main content
. 2021 Dec 13;11:23916. doi: 10.1038/s41598-021-03431-4

Table 1.

F1 score (harmonic mean of precision and recall).

Method Dataset F1-metal F1-XNA F1-small F1-all
bindEmbed21DL DevSet1014 24 ± 2% 18 ± 3% 26 ± 2% 39 ± 2%
bindEmbed21DL TestSet300 22 ± 4% 24 ± 6% 33 ± 3% 43 ± 2%
bindEmbed21DL TestSetNew46 26 ± 14% 19 ± 11% 29 ± 9% 37 ± 6%
Random TestSet300 1 ± 1% 6 ± 2% 6 ± 1% 9 ± 1%
bindEmbed21DL TestSet225 n/a n/a n/a 47 ± 2%
bindPredictML17 TestSet225 n/a n/a n/a 34 ± 2%
bindEmbed21DL TestSet300XNA66 n/a 31 ± 5% n/a n/a
ProNA2020 TestSet300XNA66 n/a 33 ± 7% n/a n/a
bindEmbed21DL TestSet300Zinc51 58 ± 8% n/a n/a n/a
PredZinc TestSet300Zinc51 58 ± 10% n/a n/a n/a
ZincBindPredict TestSet300Zinc51 17 ± 9% n/a n/a n/a

Measure: F1 (Eq. 3); ± : 95% confidence intervals (1.96 standard errors); Methods: bindEMbed21DL: method introduced here, bindPredictML175: MSA-based method predicting binding, ProNA202019: method specialized on predicting binding to DNA, RNA, and other proteins; PredZinc20 and ZincBindPredict29: methods specialized on predicting zinc-binding; Random: random prediction by randomly shuffling the original output probabilities of bindEmbed21DL; Data: DevSet1014: development set (validation) set with 1014 proteins, TestSet300: Test set created during method development with 300 proteins, TestSet225: subset of test set shared with bindPredictML17, TestSetNew46: 46 sequence-unique proteins added since development of this work began—all sequence-unique with respect to each other and all other proteins used, TestSet300XNA66: subset with DNA or RNA (dubbed XNA) binding proteins from our test set. TestSet300Zinc51: subset with zinc-binding proteins from our test set.