Table 1. Sequence-based prediction performance of DBP prediction under various redundancy conditions and DBP definition.
| DBS only | AA composition only | AA + DBS | All features | |
|---|---|---|---|---|
| NR90-FT | 63.50 | 86.00 | 86.20 | 86.40 |
| NR70-FT | 63.00 | 85.60 | 84.90 | 86.20 |
| NR50-FT | 61.30 | 84.40 | 84.90 | 85.30 |
| NR25-FT | 63.00 | 82.80 | 83.30 | 83.60 |
| NR90-GO | 64.60 | 85.60 | 85.50 | 85.60 |
| NR70-GO | 65.00 | 85.40 | 85.60 | 85.40 |
| NR50-GO | 64.70 | 82.60 | 82.90 | 83.10 |
| NR25-GO | 62.30 | 79.90 | 80.00 | 80.20 |
| NR90-Pfam | 61.90 | 85.40 | 85.30 | 86.10 |
| NR70-Pfam | 61.10 | 85.00 | 85.40 | 85.50 |
| NR50-Pfam | 61.60 | 83.20 | 83.80 | 83.90 |
| NR25-Pfam | 62.70 | 81.40 | 81.80 | 82.20 |
| NR90-PDB | 56.60 | 79.70 | 79.80 | 80.60 |
| NR70-PDB | 53.40 | 80.30 | 79.50 | 81.50 |
| NR50-PDB | 55.60 | 77.70 | 76.70 | 77.30 |
| NR25-PDB | 55.60 | 75.90 | 75.90 | 75.20 |
| Mean | 60.99 | 82.56 | 82.59 | 83.01 |
| P-values (t-test)AA + DBS versus all features | 0.0058 | |||
Feature set based on amino acid composition is the best of the three in most cases. However, adding DBS features to the model improves its performance in almost all the prediction models with a statistical significance in the improvement being observed by a P-value of 0.0058 by t-test. Abbreviations: FT: DBP definition taken from Uniprot Sequence features, NRxx: Data are non-redundant at xx% sequence identity threshold, GO: DBP definition taken from GO, DBS: DNA-binding site predictions, AA composition: amino acid composition of the full length protein.