Skip to main content
. 2017 Nov 25;46(1):54–70. doi: 10.1093/nar/gkx1166

Table 1. Sequence-based prediction performance of DBP prediction under various redundancy conditions and DBP definition.

DBS only AA composition only AA + DBS All features
NR90-FT 63.50 86.00 86.20 86.40
NR70-FT 63.00 85.60 84.90 86.20
NR50-FT 61.30 84.40 84.90 85.30
NR25-FT 63.00 82.80 83.30 83.60
NR90-GO 64.60 85.60 85.50 85.60
NR70-GO 65.00 85.40 85.60 85.40
NR50-GO 64.70 82.60 82.90 83.10
NR25-GO 62.30 79.90 80.00 80.20
NR90-Pfam 61.90 85.40 85.30 86.10
NR70-Pfam 61.10 85.00 85.40 85.50
NR50-Pfam 61.60 83.20 83.80 83.90
NR25-Pfam 62.70 81.40 81.80 82.20
NR90-PDB 56.60 79.70 79.80 80.60
NR70-PDB 53.40 80.30 79.50 81.50
NR50-PDB 55.60 77.70 76.70 77.30
NR25-PDB 55.60 75.90 75.90 75.20
Mean 60.99 82.56 82.59 83.01
P-values (t-test)AA + DBS versus all features 0.0058

Feature set based on amino acid composition is the best of the three in most cases. However, adding DBS features to the model improves its performance in almost all the prediction models with a statistical significance in the improvement being observed by a P-value of 0.0058 by t-test. Abbreviations: FT: DBP definition taken from Uniprot Sequence features, NRxx: Data are non-redundant at xx% sequence identity threshold, GO: DBP definition taken from GO, DBS: DNA-binding site predictions, AA composition: amino acid composition of the full length protein.