. 2017 Nov 25;46(1):54–70. doi: 10.1093/nar/gkx1166

Table 1. Sequence-based prediction performance of DBP prediction under various redundancy conditions and DBP definition.

	DBS only	AA composition only	AA + DBS	All features
NR90-FT	63.50	86.00	86.20	86.40
NR70-FT	63.00	85.60	84.90	86.20
NR50-FT	61.30	84.40	84.90	85.30
NR25-FT	63.00	82.80	83.30	83.60
NR90-GO	64.60	85.60	85.50	85.60
NR70-GO	65.00	85.40	85.60	85.40
NR50-GO	64.70	82.60	82.90	83.10
NR25-GO	62.30	79.90	80.00	80.20
NR90-Pfam	61.90	85.40	85.30	86.10
NR70-Pfam	61.10	85.00	85.40	85.50
NR50-Pfam	61.60	83.20	83.80	83.90
NR25-Pfam	62.70	81.40	81.80	82.20
NR90-PDB	56.60	79.70	79.80	80.60
NR70-PDB	53.40	80.30	79.50	81.50
NR50-PDB	55.60	77.70	76.70	77.30
NR25-PDB	55.60	75.90	75.90	75.20
Mean	60.99	82.56	82.59	83.01
P-values (t-test)AA + DBS versus all features				0.0058

Feature set based on amino acid composition is the best of the three in most cases. However, adding DBS features to the model improves its performance in almost all the prediction models with a statistical significance in the improvement being observed by a P-value of 0.0058 by t-test. Abbreviations: FT: DBP definition taken from Uniprot Sequence features, NRxx: Data are non-redundant at xx% sequence identity threshold, GO: DBP definition taken from GO, DBS: DNA-binding site predictions, AA composition: amino acid composition of the full length protein.