Skip to main content
. 2015 Mar 6;16(3):5194–5215. doi: 10.3390/ijms16035194

Table 1.

Commonly used data sets for DNA-binding site identification.

ID Ref. No. Notes
DB179 [33] 179 DNA-binding proteins, almost entirely nonredundant at 40% sequence identity
NB3797 [33] 3797 nonbinding proteins, significant redundancy at 35% sequence identity level (only 3482 independent clusters)
PD138 [27] 138 DNA-binding proteins, almost entirely nonredundant at 35% sequence identity, divided into seven structural classes
DISIS [3] 78 DNA-binding proteins, close to nonredundant at 20% sequence identity
PDNA62 [34] 62 DNA-binding proteins, 78 chains, 57 nonredundant sequences at 30% identity.
NB110 [34] 110 nonbinding proteins, nonredundant at 30% sequence identity level, derived from the RS126 secondary structure data set by removing entries related to DNA
BIND54 [35] Reported as 54 binding proteins, actually 58 chains, nonredundant at 30% sequence identity, original list of proteins was reported in [1]
NB250 [35] 250 nonbinding proteins, mostly nonredundant at a 35% sequence identity
DBP374 [18] 374 DNA-binding proteins, significant redundancy at a 25% sequence identity level
TS75 [18] 75 DNA-binding proteins, designed to be independent from DBP374 and PDNA62 but has some redundant entries in both at a 35% sequence identity level
PDNA-316 [29] 316 target proteins used in metaDBSite Web server, at 30% sequence identity
DNABindR171 [16] 171 proteins with mutual sequence identity ≤30% and each protein has at least 40 amino acid residues. All the structures have resolution better than 3.0 Å and an R factor less than 0.3