Skip to main content
. 2006 May 2;7:236. doi: 10.1186/1471-2105-7-236

Table 4.

Accuracy estimates (100% – error rate) using different parameters for TFBS Identification, based on twenty repetitions, each utilizing ten-fold cross validation for a total of 200 runs

Promoter Range 1 kb upstream 1 kb upstream 5 kb upstream 5 kb upstream
PWM All Proflies Limited Profiles All Proflies Limited Profiles

Classifier Expression Lower Upper Feature Selection Accuracy SD Accuracy SD Accuracy SD Accuracy SD

IB1 Threshold 0.2 0.8 InfoGain 91.56% 235% 81.65% 4.22% 93.06% 1.80% 93.27% 2.35%
IB1 Threshold 0.33 0.66 InfoGain 91.89% 2.95% 90.72% 2.90% 95.57% 2.04% 93.62% 1.78%
IB1 Threshold 0.2 0.8 ChiSquared 89.96% 2.74% 81.00% 4.04% 93.92% 1.75% 92.63% 2.22%
IB1 Threshold 0.33 0.66 ChiSquared 91.10% 2.90% 90.67% 2.79% 94.07% 2.43% 93.43% 2.31%
IB1 Tanh 0.25 0.75 InfoGain 92.71% 2.43% 92.74% 2.30% 92.13% 2.47% 92.00% 3.01%
Naive Bayes Threshold 0.2 0.8 InfoGain 90.47% 2.85% 8235% 3.78% 96.04% 1.34% 94.98% 1.41%
Naive Bayes Threshold 0.2 0.8 InfoGain 91.67% 2.53% 83.18% 3.11% 94.39% 1.73% 93.78% 2.00%

Table 4 shows the effects of variations in the parameters for connectivity network construction. The genomic region searched for transcription factor binding sites was either 1000 bp or 5000 bp upstream of known genes. Two different collections of Position weighted matrices (PWM) were also applied: 1) all the matrices provided by TRANSFAC relevant to mammalian genes (All Profiles), or 2) the selection of PWMs identified by TRANSFAC as 'high quality' (Limited Profiles).