Discovering gapped binding sites of yeast transcription factors

Chen et al. 10.1073/pnas.0712188105.

Supporting Information

Files in this Data Supplement:

SI Figure 4
SI Table 4
SI Table 5
SI Table 6
SI Table 7
SI Table 8




SI Figure 4

Fig. 4. Two examples of gapped motifs.

Table 4 shows how the parameters "Upper bound of the Hit/Seq ratio of a block" affect the performance of the method. Our method relies on the second mining step to discover gapped motifs by concatenating short compact motifs (length of 3 or 4 bp). Before proceeding to the second step, the derived compact blocks are filtered out if the Hit/Seq ratio is larger than an upper bound. A large Hit/Seq ratio implies that the compact blocks are frequently repeated in a single promoter region. Considering such short repeats in the step of growing gapped motifs would result in a large number of pseudogapped motifs, which slows down the mining process and complicates the ranking procedure. However, an improper setting of this parameter might accidentally filter out true gapped motifs. Several trial runs on the validation set suggest that setting this parameter to 15 achieves reasonable computational time without sacrificing the quality of derived motifs.