Table 1.
Dataset | Yst01g | yst02g | yst03m | yst04r | yst05r | yst06g | yst08r | yst09g | Total | Average |
---|---|---|---|---|---|---|---|---|---|---|
Number of sequences | 8 | 4 | 8 | 6 | 3 | 7 | 11 | 16 | ||
Sequence length | 1000 | 500 | 500 | 1000 | 500 | 500 | 1000 | 1000 | ||
Number of known signals | 6 | 5 | 18 | 7 | 4 | 7 | 14 | 13 | 47 | |
Phase 1: solid words extracted | 65 | 255 | 286 | 162 | 337 | 214 | 88 | 40 | ||
Phase 2: clusters (80% similarity threshold) | 50 | 126 | 141 | 84 | 157 | 111 | 46 | 31 | ||
Phase 3A: ext. clusters (95% threshold for the MV function) | 50 | 123 | 137 | 81 | 154 | 106 | 44 | 29 | ||
Phase 3B: ext. clusters (80% threshold for the MV function) | 50 | 126 | 141 | 84 | 157 | 110 | 46 | 31 | ||
Phase 1: signals found in solid words | 2 | 5 | 14 | 6 | 4 | 7 | 10 | 12 | 38 | |
Phase 2: signals found in clusters | 2 | 5 | 14 | 6 | 4 | 7 | 10 | 12 | 38 | |
Phase 3A: signals found in ext. clusters | 0 | 5 | 10 | 7 | 3 | 5 | 7 | 6 | 30 | |
Phase 3B: signals found in ext. clusters | 1 | 5 | 16 | 7 | 4 | 7 | 9 | 11 | 40 | |
Phase 1 | 0.33 | 1.00 | 0.78 | 0.86 | 1.00 | 1.00 | 0.71 | 0.92 | 0.83 | |
Phase 2 Sensitivity | 0.33 | 1.00 | 0.78 | 0.86 | 1.00 | 1.00 | 0.71 | 0.92 | 0.83 | |
Phase 3A | 0.00 | 1.00 | 0.56 | 1.00 | 0.75 | 0.71 | 0.50 | 0.46 | 0.67 | |
Phase 3B | 0.17 | 1.00 | 0.89 | 1.00 | 1.00 | 1.00 | 0.64 | 0.85 | 0.84 | |
Phase 2 | ||||||||||
Maximum number of signals per cluster | 1 | 4 | 9 | 5 | 3 | 6 | 8 | 8 | 28 | |
Number of maximal clusters | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ||
Sensitivity | 0.17 | 0.80 | 0.50 | 0.71 | 0.75 | 0.86 | 0.57 | 0.62 | 0.63 | |
Phase 3A | ||||||||||
Maximum number of signals per cluster | 0 | 1 | 5 | 4 | 3 | 5 | 4 | 3 | 18 | |
Number of maximal clusters | - | 11 | 3 | 2 | 1 | 1 | 1 | 4 | ||
Sensitivity | 0.00 | 0.20 | 0.28 | 0.57 | 0.75 | 0.71 | 0.29 | 0.23 | 0.42 | |
Phase 3B | ||||||||||
Maximum number of signals per cluster | 1 | 3 | 8 | 3 | 3 | 6 | 4 | 7 | 24 | |
Number of maximal clusters | 1 | 1 | 1 | 3 | 1 | 1 | 2 | 1 | ||
Sensitivity | 0.17 | 0.60 | 0.44 | 0.43 | 0.75 | 0.86 | 0.29 | 0.54 | 0.54 |
Datasets are identified by the names originally used by Tompa et al. For each dataset, the total number of sequences included the length of promoter sequences, and the number of signals included is reported. Rows from four to seven describe results obtained by MOST different analysis steps and, for the third step, with different conditions. The following eight rows show the number and the proportion (sensitivity) of known signals per dataset represented in results of the previously described analysis steps. In the last part of the table, the number and the proportion (sensitivity) of known signals represented in the maximal cluster are shown, for each dataset.