Table 3.
Evaluation of contiguous motifs on Prosite data.
| PS entry | Motif | NumSeqs | DiffNGrams | Rel. Supp(%) | Supp Rank | ZScore | LogOdd | Pratt | IG | Info |
| PS00341 | IPCCPV | 9 | 702 | 77.8 | 9 | 21 | 65 | 166 | 13 | 217 |
| PS00415 | LRRRLSDS | 12 | 3582 | 91.6 | 9 | 503 | 1058 | 2103 | 11 | 1784 |
| PS00047 | GAKRH | 105 | 653 | 93.3 | 21 | 61 | 109 | 216 | 27 | 460 |
| PS00984 | CFWKYC | 19 | 1256 | 100 | 1 | 1 | 1 | 785 | 1 | 5 |
| PS00541 | SKRKYRK | 6 | 144 | 100 | 1 | 85 | 110 | 131 | 3 | 134 |
| PS00822 | PFDRHDW | 9 | 2251 | 100 | 1 | 1 | 5 | 204 | 1 | 400 |
| PS00419 | CDGPGRGGTC | 207 | 32936 | 100 | 1 | 1 | 1 | 3 | 1 | 158 |
| PS00349 | RKRKYFKKHEKR | 18 | 2929 | 100 | 1 | 38 | 86 | 2884 | 19 | 310 |
| PS00861 | GWTLNSAGYLLGP | 32 | 888 | 100 | 1 | 66 | 301 | 179 | 1 | 569 |
| PS01024 | EFDYLKSLEIEEKIN | 60 | 5527 | 100 | 1 | 620 | 2427 | 5266 | 1 | 5244 |
| PS00291 | AGAAAAGAVVGGLGGY | 136 | 2423 | 100 | 1 | 1033 | 1770 | 184 | 3 | 1984 |
| Rm | 0.2340 | 4.526E-3 | 1.854E-3 | 9.075E-4 | 0.1358 | 9.764E-4 | ||||
Ranking results of eleven Prosite datasets (identified by the Prosite (PS) entry column). For each dataset, the number of protein sequences, the number of different n-grams (Diff NGrams), where n is equal to the motif length and the relative support of the target motifs (Rel. Supp) are presented. Motifs are ranked with Information-theoretic based measures. Ranks obtained by support (Supp Rank) and information gain (Info) are also provided for comparison purposes. Last row gives the Rm values of each measure, where best results are obtained by support and IG.