Skip to main content
. 2012 Mar 13;40(13):5832–5847. doi: 10.1093/nar/gks206

Table 1.

DRIMUST – fixed-structure motifs algorithm

  • Input:
    • A ranked list of sequences Inline graphic
    • A range of motif lengths [k1,k2]
    • P-value threshold for reporting (τ)
  • Output:
    • A list of sequence motifs of lengths between k1 and k2 that are rank imbalanced in Inline graphic at an mHG significance level better than τ.
  • Preprocessing:
    • Construct a generalized suffix tree for Inline graphic such that:
      • All suffixes of all sequences Inline graphic are represented by paths from the root to leaves in the tree.
      • Each leaf contains information about the occurrences of the corresponding suffix w in Inline graphic. This information is represented as a list Inline graphic. The values mi(w) are the indices, amongst Inline graphic, of the sequences at which w occurs.
  • /* The construction is implemented using Ukkonen's algorithm (41) */

  • Algorithm:
    • for k = k1 to k2 do:
    • Traverse the tree to find paths of length k, and for each path P calculate P's enrichment using the following process:
      • Get the ordered list Inline graphic of indices (ranks) of sequences containing P, extracted from the leaves of the subtree rooted below P. /*P occurs in the union of the lists of all leaves of that subtree, as it is the prefix of all the suffixes represented by these leaves. For example, assuming P appears in S8,S14,S31 and S36, then C = {8,14,31,36} */
      • Calculate the mHG score for P:  Inline graphic /* Following the example above and assuming we have 100 sequences in the input: Inline graphic In this case attained at i = 4 where Inline graphic*/
      • Report P if Inline graphic holds. /*Inline graphic (20) */