Abstract
We develop an efficient multicore algorithm, PMS6MC, for the (l, d)-motif discovery problem in which we are to find all strings of length l that appear in every string of a given set of strings with at most d mismatches. PMS6MC is based on PMS6, which is currently the fastest single-core algorithm for motif discovery in large instances. The speedup, relative to PMS6, attained by our multicore algorithm ranges from a high of 6.62 for the (17,6) challenging instances to a low of 2.75 for the (13,4) challenging instances on an Intel 6-core system. We estimate that PMS6MC is 2 to 4 times faster than other parallel algorithms for motif search on large instances.
Keywords: Planted motif search, parallel string algorithms, multi-core algorithms
I. Introduction
Motifs are patterns found in Biological sequences. These common patterns in different sequences help in understanding gene functions and lead to the design of better drugs to combat diseases. Several versions of the motif search problem have been studied the literature. In this paper, we consider the version known as the Planted Motif Search (PMS), or (l, d) motif search problem. In PMS, we are given n input strings and two integers l and d and we are to find all the strings M of length l (also referred to as l-mers) that are substrings of every input sequence1 with at most d mismatches. The d-neighborhood of an l-mer s is defined to be the set of all the strings that differ from s in at most d positions. So, for an l-mer M to be motif for n input strings, there has to be a substring in each of those n input strings that is in the d-neighborhood of M.
The PMS problem is known to be NP-hard [13]. Consequently, PMS is often solved by approximation algorithms that do not guarantee to produce every motif present in the input. Exact algorithms for PMS, on the other hand, have exponential worst-case complexity but find every motif. MEME [3] and GibbsDNA [19] are two approximation algorithms which calculate the probability matrix of each character in different motif positions using statistical tools. CONSENSUS [15] first aligns the input sequences using statistical measures and then tries to extract the motifs. Randomized algorithms such as one by Buhler and Tompa [4] are also proposed for PMS. Local search strategies, e.g. by Price et al. [22] searches d-neighborhood of some set of l-mers from the input. MULTIPROFILER [17] and Profile-Branching [22] are two algorithms that also use local search. The WINNOWER algorithm [21], proposed by Pevzner maps PMS problem to that of finding large cliques in a graph and then applies common approximation techniques used in graph algorithms.
Although exact algorithms have worst-case exponential complexity, for many small instances of interest they are able to find all motifs within a reasonable amount of time using a modern computer. MITRA [12] is an exact algorithm for PMS that uses a modified trie called Mismatch trie to spell out the motifs one character at a time. SPELLER [26], SMILE [20], RISO [5], RISOTTO [23], CENSUS [14] all use some form of suffix tree or trie to direct motif discovery. Voting algorithms such as [6] use an indicator array of size equal to the number of all possible strings of length l. For each l-mer in the d-neighborhood of every l-mer from the input, the corresponding entry in the array is set. The entries that have been set by every input sequence or has “votes” from every input sequence are the motifs. Kauska and Pavlovic [18] designed an algorithm to output motif stems i.e. a superset of motifs using regular expression. The PMS series of algorithms (PMS1-PMS6, PMSP, and PMSPrune) solve PMS instances relatively fast using a reasonable amount of storage for data structures. PMS1, PMS2 and PMS3 [24] first sorts the d-neighborhood of input l-mers using radix sort and then intersects them to find the motifs. PMS4 [25] proposes a very general technique to reduce the run time of any exact algorithm by examining only k input sequences out of total n input sequences. PMSP [9] extends this idea further by only examining the d-neighborhood of the l-mers from the first input sequence. PMSPrune [9] improves upon PMSP by using dynamic programming branch-and-bound algorithm while exploring the d-neighborhoods. Pampa [10] uses wildcards to first determine the motif patterns and then does a exhaustive search within possible mappings of the pattern to find the motifs. PMS5 [11] improves other algorithms from PMS series by efficiently computing the intersection of the d-neighborhood of l-mers without generating the entire d-neighborhoods for all the l-mers. PMS6 [2], which is the fastest algorithm in the series, get its speedup by grouping l-mers whose d-neighborhood computation follows a similar process.
Since exact algorithms for motif search are compute intensive, it is natural to attempt parallelizations that reduce the observed run time. Dasari, Desh and Zubair [7] have proposed a multi-core motif search algorithm that is based on the voting approach. They followed this work with another parallel algorithm for Graphics Processing Units (GPUs) [8] based on examining the branches of a suffix tree in parallel.
In this paper, we develop a multi-core version of PMS6 by generating and processing many d-neighborhoods in parallel. In Section II we introduce some notations and definitions used throughout the paper and also describe the PMS6 algorithm in detail. The techniques used to develop PMS6MC are described in Section III. The performance of PMS6MC is compared to that of other parallel motif search algorithms and PMS6 in Section IV.
II. PMS6
A. Notations and Definitions
We use the same notations and definitions as in [11]. An l-mer is simply any string of length l. r is an l-mer of s iff (a) r is an l-mer and (b) r is a substring of s. The notation r εl s denotes an l-mer r of s. The Hamming distance, dH(s, t), between two equal length strings s and t is the number of places where they differ and the d-neighborhood, Bd(s), of a string s, is {x|dH(x, s) ≤ d}. Let N(l, d) = |Bd(s)|. It is easy to see that , where ∑ is the alphabet in use. We also define Bd(x, y, z) to be Bd(x) ∩ Bd(y) ∩ Bd(z). For a set of triples C, we define Bd(C) as ∪(x,y,z)εC(Bd(x, y, z)). We note that x is an (l, d) motif of a set S of strings if and only if (a) |x| = l and (b) every s ε S has an l-mer (called an instance of x) whose Hamming distance from x is at most d. The set of (l, d) motifs of S is denoted Ml,d(S).
B. Overview
PMS6, which is presently the fastest exact algorithm to compute Ml,d(S) for large (l, d), was proposed by Bandyopadhyay, Sahni and Rajasekaran [2]. This algorithm (Figure 1) first computes a superset, Q′, of the motifs of S. This superset is then pruned to Ml,d(S) by the function output Motifs, which examines the l-mers in Q′ one by one determining which are valid motifs. This determination is done in a brute force manner.
Figure 1.
PMS6 [2]
To compute Q′, PMS6 examines triples (x, y, z), where x is an l-mer of s1 and y and z are l-mers of s2kand s2k+1, respectively for some fixed k. These triples are first partitioned into equivalence classes based on the number of positions in the l-mers of a triple that are of each of 5 different types (see below). Next, we compute the Bdfor all triples by classes. This two step process is elaborated below.
Step 1: Form Equivalence Classes. Classify each position i of the triple (x, y, z), into one of the following five types [11]:
Type 1: x[i] = y[i] = z[i].
Type 2: x[i] = y[i] ≠ z[i].
Type 3: x[i] = z[i] ≠ y[i].
Type 4: x[i] ≠ y[i] = z[i].
Type 5: x[i] ≠ y[i], x[i] 6= z[i], y[i] 6= z[i].
The triples (x, y, z) of l-mers such that x εl s1, y εls2kand z εl s2k+1 are partitioned into classes C(n1, ⋯, n5) where njdenotes the type j positions in the triple (x, y, z) (for 1 ≤ j ≤ 5).
Step 2: Compute Bd for all triples by classes. For each class C(n1, ⋯, n5), the union, Bd(C), of Bd(x, y, z) for all triples in that class is computed. We note that the union of all Bd (C)s is the set of all motifs of x, s2k, and s2k+1.
III. PMS6MC
A. Overview
PMS6MC exploits the parallelism present in the PMS6 algorithm. First, there is outer-level parallelism where the motif search for many x’s from s1 can be carried out in parallel (i.e., several iterations of the outer for loop of Figure 1 are run in parallel). Second, there is inner-level parallelism where the individual steps of the inner for loop of Figure 1 can be done in parallel. Outer-level parallelism is limited by the amount of memory available. We have designed PMS6MC to be flexible in terms of its memory and thread requirements. The total number of threads can be set depending on the number of cores and available memory of the system. The threads are grouped into thread blocks. Each thread block operates on a different x from s1. The threads assigned to a thread block cooperate to find the motifs corresponding to a particular x. The threads use the syncthreads() primitive function to synchronize. This function can be implemented using the thread library synchronization mechanism available under different operating systems. We denote thread block i as T [i] while the threads within thread block i are denoted by T [i][j].
B. Outer-level parallelism
In this, each thread block processes a different x from s1 and calls the function find Motif At This X() (Figure 2). Once a thread block is done with its assigned x it moves on to the next x from s1 which is not processed yet. Threads in a thread block execute the function find Motif At This X() to find if there is any motif in the d-neighborhood of x.
Figure 2.
PMS6MC outer level loop
C. Inner-level parallelization
Finding motifs in the d-neighborhood of a particular x from s1 is done by finding the motifs of x and the strings s2kand s2k+1 for . As described in Figure 1, this is a 4 step process. These steps are done cooperatively by all threads in a thread block. First, we find the equivalence classes for x and l-mers from s2kand s2+1k. For any triple (x, y, z) from an equivalence class we know the number of l-mers w which are at a distance d from x, y and z from pre-computed tables. Hence, by multiplying the number of triples with the number of possible w’s we determine the total number of w’s for each equivalence class. We denote this number by |Bd(C)|. Next, we compute Bd(C) for these equivalence classes in decreasing order of |Bd(C)| in parallel by the threads in the thread block. This order helps in load balancing between different threads as each will be computing |Bd(C)| in parallel. This is akin to using the LPT scheduling rule to minimize finish time. Then we need to store Bd(C)s in Q if k = 1; i.e., when finding motifs between x, s2 and s3. This can be done during the previous step while computing Bd(C). For k ≥ 2, we need to intersect the set of all Bd(C)s with Q. When the size of Q falls below a certain threshold, we need to execute the function output Motifs to find out which l-mers in Q are valid motifs. The different steps for find Motif At This X() are given in Figure 3. Reader is referred to [1] for a detailed description of each step along with the pseudocode.
Figure 3.
Finding motifs in parallel
IV. Experimental results
We evaluated the performance of PMS6MC on the challenging instances described in [11]. For each (l, d) that characterizes a challenging instance, we generated 20 random strings of length 600 each. Next, a random motif of length l was generated and planted at random positions in each of the 20 strings. The planted motif was then randomly mutated in exactly d randomly chosen positions. For each (l, d) value up to (19,7), we generated 20 instances and for larger (l, d) values, we generated 5 instances. The average run times for each (l, d) value are reported in this section. Since the variation in run times across instances was rather small, we do not report the standard deviation. Even though we test our algorithm using only synthetic data sets, several authors (e.g., [11]) have shown that PMS codes that work well on the kind of synthetic data used by us also work well on real data.
A. PMS6MC implementation
PMS6MC is implemented using the pthreads (POSIX Threads) library under Linux on an Intel 6-core system with each core running at 3.3GHz. We experimented with different degrees of outer-level (number of thread blocks) and inner-level (number of threads in a thread block) parallelism for different challenging instances. For smaller instances (e.g. (13,4) and (15,5)), the performance is limited by the memory bandwidth of the system. Hence, increasing the degree of inner or outer level parallelism does not have much effect on the run time as most of the threads stall for memory access. For larger instances, the number of thread blocks is limited by the available memory of the system. Figure 4 gives the number of thread blocks and the number of threads in a thread block for different challenging instances which produces the optimum performance.
Figure 4.
Degree of inner and outer level parallelism for PMS6MC
B. PMS6 and PMS6MC
We compare the run times of PMS6 and PMS6MC on an Intel 6-core system with each core running at 3.3GHz. PMS6 takes 22 seconds on an average to solve (15,5) instances and 19 hours on an average to solve (23,9) instances. PMS6MC, on the other hand, takes 8 seconds on an average to solve (15,5) instances and 3.5 hours on an average to solve (23,9) instances. The speedup achieved by PMS6MC over PMS6 varies from a low of 2.75 for (13,4) instances to a high of 6.62 for (21,8) instances. For (19,7) and larger instances PMS6MC achieves a speedup of over 5. The run times for various challenging instances are given in Figure 5.
Figure 5.

Run times for PMS6 and PMS6MC
C. PMS6MC and other parallel algorithms
We estimate that PMS6MC can solve (19,7) instances 3.6 times faster than mSPELLER-16 using the 16-core CPU of [8] and about 2 times faster than gSPELLER-4 using 4 GPU devices. The detailed derivation of these estimates is given in [1]
V. Conclusion
We have developed a multicore version of PMS6 that achieves a speedup that ranges from a low of 2.75 for (13,4) challenging instances to a high of 6.62 for (17,6) challenging instances on a 6-core CPU. Our multicore algorithm is able to solve (23,9) challenging instances in 3.5 hours while the single core PMS6 algorithm takes 19 hours. We estimate that our multicore algorithm is faster than other parallel algorithms for the motif search problem on large challenging instances. For example, we estimate that PMS6MC can solve (19,7) instances 3.6 times faster than mSPELLER-16 using the 16-core CPU of [8] and about 2 times faster than gSPELLER-4 using 4 GPU devices.
Acknowledgments
This research was supported, in part, by the National Science Foundation under grant CNS-0963812, CNS-1115184 and NETS 0963812 and also by the National Institutes of Health under grant R01-LM010101.
Footnotes
We use the terms sequence and string interchangeably in this paper.
Contributor Information
Shibdas Bandyopadhyay, VMware Inc, Palo Alto,CA 94304, sbandyopadhyay@vmware.com.
Sartaj Sahni, Department of CISE, University of Florida, Gainesville, FL 32611, sahni@cise.ufl.edu.
Sanguthevar Rajasekaran, Department of CSE, University of Connecticut, Storrs, CT 06269, USA, rajasek@engr.uconn.edu.
REFERENCES
- 1.Bandyopadhyay S, Sahni S, Rajasekaran S. www.cise.ufl.edu/˜sahni/papers/pms6mc.pdf.
- 2.Bandyopadhyay S, Sahni S, Rajasekaran S. PMS6: A Fast Algorithm for Motif Discovery. ICCABS. 2012 doi: 10.1109/ICCABS.2012.6182627. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Bailey Timothy L, Williams N, Misleh C, Wilfred W Li. MEME:discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Research. 2006;34:369–373. doi: 10.1093/nar/gkl198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Buhler J, Tompa M. Finding motifs using random projections. RECOMB. 2001 doi: 10.1089/10665270252935430. [DOI] [PubMed] [Google Scholar]
- 5.Carvalho A, Freitas A, Oliveira A, Sagot M. A highly scalable algorithm for the extraction of cis-regulatory regions. APBC. 2005 [Google Scholar]
- 6.Chin FYL, Leung HCM. Voting Algorithms for Discovering Long Motifs. APBC. 2005;261:271. [Google Scholar]
- 7.Dasai NS, Desh R, Zubair M. An Efficient Multicore Implementation of Planted Motif Problem. HPCS. 2010 [Google Scholar]
- 8.Dasai NS, Desh R, Zubair M. High Performance Implementation of Planted Motif Problem using Suffix trees. HPCS. 2011 [Google Scholar]
- 9.Davila J, Balla S, Rajasekaran S. Fast and practical algorithms for planted (l, d) motif search. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2007 doi: 10.1109/TCBB.2007.70241. [DOI] [PubMed] [Google Scholar]
- 10.Davila J, Balla S, Rajasekaran S. Tech Report. University of Connecticut; 2007. Pampa: An Improved Branch and Bound Algorithm for Planted (l, d) Motif Search. [Google Scholar]
- 11.Dinh H, Rajasekaran S, Kundeti V. PMS5: An efficient exact algorithm for the (l,d) motif finding problem. Bioinformatics. 2011;12:410. doi: 10.1186/1471-2105-12-410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Eskin E, Pevzner P. Finding composite regulatory patterns in DNA sequences. Bioinformatics. 2002:354–363. doi: 10.1093/bioinformatics/18.suppl_1.s354. [DOI] [PubMed] [Google Scholar]
- 13.Evans PA, Smith A, Todd Warenham H. On the complexity of finding common approximate substrings. Theoretical Computer Science. 2003;306:407–430. [Google Scholar]
- 14.Evans PA, Smith A. Toward Optimal Motif Enumeration. WADS. 2003:47–58. [Google Scholar]
- 15.Hertz G, Stormo G. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics. 1999;15:563–577. doi: 10.1093/bioinformatics/15.7.563. [DOI] [PubMed] [Google Scholar]
- 16.Horowitz E, Sahni S, Mehta D. Fundamentals of Data Structures in C++ 2ed. Silicon Press; 2006. [Google Scholar]
- 17.Keich U, Pevzner P. Finding motifs in the twilight zone. Bioinformatics. 2002;18:1374–1381. doi: 10.1093/bioinformatics/18.10.1374. [DOI] [PubMed] [Google Scholar]
- 18.Kuksa PP, Pavlovic V. Efficient motif finding algorithms for large-alphabet inputs. Bioinformatics. 2010;11(Suppl 8):S1. doi: 10.1186/1471-2105-11-S8-S1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science. 2003;262:208–214. doi: 10.1126/science.8211139. [DOI] [PubMed] [Google Scholar]
- 20.Marsan L, Sagot MF. Extracting structured motifs using a suffix tree. - Algorithms and application to promoter consensus identification. RECOMB. 2000 doi: 10.1089/106652700750050826. [DOI] [PubMed] [Google Scholar]
- 21.Pevzner P, Sze S-H. Combinatorial approaches to finding subtle signals in DNA sequences. ISMB. 2000;8:269–278. [PubMed] [Google Scholar]
- 22.Price A, Ramabhadran S, Pevzner P. Finding subtle motifs by branching from the sample strings. ECCB. 2003 doi: 10.1093/bioinformatics/btg1072. [DOI] [PubMed] [Google Scholar]
- 23.Pisanti N, Carvalho AM, Marsan L, Sagot MF. RISOTTO: Fast extraction of motifs with mismatches. LATIN. 2006;3887:757–768. [Google Scholar]
- 24.Rajasekaran S, Balla S, Huang CH. Exact algorithms for planted motif challenge problems. Journal of Computational Biology. 2005;12(8):1117–1128. doi: 10.1089/cmb.2005.12.1117. [DOI] [PubMed] [Google Scholar]
- 25.Rajasekaran S, Dinh H. A speedup technique for (l, d) motif nding algorithms. BMC Research Notes. 2011;4:54, 1–7. doi: 10.1186/1756-0500-4-54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Sagot MF. Spelling approximate repeated or common motifs using a suffix tree. LATIN. 1998:111–127. [Google Scholar]




