MSCAN |
Web server and source code: http://www.cisreg.ca/cgi-bin/mscan/MSCANMSCAN identifies binding site cluster with the significance of observed sites, while correcting for local compositional bias of sequence [13]. |
MCAST |
Source code: http://metameme.sdsc.edu/doc/mcast.htmlMCAST searches for statistically significant cluster of non-overlapping matches to the query motifs [14]. |
ClusterBuster |
Web server and source code: http://zlab.bu.edu/cluster-buster/ClusterBuster searches for regions that resemble a statistical model of a motif cluster more than a model of ‘background DNA’. The model of motif cluster is a uniform distribution randomly occurred motifs across the region, and the background model consists probabilities of independent, random nucleotides. It firstly performs one pass of the Forward algorithm to obtain the log likelihood score for each subsequence and keeps track of the subsequences with the maximal score. Secondly it performs the Backward algorithm for those tracked subsequences from their ends to their starts, to refine the optimal start point. At the end, it merges the tracked subsequences with a greedy algorithm [15]. |
Stubb |
Web server and source code: http://stubb.rockefeller.edu/Stubb parses a CRM as a collection of binding sites interspersed with random bases, while considering correlations between binding sites. It assumes that a probabilistic process of hidden Markov model generates sequences. At each step, the process chooses either a motif at random or the background motif. The transition probabilities of the motifs and the background, and the correlated transition probability between pairs of motifs, are trained by the Expectation-Maximization algorithm to iteratively converge to a locally optimal [16]. |
StubbMS |
Web server and source code: http://stubb.rockefeller.edu/The Stubb HMM framework is integrated with multiple species comparison by using sequence alignment as a first step. For two species, Lagan is used to find the best syntenic parse of ungapped conserved blocks. The binding site matches in the conserved blocks are evaluated using a HMM phylogenetic model. The unaligned sequences are computed as one single species and contribute independently to the final score of a homologous window [16]. |
MorphMS |
Source code: http://veda.cs.uiuc.edu/Morphalign/supplement/MorphMS implements a pair-HMM statistical alignment method to generate alignments between two species. Therefore the uncertainty of alignment can be quantified probabilistically. The parameters except window length are estimated automatically from the input sequences. For each window, it then uses HMM model to generate orthologous CRMs. MorphMS produces two log likelihood ratio (LLR) scores for each position of input sequence: the LLR1 score compares the likelihood of a sequences is generated by mixture of motifs to the likelihood of this sequence is generated by the background model; LLR2 score shows the likelihood of the two orthologous sequences are generated independently [18]. |
CisModule |
Source code: http://www.stat.ucla.edu/~zhou/CisModule/CisModule is a hierarchical mixture (HMx) model that describes CRMs in two levels: at the first level the sequences can be viewed as a mixture of CRMs interspersed by pure background sequences; at the second level, CRMs can be modelled as a mixture of motifs and within-module background. Bayesian inference is performed with Gipps sampling algorithm for the simultaneous detection of modules, TFBSs, and motif patterns, based on their joint posterior distribution [19]. |
MultiModule |
Source code: http://www.stat.ucla.edu/~zhou/MultiModule/index.htmlMultiModule uses a hidden Markov model to model the co-localization of TFBSs within each species then couples the locations of TFBSs and modules through multiple alignments. Different evolutionary models are developed to capture the difference between the conservation of the TFBSs and the background. A Markov Chain Monte Carlo algorithm is developed to sample CRMs and their TFBSs simultaneously by their joint posterior distribution [20]. |
CisPlusFinder |
Source code: http://jakob.genetik.uni-koeln.de/bioinformatik/people/nora/nora.htmlCisPlusFinder predicts CRMs by identifying high-density regions of perfect local ungapped sequences (PLUSs) based on multiple species conservation, with a second signal of locally overrepresented sequence motifs. The criterion of PLUSs to be selected is: the PLUSs contains at least one locally overrepresented core motif and there are additional PLUSs occur within the immediate neighbourhood [21]. |
EEL |
Web server and source code: http://www.cs.helsinki.fi/u/kpalin/EEL/EEL locates the highest-energy elements by considering both conservation and biochemical and physical model of TF binding. The parameters contribute to the EEL score include both the binding affinities of the TFs to their respective binding sites and the distances between the adjacent binding sites. The difference on this distance between the two species alignments are also counted, so are the differences in the angle of the TFs [24]. |
RP |
Source code: http://www.bx.psu.edu/projects/rp/RP identifies regulatory regions by statistically modelling frequencies of short alignment patterns in regulatory regions and background sequences. It describes two species alignments by five symbols: match involving A and T, match involving C and G, transition, transversion and gap. It classifies a set of k-mers of these symbols that are more overrepresented in regulatory regions than of neutral DNAs. The sequences are modelled by (k-1) Markov chain and the parameters are learnt from the experimentally confirmed regulatory regions and aligned ancestral interspersed repeats [23]. |
HexDiff |
Source code: http://www.ics.uci.edu/~bobc/hexdiff.htmlHexDiff learns a set of hexamers that are more frequent occurred in known CRMs than non-CRMs, and applies them to predict CRMs in regulatory systems [28]. |
PhylCRM |
Source code: http://the_brain.bwh.harvard.edu/PhylCRM/PhylCRM quantifies the clustering of the motifs identified by MONKEY in multiple alignments [29]. |
EMMA |
Source code: http://veda.cs.uiuc.edu/emma/EMMA captures different evolutionary modes of TFBSs, and takes uncertainty of alignments and gains of losses of TFBSs into account. It uses a statistical alignment method and the substitutions are estimated by the HKY model [85]. For the TFBS evolution, it uses the population-genetic based Halpern-Bruno (HB) model [86]. It models the functional gains and losses of binding sites by switching the models that governs the evolutions of TFBSs and non-TFBSs, similar to [30], [87]–[89]. |