Dempster-Shafer Theory for the Prediction of Auxin-Response Elements (AuxREs) in Plant Genomes

Nesrine Sghaier; Rayda Ben Ayed; Riadh Ben Marzoug; Ahmed Rebai

doi:10.1155/2018/3837060

. 2018 Nov 1;2018:3837060. doi: 10.1155/2018/3837060

Dempster-Shafer Theory for the Prediction of Auxin-Response Elements (AuxREs) in Plant Genomes

Nesrine Sghaier ^1,^2,^✉, Rayda Ben Ayed ¹, Riadh Ben Marzoug ¹, Ahmed Rebai ¹

PMCID: PMC6236769 PMID: 30515394

Abstract

Auxin is a major regulator of plant growth and development; its action involves transcriptional activation. The identification of Auxin-response element (AuxRE) is one of the most important issues to understand the Auxin regulation of gene expression. Over the past few years, a large number of motif identification tools have been developed. Despite these considerable efforts provided by computational biologists, building reliable models to predict regulatory elements has still been a difficult challenge. In this context, we propose in this work a data fusion approach for the prediction of AuxRE. Our method is based on the combined use of Dempster-Shafer evidence theory and fuzzy theory. To evaluate our model, we have scanning the DORNRÖSCHEN promoter by our model. All proven AuxRE present in the promoter has been detected. At the 0.9 threshold we have no false positive. The comparison of the results of our model and some previous motifs finding tools shows that our model can predict AuxRE more successfully than the other tools and produce less false positive. The comparison of the results before and after combination shows the importance of Dempster-Shafer combination in the decrease of false positive and to improve the reliability of prediction. For an overall evaluation we have chosen to present the performance of our approach in comparison with other methods. In fact, the results indicated that the data fusion method has the highest degree of sensitivity (Sn) and Positive Predictive Value (PPV).

1. Introduction

Plants are genetically very diverse group and are playing a vital role in nutrition and livelihood in particular for rural and tribal masses for employment and income generation In response to various developmental conditions and severe environmental changes by regulating gene expression. Transcription is at the core of physiological and developmental processes that requires well-coordinated players. Auxin is a major regulator of plant growth and development that plays important roles during all the stages of plant life and their action involves transcriptional activation. This phytohormone controls multiple fundamental aspects of the plant development [1] and environmental responses such as apical dominance [2], root development [3], phototropism, and gravitropism [4]. Also, Auxin is crucially involved in cell division, cell elongation, and cell differentiation [5]. The action of these plant hormone centres on the activation of early-response genes [6] and microarray studies has identified a large number of early Auxin-response genes [7]. Many players are implicated in the transcriptional mechanism in the regulation of Auxin target gene expression. Auxin-response element (AuxRE) is a key element which is necessary in this process. The first and second reactions involve recognition of this specific element which contains the core sequence TGTCTC [8].

The identification of AuxREs is one of the most important issues to understand the Auxin regulation of gene expression at the genome level. Cis-regulatory elements can be elucidated by experimental technologies in vitro such as ChIP-chip [9], ChIP-seq [10, 11], and ChIP-PET [12]. However, using laboratory techniques is laborious and the process requires significant time and resources [13]. This is why many computational methods have been developed to allow fast and efficient identification of hormone receptor regulatory elements [14, 15]. Computational prediction of TFBS motifs remains a central goal in bioinformatics and intensive efforts have been dedicated to identifying putative cis-regulatory elements.

Several algorithms have been developed for the detection of consensus sequences. They can be categorized into two main strategies [16, 17]: enumeration of short words (counting and comparing oligonucleotide frequencies) [18, 19] and probabilistic methods [20, 21]. Usually, motif finding tool identifies short DNA sequence ‘motifs' that are statistically overrepresented in regulatory regions (promoters) [21, 22]. A statistically overrepresented motif signify a motif that occurs more often than one would expect by chance [16]. Many computational approaches have been applied such as heuristic, greedy [23], and stochastic algorithms, some others used, expectation maximization (EM) [24], Gibbs Sampling algorithms [25], Hidden Markov model (HMM) [13], Bayesian network [26], Genetic algorithms (GA) [25], and others [16].

A pattern can be represented as a consensus sequence or a position weight matrix (PWM) [46]. PWMs are frequently applied for transcription factor binding site prediction [23, 47]. It describes the probability to find the nucleotides A,C,G,T on each position of a motif [48]. Searching pattern for matches with a PWM is more accurate than consensus string matching, but it also produces a large number of false positives [49, 50]. Other methods use localized distribution as a supplementary criterion to detect functional elements [51]. Over the past few years, a large number of motif identification tools have been developed, to name a few, MAPPER [52], AlignACE [21], MEME [53], Weeder [54], MotifSampler [55], and GAME [56]. Because of this diversity of algorithms and programs available, many studies present a comprehensive review of motifs predictors that provide comparison and guidance to researchers such as Stormo [48], Das and Dai [16], and tompa et al. [57]. These studies show that despite these considerable efforts provided by computational biologists, building reliable models to predict regulatory elements was always a challenge in task. Stormo and Zhao [57] suggested that the majority of the current approaches are not accurate or complete and it is necessary to find more accurate prediction methods with higher specificity and sensitivity. So a new bioinformatics framework is required. Tompa et al. [57] recommended the use of a few complementary tools and follow up the top motifs by combining information from different predictions. Hu et al. [58] discussed the limitations of motif discovery algorithms and developed a new one, named, EMD, which is more significant for shorter input sequences [59].

In this context, we propose in this work a data fusion approach for the prediction of Auxin-response elements. Our method is based on the combined use of Dempster-Shafer (DS) evidence theory and fuzzy sets. It consists of modelling detection uncertainty and fusing the features using DS combination rule.

2. Material and Methods

2.1. Training Set (Data Collection)

A training set of 64 experimentally verified that hormone response elements were collected from published data (Table 1). Whole genome dataset and upstream sequences of Arabidopsis thaliana were downloaded from TAIR (http://arabidopsis.org/).

Table 1.

Datasets.

Cis-regulatory elements	abbreviations	Numbers	References
Auxin-response element	AuxRE	16	[27–37]
ABA response element	ABRE	12	[38–40]
TATA Box	TATA Box	16	[41]
Ypatch	Ypatch	11	[41]
drought-responsive element	DRE	9	[38, 39]

Open in a new tab

Position weight matrix used for comparison tools was obtained from Ponomarenko and Ponomarenko [60]. Linear discriminant analysis was performed using SPSS (v. 16.0, Statistical Package for the Social Sciences, Chicago, IL, USA).

Microarray data of the primary response to Auxin in Arabidopsis was taken from Genevestigator database (https://genevestigator.com/gv/) [61]. Response in seedlings was selected: 1 μM IAA for 1 h [62].

2.2. Implementation of the Algorithm

The main algorithm was implemented under the R environment language. All measurements were performed on a single CPU Intel Core i3 computer running at 2.8 GHz, with 6 GB main memory. The source code is available upon request.

2.3. Some Fundamentals of Dempster-Shafer Theory

The Dempster-Shafer (DS) evidence theory is a mathematical theory originated from the earlier works of Arthur P. Dempster in 1967 [63, 64] and extended by Glenn Shafer in 1976 [65]. DS theory can be considered as a generalization of Bayesian probability theory which uses the notions of imprecise, uncertain, and incomplete information. It has been applied in various domains such as medical diagnosis, image processing, and expert systems [66, 67]. DS theory can be used to combine information from different sources. DS theory uses ‘belief' rather than probability. ‘Belief' function is used to represent the uncertainty of the hypothesis. In DS theory, there is a finite set of N elements called the frame of discernment Θ = {H1, H2,…, H_N}. It is a set of mutually exclusive and exhaustive propositions.

Information sources can distribute mass values on subsets of the frame of discernment. A numerical measure of uncertainty, termed basic probability masses, may be assigned to sets of hypotheses as well as individual hypotheses.

The mass functions verify the following constraints:

\begin{matrix} 0 \leq m (A_{i}) \leq 1 \\ m (\emptyset) = 0 \\ \sum_{A_{i} \in 2^{θ}} m (A i) = 1 \end{matrix}

(1)

where A_i designates a simple hypothesis Hi or composite hypotheses (union of simple hypotheses), A_i = 2^θ.

If we consider two mass distributions m₁ and m₂ from two different information sources, m1 and m2 can be combined with Dempster's orthogonal rule, and a new distribution m = m₁ ⊕ m₂ is calculated in the following manner:

\begin{matrix} m (A_{i}) = {(1 - K)}^{- 1} \sum_{A_{p} \cap A_{q} = A_{i}} m_{1} (A_{p}) m_{2} (A_{q}) \end{matrix}

(2)

where

\begin{matrix} K = \sum_{A_{p} \cap A_{q} = \emptyset} m_{1} (A_{p}) m_{2} (A_{q}) \end{matrix}

(3)

K is the conflict between the two sources.

Dempster-Shafer uses ‘belief' rather than probability. Belief function is used to represent the uncertainty of the hypothesis.

To evaluate the uncertainty of the hypothesis, two functions can be calculated from a mass distribution: the belief function (Bel) and the plausibility function (Pls). Belief and plausibility functions can be considered as lower and upper estimations of probabilities.

\begin{matrix} B e l (A_{i}) = \sum_{A_{j} \subseteq A_{i}} m (A_{j}) \\ P l s (A_{i}) = \sum_{A_{j} \cap A_{i} = \emptyset} m (A_{j}) \end{matrix}

(4)

Bel(A) = 0 represents lack of evidence about A.

3. Results and Discussion

3.1. Modelling Uncertainty of AuxRE Detection

The objective of our study is detection of AuxRE. We applied a data fusion approach which consists of a combination of predictions coming from two techniques commonly used in pattern finding: overrepresented motifs and linear discriminant analysis. The idea is to extract, for each method, some features (parameters) and combine these parameters using the Dempster-Shafer (DS) rule, called orthogonal sum. We have applied our model to the Arabidopsis thaliana genome. The Arabidopsis genome sequence was obtained from TAIR [68].

Two hypotheses are involved: “this motif is an AuxRE”: “this motif is not an AuxRE” (i.e., not a motif or a motif other than AuxRE). In terms of the Dempster-Shafer evidence theory, we are in the case where the frame of discernment is constructed of two single hypotheses H1 and H2 and one composite hypothesis H3= H1 U H2 (union of H1 and H2). H3 represents in fact the ignorance.

The modelling process is proceeding with six major steps (Figure 1):

Step 1: extraction of parameters
Step 2: construction of learning graphs
Step 3: determination of confidence regions
Step 4: modelling the doubt on the hypotheses
Step 5: fuzzification of the learning graphs
Step 6: data fusion methodology

3.1.1. Extraction of Parameters

From the first method (detection of overrepresented motifs), we have prepared four parameters which are position P, significance score Sc, occurrence O, and density D. The position was located from the ATG. Significance score obtained from Weeder algorithm [54]. The occurrence represents the total number of a validated motif sequence in the whole genome of Arabidopsis thaliana. We have considered the density as the rate of a validated AuxRE motif sequence in promoter (-1000 bp) of response gene of Auxin. To prepare density, we have extracted the 2-fold Auxin-response gene from the microarray data.

\begin{matrix} D = \frac{N u m b e r o f a v a l i d a t e d m o t i f s e q u e n c e i n t h e p r o m o t e r s o f 2 f o l d s A u x i n r e s o n s e g e n e}{T o t a l n u m b e r o f a v a l i d a t e d m o t i f s e q u e n c e i n a l l t h e p r o m o t e r s o f a r a b i d o s i s g e n e s} \end{matrix}

(5)

We used the Z-curve parameters [69] and the GC% as potentially discriminative parameters and we performed a linear discriminant analysis. The Z-curve is a unique three-dimensional curve representation of a DNA sequence. We used three Z-curve parameters which are

\begin{matrix} x 1 = (a 1 + g 1) - (c 1 + t 1) \\ y 1 = (a 1 + c 1) - (g 1 + t 1) \\ z 1 = (a 1 + t 1) - (g 1 + c 1) \end{matrix}

(6)

3.1.2. Construction of Learning Graphs

In the following sections, two methods will be presented that use the available data on a positive and a negative training set to construct a discriminative prediction model. A training set of 64 experimentally proven hormone response elements were collected from published data.

Method 1: Overrepresented Motifs. First, the validated motifs are studied in feature spaces which make the interpretation of the link between the selected features (P, SC, O, and D) and the type of motifs straightforward. We chose to study separately knowledge from position P and significance score Sc and those provided by occurrence and density in order to separate as much as possible AuxRE from other types of cis-regulatory elements. Two learning graphs have been created (Figures 2 and 3). Figure 2 represents the distribution of validated motifs according to their parameters position P and significance score Sc. We distinguish, at the bottom of the graph, a region containing only AuxRE; the other part of the graph corresponds to an area of uncertainty which contains all types of motifs. This figure shows that only AuxREs are located relatively far from the translational start site (start codon). However, it is not a discriminative parameter, as many AuxREs were found in -500 bp upstream regions. Therefore, we have decided to study two other parameters (occurrence and density) in order to improve the classification and try to differentiate AuxREs, especially those found in the mixed region shown in Figure 2.

Learning graph 1: distribution of different type of motifs in significance score/position feature space.

Learning graph 2: distribution of different type of motifs in occurrence/density feature space.

Figure 3 illustrates the classification of training cis-elements based on two parameters: the occurrence of the patterns in the -1000 bp upstream regions and the density.

Method 2: Linear Discriminant Analysis. For the linear discriminant analysis, we have used the Z-curve parameter and the % GC. Figure 4 shows the first two discriminant functions which allow a good discrimination of AuXRE from other motifs except Ypatch. The first discriminant function explains 59.6% of variability and has the highest correlation with GC% (-0.88) and Z1 (0.85) while the second function (32% of variability) is correlated to X1 (0.75).

Learning graph 3: distribution of different type of motifs in f1/f2 feature space.

3.1.3. Confidence Regions

All the previous graphs do not allow a clear discrimination of AuxRE from other motifs. Each graph can be subdivided in several ways into different regions that will be enriched in one or few motifs. Here, we have chosen to partition the graph into five confidence regions shown in the Figures 1, 2, and 3 based on the percentage of AuxRE that belong to this region. The graph partition is given in Figures 1, 2, and 3 and Tables 2, 3, and 4.

Table 2.

Proportion of false positive and true positive in the regions of the significance score/position feature space and associated propositions.

Region R _ij	% AuxRE	% non AuxRE	Proposition
Z1: R11	7	93	P4(H2)
Z2: R12	25	75	P2(H2)
Z3: R13	0	100	P4(H2)
Z4: R14, R24, R34	0	100	P4(H2)
Z5: R23	80	20	P3(H1)
Z6: R21, R31, R32, R33, R22	100	0	P4(H1)

Open in a new tab

Table 3.

Proportion of false positive and true positive in the regions of the occurrence/density feature space and associated propositions.

Region Dij	% AuxRE	% non AuxRE	Proposition
Z7: D11	43	57	P1
Z8: D12	13	87	P3(H2)
Z9: D22	91	9	P3(H1)
Z10: D41	100	0	P4(H1)
Z11: D21, D31	0	100	P4(H2)
Z12: D13, D23, D32, D33, D42, D43,	0	100	P4(H2)

Open in a new tab

Table 4.

Proportion of false positive and true positive in the regions of the f1/f2 feature space and associated propositions.

Region Q _ij	% AuxRE	% non AuxRE	Proposition
Z11:Q11, Q21, R12	0	100	P4(H2)
Z12: Q22	70	30	P3(H1)
Z13: Q13, Q23, Q33	0	100	P4(H2)
Z14: Q31, Q32	10	90	P3(H2)
Z15: Q41, Q42, Q43	0	100	P4(H2)

Open in a new tab

3.1.4. Modelling the Doubt on the Hypotheses

In order to make the graph partition an automatic process we attributed a confidence level to any unknown detected motif that would be located on the graph.

For that purpose, we define a gradual doubt through a set of four propositions:

P1(Hi,Hj): total ignorance
P2(Hi,Hj): low preference for the Hi hypothesis but high doubt between Hi and Hj
P3(Hi,Hj): strong preference for the Hi hypothesis but low doubt between Hi and Hj
P4(Hi): total confidence in the Hi hypothesis, no doubt

Next, these propositions are translated in terms of masses as detailed in Table 5. The preference level for a hypothesis from P1 to P4 is gradually represented by a mass value, respectively, equal to 0, 0.33, 0.67, and 1 [66]. Likewise, the gradual doubt between hypotheses is modelled by a mass value. In case of total doubt, the mass value affected equals 0. On the other hand, the mass value assigned to the total confidence is equal to 1.

Table 5.

Association of propositions with mass values.

Proposition	m(H1) (AuxRE)	m(H2) (Pas AuxRE)	m(H1 U H2) (ignorance)
P1(H1,H2)	0	0	1
P2(H1,H2)	0,33	0	0,67
P3(H1,H2)	0,67	0	0,33
P4(H1)	1	0	0
P2(H2,H1)	0	0,33	0,67
P3(H2,H1)	0	0,67	0,33
P4(H2)	0	1	0

Open in a new tab

Finally, a proposition is assigned to each region from the previous analyses on percentages of AuxRE and other motifs in each region. The link between the percentages and the related proposition are presented in Tables 2, 3, and 4.

3.1.5. Fuzzification of the Learning Graphs

In the previous section we used discrete representation to define regions, which is not very objective because it can allocate confidence significantly different, for two near motifs from either side of boundaries. Moreover, the boundaries between regions are not well defined, and the transition from one region of the graph to another is not abrupt but a smooth one. Thus, In order to have a fuzzy, gradual continuous transition, we introduce the fuzzy logic theory. Therefore, we define fuzzy sets for each measured feature to predict its membership degrees to different possible feature families. For the parameter significance score four sets were defined (small, average, high, and very high). For the parameter position, three sets were described (core, proximal, and distal). For the parameters occurrence and density, three sets were defined (small, average, and high) for each of them.

3.1.6. Data Fusion Methodology

The process of data fusion consists of fusing a number of learning graphs based on the definition of the so-called masses.

For each detected motifs, three masses are calculated, corresponding to the three learning graphs. They are given, respectively, by

\begin{matrix} m (O \in S / S c & P) = \sum_{i = 1, j = 1}^{i = 4, j = 3} μ_{S c (i)} (x) * μ_{p (j)} (y) \\ * m_{R_{i j}} (O \in S / S c & P) \\ m (O \in S / O & M) = \sum_{i = 1, j = 1}^{i = 3, j = 3} μ_{O (i)} (x) * μ_{M (j)} (y) \\ * m_{R_{i j}} (O \in S / O & M) \\ m (O \in S / f 1 & f 2) = \sum_{i = 1, j = 1}^{i = 3, j = 3} μ_{f 1 (i)} (x) * μ_{f 2 (j)} (y) \\ * m_{R_{i j}} (O \in S / f 1 & f 2) \end{matrix}

(7)

where S represents any subset of the hypotheses and m(O ∈ S/Sc&P), m_{R_ij}(O ∈ S/O&M), m_{R_ij}(O ∈ S/f1&f2) designate the mass corresponding to the region Rij of, respectively, the significance score/position graph, occurrence/density graph, and f1/f2 graph.

First, we have to fuse the two masses of method 1; this masse m₁(O ∈ S) is obtained by combination of the two masses from the two feature spaces of method 1 through using the orthogonal sum of Dempster:

\begin{matrix} m_{1} (O \in S) = m_{R_{i j}} (O \in S / S c & P) \\ \oplus m_{R_{i j}} (O \in S / O & M) \end{matrix}

(8)

The final mass function is then calculated by fusing the two masses m₁(O ∈ S) and m_{R_ij}(O ∈ S/f1&f2); the orthogonal sum of Dempster is

\begin{matrix} m_{f u s i o n} (O \in S) = m_{1} (O \in S) \oplus m_{R_{i j}} (O \in S / f 1 & f 2) \end{matrix}

(9)

3.2. Scan of the Auxin Responsive DRN Promoter

DORNRÖSCHEN (DRN) promoter is one of the most studied Auxin responsive promoters which have an essential role in Auxin transport and perception in the Arabidopsis embryogenesis [70]. Two AuxREs that are not used in training have been experimentally identified in this promoter. To verify the reliability of the prediction, we tested our method to the DRN promoter. At a threshold of 0.9, the scanning of the DRN promoter by the model has detected the two validated AuxREs and at the same time we have not detected a false positive. Among 1200 motifs, we considered the two proven AuxREs as a true positive and the others as false positives (Figure 5).

Scanning of the DRN promoter by the data fusion method.

3.3. Comparison between Method 1, Method 2, and Fusion

In order to study the influence of the data fusion by Dempster-Shafer combination, we have presented in Figure 6 the ration between true and false positive before and after combination. Figure 6 shows that, based on method 1 and method 2 separately, we have a large number of false positives. Their percentage exceeds 90% in both cases. After combination, it appears that the number of false positive significantly decreases to the point of cancelled when the credibility value equals 0.9. The reliability of detection is improved by data fusion. In parallel, the comparing of Tree ROC curves as shown in Figure 7 confirms the higher predictive reliability of the model after fusion compared with that based on only one method, when we scan DRN promoter.

Evolution of the positive and false detection as a function of credibility obtained before and after data fusion.

ROC curves before and after data fusion (scan of DRN promoter).

3.4. Scan of DRN Promoter by Other Methods

To evaluate our method, we have scanned the DRN promoter by previous tools: Consensus [71], MEME [20], Gibbs Sampler [25], MDScan [72], and Weeder [54]. On the analysis platform MELINA II [73], the result indicates that the four motifs finding tools do not detect any AuxRE. These basic tools are unable to detect specific hormone responsive elements, but they detect cis-elements in general. We have also compared our model to the PWM method. PWM detects the two AuxREs but in return it produces a high frequency of false positive predictions. In fact, four false positives have been detected at a threshold equal to 0.9. For example, PWM detects the motif TTGTCAAA as an AuxRE with a score equal to 0.93 because this motif sequence is similar to the AuxRE sequence and, on the other hand, the PWM is based only on the composition. Conversely, this motif was not detected with our method since the prediction depends on several parameters. Likewise, the Plant Promoter Database (PPDB) has not detected these two validated AuxRE present in the DRN promoter. In this database, cis-regulatory elements are identified by the Local Distribution of Short Sequences (LDSS) and a prediction method based on microarray data methods (RARf-based approach)[74].

3.5. Scan of RD29B Promoter

The promoter of RD29B gene contains no AuxRE according to the literature. Several studies have shown the presence of other types of cis-regulatory elements such as ABA and DRE. The scan of this promoter by our model did not detect any false positives.

3.6. Validation of the Results

Because of the limited number of confirmed Auxin responsive elements, there is not enough data to divide it into training and validation sets. So, we have performed the Gold Standard [75] test to evaluate our model. A library of random DNA sequences (100 sequences) was generated using Unipro UGENE software version 1.26.1. (http://ugene.unipro.ru/) [76]. A set of 14 AuxRE was prepared. In each randomly generated DNA sequence only one AuxRE from preparing set was inserted at a random position using SeqKit toolkit [77]. A TSV file which contains a list of the sequences of inserted AuxRE and their positions of insertion was generated using csvtk (https://github.com/shenwei356/csvtk).

In the next step, to further investigate the prediction performance and to choose the optimum cutoff, we applied our prediction method and we look at the variation of Positive Predictive Value (PPV). The results showed that we achieve maximal PPV for a cutoff value of 0.9 (Figure 8).

Variation of Positive Predictive Value (PPV).

For an overall evaluation we have chosen to present the performance of our approach in comparison with other methods. The chosen methods are the five individual TFBS prediction tools evaluated by Jayaram et al. [78].

We do this by first summing true/false positives and negatives, and then statistical parameters were calculated in order to illustrate the best predictive approach. Table 6 presented the obtained results. Our method is based on the joint using of Dempster-Shafer (DS) evidence theory and fuzzy sets and has the high degree of sensitivity (Sn) and Positive Predictive Value (PPV) with a value of 79 and 48.17, respectively, compared to the best previous methods. Even the Youden index (YI) and the Χ2 test parameters generated higher value than the other reference tools. Moreover, Table 6 shows that our approach (Data fusion) followed by the Clover computer program implemented by Frith et al. [42] are the best performing transcription factor binding sites (TFBS) prediction tools for individual sites. On the other side, Table 6 shows that the Find Individual Motif Occurrences (FIMO) method described by Grant et al. [44] has the worst sensitivity (Sn=22) on all the six presented tools. Besides, position specific scoring matrices (PoSSuMsearch) developed by Beckstette et al. [45] and FIMO tool have lower Positive Predictive Value (PPV) than the other previous methods, with a value of 40.74 and 42.31, respectively.

Table 6.

Comparison between our method and the other published methods.

	Sn	Sp	PPV	NPV	FPR	FNR	YI	QCY	Χ ₂	Ref
Data fusion	79	99.91	48.17	99.98	51.83	0.02	0.79	1	36490.3
Clover	69	99.92	47.9	99.97	52.08	0.03	0.69	1	31702.8	[42]
Matrix-Scan	51	99.94	46.36	99.95	53.64	0.05	0.51	1	22661.6	[43]
Patser	63.64	99.92	43.45	99.96	56.55	0.04	0.64	1	27285.9	[43]
FIMO	22	99.97	42.31	99.92	57.69	0.08	0.22	1	8911.9	[44]
PoSSuMsearch	56.41	83.84	40.74	90.71	59.26	9.29	0.4	0.74	30	[45]

Open in a new tab

Average sensitivities (Sn). Specificity (Sp). Positive Predictive Value (PPV). Negative Predictive Value (NPV). False Positive Rate (FPR). False Negative Rate (FNR). Youden index (YI). Q coefficient of Yule (QCY) and Χ₂ test value (Χ₂). The best-performing tools. Data fusion and Clover are highlighted in bold.

Our method strikes a good balance between sensitivity and PPV.

4. Conclusion

In this study, we applied a data fusion approach for the prediction of Auxin-response elements. Our method is based on the combined use of Dempster-Shafer (DS) evidence theory and fuzzy theory. We have tested our model to the DRN promoter and we have compared the prediction to previous tools. The results show that false positives are significantly decreased.

Acknowledgments

This work was supported by the Tunisian Ministry of Higher Education and Scientific Research.

Data Availability

All the data used in this manuscript are included within the article and will be freely accessible upon its publication in BioMed Research International.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

References

1.Möller B., Weijers D. Auxin control of embryo patterning. Cold Spring Harbor Perspectives in Biology. 2009;1(5):p. a001545. doi: 10.1101/cshperspect.a001545. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Leyser O. The fall and rise of apical dominance. Current Opinion in Genetics & Development. 2005;15(4):468–471. doi: 10.1016/j.gde.2005.06.010. [DOI] [PubMed] [Google Scholar]
3.Bennett T., Scheres B. Root development-two meristems for the price of one? Current Topics in Developmental Biology. 2010;91(C):67–102. doi: 10.1016/S0070-2153(10)91003-X. [DOI] [PubMed] [Google Scholar]
4.Muday G. K. Auxins and tropisms. Journal of Plant Growth Regulation. 2001;20(3):226–243. doi: 10.1007/s003440010027. [DOI] [PubMed] [Google Scholar]
5.Ding Z., Friml J. Auxin regulates distal stem cell differentiation in Arabidopsis roots. Proceedings of the National Acadamy of Sciences of the United States of America. 2010;107(26):12046–12051. doi: 10.1073/pnas.1000672107. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Abel S., Theologis A. Early genes and auxin action. Plant Physiology. 1996;111(1):9–17. doi: 10.1104/pp.111.1.9. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Nemhauser J. L., Hong F., Chory J. Different plant hormones regulate similar processes through largely nonoverlapping transcriptional responses. Cell. 2006;126(3):467–475. doi: 10.1016/j.cell.2006.05.050. [DOI] [PubMed] [Google Scholar]
8.Ulmasov T., Liu Zhan-Bin, Hagen G., Guilfoyle T. J. Composite structure of auxin response elements. The Plant Cell. 1995;7(10):1611–1623. doi: 10.1105/tpc.7.10.1611. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Weinmann A. S., Farnham P. J. Identification of unknown target genes of human transcription factors using chromatin immunoprecipitation. Methods. 2002;26(1):37–47. doi: 10.1016/S1046-2023(02)00006-3. [DOI] [PubMed] [Google Scholar]
10.Robertson G., Hirst M., Bainbridge M., et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nature Methods. 2007;4(8):651–657. doi: 10.1038/nmeth1068. [DOI] [PubMed] [Google Scholar]
11.Barski A., Cuddapah S., Cui K., et al. High-resolution profiling of histone methylations in the human genome. Cell. 2007;129(4):823–837. doi: 10.1016/j.cell.2007.05.009. [DOI] [PubMed] [Google Scholar]
12.Loh Y., Wu Q., Chew J., et al. The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells. Nature Genetics. 2006;38(4):431–440. doi: 10.1038/ng1760. [DOI] [PubMed] [Google Scholar]
13.Sandelin A., Wasserman W. W. Prediction of nuclear hormone receptor response elements. Molecular Endocrinology. 2005;19(3):595–606. doi: 10.1210/me.2004-0101. [DOI] [PubMed] [Google Scholar]
14.Lenhard B., Sandelin A., Mendoza L., Engström P., Jareborg N., Wasserman W. W. Identification of conserved regulatory elements by comparative genome analysis. Journal of Biology. 2003;2(2, article no. 13) doi: 10.1186/1475-4924-2-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Brazma A., Jonassen I., Vilo J., Ukkonen E. Predicting gene regulatory elements in silico on a genomic scale. Genome Research. 1998;8(11):1202–1215. doi: 10.1101/gr.8.11.1202. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Das M. K., Dai H.-K. A survey of DNA motif finding algorithms. BMC Bioinformatics. 2007;8(supplement 7, article S21) doi: 10.1186/1471-2105-8-s7-s21. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Davis I. W., Benninger C., Benfey P. N., Elich T. Powrs: Position-sensitive motif discovery. PLoS ONE. 2012;7(7) doi: 10.1371/journal.pone.0040373. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Linhart C., Halperin Y., Shamir R. Transcription factor and microRNA motif discovery: The Amadeus platform and a compendium of metazoan target sets. Genome Research. 2008;18(7):1180–1189. doi: 10.1101/gr.076117.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Georgiev S., Boyle A. P., Jayasurya K., Ding X., Mukherjee S., Ohler U. Evidence-ranked motif identification. Genome Biology. 2010;11(2, article R19) doi: 10.1186/gb-2010-11-2-r19. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Bailey T. L., Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers; 1994; pp. 28–36. [PubMed] [Google Scholar]
21.Hughes J. D., Estep P. W., Tavazoie S., Church G. M. Computational identification of Cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. Journal of Molecular Biology. 2000;296(5):1205–1214. doi: 10.1006/jmbi.2000.3519. [DOI] [PubMed] [Google Scholar]
22.Van Helden J., André B., Collado-Vides J. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. Journal of Molecular Biology. 1998;281(5):827–842. doi: 10.1006/jmbi.1998.1947. [DOI] [PubMed] [Google Scholar]
23.Hertz G. Z., Hartzell G. W., Stormo G. D. Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Bioinformatics. 1990;6(2):81–92. doi: 10.1093/bioinformatics/6.2.81. [DOI] [PubMed] [Google Scholar]
24.Lawrence C. E., Reilly A. A. An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins: Structure, Function, and Bioinformatics. 1990;7(1):41–51. doi: 10.1002/prot.340070105. [DOI] [PubMed] [Google Scholar]
25.Lawrence C. E., Altschul S. F., Boguski M. S., Liu J. S., Neuwald A. F., Wootton J. C. Detecting subtle sequence signals: a gibbs sampling strategy for multiple alignment. Science. 1993;262(5131):208–214. doi: 10.1126/science.8211139. [DOI] [PubMed] [Google Scholar]
26.Siddharthan R., Siggia E. D., Van Nsmwegea E. PhyloGibbs: A gibbs sampling motif finder that incorporates phylogeny that incorporates phylogeny. PLoS Computational Biology. 2005;1(7):0534–0556. doi: 10.1371/journal.pcbi.0010067. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Okushima Y., Mitina I., Quach H. L., Theologis A. AUXIN RESPONSE FACTOR 2 (ARF2): A pleiotropic developmental regulator. The Plant Journal. 2005;43(1):29–46. doi: 10.1111/j.1365-313X.2005.02426.x. [DOI] [PubMed] [Google Scholar]
28.Ismail I. Function and Regulation of Xylem Cysteine Protease 1 and Xylem Cysteine Protease 2 in Arabidopsis. 2003, http://scholar.lib.vt.edu/theses/available/etd-08152004-231624.
29.Donner T. J., Sherr I., Scarpella E. Regulation of preprocambial cell state acquisition by auxin signaling in Arabidopsis leaves. Development. 2009;136(19):3235–3246. doi: 10.1242/dev.037028. [DOI] [PubMed] [Google Scholar]
30.Scacchi E., Salinas P., Gujas B., et al. Spatio-temporal sequence of cross-regulatory events in root meristem growth. Proceedings of the National Acadamy of Sciences of the United States of America. 2010;107(52):22734–22739. doi: 10.1073/pnas.1014716108. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Zhao Z., Andersen S. U., Ljung K., et al. Hormonal control of the shoot stem-cell niche. Nature. 2010;465(7301):1089–1092. doi: 10.1038/nature09126. [DOI] [PubMed] [Google Scholar]
32.Walcher C. L., Nemhauser J. L. Bipartite promoter element required for auxin response. Plant Physiology. 2012;158(1):273–282. doi: 10.1104/pp.111.187559. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Hirota A., Kato T., Fukaki H., Aida M., Tasaka M. The auxin-regulated AP2/EREBP gene PUCHI is required for morphogenesis in the early lateral root primordium of Arabidopsis. The Plant Cell. 2007;19(7):2156–2168. doi: 10.1105/tpc.107.050674. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Berendzen K. W., Weiste C., Wanke D., Kilian J., Harter K., Dröge-Laser W. Bioinformatic cis-element analyses performed in Arabidopsis and rice disclose bZIP- and MYB-related binding sites as potential AuxRE-coupling elements in auxin-mediated transcription. BMC Plant Biology. 2012;12, article no. 125 doi: 10.1186/1471-2229-12-125. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Zhu C., Perry S. E. Control of expression and autoregulation of AGL15, a member of the MADS-box family. The Plant Journal. 2005;41(4):583–594. doi: 10.1111/j.1365-313X.2004.02320.x. [DOI] [PubMed] [Google Scholar]
36.Schlereth A., Möller B., Liu W., et al. MONOPTEROS controls embryonic root initiation by regulating a mobile transcription factor. Nature. 2010;464(7290):913–916. doi: 10.1038/nature08836. [DOI] [PubMed] [Google Scholar]
37.Cheng Z. J., Wang L., Sun W., et al. Pattern of auxin and cytokinin responses for shoot meristem induction results from the regulation of cytokinin biosynthesis by AUXIN RESPONSE FACTOR3. Plant Physiology. 2013;161(1):240–251. doi: 10.1104/pp.112.203166. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Yamaguchi-Shinozaki K., Shinozaki K. A novel cis-acting element in an Arabidopsis gene is involved in responsiveness to drought, low-temperature, or high-salt stress. The Plant Cell. 1994;6(2):251–264. doi: 10.1105/tpc.6.2.251. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Nakashima K., Fujita Y., Katsura K., et al. Transcriptional regulation of ABI3- and ABA-responsive genes including RD29B and RD29A in seeds, germinating embryos, and seedlings of Arabidopsis. Plant Molecular Biology. 2006;60(1):51–68. doi: 10.1007/s11103-005-2418-5. [DOI] [PubMed] [Google Scholar]
40.Denekamp M., Smeekens S. C. Integration of wounding and osmotic stress signals determines the expression of the AtMYB102 transcription factor gene. Plant Physiology. 2003;132(3):1415–1423. doi: 10.1104/pp.102.019273. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Hieno A., Naznin H. A., Hyakumachi M., et al. Ppdb: Plant promoter database version 3.0. Nucleic Acids Research. 2014;42(1):D1188–D1192. doi: 10.1093/nar/gkt1027. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Frith M. C., Fu Y., Yu L., Chen J.-F., Hansen U., Weng Z. Detection of functional DNA motifs via statistical over-representation. Nucleic Acids Research. 2004;32(4):1372–1381. doi: 10.1093/nar/gkh299. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Turatsinze J.-V., Thomas-Chollier M., Defrance M., van Helden J. Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules. Nature Protocols. 2008;3(10):1578–1588. doi: 10.1038/nprot.2008.97. [DOI] [PubMed] [Google Scholar]
44.Grant C., Bailey T., Noble W. Scanning for occurrences of a given motif. Bioinformatics. 2011;27(2):1017–1018. doi: 10.1093/bioinformatics/btr064. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Beckstette M., Homann R., Giegerich R., Kurtz S. Fast index based algorithms and software for matching position specific scoring matrices. BMC Bioinformatics. 2006;7, article no. 389 doi: 10.1186/1471-2105-7-389. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Barnes M. R. Bioinformatics Challenges for the Geneticist. Bioinformatics for Geneticists: A Bioinformatics Primer for the Analysis of Genetic Data: Second Edition. 2007:1–16. [Google Scholar]
47.Stormo G. D., Schneider T. D., Gold L., Ehrenfeucht A. Use of the ‘perceptron’ algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Research. 1982;10(9):2997–3011. doi: 10.1093/nar/10.9.2997. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Stormo G. D. DNA binding sites: representation and discovery. Bioinformatics. 2000;16(1):16–23. doi: 10.1093/bioinformatics/16.1.16. [DOI] [PubMed] [Google Scholar]
49.Cuellar-Partida G., Buske F. A., McLeay R. C., Whitington T., Noble W. S., Bailey T. L. Epigenetic priors for identifying active transcription factor binding sites. Bioinformatics. 2012;28(1):56–62. doi: 10.1093/bioinformatics/btr614. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Lähdesmäki H., Rust A. G., Shmulevich I. Probabilistic inference of transcription factor binding from multiple data sources. PLoS ONE. 2008;3(3) doi: 10.1371/journal.pone.0001820. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Yamamoto Y. Y., Ichida H., Matsui M., et al. Identification of plant promoter constituents by analysis of local distribution of short sequences. BMC Genomics. 2007;8, article no. 67 doi: 10.1186/1471-2164-8-67. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Marinescu V. D., Kohane I. S., Riva A. MAPPER: A search engine for the computational identification of putative transcription factor binding sites in multiple genomes. BMC Bioinformatics. 2005;6, article no. 79 doi: 10.1186/1471-2105-6-79. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Bailey T. L., Elkan C. The value of prior knowledge in discovering motifs with MEME. Proceedings / . International Conference on Intelligent Systems for Molecular Biology; ISMB. International Conference on Intelligent Systems for Molecular Biology. 1995;3:21–29. [PubMed] [Google Scholar]
54.Pavesi G., Mereghetti P., Mauri G., Pesole G. Weeder web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Research. 2004;32:W199–W203. doi: 10.1093/nar/gkh465. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Thijs G., Lescot M., Marchal K., et al. A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics. 2002;17(12):1113–1122. doi: 10.1093/bioinformatics/17.12.1113. [DOI] [PubMed] [Google Scholar]
56.Wei Z., Jensen S. T. GAME: detecting cis-regulatory elements using a genetic algorithm. Bioinformatics. 2006;22(13):1577–1584. doi: 10.1093/bioinformatics/btl147. [DOI] [PubMed] [Google Scholar]
57.Tompa M., Li N., Bailey T. L., et al. Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology. 2005;23(1):137–144. doi: 10.1038/nbt1053. [DOI] [PubMed] [Google Scholar]
58.Hu J., Yang Y. D., Kihara D. EMD: An ensemble algorithm for discovering regulatory motifs in DNA sequences. BMC Bioinformatics. 2006;7, article no. 342 doi: 10.1186/1471-2105-7-342. [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Hu J., Li B., Kihara D. Limitations and potentials of current motif discovery algorithms. Nucleic Acids Research. 2005;33(15):4899–4913. doi: 10.1093/nar/gki791. [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Ponomarenko P. M., Ponomarenko M. P. Sequence-based prediction of transcription upregulation by auxin in plants. Journal of Bioinformatics and Computational Biology. 2015;13(1) doi: 10.1142/s0219720015400090.1540009 [DOI] [PubMed] [Google Scholar]
61.Zimmermann P., Hirsch-Hoffmann M., Hennig L., Gruissem W. GENEVESTIGATOR. Arabidopsis microarray database and analysis toolbox. Plant Physiology. 2004;136(1):2621–2632. doi: 10.1104/pp.104.046367. [DOI] [PMC free article] [PubMed] [Google Scholar]
62.Goda H., Sasaki E., Akiyama K., et al. The AtGenExpress hormone and chemical treatment data set: Experimental design, data evaluation, model data analysis and data access. The Plant Journal. 2008;55(3):526–542. doi: 10.1111/j.1365-313X.2008.03510.x. [DOI] [PubMed] [Google Scholar]
63.Dempster A. P. New methods for reasoning towards posterior distributions based on sample data. Annals of Mathematical Statistics. 1966;37:355–374. doi: 10.1214/aoms/1177699517. [DOI] [Google Scholar]
64.Dempster A. P. Upper and lower probabilities induced by a multivalued mapping. Annals of Mathematical Statistics. 1967;38:325–339. doi: 10.1214/aoms/1177698950. [DOI] [Google Scholar]
65.Shafer G. A Mathematical Theory of Evidence. Princeton, NJ, USA: Princeton University Press; 1976. [Google Scholar]
66.Kaftandjian V., Zhu Y. M., Dupuis O., Babot D. The combined use of the evidence theory and fuzzy logic for improving multimodal nondestructive testing systems. IEEE Transactions on Instrumentation and Measurement. 2005;54(5):1968–1977. doi: 10.1109/TIM.2005.854255. [DOI] [Google Scholar]
67.Zhu Y. M., Bentabet L., Dupuis O., Kaftandjian V., Babot D., Rombaut M. Automatic determination of mass functions in Dempster-Shafer theory using fuzzy c-means and spatial neighborhood information for image segmentation. Optical Engineering. 2002;41(4):760–770. doi: 10.1117/1.1457458. [DOI] [Google Scholar]
68.Huala E., Dickerman A. W., Garcia-Hernandez M., et al. The Arabidopsis Information Resource (TAIR): A comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant. Nucleic Acids Research. 2001;29(1):102–105. doi: 10.1093/nar/29.1.102. [DOI] [PMC free article] [PubMed] [Google Scholar]
69.Guo F.-B., Ou H.-Y., Zhang C.-T. ZCURVE: A new system for recognizing protein-coding genes in bacterial and archaeal genomes. Nucleic Acids Research. 2003;31(6):1780–1789. doi: 10.1093/nar/gkg254. [DOI] [PMC free article] [PubMed] [Google Scholar]
70.Cole M., Chandler J., Weijers D., Jacobs B., Comelli P., Werr W. DORNRÖSCHEN is a direct target of the auxin response factor MONOPTEROS in the Arabidopsis embryo. Development. 2009;136(10):1643–1651. doi: 10.1242/dev.032177. [DOI] [PubMed] [Google Scholar]
71.Stormo G. D., Hartzell G. W., III Identifying protein-binding sites from unaligned DNA fragments. Proceedings of the National Acadamy of Sciences of the United States of America. 1989;86(4):1183–1187. doi: 10.1073/pnas.86.4.1183. [DOI] [PMC free article] [PubMed] [Google Scholar]
72.Liu X. S., Brutlag D. L., Liu J. S. An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nature Biotechnology. 2002;20(8):835–839. doi: 10.1038/nbt717. [DOI] [PubMed] [Google Scholar]
73.Poluliakh N., Takagi T., Nakai K. MELINA: Motif extraction from promoter regions of potentially co-regulated genes. Bioinformatics. 2003;19(3):423–424. doi: 10.1093/bioinformatics/btf872. [DOI] [PubMed] [Google Scholar]
74.Yamamoto Y. Y., Yoshioka Y., Hyakumachi M., et al. Prediction of transcriptional regulatory elements for plant hormone responses based on microarray data. BMC Plant Biology. 2011;11, article no. 39 doi: 10.1186/1471-2229-11-39. [DOI] [PMC free article] [PubMed] [Google Scholar]
75.Rudd P. In search of the gold standard for compliance measurement. JAMA Internal Medicine. 1979;139(6):627–628. doi: 10.1001/archinte.139.6.627. [DOI] [PubMed] [Google Scholar]
76.Okonechnikov K., Golosova O., Fursov M., et al. Unipro UGENE: a unified bioinformatics toolkit. Bioinformatics. 2012;28(8):1166–1167. doi: 10.1093/bioinformatics/bts091. [DOI] [PubMed] [Google Scholar]
77.Shen W., Le S., Li Y., Hu F. SeqKit: A cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS ONE. 2016;11(10) doi: 10.1371/journal.pone.0163962. [DOI] [PMC free article] [PubMed] [Google Scholar]
78.Jayaram N., Usvyat D., R. Martin A. C. Evaluating tools for transcription factor binding site prediction. BMC Bioinformatics. 2016 doi: 10.1186/s12859-016-1298-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All the data used in this manuscript are included within the article and will be freely accessible upon its publication in BioMed Research International.

[B1] 1.Möller B., Weijers D. Auxin control of embryo patterning. Cold Spring Harbor Perspectives in Biology. 2009;1(5):p. a001545. doi: 10.1101/cshperspect.a001545. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] 2.Leyser O. The fall and rise of apical dominance. Current Opinion in Genetics & Development. 2005;15(4):468–471. doi: 10.1016/j.gde.2005.06.010. [DOI] [PubMed] [Google Scholar]

[B3] 3.Bennett T., Scheres B. Root development-two meristems for the price of one? Current Topics in Developmental Biology. 2010;91(C):67–102. doi: 10.1016/S0070-2153(10)91003-X. [DOI] [PubMed] [Google Scholar]

[B4] 4.Muday G. K. Auxins and tropisms. Journal of Plant Growth Regulation. 2001;20(3):226–243. doi: 10.1007/s003440010027. [DOI] [PubMed] [Google Scholar]

[B5] 5.Ding Z., Friml J. Auxin regulates distal stem cell differentiation in Arabidopsis roots. Proceedings of the National Acadamy of Sciences of the United States of America. 2010;107(26):12046–12051. doi: 10.1073/pnas.1000672107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] 6.Abel S., Theologis A. Early genes and auxin action. Plant Physiology. 1996;111(1):9–17. doi: 10.1104/pp.111.1.9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7.Nemhauser J. L., Hong F., Chory J. Different plant hormones regulate similar processes through largely nonoverlapping transcriptional responses. Cell. 2006;126(3):467–475. doi: 10.1016/j.cell.2006.05.050. [DOI] [PubMed] [Google Scholar]

[B8] 8.Ulmasov T., Liu Zhan-Bin, Hagen G., Guilfoyle T. J. Composite structure of auxin response elements. The Plant Cell. 1995;7(10):1611–1623. doi: 10.1105/tpc.7.10.1611. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] 9.Weinmann A. S., Farnham P. J. Identification of unknown target genes of human transcription factors using chromatin immunoprecipitation. Methods. 2002;26(1):37–47. doi: 10.1016/S1046-2023(02)00006-3. [DOI] [PubMed] [Google Scholar]

[B10] 10.Robertson G., Hirst M., Bainbridge M., et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nature Methods. 2007;4(8):651–657. doi: 10.1038/nmeth1068. [DOI] [PubMed] [Google Scholar]

[B11] 11.Barski A., Cuddapah S., Cui K., et al. High-resolution profiling of histone methylations in the human genome. Cell. 2007;129(4):823–837. doi: 10.1016/j.cell.2007.05.009. [DOI] [PubMed] [Google Scholar]

[B12] 12.Loh Y., Wu Q., Chew J., et al. The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells. Nature Genetics. 2006;38(4):431–440. doi: 10.1038/ng1760. [DOI] [PubMed] [Google Scholar]

[B13] 13.Sandelin A., Wasserman W. W. Prediction of nuclear hormone receptor response elements. Molecular Endocrinology. 2005;19(3):595–606. doi: 10.1210/me.2004-0101. [DOI] [PubMed] [Google Scholar]

[B14] 14.Lenhard B., Sandelin A., Mendoza L., Engström P., Jareborg N., Wasserman W. W. Identification of conserved regulatory elements by comparative genome analysis. Journal of Biology. 2003;2(2, article no. 13) doi: 10.1186/1475-4924-2-13. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] 15.Brazma A., Jonassen I., Vilo J., Ukkonen E. Predicting gene regulatory elements in silico on a genomic scale. Genome Research. 1998;8(11):1202–1215. doi: 10.1101/gr.8.11.1202. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16] 16.Das M. K., Dai H.-K. A survey of DNA motif finding algorithms. BMC Bioinformatics. 2007;8(supplement 7, article S21) doi: 10.1186/1471-2105-8-s7-s21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] 17.Davis I. W., Benninger C., Benfey P. N., Elich T. Powrs: Position-sensitive motif discovery. PLoS ONE. 2012;7(7) doi: 10.1371/journal.pone.0040373. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] 18.Linhart C., Halperin Y., Shamir R. Transcription factor and microRNA motif discovery: The Amadeus platform and a compendium of metazoan target sets. Genome Research. 2008;18(7):1180–1189. doi: 10.1101/gr.076117.108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] 19.Georgiev S., Boyle A. P., Jayasurya K., Ding X., Mukherjee S., Ohler U. Evidence-ranked motif identification. Genome Biology. 2010;11(2, article R19) doi: 10.1186/gb-2010-11-2-r19. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] 20.Bailey T. L., Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers; 1994; pp. 28–36. [PubMed] [Google Scholar]

[B21] 21.Hughes J. D., Estep P. W., Tavazoie S., Church G. M. Computational identification of Cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. Journal of Molecular Biology. 2000;296(5):1205–1214. doi: 10.1006/jmbi.2000.3519. [DOI] [PubMed] [Google Scholar]

[B22] 22.Van Helden J., André B., Collado-Vides J. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. Journal of Molecular Biology. 1998;281(5):827–842. doi: 10.1006/jmbi.1998.1947. [DOI] [PubMed] [Google Scholar]

[B23] 23.Hertz G. Z., Hartzell G. W., Stormo G. D. Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Bioinformatics. 1990;6(2):81–92. doi: 10.1093/bioinformatics/6.2.81. [DOI] [PubMed] [Google Scholar]

[B24] 24.Lawrence C. E., Reilly A. A. An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins: Structure, Function, and Bioinformatics. 1990;7(1):41–51. doi: 10.1002/prot.340070105. [DOI] [PubMed] [Google Scholar]

[B25] 25.Lawrence C. E., Altschul S. F., Boguski M. S., Liu J. S., Neuwald A. F., Wootton J. C. Detecting subtle sequence signals: a gibbs sampling strategy for multiple alignment. Science. 1993;262(5131):208–214. doi: 10.1126/science.8211139. [DOI] [PubMed] [Google Scholar]

[B26] 26.Siddharthan R., Siggia E. D., Van Nsmwegea E. PhyloGibbs: A gibbs sampling motif finder that incorporates phylogeny that incorporates phylogeny. PLoS Computational Biology. 2005;1(7):0534–0556. doi: 10.1371/journal.pcbi.0010067. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B63] 27.Okushima Y., Mitina I., Quach H. L., Theologis A. AUXIN RESPONSE FACTOR 2 (ARF2): A pleiotropic developmental regulator. The Plant Journal. 2005;43(1):29–46. doi: 10.1111/j.1365-313X.2005.02426.x. [DOI] [PubMed] [Google Scholar]

[B64] 28.Ismail I. Function and Regulation of Xylem Cysteine Protease 1 and Xylem Cysteine Protease 2 in Arabidopsis. 2003, http://scholar.lib.vt.edu/theses/available/etd-08152004-231624.

[B65] 29.Donner T. J., Sherr I., Scarpella E. Regulation of preprocambial cell state acquisition by auxin signaling in Arabidopsis leaves. Development. 2009;136(19):3235–3246. doi: 10.1242/dev.037028. [DOI] [PubMed] [Google Scholar]

[B66] 30.Scacchi E., Salinas P., Gujas B., et al. Spatio-temporal sequence of cross-regulatory events in root meristem growth. Proceedings of the National Acadamy of Sciences of the United States of America. 2010;107(52):22734–22739. doi: 10.1073/pnas.1014716108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B67] 31.Zhao Z., Andersen S. U., Ljung K., et al. Hormonal control of the shoot stem-cell niche. Nature. 2010;465(7301):1089–1092. doi: 10.1038/nature09126. [DOI] [PubMed] [Google Scholar]

[B68] 32.Walcher C. L., Nemhauser J. L. Bipartite promoter element required for auxin response. Plant Physiology. 2012;158(1):273–282. doi: 10.1104/pp.111.187559. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B69] 33.Hirota A., Kato T., Fukaki H., Aida M., Tasaka M. The auxin-regulated AP2/EREBP gene PUCHI is required for morphogenesis in the early lateral root primordium of Arabidopsis. The Plant Cell. 2007;19(7):2156–2168. doi: 10.1105/tpc.107.050674. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B70] 34.Berendzen K. W., Weiste C., Wanke D., Kilian J., Harter K., Dröge-Laser W. Bioinformatic cis-element analyses performed in Arabidopsis and rice disclose bZIP- and MYB-related binding sites as potential AuxRE-coupling elements in auxin-mediated transcription. BMC Plant Biology. 2012;12, article no. 125 doi: 10.1186/1471-2229-12-125. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B71] 35.Zhu C., Perry S. E. Control of expression and autoregulation of AGL15, a member of the MADS-box family. The Plant Journal. 2005;41(4):583–594. doi: 10.1111/j.1365-313X.2004.02320.x. [DOI] [PubMed] [Google Scholar]

[B72] 36.Schlereth A., Möller B., Liu W., et al. MONOPTEROS controls embryonic root initiation by regulating a mobile transcription factor. Nature. 2010;464(7290):913–916. doi: 10.1038/nature08836. [DOI] [PubMed] [Google Scholar]

[B73] 37.Cheng Z. J., Wang L., Sun W., et al. Pattern of auxin and cytokinin responses for shoot meristem induction results from the regulation of cytokinin biosynthesis by AUXIN RESPONSE FACTOR3. Plant Physiology. 2013;161(1):240–251. doi: 10.1104/pp.112.203166. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B74] 38.Yamaguchi-Shinozaki K., Shinozaki K. A novel cis-acting element in an Arabidopsis gene is involved in responsiveness to drought, low-temperature, or high-salt stress. The Plant Cell. 1994;6(2):251–264. doi: 10.1105/tpc.6.2.251. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B75] 39.Nakashima K., Fujita Y., Katsura K., et al. Transcriptional regulation of ABI3- and ABA-responsive genes including RD29B and RD29A in seeds, germinating embryos, and seedlings of Arabidopsis. Plant Molecular Biology. 2006;60(1):51–68. doi: 10.1007/s11103-005-2418-5. [DOI] [PubMed] [Google Scholar]

[B76] 40.Denekamp M., Smeekens S. C. Integration of wounding and osmotic stress signals determines the expression of the AtMYB102 transcription factor gene. Plant Physiology. 2003;132(3):1415–1423. doi: 10.1104/pp.102.019273. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B77] 41.Hieno A., Naznin H. A., Hyakumachi M., et al. Ppdb: Plant promoter database version 3.0. Nucleic Acids Research. 2014;42(1):D1188–D1192. doi: 10.1093/nar/gkt1027. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B60] 42.Frith M. C., Fu Y., Yu L., Chen J.-F., Hansen U., Weng Z. Detection of functional DNA motifs via statistical over-representation. Nucleic Acids Research. 2004;32(4):1372–1381. doi: 10.1093/nar/gkh299. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B78] 43.Turatsinze J.-V., Thomas-Chollier M., Defrance M., van Helden J. Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules. Nature Protocols. 2008;3(10):1578–1588. doi: 10.1038/nprot.2008.97. [DOI] [PubMed] [Google Scholar]

[B61] 44.Grant C., Bailey T., Noble W. Scanning for occurrences of a given motif. Bioinformatics. 2011;27(2):1017–1018. doi: 10.1093/bioinformatics/btr064. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B62] 45.Beckstette M., Homann R., Giegerich R., Kurtz S. Fast index based algorithms and software for matching position specific scoring matrices. BMC Bioinformatics. 2006;7, article no. 389 doi: 10.1186/1471-2105-7-389. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B27] 46.Barnes M. R. Bioinformatics Challenges for the Geneticist. Bioinformatics for Geneticists: A Bioinformatics Primer for the Analysis of Genetic Data: Second Edition. 2007:1–16. [Google Scholar]

[B28] 47.Stormo G. D., Schneider T. D., Gold L., Ehrenfeucht A. Use of the ‘perceptron’ algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Research. 1982;10(9):2997–3011. doi: 10.1093/nar/10.9.2997. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B29] 48.Stormo G. D. DNA binding sites: representation and discovery. Bioinformatics. 2000;16(1):16–23. doi: 10.1093/bioinformatics/16.1.16. [DOI] [PubMed] [Google Scholar]

[B30] 49.Cuellar-Partida G., Buske F. A., McLeay R. C., Whitington T., Noble W. S., Bailey T. L. Epigenetic priors for identifying active transcription factor binding sites. Bioinformatics. 2012;28(1):56–62. doi: 10.1093/bioinformatics/btr614. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B31] 50.Lähdesmäki H., Rust A. G., Shmulevich I. Probabilistic inference of transcription factor binding from multiple data sources. PLoS ONE. 2008;3(3) doi: 10.1371/journal.pone.0001820. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B32] 51.Yamamoto Y. Y., Ichida H., Matsui M., et al. Identification of plant promoter constituents by analysis of local distribution of short sequences. BMC Genomics. 2007;8, article no. 67 doi: 10.1186/1471-2164-8-67. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B33] 52.Marinescu V. D., Kohane I. S., Riva A. MAPPER: A search engine for the computational identification of putative transcription factor binding sites in multiple genomes. BMC Bioinformatics. 2005;6, article no. 79 doi: 10.1186/1471-2105-6-79. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B34] 53.Bailey T. L., Elkan C. The value of prior knowledge in discovering motifs with MEME. Proceedings / . International Conference on Intelligent Systems for Molecular Biology; ISMB. International Conference on Intelligent Systems for Molecular Biology. 1995;3:21–29. [PubMed] [Google Scholar]

[B35] 54.Pavesi G., Mereghetti P., Mauri G., Pesole G. Weeder web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Research. 2004;32:W199–W203. doi: 10.1093/nar/gkh465. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B36] 55.Thijs G., Lescot M., Marchal K., et al. A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics. 2002;17(12):1113–1122. doi: 10.1093/bioinformatics/17.12.1113. [DOI] [PubMed] [Google Scholar]

[B37] 56.Wei Z., Jensen S. T. GAME: detecting cis-regulatory elements using a genetic algorithm. Bioinformatics. 2006;22(13):1577–1584. doi: 10.1093/bioinformatics/btl147. [DOI] [PubMed] [Google Scholar]

[B38] 57.Tompa M., Li N., Bailey T. L., et al. Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology. 2005;23(1):137–144. doi: 10.1038/nbt1053. [DOI] [PubMed] [Google Scholar]

[B39] 58.Hu J., Yang Y. D., Kihara D. EMD: An ensemble algorithm for discovering regulatory motifs in DNA sequences. BMC Bioinformatics. 2006;7, article no. 342 doi: 10.1186/1471-2105-7-342. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B40] 59.Hu J., Li B., Kihara D. Limitations and potentials of current motif discovery algorithms. Nucleic Acids Research. 2005;33(15):4899–4913. doi: 10.1093/nar/gki791. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B41] 60.Ponomarenko P. M., Ponomarenko M. P. Sequence-based prediction of transcription upregulation by auxin in plants. Journal of Bioinformatics and Computational Biology. 2015;13(1) doi: 10.1142/s0219720015400090.1540009 [DOI] [PubMed] [Google Scholar]

[B42] 61.Zimmermann P., Hirsch-Hoffmann M., Hennig L., Gruissem W. GENEVESTIGATOR. Arabidopsis microarray database and analysis toolbox. Plant Physiology. 2004;136(1):2621–2632. doi: 10.1104/pp.104.046367. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B43] 62.Goda H., Sasaki E., Akiyama K., et al. The AtGenExpress hormone and chemical treatment data set: Experimental design, data evaluation, model data analysis and data access. The Plant Journal. 2008;55(3):526–542. doi: 10.1111/j.1365-313X.2008.03510.x. [DOI] [PubMed] [Google Scholar]

[B44] 63.Dempster A. P. New methods for reasoning towards posterior distributions based on sample data. Annals of Mathematical Statistics. 1966;37:355–374. doi: 10.1214/aoms/1177699517. [DOI] [Google Scholar]

[B45] 64.Dempster A. P. Upper and lower probabilities induced by a multivalued mapping. Annals of Mathematical Statistics. 1967;38:325–339. doi: 10.1214/aoms/1177698950. [DOI] [Google Scholar]

[B46] 65.Shafer G. A Mathematical Theory of Evidence. Princeton, NJ, USA: Princeton University Press; 1976. [Google Scholar]

[B47] 66.Kaftandjian V., Zhu Y. M., Dupuis O., Babot D. The combined use of the evidence theory and fuzzy logic for improving multimodal nondestructive testing systems. IEEE Transactions on Instrumentation and Measurement. 2005;54(5):1968–1977. doi: 10.1109/TIM.2005.854255. [DOI] [Google Scholar]

[B48] 67.Zhu Y. M., Bentabet L., Dupuis O., Kaftandjian V., Babot D., Rombaut M. Automatic determination of mass functions in Dempster-Shafer theory using fuzzy c-means and spatial neighborhood information for image segmentation. Optical Engineering. 2002;41(4):760–770. doi: 10.1117/1.1457458. [DOI] [Google Scholar]

[B49] 68.Huala E., Dickerman A. W., Garcia-Hernandez M., et al. The Arabidopsis Information Resource (TAIR): A comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant. Nucleic Acids Research. 2001;29(1):102–105. doi: 10.1093/nar/29.1.102. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B50] 69.Guo F.-B., Ou H.-Y., Zhang C.-T. ZCURVE: A new system for recognizing protein-coding genes in bacterial and archaeal genomes. Nucleic Acids Research. 2003;31(6):1780–1789. doi: 10.1093/nar/gkg254. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B51] 70.Cole M., Chandler J., Weijers D., Jacobs B., Comelli P., Werr W. DORNRÖSCHEN is a direct target of the auxin response factor MONOPTEROS in the Arabidopsis embryo. Development. 2009;136(10):1643–1651. doi: 10.1242/dev.032177. [DOI] [PubMed] [Google Scholar]

[B52] 71.Stormo G. D., Hartzell G. W., III Identifying protein-binding sites from unaligned DNA fragments. Proceedings of the National Acadamy of Sciences of the United States of America. 1989;86(4):1183–1187. doi: 10.1073/pnas.86.4.1183. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B53] 72.Liu X. S., Brutlag D. L., Liu J. S. An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nature Biotechnology. 2002;20(8):835–839. doi: 10.1038/nbt717. [DOI] [PubMed] [Google Scholar]

[B54] 73.Poluliakh N., Takagi T., Nakai K. MELINA: Motif extraction from promoter regions of potentially co-regulated genes. Bioinformatics. 2003;19(3):423–424. doi: 10.1093/bioinformatics/btf872. [DOI] [PubMed] [Google Scholar]

[B55] 74.Yamamoto Y. Y., Yoshioka Y., Hyakumachi M., et al. Prediction of transcriptional regulatory elements for plant hormone responses based on microarray data. BMC Plant Biology. 2011;11, article no. 39 doi: 10.1186/1471-2229-11-39. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B56] 75.Rudd P. In search of the gold standard for compliance measurement. JAMA Internal Medicine. 1979;139(6):627–628. doi: 10.1001/archinte.139.6.627. [DOI] [PubMed] [Google Scholar]

[B57] 76.Okonechnikov K., Golosova O., Fursov M., et al. Unipro UGENE: a unified bioinformatics toolkit. Bioinformatics. 2012;28(8):1166–1167. doi: 10.1093/bioinformatics/bts091. [DOI] [PubMed] [Google Scholar]

[B58] 77.Shen W., Le S., Li Y., Hu F. SeqKit: A cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS ONE. 2016;11(10) doi: 10.1371/journal.pone.0163962. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B59] 78.Jayaram N., Usvyat D., R. Martin A. C. Evaluating tools for transcription factor binding site prediction. BMC Bioinformatics. 2016 doi: 10.1186/s12859-016-1298-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Dempster-Shafer Theory for the Prediction of Auxin-Response Elements (AuxREs) in Plant Genomes

Nesrine Sghaier

Rayda Ben Ayed

Riadh Ben Marzoug

Ahmed Rebai

Abstract

1. Introduction

2. Material and Methods

2.1. Training Set (Data Collection)

Table 1.

2.2. Implementation of the Algorithm

2.3. Some Fundamentals of Dempster-Shafer Theory

3. Results and Discussion

3.1. Modelling Uncertainty of AuxRE Detection

Figure 1.

3.1.1. Extraction of Parameters

3.1.2. Construction of Learning Graphs

Figure 2.

Figure 3.

Figure 4.

3.1.3. Confidence Regions

Table 2.

Table 3.

Table 4.

3.1.4. Modelling the Doubt on the Hypotheses

Table 5.

3.1.5. Fuzzification of the Learning Graphs

3.1.6. Data Fusion Methodology

3.2. Scan of the Auxin Responsive DRN Promoter

Figure 5.

3.3. Comparison between Method 1, Method 2, and Fusion

Figure 6.

Figure 7.

3.4. Scan of DRN Promoter by Other Methods

3.5. Scan of RD29B Promoter

3.6. Validation of the Results

Figure 8.

Table 6.

4. Conclusion

Acknowledgments

Data Availability

Conflicts of Interest

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases