TABLE 1.
General information on the tools used here
Tool | Method | Training sequence data seta | No. of E. coli sigma factors | Availability | Yr | Reference | No. of citations (Google Scholar)b |
---|---|---|---|---|---|---|---|
BPROM | Weight matrices of different motifs combined with linear discriminant analysis | Positive: Experimentally validated promoters from E. coli (14). | 70 | Web server http://www.softberry.com/berry.phtml?topic=bprom&group=programs&subgroup=gfindb | 2011 | 33 | 427 |
Negative: Inner regionsof protein-coding ORFs. | |||||||
bTSSfinder | Position weight matricesfor promoter elements, oligomer frequencies, physicochemical properties as features, and Mahalanobis distance for feature selection and with neural network for classification | Positive: Experimentally validated TSSs from Regulon DB. | 24, 28, 32, 38, 70 | Stand-alone and Web server http://www.cbrc.kaust.edu.sa/btssfinder/ | 2016 | 23 | 26 |
[−200, +51]. | |||||||
Negative: Genomic regions withno experimental evidence for the presence of TSSs. | |||||||
BacPP | Weighted rules extracted from neural network | Positive: Regulon DB available promoters. | 24, 28, 32 38, 54, 70 | Web server http://www.bacpp.bioinfoucs.com/home | 2011 | 17 | 22 |
[−60, +20]. | |||||||
Negative: randomly generated sequences (with established nucleotide frequencies) and intergenic sequences. | |||||||
Virtual Footprint | PWMs from different available databases | Web server http://www.prodoric.de/vfp/vfp_promoter.php | 2005 | 36 | 370 | ||
IBBP | Image-based and evolutionary approach which generates “images” (template-image strings that keep features of spatial sequence relationships) | Positive: sigma 70 promoters from Regulon DB. | 70 (expand- able approach) | Source code https://github.com/hahatcdg/IBPP | 2018 | 35 | 1 |
[−60, +20]. | |||||||
Negative: randomly generated from protein-coding sequences. | |||||||
iPro70-FMWin | 22,595 features extracted from sequence and AdaBoost to select the most representatives among then; logistic regression classifier | Positive: Regulon DB annotated promoters. | 70 | Webserver http://ipro70.pythonanywhere.com/ | 2019 | 38 | 4 |
[−60, 20]. | |||||||
Negative: randomly generated from protein- coding and intergenic region sequences. | |||||||
70ProPred | Support vector machine using position-specific tendencies of trinucleotide and electron-ion interaction pseudopotentials as features | Positive: promoters from Regulon DB. | 70 | Webserver http://server.malab.cn/70ProPred/ | 2017 | 39 | 33 |
[−60, 20]. | |||||||
Negative: randomly generated from coding and noncoding sequences. | |||||||
CNNProm | Convolutional neural networks | Positive: promoters from Regulon DB. | 70 | http://www.softberry.com/berry.phtml?topic=index&group=programs&subgroup=deeplearn | 2017 | 34 | 60 |
[−60, 20]. | |||||||
Negative: the opposite chain of randomly selected protein-coding genes. | |||||||
MULTiPly | Support vector machine using biprofile Bayes, KNN features, k-tuple nucleotide compositions, and dinucleotide-based auto-covariance as features | Positive: promoters from Regulon DB. | 24, 28, 32, 38, 54, 70 | Web server and stand-alone http://flagshipnt.erc.monash.edu/MULTiPly/ | 2019 | 41 | 30 |
[−60, 20]. | |||||||
iPromoter-2L | Multiwindow-based pseudo k-tuple nucleotide composition with physicochemical properties as features and Random Forest as a predictor | Positive: promoters from Regulon DB. | 24, 28, 32. 38, 54, 70 | Web server http://bioinformatics.hitsz.edu.cn/iPromoter-2L/ | 2018 | 40 | 180 |
[−60, 20]. | |||||||
Negative: randomly extracted from the middle regions of long coding sequences and convergent intergenic region (none of the promoters in each set has more than 0.8 pairwise sequence identity) |
Positive, positive sequences, sequences expected to be promoters. Negative, negative sequences, sequences expected to not include promoters. The interval of the sequence with the boundary numbers related to a TSS is indicated within brackets ([−60, +20], [−60, +19], or [−200, +51]).
Citations checked on 3 May 2020.