Skip to main content
. 2020 Aug 25;5(4):e00439-20. doi: 10.1128/mSystems.00439-20

TABLE 1.

General information on the tools used here

Tool Method Training sequence data seta No. of E. coli sigma factors Availability Yr Reference No. of citations (Google Scholar)b
BPROM Weight matrices of different motifs combined with linear discriminant analysis Positive: Experimentally validated promoters from E. coli (14). 70 Web server http://www.softberry.com/berry.phtml?topic=bprom&group=programs&subgroup=gfindb 2011 33 427
Negative: Inner regionsof protein-coding ORFs.
bTSSfinder Position weight matricesfor promoter elements, oligomer frequencies, physicochemical properties as features, and Mahalanobis distance for feature selection and with neural network for classification Positive: Experimentally validated TSSs from Regulon DB. 24, 28, 32, 38, 70 Stand-alone and Web server http://www.cbrc.kaust.edu.sa/btssfinder/ 2016 23 26
[−200, +51].
Negative: Genomic regions withno experimental evidence for the presence of TSSs.
BacPP Weighted rules extracted from neural network Positive: Regulon DB available promoters. 24, 28, 32 38, 54, 70 Web server http://www.bacpp.bioinfoucs.com/home 2011 17 22
[−60, +20].
Negative: randomly generated sequences (with established nucleotide frequencies) and intergenic sequences.
Virtual Footprint PWMs from different available databases Web server http://www.prodoric.de/vfp/vfp_promoter.php 2005 36 370
IBBP Image-based and evolutionary approach which generates “images” (template-image strings that keep features of spatial sequence relationships) Positive: sigma 70 promoters from Regulon DB. 70 (expand- able approach) Source code https://github.com/hahatcdg/IBPP 2018 35 1
[−60, +20].
Negative: randomly generated from protein-coding sequences.
iPro70-FMWin 22,595 features extracted from sequence and AdaBoost to select the most representatives among then; logistic regression classifier Positive: Regulon DB annotated promoters. 70 Webserver http://ipro70.pythonanywhere.com/ 2019 38 4
[−60, 20].
Negative: randomly generated from protein- coding and intergenic region sequences.
70ProPred Support vector machine using position-specific tendencies of trinucleotide and electron-ion interaction pseudopotentials as features Positive: promoters from Regulon DB. 70 Webserver http://server.malab.cn/70ProPred/ 2017 39 33
[−60, 20].
Negative: randomly generated from coding and noncoding sequences.
CNNProm Convolutional neural networks Positive: promoters from Regulon DB. 70 http://www.softberry.com/berry.phtml?topic=index&group=programs&subgroup=deeplearn 2017 34 60
[−60, 20].
Negative: the opposite chain of randomly selected protein-coding genes.
MULTiPly Support vector machine using biprofile Bayes, KNN features, k-tuple nucleotide compositions, and dinucleotide-based auto-covariance as features Positive: promoters from Regulon DB. 24, 28, 32, 38, 54, 70 Web server and stand-alone http://flagshipnt.erc.monash.edu/MULTiPly/ 2019 41 30
[−60, 20].
iPromoter-2L Multiwindow-based pseudo k-tuple nucleotide composition with physicochemical properties as features and Random Forest as a predictor Positive: promoters from Regulon DB. 24, 28, 32. 38, 54, 70 Web server http://bioinformatics.hitsz.edu.cn/iPromoter-2L/ 2018 40 180
[−60, 20].
Negative: randomly extracted from the middle regions of long coding sequences and convergent intergenic region (none of the promoters in each set has more than 0.8 pairwise sequence identity)
a

Positive, positive sequences, sequences expected to be promoters. Negative, negative sequences, sequences expected to not include promoters. The interval of the sequence with the boundary numbers related to a TSS is indicated within brackets ([−60, +20], [−60, +19], or [−200, +51]).

b

Citations checked on 3 May 2020.