Abstract
Motivation
Prediction and identification of core promoter elements and transcription factor binding sites is essential for understanding the mechanism of transcription initiation and deciphering the biological activity of a specific locus. Thus, there is a need for an up-to-date tool to detect and curate core promoter elements/motifs in any provided nucleotide sequences.
Results
Here, we introduce ElemeNT 2023—a new and enhanced version of the Elements Navigation Tool, which provides novel capabilities for assessing evolutionary conservation and for readily evaluating the quality of high-throughput transcription start site (TSS) datasets, leveraging preferential motif positioning. ElemeNT 2023 is accessible both as a fast web-based tool and via command line (no coding skills are required to run the tool). While this tool is focused on core promoter elements, it can also be used for searching any user-defined motif, including sequence-specific DNA binding sites. Furthermore, ElemeNT’s CORE database, which contains predicted core promoter elements around annotated TSSs, is now expanded to cover 10 species, ranging from worms to human. In this applications note, we describe the new workflow and demonstrate a case study using ElemeNT 2023 for core promoter composition analysis of diverse species, revealing motif prevalence and highlighting evolutionary insights. We discuss how this tool facilitates the exploration of uncharted transcriptomic data, appraises TSS quality, and aids in designing synthetic promoters for gene expression optimization. Taken together, ElemeNT 2023 empowers researchers with comprehensive tools for meticulous analysis of sequence elements and gene expression strategies.
Availability and implementation
ElemeNT 2023 is freely available at https://www.juven-gershonlab.org/resources/element-v2023/. The source code and command line version of ElemeNT 2023 are available at https://github.com/OritAdato/ElemeNT. No coding skills are required to run the tool.
1 Introduction
Successful analysis of gene regulation depends upon knowledge of underlying DNA sequence motifs. To understand the factors and mechanisms mediating expression of a given gene of interest, it is thus critical to accurately define the underlying DNA sequence motifs. The core promoter, often referred to as “the gateway to transcription” (Heintzman and Ren 2007, Juven-Gershon et al. 2008), is an 80-bp region that may contain one or more short DNA sequences, termed core promoter elements/motifs. These elements and their composition are central in the process of transcription initiation by RNA polymerase II (Pol II) (Thomas and Chiang 2006, Ohler and Wassarman 2010, Wang et al. 2014, Danino et al. 2015, Haberle and Lenhard 2016, Vo Ngoc et al. 2019, Sloutskin et al. 2021). Notably, many sequence-specific transcription factors (TFs) bind to the proximal promoter region (within ∼ −150 to −50, relative to the TSS). Together, the spacing and composition of core promoter elements and TFs play an important role in the spatio-temporal pattern of Pol II initiation (Lenhard et al. 2012, Spitz and Furlong 2012, Lu et al. 2020).
ElemeNT (Sloutskin et al. 2015) is a tool used for detection and curation of core promoter elements within user-provided sequences. It utilizes position weight matrices generated based on experimentally validated sequences, rather than over-represented motifs. This web-based interactive tool can be easily used to predict and display putative core promoter elements and their biologically relevant combinations. Here, we present a new and advanced version, ElemeNT 2023, that enables researchers to identify core promoter elements, transcription factor binding sites (TFBSs) or any DNA sequence motif of interest, examine their evolutionary conservation across species, and obtain valuable biological insights. In addition, ElemeNT 2023 facilitates a unique approach to rapidly assess the quality of high-throughput TSS datasets leveraging natural spacing preferences of diverse DNA sequence motifs.
2 Materials and methods
ElemeNT 2023 is implemented in Perl, and the installation package of the command line version is available for download in GitHub. This version was extended to include four new core promoter elements, namely pause button (PB), the BBCA+1BW initiator, Ohler Motif 1 (Motif 1) and GAGA factor binding site. The position weight matrices (PWMs) of the PB, BBCA+1BW initiator and Motif 1 are based on published experimental data (Ohler et al. 2002, Hendrix et al. 2008, Vo Ngoc et al. 2017). The PWM of the GAGA factor binding site is based on CISBP motif M5247, version 1.02 (Weirauch et al. 2014).
2.1 Normalization to GC content
The fraction of the provided GC is divided by 2 to represent the background probabilities of the individual occurrences of G and C. Accordingly, the background probabilities of individual occurrences of A and T were calculated as (1 − GC)/2. These probabilities were used to normalize the PWM of the correlated nucleotides in the motif, i.e. the nucleotide score at every position in the PWM was calculated as follows: in case of G or C score(i,j) = log2(P(i,j)/PGC/2), and in case of A or T it was calculated as score(i,j)=log2(P(i,j)/P(1−GC)/2).
2.2 Visualizing the distributions of core promoter elements among nascent transcription peaks
We have recently performed nascent RNA sequencing using capped-small RNA sequencing (csRNA-seq) (Duttke et al. 2019) of the first 8 h of Drosophila melanogaster embryonic development. Data are available at Gene Expression Omnibus (GEO) accession number GSE221852. csRNA-seq data (GSE135498) of human K562 cells and mouse bone marrow-derived macrophages (BMDM) (Duttke et al. 2019) were also analyzed. Peak calling was performed using HOMER (Duttke et al. 2019). For the identified annotated transcription start regions, the genomic sequences of the core promoter region (±100 bp around TSS) were downloaded. These sequences were used as input to ElemeNT 2023 for detection of the TATA box (TATA), Drosophila initiator (dInr), PB and downstream core promoter element (DPE), GAGA, motif 1 and dTCT motifs. For each searched element, we used a python script to calculate the fraction (%) of transcripts containing the element identified by ElemeNT at a specific position (out of a total number of the specific element), and their median score in the specific position. The graphs were generated using a custom R script (available in GitHub).
3 Results
3.1 ElemeNT workflow
ElemeNT detects and annotates core promoter elements (or any user-defined TFBS or sequence motif), and calculates its PWM log score (higher or equal to the defined cutoff; Fig. 1A). The output can be used as a quality control tool for transcriptomics datasets and to gain new biological insights into examined data. For example, the quality of the input data and the sequence conservation of the elements can be evaluated based on the distribution and scores of a DNA motif occurrence relative to the reported TSSs.
Figure 1.
(A) General workflow of analysis with ElemeNT 2023. Sequences and PWMs of core promoter elements are provided as input to the ElemeNT utility. The output of the utility run includes the location and score for every detected element. (B) Results of ElemeNT 2023 run on TSSs derived from nascent csRNA-seq of the first 8 h of D. melanogaster embryonic development. The X-axis represents the predicted element position relative to the TSS for the TATA box, dInr and PB, and the predicted position relative to the A+1 of the dInr for the DPE. The Y-axis represents the percentage of transcripts containing the element at the specific position (out of all transcripts containing the element). The color hue represents the mean score of the predicted elements within a specific position, as indicated in the color legend.
Specifically, the new functionality added to ElemeNT 2023 includes:
An updated web-based user interface that includes UCSC sessions (vCORE; https://www.juven-gershonlab.org/resources/vcore/) for visualization of detected elements in ten species, from worm to human
PWMs search scores are calculated using log likelihoods: The PWMs used for motif search in ElemeNT 2023 are now calculated as log likelihoods (log2), providing higher resolution for small-scale differences. PWMs can be downloaded as an excel table from https://www.juven-gershonlab.org/resources/element-v2023/element-v-2023-manual/
-
Newly added features:
normalization to GC-content;
elements search can be performed in both directions (i.e. sense and antisense DNA strands); and
the current collection of experimentally validated core promoter elements was expanded to include: the human BBCA+1BW initiator, as well as three motifs that were previously implicated in Pol II pausing, namely Motif 1, Pol II PB and the GAGA factor binding site (Li and Gilmour 2013).
An extended capability for quality control analysis of next-generation sequencing transcriptomic data based on spacing of sequencing motifs relative to the TSSs. Following the suggested workflow (Fig. 1A), the output of ElemeNT 2023 can be used as input to the script available in GitHub.
3.2 Case study for ElemeNT 2023 usage
3.2.1 Application of ElemeNT 2023 to nascent RNA data
In the case study presented in this applications note, ElemeNT 2023 was applied to genomic sequences around experimentally determined nascent TSSs captured by csRNA-seq (Duttke et al. 2019) during the initial 8 h of D. melanogaster embryonic development (Fig. 1B). The analysis revealed the conservation of positional information and motif score, inferring biological significance that can be utilized to assess the quality of high-throughput TSS datasets. For instance, the majority of identified dInr and DPE motifs were precisely positioned relative to the TSS, specifically at −2 and +28, respectively. This aligns with the strict spacing dependency of the DPE on the dInr motif (Burke and Kadonaga 1997, Kutach and Kadonaga 2000). The observed elevated mean scores (indicated by the darker color in Fig. 1B) of motifs occurring at the expected positions, further underscore the requirement of exact positioning of these motifs for their biological function.
In contrast, TATA box sequences exhibited broader spatial variability, in line with previous studies (Ohler et al. 2002, Gershenzon and Ioshikhes 2005, Carninci et al. 2006, Ponjavic et al. 2006). Notably, while the PB motif's location lacks strict adherence (Hendrix et al. 2008), the most prevalent PB position was downstream [in line with Hendrix et al. (2008)], with a marked preference at +26. In contrast, the GAGA factor binding site, which was also implicated in Pol II pausing motif (Li and Gilmour 2013) is not strictly positioned (Supplementary Fig. S1). Both Motif 1 and dTCT were strictly localized, with peaks at −5 and −2 upstream of the TSS (Supplementary Fig. S1). Analysis of available csRNA-seq data (GSE135498) of human K562 cells and mouse bone marrow-derived macrophages (BMDM) (Supplementary Fig. S2) demonstrates that the use of ElemeNT 2023 is not limited to D. melanogaster. These findings underscore ElemeNT's capacity to precisely detect and characterize core promoter elements within their relevant genomic context, shedding light on their distribution and conservation.
3.2.2 The CORE database was expanded to 10 species, from worm to human
The ElemeNT CORE database, which contains predicted core promoter elements around Cap Analysis of Gene Expression (CAGE)-annotated TSSs (https://epd.expasy.org/epd/EPDnew_database.php), was expanded from Drosophila to now include 10 species, ranging from worm to human (D. melanogaster (dm6), Homo sapiens (hg38), Macaca mulatta (rheMac8), Mus musculus (mm10), Rattus norvegicus (rn6), Gallus gallus (galGal5), Canis familiaris (canFam3), Apis mellifera (amel5), Danio rerio (danRer7), and Caenorhabditis elegans (ce6)). This analysis is available at https://www.juven-gershonlab.org/resources/core-2023/ as a precompiled file named CORE.
Comparing promoter sequences from these species (Dreos et al. 2015, Meylan et al. 2020) revealed that the positions of both the TATA box and the dInr motif are largely conserved among these species (Supplementary Fig. S3). However, the mean score of dInr is higher in fly and honeybee, compared to the other species analyzed. The PB motif, which was first discovered in flies (Hendrix et al. 2008) seems to be fly-specific (Supplementary Fig. S3). While it was previously detected between +20 and +30 (Hendrix et al. 2008), both nascent transcriptomics and EPDnew indicate its enrichment in position +26 in D. melanogaster (Fig. 1B and Supplementary Fig. S3).
Overall, both the location and the scores of the detected motifs were similar across the species, indicating the functionality of these elements is conserved across the diverse analyzed metazoan species. The updated CORE database is available at https://www.juven-gershonlab.org/resources/core-2023/. UCSC visual sessions (vCORE) are available at https://www.juven-gershonlab.org/resources/vcore/.
3.2.3 ElemeNT 2023 as a tool to evaluate high-throughput transcriptomics datasets and transcription start site quality
Many core promoter elements exhibit strict functional spacing dependencies on the TSS. ElemeNT 2023 quickly determines core promoter element distributions relative to the TSS, enabling assessing TSS quality based on biological properties. In this case study, we compared the EPDnew data (Supplementary Fig. S3), which is largely based on CAGE, with nascent csRNA-seq data (Fig. 1), and RNA annotation and mapping of promoters for the analysis of gene expression (RAMPAGE) data of D. melanogaster (Supplementary Fig. S4, GSE89299; Batut and Gingeras 2017). This comparison revealed the spatial preference of the dInr, PB and DPE as most stringent in csRNA-seq captured nascent TSSs, while the RAMPAGE-based TATA box seems more strictly positioned than the TATA box identified in the csRNA-seq data.
Together, this case study demonstrates the use of ElemeNT and biological features, such as core promoter element spacing constraints, to evaluate high-throughput transcriptomics data.
4 Discussion
Predicting and identifying core promoter elements and TFBSs is an essential part in the process of understanding the mechanism of transcription initiation and deciphering the biological activity of a specific locus. ElemeNT 2023 is a simple, fast, and user-friendly web-based interactive tool for prediction and display of any sequence element. It is also available as a command line tool on GitHub. ElemeNT 2023 was designed to annotate these elements in any combination and genomic sequence. In addition to the location and score of identified elements, it also displays their biologically relevant combinations (e.g. the dependency of downstream core promoter elements on the presence of an Inr motif and the precise spacing from it), without a need for prior determination of the TSS. Notably, given the spatial preference of many DNA sequence motifs relative to the TSSs (Fig. 1) (Delos Santos et al. 2022), ElemeNT 2023 also provides a unique approach to assess the quality of high-throughput TSS datasets and to confirm the existence of a core promoter region.
ElemeNT 2023 includes the functionality of searching for any user-provided PWM in a given sequence. It can also be used for predicting core promoter elements and TFBSs near enhancer RNA (eRNA) TSSs, enabling a comparison between promoters and enhancers, which were previously suggested to have a unified architecture of initiation regions (Core et al. 2014, Andersson et al. 2015). The presented tool and resources add complimentary information to existing tools like JASPAR (Castro-Mondragon et al. 2022) and PINTS (Yao et al. 2022). As combinations of core promoter elements and TFBSs can also be used to enhance gene expression (Juven-Gershon et al. 2006), ElemeNT 2023 can be utilized to engineer potent promoters. Therefore, ElemeNT 2023 fills the need for an easy and convenient web-based tool to quickly annotate sequence elements, empowering analysis, quality control and discovery.
Supplementary Material
Acknowledgements
We thank Dr Tirza Doniger for fruitful discussions and Dr Diana Ideses for critical reading of the manuscript. We are indebted to Yehuda Bar Lev for invaluable help and support in setting up the website.
Contributor Information
Orit Adato, The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, 5290002, Israel.
Anna Sloutskin, The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, 5290002, Israel.
Hodaya Komemi, The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, 5290002, Israel.
Ian Brabb, School of Molecular Biosciences, College of Veterinary Medicine, Washington State University, Pullman, WA 99164, United States.
Sascha Duttke, School of Molecular Biosciences, College of Veterinary Medicine, Washington State University, Pullman, WA 99164, United States.
Philipp Bucher, Swiss Institute of Bioinformatics (SIB), 1015 Lausanne, Switzerland.
Ron Unger, The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, 5290002, Israel.
Tamar Juven-Gershon, The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, 5290002, Israel.
Supplementary data
Supplementary data are available at Bioinformatics online.
Conflict of interest
None declared.
Funding
The work was partially supported by the National Institutes of Health [NIGMS R00-GM135515] to S.D.
References
- Andersson R, Sandelin A, Danko CG.. A unified architecture of transcriptional regulatory elements. Trends Genet 2015;31:426–33. [DOI] [PubMed] [Google Scholar]
- Batut PJ, Gingeras TR.. Conserved noncoding transcription and core promoter regulatory code in early Drosophila development. Elife 2017;6:e29005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burke TW, , Kadonaga JT.. The downstream core promoter element, DPE, is conserved from drosophila to humans and is recognized by TAFII60 of drosophila. Genes Dev 1997;11:3020–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carninci P, Sandelin A, Lenhard B. et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet 2006;38:626–35. [DOI] [PubMed] [Google Scholar]
- Castro-Mondragon JA, Riudavets-Puig R, Rauluseviciute I. et al. JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res 2022;50:D165–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Core LJ, Martins AL, Danko CG. et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat Genet 2014;46:1311–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Danino YM, Even D, Ideses D. et al. The core promoter: at the heart of gene expression. Biochim Biophys Acta 2015;1849:1116–31. [DOI] [PubMed] [Google Scholar]
- Delos Santos NP, Duttke S, Heinz S. et al. MEPP: more transparent motif enrichment by profiling positional correlations. NAR Genom Bioinform 2022;4:lqac075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dreos R, Ambrosini G, Périer RC. et al. The Eukaryotic Promoter Database: expansion of EPDnew and new promoter analysis tools. Nucleic Acids Res 2015;43:D92–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duttke SH, Chang MW, Heinz S. et al. Identification and dynamic quantification of regulatory elements using total RNA. Genome Res 2019;29:1836–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gershenzon NI, Ioshikhes IP.. Synergy of human Pol II core promoter elements revealed by statistical sequence analysis. Bioinformatics 2005;21:1295–300. [DOI] [PubMed] [Google Scholar]
- Haberle V, Lenhard B.. Promoter architectures and developmental gene regulation. Semin Cell Dev Biol 2016;57:11–23. [DOI] [PubMed] [Google Scholar]
- Heintzman ND, Ren B.. The gateway to transcription: identifying, characterizing and understanding promoters in the eukaryotic genome. Cell Mol Life Sci 2007;64:386–400. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hendrix DA, Hong J-W, Zeitlinger J. et al. Promoter elements associated with RNA Pol II stalling in the Drosophila embryo. Proc Natl Acad Sci U S A 2008;105:7762–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Juven-Gershon T, Cheng S, Kadonaga JT.. Rational design of a super core promoter that enhances gene expression. Nat Methods 2006;3:917–22. [DOI] [PubMed] [Google Scholar]
- Juven-Gershon T, Hsu J-Y, Theisen JW. et al. The RNA polymerase II core promoter—the gateway to transcription. Curr Opin Cell Biol 2008;20:253–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kutach AK, , Kadonaga JT.. The downstream promoter element DPE appears to be as widely used as the TATA box in drosophila core promoters. Mol Cell Biol 2000;20:4754–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lenhard B, Sandelin A, Carninci P.. Metazoan promoters: emerging characteristics and insights into transcriptional regulation. Nat Rev Genet 2012;13:233–45. [DOI] [PubMed] [Google Scholar]
- Li J, Gilmour DS.. Distinct mechanisms of transcriptional pausing orchestrated by GAGA factor and M1BP, a novel transcription factor. EMBO J 2013;32:1829–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lu D, Sin H-S, Lu C. et al. Developmental regulation of cell type-specific transcription by novel promoter-proximal sequence elements. Genes Dev 2020;34:663–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meylan P, Dreos R, Ambrosini G. et al. EPD in 2020: enhanced data visualization and extension to ncRNA promoters. Nucleic Acids Res 2020;48:D65–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ohler U, Liao G-C, Niemann H. et al. Computational analysis of core promoters in the Drosophila genome. Genome Biol 2002;3:RESEARCH0087. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ohler U, Wassarman DA.. Promoting developmental transcription. Development 2010;137:15–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ponjavic J, Lenhard B, Kai C. et al. Transcriptional and structural impact of TATA-initiation site spacing in mammalian core promoters. Genome Biol 2006;7:R78. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sloutskin A, Danino YM, Orenstein Y. et al. ElemeNT: a computational tool for detecting core promoter elements. Transcription 2015;6:41–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sloutskin A, Shir-Shapira H, Freiman RN. et al. The core promoter is a regulatory hub for developmental gene expression. Front Cell Dev Biol 2021;9:666508. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Spitz F, Furlong EE.. Transcription factors: from enhancer binding to developmental control. Nat Rev Genet 2012;13:613–26. [DOI] [PubMed] [Google Scholar]
- Thomas MC, Chiang CM.. The general transcription machinery and general cofactors. Crit Rev Biochem Mol Biol 2006;41:105–78. [DOI] [PubMed] [Google Scholar]
- Vo Ngoc L, Cassidy CJ, Huang CY. et al. The human initiator is a distinct and abundant element that is precisely positioned in focused core promoters. Genes Dev 2017;31:6–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vo Ngoc L, Kassavetis GA, Kadonaga JT.. The RNA polymerase II core promoter in Drosophila. Genetics 2019;212:13–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang Y-L, Duttke SHC, Chen K. et al. TRF2, but not TBP, mediates the transcription of ribosomal protein genes. Genes Dev 2014;28:1550–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weirauch MT, Yang A, Albu M. et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell 2014;158:1431–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yao L, Liang J, Ozer A. et al. A comparison of experimental assays and analytical methods for genome-wide identification of active enhancers. Nat Biotechnol 2022;40:1056–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

