Skip to main content
Genes & Development logoLink to Genes & Development
. 2023 May 1;37(9-10):377–382. doi: 10.1101/gad.350572.123

Analysis of the Drosophila and human DPR elements reveals a distinct human variant whose specificity can be enhanced by machine learning

Long Vo ngoc 1,1,2, Torrey E Rhyne 1,2, James T Kadonaga 1,
PMCID: PMC10270198  PMID: 37163335

In this study, Vo ngoc et al. used machine learning to compare the downstream core promoter (DPR) region, a DNA motif within the core promoter involved in transcription initiation, in humans and Drosophila. They identify synthetic DPR variants with specificity for species-specific transcription factors and discuss implications of their synthetic variant modeling strategy in the functional annotation of DNA sequence elements.

Keywords: transcription, RNA polymerase II, core promoter, gene expression, Drosophila

Abstract

The RNA polymerase II core promoter is the site of convergence of the signals that lead to the initiation of transcription. Here, we performed a comparative analysis of the downstream core promoter region (DPR) in Drosophila and humans by using machine learning. These studies revealed a distinct human-specific version of the DPR and led to the use of machine learning models for the identification of synthetic extreme DPR motifs with specificity for human transcription factors relative to Drosophila factors and vice versa. More generally, machine learning models could similarly be used to design synthetic DNA elements with customized functional properties.


The initiation of transcription by RNA polymerase II is an important step in the expression of genes (for example, see Haberle and Stark 2018; Cramer 2019; Roeder 2019; Vo ngoc et al. 2019; Schier and Taatjes 2020; Zeitlinger 2020; Sloutskin et al. 2021; Galouzis and Furlong 2022). Transcription initiates at the core promoter, which comprises the stretch of DNA from about −40 to +40 relative to the +1 transcription start site (TSS). The core promoter is often referred to as the “gateway to transcription,” as it is the site of convergence of the signals that direct the initiation of transcription. Core promoter activity is driven by DNA sequence elements such as the TATA box, initiator (Inr), and downstream promoter region (DPR), which is the revised name for the combined motif ten element (MTE) (Lim et al. 2004) and downstream core promoter element (DPE) (Burke and Kadonaga 1996) from +17 to +35 relative to the +1 TSS (Vo ngoc et al. 2020). There are no universal core promoter elements. Moreover, individual core promoter motifs are involved in transcriptional regulatory functions such as enhancer–core promoter specificity (for example, see Ohtsuki et al. 1998; Butler and Kadonaga 2001; Zabidi et al. 2015; Galouzis and Furlong 2022).

A key challenge in the study of the core promoter has been the ability to predict the existence and activity of particular core promoter elements. In the past, the presence of core promoter motifs has been generally assessed by the similarity of the DNA sequences to consensus sequences, such as TATAAA for the TATA box. This method can lead to the incorrect assignments of motifs and also does not provide a quantitative prediction of the transcription strength of the putative elements.

To address this problem, we used a two-step approach in which we first determined the transcription strengths of each of hundreds of thousands of DNA sequence variants of a core promoter motif (by high-throughput analysis of randomized promoter elements [HARPE]) and then used the resulting data to create a support vector regression (SVR; a form of machine learning and artificial intelligence) model of the element (Vo ngoc et al. 2020). The SVR model for a core promoter element provides an objective, data-based, quantitative prediction of the transcription strength of any test DNA sequence. Thus far, we have generated SVR models for the human TATA box and the human DPR (Vo ngoc et al. 2020).

In this study, we carried out a comparative analysis of the DPR motif in humans and in Drosophila. To accomplish this objective, we generated high-throughput HARPE data and a machine learning model for the Drosophila DPR. We felt that this work would yield new insights into the DPR from the perspectives of a protostome and a deuterostome, which have an estimated species divergence time of ∼700 million years (Kumar et al. 2022). In addition, the DPE has been found to occur frequently in Drosophila promoters (for example, see Kutach and Kadonaga 2000; Ohler et al. 2002; FitzGerald et al. 2006; Chen et al. 2014); hence, the analysis of the Drosophila DPR would provide useful information for this important model organism. We thus embarked on the HARPE and SVR analyses of the Drosophila DPR and compared its properties with those of its human counterpart. These studies unexpectedly revealed a human-specific DPR variant, which further led to the use of SVR models for the design of synthetic DPR motifs with customized functional properties.

Results and Discussion

The SVRd model provides a reliable prediction of DPR activity in Drosophila

To generate an SVR model for the Drosophila DPR, we carried out HARPE (Fig. 1A) and SVR analyses by the method of Vo ngoc et al. (2020). To measure basal transcription activity from the core promoter, we used the Drosophila embryo extract system developed by Soeller et al. (1988), which has been found to mediate accurate initiation of transcription in vitro that is essentially identical to that seen in vivo in embryos for a wide range of promoters (for example, see Biggin and Tjian 1988; Perkins et al. 1988; Kerrigan et al. 1991; Kutach and Kadonaga 2000; Lim et al. 2004). The HARPE data from two independent replicates were found to be reproducible (PCC = 0.97) (Supplemental Fig. S1). Only a small fraction of the randomized sequences exhibited DPR activity (Fig. 1B), and HOMER analysis (Heinz et al. 2010) of the top 0.1% most active sequences yielded a motif with a strong resemblance to the Drosophila DPE consensus (RGWYS from +28 to +32 relative to the +1 TSS) (Fig. 1C; Kutach and Kadonaga 2000).

Figure 1.

Figure 1.

HARPE and SVR analyses of the DPR in Drosophila melanogaster. (A) Use of the HARPE method for the analysis of the Drosophila DPR. The TATA-less SCP1m promoter backbone, which contains two GC boxes upstream of the mutated TATA box, is identical to that used in the analysis of the human DPR (Vo ngoc et al. 2020). The DPR is randomized from +17 to +35 relative to the +1 TSS. (B) Most DPR sequence variants exhibit low transcriptional activity, but a small fraction of the variants is highly active. The graph depicts the transcription strength of each of the 437,002 DPR variants, which are ranked along the X-axis in order of decreasing activity. The data are the average values from two independent biological replicates. (C) HOMER analysis of the 0.1% most active DPR variants reveals a distinct motif that contains a DPE-like sequence. Shown is the web logo of the top HOMER motif obtained from the data in B. The DPE-like sequence (RGWYS) is from +28 to +32. (D) The SVRd model of the Drosophila DPR accurately predicts the transcription strengths of DPR sequence variants. The SVRd machine learning model was generated by training with 200,000 variants in the HARPE data set and provides a numerical prediction for the activity of any potential DPR sequence. SVRd was then tested with 7115 independent sequences (i.e., not used in the training of SVRd) from the HARPE data set. For each of the independent test sequences, the predicted SVRd score was compared with the observed transcription strength. The value of the SVRd score is not identical to the transcription strength. (PCC) Pearson's correlation coefficient, (ρ) Spearman's rank correlation coefficient. (E) Approximately two-thirds of natural Drosophila promoters are predicted to contain an active DPR. The graph shows the cumulative frequencies of SVRd scores of sequences in the DPR (+17 to +35) in 4489 natural Drosophila promoters that are active in Drosophila embryos. This analysis revealed that ∼68% of Drosophila promoters in embryos are predicted to have an active DPR (SVRd score ≥ 1.5) (Supplemental Figs. S3, S4). In contrast, only ∼19% of 500,000 random 19-nt sequences with the same overall G/C content (51.3%) as Drosophila core promoters are predicted to have DPR activity. To examine the variability of the percentage DPR usage relative to the degree of focus in the TSSs, we determined the percentage DPR usage in promoters in which the minimum focus index (FImin) (Vo ngoc et al. 2017) varies from 0.65 to 0.85 and observed a range of 61%–71% DPR usage in embryos.

We then used the HARPE data (200,000 sequence variants, each with an experimentally determined transcription strength) to train and optimize an SVR model, which we refer to as SVRd, for SVR model of the Drosophila DPR (Supplemental Fig. S2). There is a strong correlation (PCC = 0.89, ρ = 0.91) between the observed transcription strengths of 7115 independent (i.e., not used in the training of SVRd) test DPR sequences and their predicted DPR activities (Fig. 1D). Thus, SVRd provides a strong prediction of DPR activity in Drosophila.

To characterize the effectiveness of SVRd, we carried out a performance assessment and found that the SVRd score of 1.5 is the best threshold for distinguishing active (SVRd ≥ 1.5) versus inactive (SVRd < 1.5) DPR elements (Supplemental Fig. S3). Moreover, DPR sequences with SVRd scores ≥1.5 exhibit at least 11-fold higher activity than the median inactive sequence (Supplemental Fig. S4). We therefore consider DPR sequences with SVRd scores ≥1.5 to be active. Strikingly, by this measure, ∼68% of focused natural Drosophila promoters in embryos contain active DPR motifs, whereas, in contrast, only 19% of random sequences with the same G/C content are predicted to function as active DPR elements (Fig. 1E). Similarly, we observed that ∼68% of focused promoters in Drosophila S2 cells are predicted to have an active DPR (Supplemental Fig. S5A). The DPR thus appears to be a widely used core promoter element in Drosophila.

We also compared the performance of SVRd with that of the DPE consensus sequence. We found that a perfect match to the DPE consensus is a poor predictor of DPR activity, whereas the SVRd model provides excellent assessments of the activities of the same sequences (Supplemental Fig. S6). Hence, the SVRd model is superior to the DPE consensus sequence for the prediction of DPR activity in Drosophila. Importantly, the SVRd model predicts the quantitative strength of each test DPR sequence.

A human-specific DPR variant is used in humans but not in Drosophila

With the Drosophila HARPE data and the SVRd model, we compared the properties of the Drosophila DPR with those of the human DPR (Vo ngoc et al. 2020). (Note that in this study, the SVRb model in Vo ngoc et al. [2020] is termed SVRh, for SVR model of the human DPR.) The direct comparison of the observed transcription strengths (as assessed by HARPE) of DPR sequence variants in Drosophila versus humans revealed many general variants that are active in both organisms as well as a distinct set of “human-specific” (i.e., specific for humans relative to Drosophila) variants that are more active in humans than in Drosophila (Fig. 2A). HOMER analysis indicated that the general variants contain the canonical DPR element with the DPE motif (RGWYS) in the standard position (from +28 to +32 relative to the +1 TSS), whereas the human-specific variants contain the DPE motif shifted 1 nt upstream (to +27 to +31 relative to the +1 TSS) (Fig. 2B).

Figure 2.

Figure 2.

Identification of a species-specific DPR variant that is present in humans but not in Drosophila. (A) Comparison of the observed transcription strengths, as assessed in HARPE assays, of 437,002 DPR sequence variants in humans versus Drosophila. General DPR variants with high activity in both humans and Drosophila are depicted in blue. Human-specific DPR variants with high activity in humans and low activity in Drosophila are denoted in red. All other variants are shown in gray. We did not observe a distinct class of Drosophila-specific DPR variants. The dashed light-violet lines depict 5× mean activities of the DPR variants in humans (vertical line) and in Drosophila (horizontal line). The black diagonal dashed line demarcates the general variants and the human-specific variants. (B) The human-specific DPR variants are positioned 1 bp upstream of the general/canonical DPR variants. All canonical DPR variants (blue dots in A) as well as all human-specific DPR variants (red dots in A) were analyzed with HOMER. The web logo of the top HOMER motif for each of the classes of variants is shown. (C) The human-specific −1 DPR variant appears to be present in ∼25% of the 3161 predicted active human DPR elements (SVRh ≥ 2) (Vo ngoc et al. 2020) in 11,932 natural human promoters. (D) The human-specific −1 DPR variant appears to be present in ∼1% of 3070 predicted active DPR elements (SVRd ≥ 1.5, dashed light-violet line) (Supplemental Figs. S3, S4) in 4489 natural Drosophila promoters. In both C and D, the black diagonal dashed line demarcates the canonical DPR sequences (blue dots) and the human-specific −1 DPR variants (red dots).

We also examined the occurrence of human-specific variants in natural human and Drosophila promoters. In humans, ∼25% (799 out of 3161) of predicted active DPR sequences (which corresponds to ∼7% [799 out of 11,932] of all focused promoters) are the human-specific variants (Fig. 2C), whereas in Drosophila, only ∼1% (29 out of 3070) of predicted active DPR sequences (which corresponds to ∼0.6% [29 out of 4489] of all focused promoters) are similar to the human-specific variants (Fig. 2D; see also Supplemental Fig. S5).

The −1 spacing is the basis of the human specificity of the DPR variants

We next sought to gain a better understanding of the human-specific DPR variants. We first tested whether these variants are important for core promoter function in humans. To this end, we analyzed five natural human-specific variants with the −1 DPR spacing and found that they are important for transcriptional activity in the context of their entire core promoter regions from −36 to +50 relative to the +1 TSS (Supplemental Fig. S7). Hence, the human-specific DPR variants are functionally important in natural promoters.

Next, we investigated whether the lack of activity of the human-specific variants in Drosophila is due to the −1 spacing of the DPR relative to the canonical position. Although the human-specific variants possess the −1 spacing of the DPR (Fig. 2B), we did not know whether the human specificity is due to this altered spacing. To address this question, we subjected five canonical and five human-specific variants to in vitro transcription analysis with either human or Drosophila factors (Fig. 3; Supplemental Fig. S7A). To enable a direct comparison of the activities of the different DPR sequences, we placed each test DPR sequence into the SCP1m promoter, which lacks a TATA box but contains an Inr element (Juven-Gershon et al. 2006). We therefore tested the activity of each DPR sequence in the same promoter context.

Figure 3.

Figure 3.

The −1 DPR element is active with human transcription factors but not with Drosophila transcription factors. In these experiments, five canonical DPR elements and five human-specific −1 DPR variants (Supplemental Fig. S7A) were analyzed by replacing the DPR motif in the TATA-less SCP1m reference promoter with DPR sequences from the indicated human genes. (LOC) LOC100505495 genes, (HNRNP) HNRNPA2B1 genes. The resulting promoter constructs were subjected to in vitro transcription analysis with either human or Drosophila nuclear extracts. The indicated DPR activities are relative to that of the strong DPR in the SCP1m promoter. The data are shown as the mean ± standard deviation with n = 3 biologically independent samples. Autoradiograms of representative experiments are shown in Supplemental Figure S8, A and B, and the quantitated results from each experiment are given in Supplemental Table S1. It is also relevant to note that all of the test DPR sequences have been found to be active in their natural promoter contexts by in vitro transcription analysis (Supplemental Fig. S7B; Vo ngoc et al. 2020). (A) Both the canonical and human-specific −1 DPR elements have strong transcription activity with human transcription factors. (B) Five different human-specific −1 DPR elements exhibit little or no activity with Drosophila transcription factors. (C) The translocation of −1 DPR variant sequences to the canonical DPR position has little or no effect upon their activity with human transcription factors. The observed and predicted (with SVRh) fold changes in activity (downstream-shifted DPR relative to wild-type −1 DPR) are indicated. (D) The translocation of −1 DPR variant sequences to the canonical DPR position results in a substantial increase in activity with Drosophila transcription factors. The observed and predicted (with SVRd) fold changes in activity (downstream-shifted DPR relative to wild-type −1 DPR) are indicated.

First, we observed that the human transcription factors are able to function with both the canonical and the human-specific DPR variants, whereas the Drosophila factors function with the canonical DPR elements but not with the human-specific −1 variants (Fig. 3A,B; Supplemental Fig. S8A). We then tested whether the Drosophila factors could function with the human-specific variants if they were shifted 1 nt downstream to the canonical position. These experiments revealed that the Drosophila factors could indeed function with the +1-nt-shifted human-specific DPR elements (Fig. 3C,D; Supplemental Fig. S8B). Thus, the inability of the Drosophila factors to transcribe the human-specific −1 DPR variants can be rescued by shifting the element by 1 nt downstream to the canonical position. These transcriptional effects were also predicted somewhat accurately with the SVRh and SVRd models (Fig. 3C,D).

It remained possible, however, that the Drosophila factors might be able to transcribe some −1 DPR variants. To address this point, we analyzed our HARPE data set of 437,002 DPR sequence variants (Fig. 1A,B) and found that those with the canonical +28 positioning of the RGWYS DPE motif (Fig. 1C) possess much higher transcription strengths than variants with the −1 positioning of this motif at +27 (Supplemental Fig. S9A,B). Furthermore, in the analysis of natural Drosophila promoters, the RGWYS motif is much more commonly found at the +28 position than at the +27 position (Supplemental Figs. S5C, S9C). We therefore conclude that the DPR in the noncanonical −1 position is an active core promoter element in humans but not in Drosophila.

Use of SVR models to identify synthetic extreme DNA sequence elements

The identification of canonical and human-specific variants of the DPR led us to explore new applications for the SVR models. We were first interested in using the machine learning models to enhance the human to Drosophila specificity of the −1 DPR variants. To this end, we substantially expanded the range of DPR sequence candidates (∼100-fold relative to the HARPE library, as in Fig. 1A) by generating 50 million random 19-nt sequences and determining their predicted DPR scores with SVRd and SVRh (Supplemental Fig. S10). In this analysis, the top 0.001% variants with the highest SVRh:SVRd score ratios yielded a distinct HOMER motif in which the 1-nt upstream shift of the −1 DPR can be clearly seen (Fig. 4A).

Figure 4.

Figure 4.

Use of machine learning to generate DPR variants with specificity for transcription with human factors relative to Drosophila factors and vice versa. (A) SVR analysis of 50 million random 19-nt sequences with SVRh and SVRd reveals a distinct preferred motif for the human-specific −1 DPR variant. The top panel is the HOMER web logo for the human-specific −1 DPR variant, as assessed with the top 500 sequences with SVRh ≥ 5 that have the highest SVRh:SVRd score ratios. (B) Identification of DPR variants that are predicted to be specific for transcription in humans relative to Drosophila and vice versa. Four DPR sequences, termed E1 to E4, have high SVRh:SVRd score ratios, whereas four other sequences, termed E5 to E8, have high SVRd:SVRh score ratios. The specific sequences and their predicted SVRd and SVRh scores are given in Supplemental Table S2. The position of the DPR in SCP1m is also indicated. (C) The synthetic E1 to E4 DPR sequences exhibit specificity for transcription with human factors relative to Drosophila factors, whereas the E5 to E8 DPR motifs have a distinct preference for Drosophila factors relative to human factors. The synthetic DPR sequences in B were analyzed in the SCP1m promoter backbone, and their transcriptional activities with human factors and with Drosophila factors were compared with that of the reference DPR in SCP1m (Juven-Gershon et al. 2006). Autoradiograms of representative experiments are shown in Supplemental Figure S11. The quantitated results from each experiment are given in Supplemental Table S1.

Next, to test the transcriptional properties of the predicted extreme DPR elements, we selected four synthetic sequences, which we termed E1 to E4 (Fig. 4B; Supplemental Fig. S11A), with high SVRh:SVRd ratios and found that they are highly active with human factors and possess little or no detectable activity with Drosophila factors (Fig. 4C; Supplemental Fig. S11B). The E2, E3, and E4 variants are particularly human-specific, with human:Drosophila transcription ratios (with respect to the SCP1m DPR reference) of at least 75. Hence, these findings suggest that machine learning analysis of a wide range of sequence variants can be used to identify DNA elements with specific functional properties.

We next examined whether it is possible to perform the reciprocal experiment; that is, use the SVR models to identify Drosophila-specific DPR sequences. In this regard, we identified four synthetic DPR sequences, termed E5 to E8 (Fig. 4B; Supplemental Fig. 11A), with high SVRd:SVRh ratios and found that they exhibit stronger activity with Drosophila factors than with human factors (Fig. 4C; Supplemental Fig. 11C). The E8 variant exhibited ∼17-fold higher transcription activity with Drosophila factors relative to human factors (with respect to the SCP1m DPR reference). Thus, although there is not a distinct Drosophila-specific class of DPR elements, it is possible to identify synthetic DPR sequences that have stronger transcription activity with Drosophila relative to human factors, as assessed under the same conditions that were used to generate the data for the machine learning models. These results further support the conclusion that the generation and use of machine learning models can be used to identify synthetic DNA sequence motifs with customized properties.

Summary and perspectives

In this study, we performed a comparative analysis of the DPR core promoter motif in Drosophila and humans. This work led to the identification of a human-specific DPR variant in which the DPR is located 1 nt upstream of the canonical DPR (Figs. 24A). The human-specific −1 DPR appears to be used in ∼25% of DPR elements in natural human promoters (Fig. 2C). Strikingly, the DPR is predicted to be present in about two-thirds of natural Drosophila promoters in embryos and in cells (Fig. 1E; Supplemental S5A). Moreover, even though key promoter characteristics such as CpG islands are present in humans but not in Drosophila (Deaton and Bird 2011), the predicted optimal canonical DPR has remained mostly unchanged over the estimated 700 million years of species divergence time from Drosophila to humans (Supplemental Fig. S12). In this manner, the analysis of the DPR in both Drosophila and humans has led to new insights that would not have been obtained in the study of the DPR in either organism alone. The altered spacing in the human-specific DPR reveals differences in the transcription machinery in humans relative to Drosophila. It remains to be determined, however, whether the expanded range of function of the human transcription factors is used to achieve new modes of regulation.

The existence of the human-specific DPR motif inspired us to explore the use of the SVR models to predict extreme versions of the DPR that exhibit specificity for transcription with human factors relative to Drosophila factors and vice versa. In this work, we used the SVR models to expand the scope of the analysis to 50 million DPR sequence variants (Fig. 4; Supplemental Figs. S10–S12). In the 50 million variants, the extreme human- or Drosophila-specific DPR motifs that were predicted by the SVR models were found to be excellent candidates, as assessed by transcriptional analyses (Fig. 4C; Supplemental Fig. S11). The SVR model predictions were good but not quantitatively perfect, possibly due to the extreme or fringe nature of the candidates, but did yield DPR motifs that were much more active with human transcription factors relative to Drosophila factors and vice versa. It is also expected that the accuracy of the machine learning models will continue to improve in the future.

Thus, these experiments provide a demonstration of the use of machine learning models for the identification of DNA sequence motifs with custom-tailored functions. For example, SVR models could be made for a promoter element that stimulates transcription in condition 1 (SVR1) as well as in condition 2 (SVR2). Then, by using an analogous approach as in this work, DNA sequence motifs that activate transcription in condition 1 but not in condition 2 and vice versa could be identified. Hence, in this manner, the use of machine learning models for the study of DNA sequence elements can extend beyond the analysis of natural DNA sequence elements to the prediction and identification of synthetic sequence variants with specifically desired properties.

Materials and methods

HARPE method

The HARPE plasmid libraries for the DPR were described in Vo ngoc et al. (2020). Sample and data processing was performed as in Vo ngoc et al. (2020), with the exception that the in vitro transcription reactions were carried out with Drosophila nuclear extracts for 30 min by the method of Wampler et al. (1990). Additional information is provided in the Supplemental Material. Sequencing of the PCR amplicons was performed on an Illumina Novaseq 6000 at the Institute for Genomic Medicine Genomics Center, University of California, San Diego (Moores Cancer Center, supported by National Institutes of Health [NIH] grant P30 CA023100 and NIH Shared Instrument Grant S10 OD026929). The genome-wide data have been deposited at the Gene Expression Omnibus (GEO; accession no. GSE225570) and will be released upon acceptance of the study for publication.

Transcription of individual test sequences

The plasmids that were used for testing individual DPR sequences were constructed with the Q5 site-directed mutagenesis kit (New England Biolabs) as recommended by the manufacturer. Transcription reactions were performed as described in Vo ngoc et al. (2020) for human factors and as described in Wampler et al. (1990) for Drosophila factors. The transcripts were subjected to primer extension analysis, and the reverse transcription products were resolved by 6% polyacrylamide/8 M urea gel electrophoresis and quantified by using a Typhoon imager (GE Health Sciences) and Amersham Typhoon control software v1.1. Quantification of radiolabeled samples was performed with ImageJ version 2.1.0. All experiments with individual promoter constructs were performed independently at least three times to ensure reproducibility of the data. The quantitated data from the transcription reactions are in Supplemental Table S1. The sequences of the core promoters and DPR elements used in this study are given in Supplemental Table S2.

Supplementary Material

Supplemental Material

Acknowledgments

We thank George Kassavetis, Grisel Cruz-Becerra, and Jack Cassidy for critical reading of the manuscript. J.T.K. is the Amylin Chair in the Life Sciences. This work was supported by National Institutes of Health grant R35 GM118060 to J.T.K.

Author contributions: L.V.n. and J.T.K. initially conceived the project and oversaw the overall execution of this work. L.V.n. and T.E.R. performed the laboratory experiments as well as the computational analyses. L.V.n., T.E.R., and J.T.K. prepared the figures and wrote the manuscript.

Footnotes

Supplemental material is available for this article.

Article published online ahead of print. Article and publication date are online at http://www.genesdev.org/cgi/doi/10.1101/gad.350572.123.

Competing interest statement

The authors declare no competing interests.

References

  1. Biggin MD, Tjian R. 1988. Transcription factors that activate the Ultrabithorax promoter in developmentally staged extracts. Cell 53: 699–711. 10.1016/0092-8674(88)90088-8 [DOI] [PubMed] [Google Scholar]
  2. Burke TW, Kadonaga JT. 1996. Drosophila TFIID binds to a conserved downstream basal promoter element that is present in many TATA-box-deficient promoters. Genes Dev 10: 711–724. 10.1101/gad.10.6.711 [DOI] [PubMed] [Google Scholar]
  3. Butler JE, Kadonaga JT. 2001. Enhancer–promoter specificity mediated by DPE or TATA core promoter motifs. Genes Dev 15: 2515–2519. 10.1101/gad.924301 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Chen ZX, Sturgill D, Qu J, Jiang H, Park S, Boley N, Suzuki AM, Fletcher AR, Plachetzki DC, FitzGerald PC, et al. 2014. Comparative validation of the D. melanogaster modENCODE transcriptome annotation. Genome Res 24: 1209–1223. 10.1101/gr.159384.113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cramer P. 2019. Organization and regulation of gene transcription. Nature 573: 45–54. 10.1038/s41586-019-1517-4 [DOI] [PubMed] [Google Scholar]
  6. Deaton AM, Bird A. 2011. Cpg islands and the regulation of transcription. Genes Dev 25: 1010–1022. 10.1101/gad.2037511 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. FitzGerald PC, Sturgill D, Shyakhtenko A, Oliver B, Vinson C. 2006. Comparative genomics of Drosophila and human core promoters. Genome Biol 7: R53. 10.1186/gb-2006-7-7-r53 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Galouzis CC, Furlong EEM. 2022. Regulating specificity in enhancer–promoter communication. Curr Opin Cell Biol 75: 102065. 10.1016/j.ceb.2022.01.010 [DOI] [PubMed] [Google Scholar]
  9. Haberle V, Stark A. 2018. Eukaryotic core promoters and the functional basis of transcription initiation. Nat Rev Mol Cell Biol 19: 621–637. 10.1038/s41580-018-0028-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C, Singh H, Glass CK. 2010. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 38: 576–589. 10.1016/j.molcel.2010.05.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Juven-Gershon T, Cheng S, Kadonaga JT. 2006. Rational design of a super core promoter that enhances gene expression. Nat Methods 3: 917–922. 10.1038/nmeth937 [DOI] [PubMed] [Google Scholar]
  12. Kerrigan LA, Croston GE, Lira LM, Kadonaga JT. 1991. Sequence-specific transcriptional antirepression of the Drosophila Krüppel gene by the GAGA factor. J Biol Chem 266: 574–582. 10.1016/S0021-9258(18)52474-1 [DOI] [PubMed] [Google Scholar]
  13. Kumar S, Suleski M, Craig JM, Kasprowicz AE, Sanderford M, Li M, Stecher G, Hedges SB. 2022. Timetree 5: an expanded resource for species divergence times. Mol Biol Evol 39: msac174. 10.1093/molbev/msac174 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Kutach AK, Kadonaga JT. 2000. The downstream promoter element DPE appears to be as widely used as the TATA box in Drosophila core promoters. Mol Cell Biol 20: 4754–4764. 10.1128/MCB.20.13.4754-4764.2000 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Lim CY, Santoso B, Boulay T, Dong E, Ohler U, Kadonaga JT. 2004. The MTE, a new core promoter element for transcription by RNA polymerase II. Genes Dev 18: 1606–1617. 10.1101/gad.1193404 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Ohler U, Liao GC, Niemann H, Rubin GM. 2002. Computational analysis of core promoters in the Drosophila genome. Genome Biol 3: RESEARCH0087. 10.1186/gb-2002-3-12-research0087 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Ohtsuki S, Levine M, Cai HN. 1998. Different core promoters possess distinct regulatory activities in the Drosophila embryo. Genes Dev 12: 547–556. 10.1101/gad.12.4.547 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Perkins KK, Dailey GM, Tjian R. 1988. In vitro analysis of the Antennapedia P2 promoter: identification of a new Drosophila transcription factor. Genes Dev 2: 1615–1626. 10.1101/gad.2.12a.1615 [DOI] [PubMed] [Google Scholar]
  19. Roeder RG. 2019. 50+ years of eukaryotic transcription: an expanding universe of factors and mechanisms. Nat Struct Mol Biol 26: 783–791. 10.1038/s41594-019-0287-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Schier AC, Taatjes DJ. 2020. Structure and mechanism of the RNA polymerase II transcription machinery. Genes Dev 34: 465–488. 10.1101/gad.335679.119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Sloutskin A, Shir-Shapira H, Freiman RN, Juven-Gershon T. 2021. The core promoter is a regulatory hub for developmental gene expression. Front Cell Dev Biol 9: 666508. 10.3389/fcell.2021.666508 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Soeller WS, Poole SJ, Kornberg T. 1988. In vitro transcription of the Drosophila engrailed gene. Genes Dev 2: 68–81. 10.1101/gad.2.1.68 [DOI] [PubMed] [Google Scholar]
  23. Vo ngoc L, Cassidy CJ, Huang CY, Duttke SHC, Kadonaga JT. 2017. The human initiator is a distinct and abundant element that is precisely positioned in focused core promoters. Genes Dev 31: 6–11. 10.1101/gad.293837.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Vo ngoc L, Kassavetis GA, Kadonaga JT. 2019. The RNA polymerase II core promoter in Drosophila. Genetics 212: 13–24. 10.1534/genetics.119.302021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Vo ngoc L, Huang CY, Cassidy CJ, Medrano C, Kadonaga JT. 2020. Identification of the human DPR core promoter element using machine learning. Nature 585: 459–463. 10.1038/s41586-020-2689-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Wampler SL, Tyree CM, Kadonaga JT. 1990. Fractionation of the general RNA polymerase II transcription factors from Drosophila embryos. J Biol Chem 265: 21223–21231. 10.1016/S0021-9258(17)45349-X [DOI] [PubMed] [Google Scholar]
  27. Zabidi MA, Arnold CD, Schernhuber K, Pagani M, Rath M, Frank O, Stark A. 2015. Enhancer–core–promoter specificity separates developmental and housekeeping gene regulation. Nature 518: 556–559. 10.1038/nature13994 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Zeitlinger J. 2020. Seven myths of how transcription factors read the cis-regulatory code. Curr Opin Syst Biol 23: 22–31. 10.1016/j.coisb.2020.08.002 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material

Articles from Genes & Development are provided here courtesy of Cold Spring Harbor Laboratory Press

RESOURCES