Summary
Mammalian gene expression is inherently stochastic1,2resulting in discrete bursts of RNA molecules synthesised from each allele3–7. Although known to be regulated by promoters and enhancers, it is unclear how cis-regulatory sequences encode transcriptional burst kinetics. Characterization of transcriptional bursting, including the burst size and frequency, have mainly relied on live-cell4,6,8 or single-molecule RNA-FISH3,5,8,9 recordings of selected loci. Here, we inferred transcriptome-wide burst frequencies and sizes for endogenous genes using allele-sensitive single-cell RNA-sequencing (scRNA-seq). We show that core promoter elements affect burst size and uncover synergistic effects between TATA and Initiator elements, which were masked at mean expression levels. Importantly, we provide transcriptome-wide support for enhancers controlling burst frequencies and we additionally demonstrate that cell-type-specific gene expression is primarily shaped by changes in burst frequencies. Altogether, our data show that burst frequency is primarily encoded in enhancers and burst size in core promoters, and that allelic scRNA-seq is a powerful paradigm for investigating transcriptional kinetics.
It was postulated over 20 years ago that enhancers might increase the probability of transcription10, yet supporting data is scarce (e.g. beta-globin in mouse9 and sna in flies8) and it remains unclear whether these observations generalize across different types of promoters and enhancers11. An important goal is therefore to determine how promoters and enhancers modulate gene expression through altering burst frequencies and sizes.
Single-cell analyses of allelic transcription have revealed frequent monoallelic expression consistent with episodic transcription12–14. Inspired by recent developments in transcriptome-wide inference of burst kinetics6,15,16, we modelled the expression distribution at each allele independently16 using the two-state model of transcription17. Profile-likelihood was used to infer point estimates (using maximum likelihood) and confidence intervals directly on burst frequency (kon; in units of mean mRNA degradation rate) and size (ksyn/koff; the individual parameters are unidentifiable in larger parts of parameter space) (Extended Data Fig. 1a-b, exemplified for Mbnl2 in Fig. 1a; Methods). By simulations, we determined the boundaries of parameter space wherein kinetic parameters could be robustly inferred (Fig. 1b), and how cell numbers and incomplete sampling (i.e. sensitivity) in scRNA-seq affected inference (Extended Data Fig. 2).
To investigate transcriptome-wide patterns of transcriptional bursting, kinetic parameters were inferred for 7,186 genes using transcriptomes from 224 individual primary mouse fibroblasts for each allele (CAST/EiJ × C57BL/6J) (Supplementary Table 1). Inference was performed on Smart-seq2 scRNA-seq libraries both at RPKM and molecule levels (Methods), with improved goodness-of-fit towards the two-state model for molecule-level data (Extended Data Fig. 1c-f). The inferred kinetic parameters inhabited regions in parameter space for which the estimated precision was high (i.e. small confidence intervals) for most genes (Fig. 1b). Observed burst frequencies (Fig. 1c) and burst sizes (Fig. 1d) were in the range of those observed in imaging-based single-gene analyses6, and had a general relationship with expression levels (Extended Data Fig. 1g-h) similar to a previous study18. Kinetics inferred for the two alleles correlated (Spearman r= 0.79 and 0.63 for burst frequency and size, respectively) and were consistent with two independent transcriptional processes (Extended Data Fig. 1i), as previously reported12. Incorporating gene-specific mRNA half-lives had minor effect on burst frequency estimation (Fig. 1e), since burst frequencies had larger magnitudes of variation than mRNA degradation rates. Interestingly, the average waiting time between bursts were approximately 4 hours (per allele) (Fig. 1f) and inferred kon values were consistently much smaller than the corresponding koff values (Fig. 1g), demonstrating that genes are mainly in an idle state with occasional bursts of transcription.
We dissected molecular determinants of burst size variation. House-keeping genes tend to be highly expressed and have compact gene structure19. Indeed, we observed negative correlation between the gene loci length and burst size (Extended Data Fig. 3A, Spearman r=-0.86, P = 2.3e-228) but not burst frequency (Extended Data Fig. 3B, r=0.03, P = 0.6), and this effect was not associated with spliced mRNA transcript length (Extended Data Fig. 3c). To assess the roles of core promoter elements on burst kinetics (Fig. 2a), we formulated linear regression models that included gene length, TATA and Initiator (Inr) elements (Methods) which identified several significant factors and interactions (Supplementary Table 2). Genes with TATA elements in their core promoters had significantly larger burst sizes (P=4.0e-6, F test, adjusted for gene length) (Fig. 2b) than genes without TATA elements, in agreement with previous reports in yeast20 and mammals21. In TATA-containing core promoters, we observed that the Inr element significantly boosted burst sizes (P=7.2e-4, F test, adjusted for gene length) whereas Inr elements alone (in promoters lacking TATA element) had no effect on burst size (Fig. 2b). Notably, the effects of the TATA and Inr elements were masked in mean gene expression levels (Extended Data Fig. 3d) and absent on burst frequencies (Extended Data Fig. 3e). Thus, a separation of expression into burst kinetics was required to identify the effects of core promoter elements on transcriptional dynamics, since variations in burst frequencies distort the burst size effect at mean expression levels (Extended Data Fig. 3f). The observed synergy between TATA and Inr elements significantly extends an earlier report of Inr elements positive effect on gene expression for promoters with TATA elements22. Interestingly, we observed distinct gene-length dependencies for the different core promoter elements (Fig. 2c and Extended Data Fig. 3g) and the effects declined around 80 kbp. We conclude that core promoter sequence elements affect burst size and that a transcriptome-wide inference of burst kinetics can deduce individual and synergistic effects of cis-regulatory elements on transcription.
We applied the same procedure to 188 mouse embryonic stem cells (mESC, C57BL/6 x CAST/EiJ) (Supplementary Table 3) and determined kinetic differences between the 4,854 genes that were expressed (and had inferable kinetics) in both mESC and fibroblasts. We detected 1,552 genes with significant (FDR<5%) differences in burst frequency (Fig. 3a) and 1,075 genes for burst size (Fig. 3b), with our current power to detect changes (Extended Data Fig. 4 and Supplementary Table 4). We next investigated whether alterations in burst sizes or frequencies account for cell-type specific differential expression. When binning genes by expression difference between the cell types it became apparent that cell-type-specific expression levels are mainly shaped by changes in burst frequencies (Fig. 3c and Extended Data Fig. 5a-d). We hypothesised that burst frequencies were generally regulated by enhancers8–10. A strong linear dependence was observed between differential enhancer activity (normalized H3K27Ac ChIP-seq read density in enhancer regions) and differential burst frequencies (Fig. 3d and Extended Data Fig. 5e-f), with only a modest effect of burst size, providing genome-wide support for enhancer-mediated regulation of burst frequencies. To validate the allele-level kinetics using a complementary method, we performed single-molecule RNA FISH (smFISH) on male fibroblasts and mESCs for a selection of X-linked genes (expressed from a single allele) with significant cell-type differences in burst frequencies or size (Hdac6, Msl3, Mpp1) and without significant differences (Ibgp1) (Extended Data Fig. 6). We observed a general agreement between methods in burst frequencies with some discrepancies and larger burst size estimates in the smFISH data (Extended Data Figs. 7a-f). Importantly, significant cell-type differences were corroborated in the kinetics inferred from the smFISH data as we observed significant increase in burst frequencies (Hdac6, in ES cells; Msl3, in fibroblasts) and burst size (Mpp1, in fibroblasts) using both methods (Fig. 3e and Extended Data Fig. 7g-j).
To further investigate the effects of enhancers on burst frequencies, we identified genes with significant kinetics differences between the C57 and CAST allele in fibroblasts (Burst frequency: 307 genes; Burst size: 276 genes) (Fig. 4a-b, Extended Data Fig. 8a-d and Supplementary Table 5). Interestingly, genes with burst frequency differences had significantly higher densities of single-nucleotide polymorphisms (SNPs) in their enhancer regions (Fig. 4c) but not in their promoters (Extended Data Fig. 5g). No effect was found when performing similar analyses on genes with significant differences in burst sizes (data not shown). That genes with strain-specific burst frequency have more genetic changes in their enhancers supports the notion of enhancers regulating burst frequencies.
To functionally assess whether enhancers regulate burst frequencies, we sequenced and inferred transcriptional kinetics from 57 normal mESCs (CAST/EiJ x 129SvEv) and 174 mESCs harbouring a Sox2 enhancer deletion on the CAST allele23 (Fig. 4d). Markedly, cells with Sox2 enhancer deletion had significantly (P < 4.2e-40) reduced Sox2 burst frequency specifically on the affected CAST allele (Fig. 4e), whereas no significant change in burst size was observed (P=0.48). By simulations, we demonstrated that the observed kinetics for the affected CAST allele was in the region of parameter space expected for an exclusive reduction in burst frequency (Fig. 4e, green region; Methods). This provides direct evidence for the role of enhancers in modulating transcriptional frequencies and validates that allele-sensitive scRNA-seq is sufficiently accurate to infer transcriptional kinetics after a perturbation.
The conservation of gene expression patterns and levels among mammals have been extensively studied24, still little is known about the conservation of transcriptional kinetics. We inferred transcriptional kinetics for 2,484 genes in 163 human fibroblast cells after phasing their transcribed SNPs. One-to-one orthologs (1,609 genes) showed significant positive correlations in burst frequency, size and mean expression (Extended Data Fig. 9a,b,c and Supplementary Table 6). Interestingly, kinetic parameters were more similar across species than expected merely by preserved expression levels (Extended Data Fig. 9d), although larger data sets would enable in-depth analysis.
A caveat to inference of burst kinetics by scRNA-seq is that the estimates may be partly affected by cell-cycle features. A decreased burst frequency per allele25 combined with an additional copy of each allele after genome duplication would balance the numbers of RNA molecules recorded per allele in scRNA-seq. However, most cells analysed in our study were in the G1 phase, and genes with different kinetics between phases were mostly related to the cell-cycle functions (Extended Data Fig. 10).
Transcription is regulated at multiple levels, including enhancer-promoter interactions11, the formation of the transcription preinitiation complex (PIC), recruitment of RNA polymerase (Pol) II26, Pol II initiation27 and elongation28 control. Active transcription typically results in multiple Pol II complexes simultaneously transcribing the locus generating spurts of RNA molecules21. Such dynamics become averaged out in bulk RNA-seq data and obscured even in scRNA-seq when lacking allelic resolution. We have demonstrated the opportunities in analysing burst size and frequency to obtain a more accurate characterization of transcription. Mechanistically, specialized TATA binding protein-associated factors and TATA binding protein-related factors bind different types of core promoters1 and our data suggests that such complexes ultimately affect burst size, potentially through modulation of subsequent steps such as RNA pol II elongation control28. Hundreds of genes had significant differences in burst size between both cell types and genotypes, suggesting that modulations of both the levels of trans-acting factors and variations in the DNA elements they bind can regulate burst size. Our data provides transcriptome-wide evidence for the role of enhancers in controlling burst frequencies. Thus, the primary role of enhancers might lie in forming a PIC without effecting the size of the transcriptional burst. The strategy introduced here paves the way for unprecedented mechanistic insights into how burst size and frequency control are governed by cis-regulatory sequences and the systematic dissection of transcription.
Extended Data
Supplementary Material
Acknowledgement
We thank Qiaolin Deng for ES cell culturing, Sarantis Giatrellis for assistance with FACS sorting, Gert-Jan Hendriks for fruitful discussions, and the remainder of the Sandberg lab. This work was supported by grants to R.S. from the European Research Council (648842), the Swedish Research Council (2017-01062), the Knut and Alice Wallenberg’s foundation (2017.0110) and the Bert L. and N. Kuggie Vallee Foundation.
Footnotes
Competing financial interests. The authors declare no competing financial interests.
Author contribution. Conceived the study: RS. Developed computational methodology: AL. Explored and interpreted data: AL and RS. Prepared single-cell transcriptome data: PJ, LH, BjR and ÅS. Designed modified smart-seq2: MHJ and ORF. Provided Sox2 enhancer deletion cells: CMR and BR. Performed single-molecule RNA FISH: PJ. Generated figures: AL, RS. Wrote the manuscript: RS and AL.
Data Availability Statement
Sequencing data has been deposited at ENA (EBI) (E-MTAB-6362, E-MTAB-6385 and E-MTAB-7098) and code for transcriptional kinetic inference and analyses is provided through Github https://github.com/sandberg-lab/txburst
References
- 1.Levine M, Tjian R. Transcription regulation and animal diversity. Nature. 2003;424:147–151. doi: 10.1038/nature01763. [DOI] [PubMed] [Google Scholar]
- 2.Raj A, van Oudenaarden A. Nature, nurture, or chance: stochastic gene expression and its consequences. Cell. 2008;135:216–226. doi: 10.1016/j.cell.2008.09.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Raj A, Peskin CS, Tranchina D, Vargas DY, Tyagi S. Stochastic mRNA synthesis in mammalian cells. PLoS Biol. 2006;4:e309. doi: 10.1371/journal.pbio.0040309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Chubb JR, Trcek T, Shenoy SM, Singer RH. Transcriptional pulsing of a developmental gene. Curr Biol. 2006;16:1018–1025. doi: 10.1016/j.cub.2006.03.092. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Levsky JM, Shenoy SM, Pezo RC, Singer RH. Single-cell gene expression profiling. Science. 2002;297:836–840. doi: 10.1126/science.1072241. [DOI] [PubMed] [Google Scholar]
- 6.Suter DM, et al. Mammalian genes are transcribed with widely different bursting kinetics. Science. 2011;332:472–474. doi: 10.1126/science.1198817. [DOI] [PubMed] [Google Scholar]
- 7.Nicolas D, Phillips NE, Naef F. What shapes eukaryotic transcriptional bursting? Mol Biosyst. 2017 doi: 10.1039/c7mb00154a. [DOI] [PubMed] [Google Scholar]
- 8.Fukaya T, Lim B, Levine M. Enhancer Control of Transcriptional Bursting. Cell. 2016;166:358–368. doi: 10.1016/j.cell.2016.05.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Bartman CR, Hsu SC, Hsiung CC-S, Raj A, Blobel GA. Enhancer Regulation of Transcriptional Bursting Parameters Revealed by Forced Chromatin Looping. Mol Cell. 2016;62:237–247. doi: 10.1016/j.molcel.2016.03.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Walters MC, et al. Enhancers increase the probability but not the level of gene expression. Proc Natl Acad Sci USA. 1995;92:7125–7129. doi: 10.1073/pnas.92.15.7125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Zabidi MA, et al. Enhancer-core-promoter specificity separates developmental and housekeeping gene regulation. Nature. 2015;518:556–559. doi: 10.1038/nature13994. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Deng Q, Ramsköld D, Reinius B, Sandberg R. Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science. 2014;343:193–196. doi: 10.1126/science.1245316. [DOI] [PubMed] [Google Scholar]
- 13.Reinius B, et al. Analysis of allelic expression patterns in clonal somatic cells by single-cell RNA-seq. Nat Genet. 2016;48:1430–1435. doi: 10.1038/ng.3678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Reinius B, Sandberg R. Random monoallelic expression of autosomal genes: stochastic transcription and allele-level regulation. Nat Rev Genet. 2015;16:653–664. doi: 10.1038/nrg3888. [DOI] [PubMed] [Google Scholar]
- 15.Kim JK, Marioni JC. Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing data. Genome Biol. 2013;14:R7. doi: 10.1186/gb-2013-14-1-r7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Jiang Y, Zhang NR, Li M. SCALE: modeling allele-specific gene expression by single-cell RNA sequencing. Genome Biol. 2017;18:74. doi: 10.1186/s13059-017-1200-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Peccoud J, Ycart B. Markovian Modeling of Gene-Product Synthesis. Theoretical Population Biology. 1995;48:222–234. [Google Scholar]
- 18.Dar RD, et al. Transcriptional burst frequency and burst size are equally modulated across the human genome. Proc Natl Acad Sci USA. 2012;109:17454–17459. doi: 10.1073/pnas.1213530109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Eisenberg E, Levanon EY. Human housekeeping genes are compact. Trends Genet. 2003;19:362–365. doi: 10.1016/S0168-9525(03)00140-9. [DOI] [PubMed] [Google Scholar]
- 20.Hornung G, et al. Noise-mean relationship in mutated promoters. Genome Res. 2012;22:2409–2417. doi: 10.1101/gr.139378.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Tantale K, et al. A single-molecule view of transcription reveals convoys of RNA polymerases and multi-scale bursting. Nat Commun. 2016;7 doi: 10.1038/ncomms12248. 12248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Malecová B, Gross P, Boyer-Guittaut M, Yavuz S, Oelgeschläger T. The initiator core promoter element antagonizes repression of TATA-directed transcription by negative cofactor NC2. J Biol Chem. 2007;282:24767–24776. doi: 10.1074/jbc.M702776200. [DOI] [PubMed] [Google Scholar]
- 23.Li Y, et al. CRISPR reveals a distal super-enhancer required for Sox2 expression in mouse embryonic stem cells. PLoS ONE. 2014;9:e114485. doi: 10.1371/journal.pone.0114485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Merkin J, Russell C, Chen P, Burge CB. Evolutionary dynamics of gene and isoform regulation in Mammalian tissues. Science. 2012;338:1593–1599. doi: 10.1126/science.1228186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Padovan-Merhar O, et al. Single mammalian cells compensate for differences in cellular volume and DNA copy number through independent global transcriptional mechanisms. Mol Cell. 2015;58:339–352. doi: 10.1016/j.molcel.2015.03.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Hantsche M, Cramer P. Conserved RNA polymerase II initiation complex structure. Curr Opin Struct Biol. 2017;47:17–22. doi: 10.1016/j.sbi.2017.03.013. [DOI] [PubMed] [Google Scholar]
- 27.Roeder RG. The role of general initiation factors in transcription by RNA polymerase II. Trends Biochem Sci. 1996;21:327–335. [PubMed] [Google Scholar]
- 28.Jonkers I, Lis JT. Getting up to speed with transcription elongation by RNA polymerase II. Nat Rev Mol Cell Biol. 2015;16:167–177. doi: 10.1038/nrm3953. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Sequencing data has been deposited at ENA (EBI) (E-MTAB-6362, E-MTAB-6385 and E-MTAB-7098) and code for transcriptional kinetic inference and analyses is provided through Github https://github.com/sandberg-lab/txburst