SUMMARY
How transcriptional bursting relates to gene regulation is a central question that has persisted for more than a decade. Here, we measure nascent transcriptional activity in early Drosophila embryos and characterize the variability in absolute activity levels across expression boundaries. We demonstrate that boundary formation follows a common transcription principle: a single control parameter determines the distribution of transcriptional activity, regardless of gene identity, boundary position, or enhancer-promoter architecture. We infer the underlying bursting kinetics and identify the key regulatory parameter as the fraction of time a gene is in a transcriptionally active state. Unexpectedly, both the rate of polymerase initiation and the switching rates are tightly constrained across all expression levels, predicting synchronous patterning outcomes at all positions in the embryo. These results point to a shared simplicity underlying the apparently complex transcriptional processes of early embryonic patterning and indicate a path to general rules in transcriptional regulation.
INTRODUCTION
A central question in gene regulation concerns how discrete molecular interactions generate a continuum of expression levels observed at the transcriptome level (Lionnet and Singer, 2012; Scholes et al., 2016). A large set of molecular activities are required to elicit RNA transcription, including transcription factor binding, chromatin modifications, and long-range enhancer–promoter interactions (Voss and Hager, 2014). However, in most cases it is unclear which of these interactions predominantly regulate RNA synthesis rates and variability for a given gene (Coulon et al., 2013). In general, for genes whose transcription rates depend on levels of external inputs, we do not know which regulatory steps are preferably tuned to achieve required mRNA expression levels. Overall, it is unknown whether constraints exist that might select common mechanisms for modulating transcriptional activity across genes, space and time.
Addressing these questions requires measuring the kinetic rates of transcription, in absolute units. Many studies using single molecule counting approaches have documented the inherently stochastic nature of transcription (Little et al., 2013; Raj et al., 2006; Taniguchi et al., 2010; Zenklusen et al., 2008). In organisms ranging from bacteria to vertebrates, genes exhibit transcription bursts characterized by intermittent intervals of mRNA production followed by protracted quiescent periods (Bothma et al., 2014; Golding et al., 2005; Suter et al., 2011). This inherent stochasticity in gene activation results in higher cell-to-cell variability than expected from constitutive expression (Blake et al., 2003). A simple telegraph or two-state model has been used to explain the measured variability in the context of transcriptional bursting (Peccoud and Ycart, 1995). In this model a locus switches at random between inactive and active states, with only the latter permitting transcription initiation. Despite its prevalent use, it is largely unknown which molecular events determine the kinetic rates of this model (Coulon et al., 2013). Nor is it widely understood which of these kinetic rates are modulated by external input signals or to what extent. However, with precise measurements and quantitative modeling, it is possible to gain intuition for the mechanisms of transcriptional bursting based on their signature in the measured variability (Jones et al., 2014; Larson et al., 2013; Molina et al., 2013; Senecal et al., 2014; Zoller et al., 2015).
Drosophila embryos provide an ideal model to investigate transcriptional regulation (Gregor et al., 2014). Early embryos express many genes in graded patterns in response to modulatory inputs (Struhl et al., 1992). Spatial domains, where gene expression levels transition from highly active to nearly silent, are functionally the most critical for the developing embryo, as they determine specification of cell identities (Kornberg and Tabata, 1993). Among the earliest expressed genes in Drosophila development are the gap genes, which encode transcription factors responsible for anterior-posterior (AP) patterning (Jaeger, 2011). Each gap gene is expressed in its own unique domain, and the expression boundaries arise at distinct and precise positions (Dubuis et al., 2013). Gene expression levels are spatially graded across several cell diameters, and the intermediate levels of these gap genes confer patterning information necessary for segmentation (Lawrence, 1992). Thus the precise control of expression levels is essential for properly patterned cell fate specification.
The regulation of gap genes appears highly complex. Many activating and repressing factors determine expression boundaries through complex layers of homo- and heterotypic protein interactions at multiple promoters and enhancers (Estrada et al., 2016; Jaeger et al., 2004; Kvon et al., 2014; Perry et al., 2011; Segal et al., 2008). The collective activity of these factors generates expression rates that vary with position in the embryo (Briscoe and Small, 2015; Lawrence, 1992; Manu et al., 2009). Given the diversity of cis-regulatory architecture and trans-acting factors regulating these genes, an intuitive expectation is that expression rates emerge from carefully tuned transcription factor concentrations and binding affinities. Since various bursting kinetics could achieve such rates, a straightforward prediction is that the underlying bursting kinetics will differ between boundaries. This expectation is consistent with prior studies in cultured cells suggesting that many regulatory strategies exist (Carey et al., 2013; Molina et al., 2013; Senecal et al., 2014; Siddharth S et al., 2015). However, it is unknown how bursting rates are modulated across multiple expression boundaries in intact tissues.
To address these questions, we developed a single molecule fluorescent in situ hybridization (smFISH) method that generates accurate counts of nascent RNA molecules in individual nuclei. We applied this method to assess absolute transcriptional activity of the gap genes in terms of the number and variability of RNA polymerase II (Pol II) molecules at transcribing loci. This approach reveals a common principle that unifies transcriptional activity across expression boundaries. Surprisingly, a single common control parameter globally determines the distribution of transcriptional activity. We use a simple telegraph model to interpret our measurements. We show that the key regulatory parameter is the fraction of time a gene is in a transcriptionally active state, while the Pol II initiation rate is constant. Contrary to the expectation of diverse bursting kinetics, the promoter switching rates are tightly constrained across boundaries. This constraint highlights the conservation of the switching correlation time, and predicts synchronous transcriptional outcomes regardless of expression level, gene identity, or position in the embryo. We propose that this synchronicity is important for ensuring precise patterning. Moreover, our results suggest an emergent simplicity in the modulation of bursting that governs the apparently complex process of embryo segmentation. Overall, our quantitative approach provides a framework for uncovering unifying principles of transcriptional regulation that can be applied across genes in any biological context.
RESULTS
Precise measurements of transcriptional activity
During early fly development, gene expression boundaries arise from spatially varying transcription factor concentrations. Early embryos thus provide a natural context in which to ask how input factors shape transcription dynamics. Here we enhanced a previously developed smFISH method (Little et al, 2013) to yield a 3- to 4-fold increase in sensitivity, enabling precise counting of nascent transcripts and measurement of transcriptional activity across boundaries (STAR Methods). We performed confocal imaging with fluorescent oligonucleotide probes to label single mRNA molecules in fixed embryos followed by analysis to estimate intensities of transcription sites (i.e., spatially co-localized nascent transcripts) and individual cytoplasmic mRNAs. This method measures instantaneous activity per nucleus in terms of intensity units of individual cytoplasmic mRNAs, the “cytoplasmic unit” (C.U.) by normalizing the total intensity of each locus to that of cytoplasmic mRNAs (Fig. 1A, B).
We measured the transcriptional activity of the four major gap genes hunchback (hb), Krüppel (Kr), knirps (kni), and giant (gt) along the embryo’s AP axis. These genes are expressed early in development in broad spatial domains, permitting measurements of thousands of synchronized nuclei across small numbers of embryos; these factors all favor low measurement error (Fig. 1C and 1D, embryos per combination of gene/genotype). Analysis of expression levels in mid- to late interphase 13 ensures sufficient time to attain steady state levels of transcribing RNA polymerase II (Pol II; Fig. S1A–D, STAR Methods); and DNA replication occurs in early interphase (Blumenthal et al., 1974), such that these observations eliminate ambiguity arising from varying numbers of loci. Since loci on recently duplicated chromatids are often closely apposed in space, we measure total transcription per nucleus (Little et al, 2013), then infer properties of individual loci. As a control, we generated data from embryos heterozygous for a hb deficiency, and observed half the wild-type level of expression per nucleus (Fig. 1C). Importantly, we observe a corresponding decrease in variance to half of wild-type (Fig. 1D), supporting previous findings that all loci behave independently (Little et al., 2013). These results demonstrate the suitability of using total transcriptional activity per nucleus to infer the behavior of individual loci.
Since biological variance greatly constrains models of regulatory processes, we needed to determine how variability arises from measurement error, embryo-to-embryo differences, and intrinsic fluctuations in individual nuclei. The performance of our measurements was assessed by labeling each mRNA in alternating colors along the length of the strand. This allowed us to perform independent normalization in each channel, thus characterizing sources of measurement error, such as noise stemming from imaging and normalization (Fig. 2A). Estimation of the variance of the mean across embryos (Fig. 2B) enables further splitting of the variability in terms of embryo alignment along the anterior-posterior axis and inherent embryo-to-embryo variability (Fig. S1E–H, STAR Methods). For all genes and at all positions, measurement variability (imaging and spatial alignment) represents less than of the total variance on average (Fig. 2C), indicating that biological variability dominates our measurements (Dubuis et al., 2013). Importantly, this variability arises almost entirely from differences between nuclei, rather than differences between embryos (Fig. 2D); the low embryo-to-embryo variability in the maximally expressed regions ( CV, Fig. 2E) emphasizes that the mean expression levels across embryos are reproducible in absolute units (Fig. 1C). Thus the measured expression noise mainly stems from zygotic transcription, and is intrinsic to the molecular processes of transcription rather than from extrinsic sources of variability. Low measurement error and the predominance of intrinsic variability facilitates analysis of the noise–mean relationship, permitting inference of bursting kinetics from several hundred nuclei at each position along the AP axis (Fig. 1B), as detailed below.
Single parameter distribution of transcriptional activity across all expression boundaries
The expression patterns of the gap genes are determined by multiple enhancer elements at varying distances from their promoters (Kvon et al., 2014; Perry et al., 2011). Each enhancer contains a variable number of binding sites for multiple patterning input factors with cross-regulatory interactions (Ochoa-Espinosa et al., 2005; Schroeder et al., 2004). These features and evidence from genetic manipulations (Hoch et al., 1990; Jacob et al., 1991; Pankratz et al., 1992) indicate that many molecular processes regulate transcription rates generating observed mRNA levels with their stereotypical modulation as a function of position (Fig. 1C). Given the diversity of input factors and molecular control elements, it would appear likely that different genes should exhibit vastly different, uniquely defined transcriptional kinetics. To make progress in understanding these complex relationships, we capitalize on the fact that the kinetics of the processes underlying transcription determines not only mean expression levels but also the variability (Fig. 1D). Thus we can use the noise–mean relationship to characterize the transcription kinetics for individual genes.
To characterize noise–mean relationships in our system, we examined the dependence of variability on mean transcription levels (Fig. 3A). In agreement with prior measurements (Little et al., 2013), genes span a similar dynamic range of expression levels across boundaries, from nearly zero to a maximum value of C.U. (Fig. 1C). Moreover, transcription is inherently variable: at all positions and for all genes, variability exceeds that expected from a simple model of constitutive activity, with noise (measured as CV2) approximately 10 times larger than Poisson for mean transcriptional activity below 10 C.U. (Fig. 3A). However, the noise–mean relationship follows an unexpectedly similar overall trend (Fig. 3A) (STAR Methods). Unlike many other systems (bacteria, yeast, mammalian cell culture), there is no clearly identifiable noise floor at high expression (Keren et al., 2015; Taniguchi et al., 2010; Zoller et al., 2015). The absence of such an extrinsic noise floor is likely a key feature of early embryo development: nuclei are highly synchronized within the cell cycle and share the same environment of the syncytial blastoderm. Sources of extrinsic noise that affect gene expression in cultured cells are thus minimized. Moreover, the collapse on a unique curve is unexpected and atypical given the different promoter–enhancer architectures (Hornung et al., 2012; Sanchez and Golding, 2013).
This result is even more striking when we convert our units of transcriptional activity from C.U. to the actual number count of Pol II molecules, g. Such a conversion is necessary as the intensity at a given active transcription locus is dependent on the length of the individual gene, the copy number, and the probe arrangement (Fig. S2A, STAR Methods). Accounting for these factors, we can describe the shape of the distribution of Pol II counts per locus by calculating the 2nd, 3rd and 4th cumulants for each gene across each boundary. While again the expectation is that Pol II counts should differ between different genes, an extra data collapse is observed instead: the 2nd, 3rd and 4th cumulants for all data points are nearly uniquely determined by a single parameter, the mean activity (Fig. 3B–D and Fig. S2B–D). Thus, transcriptional activity for all genes and across the entire expression range is characterized by a unique, common single-parameter distribution. This observation is model-free and indicates that a single parameter determines the generation of all gene expression boundaries. The uniqueness of the Pol II count distribution suggests that despite the well-documented diversity of cis-regulatory elements and trans-acting factors, a common conserved set of processes is regulated to determine transcription kinetics across all boundaries in the early embryo.
Two-state model identifies the unique control parameter
The shared Pol II count distribution suggests a common general model can describe the regulation of all gap genes. The observed intrinsic super-Poissonian variability in our data suggests that these genes operate in a bursting regime. While constitutive genes can be modeled by a single parameter, i.e. the effective initiation rate, multiple independent parameters are required to model transcription kinetics of bursting genes. A popular minimalist model accounting for bursting is the two-state or telegraph model (Peccoud and Ycart, 1995). It has been widely used to describe the distribution of mature mRNA and protein counts (Bar-Even et al., 2006; Raj et al., 2006; Zenklusen et al., 2008). Such a simple mechanistic model enables estimation of kinetic rates underlying bursting (Fig. 3E and Table 1), i.e. the switching rates between promoter states ( and ) as well as the effective initiation rate (Larson et al., 2013; Senecal et al., 2014; Suter et al., 2011).
Table 1:
Kinetic rates | Units | Parameterization | Parameterization |
---|---|---|---|
Pol II initiation rate | [min−1] | ||
Promoter switching on-rate | [min−1] | ||
Promoter switching off-rate | [min−1] | ||
Bursting parameters | Units | Parameterization | Parameterization |
Promoter mean occupancy | # | ||
Switching correlation time | [min] | ||
Burst size | # | ||
Burst frequency | [min−1] | ||
Mean transcript synthesis rate | [min−1] |
Within the context of the two-state model the most intuitive parameterization is given by the kinetic rates and . However, fluctuation analysis in transcriptional activity and inference approach both revealed that the three independent and uncorrelated variables and provides a more natural parameterization, in which only is modulated, while and are both constant. Bursting parameters are clearly identified in both parameterizations.
Our measurements of nascent transcriptional activity represent instantaneous counts of the number of Pol II molecules engaged in transcription, providing a more direct measurement of transcriptional activity compared to counts of mature mRNAs or proteins. The two-state model presents a straightforward and parameter-sparse means to describe how discrete randomly occurring events generate a continuum of expression rates. Assuming the Pol II elongation rate is constant and identical for all gap genes (Garcia et al., 2013; O’Brien and Lis, 1993), this model predicts the dependence of variability on mean activity for different scenarios of parameter modulation. Specifically, it predicts which kinetic rates are modulated to form gene expression boundaries.
Given that the first four cumulants of our data are uniquely determined by the mean activity, we sought to explore modulation of the mean arising from varying a single parameter, where such parameters could consist of combinations of the kinetic rates. When we solve the master equation for such a model (STAR Methods), a comparison of predicted noise (Fig. 3F) with our data (Fig. 3A) eliminates modulation of . Indeed, solely varying leads to saturation of noise at high activity, which is not observed. This is true no matter the values of and , which only affect the level of the plateau. Instead, our measurements are consistent with modulation of the fractional mean promoter occupancy , defined as . (Here occupancy refers to the active or “ON” state; thus is bound between zero and one.) This value is the fraction of time spent in the active state and is equivalent to the probability of finding a locus in the active state (Lucas et al., 2013; Xu et al., 2015). Varying is the only solution leading to a concave function for the variance observed in the data (Fig. 3B,G; STAR Methods Eq. 9). Modulation of the mean production rate is thus determined by rather than the rate at which Pol II molecules enter into productive elongation.
In principle, either or both of the rates and may be tuned to modulate . To test which of these scenarios reproduces the noise and the shape of the cumulants (Fig. 3A–D), we first set the value of to match the Poisson background in the data (Fig. 3B, dashed line, STAR Methods). For the special case in which both switching rates are modulated simultaneously, we achieved effective single parameter modulation by fixing the switching correlation time the characteristic time-scale for changes in promoter activity. This quantity reveals how fast the switching occurs, how much time is required for the mean number of Pol II molecules engaged in transcription to reach steady state, and what fraction of the switching noise is filtered by the elongation process (STAR Methods). When is fixed, both switching rates, and , are fully determined by , i.e.
(Eq. 1) |
In the three scenarios (tuning , or ), the single free parameter (either or ) was estimated by fitting the set of modeled cumulants to the data, assuming steady-state Pol II levels (Fig. 3G–I and Fig. S2E). Modulation of alone is ruled out, since this does not capture the noise below 10 C.U. (Fig. 3F). However, modulation of alone or of at fixed recapitulates the noise and the cumulants (Fig. 3F–I). Thus, in addition to conserved , the model predicts a second conserved quantity across genes and positions, either alone or a combination of and .
Finally, we examined whether the fitted cumulants assuming steady state are compatible with the finite duration of the nuclear cycle (15 min). The time during which a gene relaxes from an inactive state devoid of elongating Pol II (start of interphase 13) to steady state is determined by the correlation time (Fig. S3A). Since each parameter modulation predicts a different dependency of on (Fig. S3B), we tested under each scenario whether the mean and the cumulants at mid-cycle would be attained in time. It follows that modulation of through alone or at fixed predicts a time-dependent solution at mid-cycle that is consistent with the steady state assumption above (Fig. S3C–G, STAR Methods). Thus the two-state model explains the data collapse and predicts that tuning only the mean occupancy uniquely describes the formation of expression boundaries regardless of their position in the embryo.
Transcriptional bursting in absolute units
Further insight into transcriptional mechanisms requires the absolute scales of kinetic parameters. To go beyond arguments based on cumulants, we adopted an approach that is agnostic to the modulation strategy. To resolve whether or is constant and exclude other non-trivial forms of modulation (i.e., multiple rates changing simultaneously), we inferred all kinetic rates from the full distribution of transcriptional activity, for each gene and at each position independently. We performed dual-color smFISH, tagging the 5’ and 3’ regions of the transcripts with differently colored probe sets that provide two complementary readouts of nascent activity (Fig. 4A) (Brody et al., 2011; Xu et al., 2016). The measured activities are correlated via a finite Pol II elongation time (Fig. S4A–C, STAR Methods) and provide two snapshots of the state of the gene. Jointly measuring the 5’ and 3’ activities constrains the possible configurations of nascent transcripts and Pol II configurations at each locus (Fig 4B).
Given a stochastic model of transcription, it is possible to extract the transcriptional parameters underlying the activities of each gene (Fig. 4C–D and Fig. S5). Using the two-state model, we calculated the likelihood of the joint distribution of 5’ and 3’ activities at each AP position while accounting for measurement noise (Fig. 4E, STAR Methods). The rate parameters , and for each AP position were inferred from the likelihood of the data according to Bayes’ rule. We sampled the joint posterior distribution of the parameters (Hastings, 1970), which provides a probability for each parameter combination given the observed data. All inferred parameters with respective errors were estimated from the sampled joint posterior distribution (Fig. 4E and S5C). Validating our approach, inference on synthetic data clearly shows that the parameters are identifiable as long as the Pol II elongation rate is measured independently (Fig. S6A–F). Moreover, the previously measured Pol II elongation rate kb/min (Garcia et al., 2013) provides an absolute time scale, enabling inference of endogenous kinetics from chemically cross-linked, inert embryos.
The inferred kinetic rates revealed nearly identical modulation across all expression boundaries, regardless of gene identity or boundary position (Fig. 5). Consistent with predictions based on cumulants (Fig. 3), the initiation rate is constant at 7.21.0 Pol II initiations per minute and does not change across genes or positions (Fig. 5A). Thus while in the ‘ON’ state, these genes share the same rate-limiting step(s) in the cascade of molecular interactions leading to productive Pol II elongation, as reported for constitutive genes (Choubey et al., 2015). We also observe close agreement between measured and inferred mean activity, as well as good agreement between all other cumulants (Fig. S6G–J). Our inference confirms that all expression boundaries are generated through modulation of the mean promoter occupancy (Fig. 5B). This result supports the view that the processes that determine are disfavored as mechanisms for controlling mRNA synthesis rates. Because these rates are determined by for all genes, and span a similar dynamic range for all boundaries (Fig. S6K), we advocate that promoter occupancy represents the key control parameter describing expression boundary formation.
Current models of boundary formation imply a careful tuning of multiple input factor concentrations and DNA binding affinities (Briscoe and Small, 2015; Jaeger, 2011). The complexity and diversity of these inputs leads to an intuitive expectation that kinetic switching rates will differ between genes. This expectation seems all the more reasonable given that many combinations of and generate the same . Surprisingly, both and are tightly constrained for all genes and across all boundaries when portrayed as a function of mean occupancy (Fig. 5C, D). This suggests that some combination of and must be conserved. Indeed, as predicted by the cumulant analysis above, our measurements confirm that the conserved combination is in fact the correlation time of the switching process , which is roughly constant at all positions over the entire expression range for every gene (Fig. 5E).
Our inference thus revealed that the more natural parameterization of this system is expressed in terms of the three independent, uncorrelated variables {, }, in which only is modulated (Table 1). The conservation of correlation time implies that and must be carefully coordinated such that all boundaries emerge from quantitatively identical modulation of switching rates. In addition, these conclusions are unaffected by changes in elongation rate, which only rescales the kinetic parameters (Fig. S6L–N, STAR Methods).
Our observation of constant and has several implications. Much prior work has characterized bursting in terms of burst size (the average number of transcripts produced per burst) and burst frequency (which reduces to for short burst durations, i.e. small ) (Dar et al., 2012; Siddharth S et al., 2015). Interestingly, by virtue of the constancy of and , at high activity () mainly the burst size changes (Figure 5F), while for , it is the burst frequency that changes (Figure 5G). These results recapitulate recent observations of frequency modulation (Bartman et al., 2016; Fukaya et al., 2016; Larson et al., 2013; Li et al., 2018; Senecal et al., 2014) and might explain previously observed global trends in burst size (Sanchez and Golding, 2013).
Provided all gene become transcriptionally competent at the same time following mitosis (Blythe and Wieschaus, 2015; Blythe et al., 2016), the conserved correlation time we measure here implies that all genes reach steady state simultaneously (Figure S3C,F). Consistent with prior observations (Dubuis et al., 2013; Garcia et al., 2013), synchronicity suggests that the relative mean synthesis rates are maintained (i.e. unmodulated) across the patterning boundaries during early development (Figure S3F). In addition, a short correlation time ( min, small relative to 15 min duration of interphase 13) ensures effective temporal averaging of the switching noise by accumulation of stable transcripts, further suggesting that both expression timing and noise minimization jointly constrain switching rates. These dynamic constraints may be essential for precise and reproducible patterning outcomes, affecting the range of permissible values of and . Together these results show that for the gap genes, the apparently complex process of regulating expression rates is explained by a conceptually simple, shared modulation strategy of bursting kinetics. Our approach opens a path to uncovering general principles to unify the regulation of transcription across genes.
DISCUSSION
A multitude of processes influence eukaryotic transcription rates. It is unclear which events might be more likely than others to determine the kinetics of bursting—either globally or in a gene specific manner. Nor is it known how bursting kinetics compare across endogenous genes over a range of expression levels. Our quantitative bursting measurements reveal that all gap gene expression boundaries arise from the same underlying kinetics regardless of the differences in regulatory elements. Thus, from the complex combination of diverse interactions specific to each gene emerges a simple, common strategy for transcriptional regulation.
Our recognition of shared regulation surfaced only upon development of a highly precise single molecule method of quantification. Conclusions about bursting depend heavily upon understanding sources and extent of measurement error and minimizing variability from extrinsic sources. Extrinsic processes, such as cell growth and division, DNA duplication, and mRNA transport and decay, can significantly affect the apparent variability between cells, and thus also bursting rates (Battich et al., 2015; Halpern et al., 2015; Zopf et al., 2013). We minimize these effects by measuring transcription at nascent sites in an endogenous system with synchronized cell divisions. Moreover, explicit quantification of measurement error resulted in a noise model that significantly constrained our inference framework. All these approaches are generally applicable to enable precise quantification in any system.
The fundamental mean–cumulant relationships we uncovered demonstrate that a single parameter distribution globally determines transcriptional activity (Fig. 3B–D). Employing the telegraph model (Peccoud and Ycart, 1995), we find that the modulation of mean occupancy predicts mean mRNA synthesis rates comparable with previous measurements (Fig. S6O, Garcia et al., 2013) and reproduces the distribution of nascent activity (Fig. S6G–J), whereas and are conserved. The global behavior we observe is surprising, given that bursting is generally believed to be gene- and promoter-specific. Multiple factors and processes all impinge on bursting rates, including enhancer-promoter interactions, chromatin context, nucleosome occupancy, Pol II pausing, and transcription factor interactions (Bartman et al., 2016; Brown and Boeger, 2014; Carey et al., 2013; Dar et al., 2012; Fukaya et al., 2016; Molina et al., 2013; Senecal et al., 2014; Siddharth S et al., 2015; Suter et al., 2011; Weinberger et al., 2012; Zenklusen et al., 2008). It remains to be determined whether the same processes are modulated in the same manner, or conversely whether different regulatory strategies have converged, to generate identical transcriptional activity across genes.
These observations raise the question of whether the common transcriptional bursting kinetics carry a functional advantage (Eldar and Elowitz, 2010). In early embryos, the precise positioning of cell fates requires minimizing variability between nuclei, which is achieved by a combination of long mRNA lifetimes permitting accumulation and spatial averaging through the syncytial cytoplasm (Little et al., 2013). In principle, modulating at a constitutive promoter would generate the theoretical minimal (Poisson) transcriptional noise at all levels (Sanchez et al., 2013). The fact that neither constitutive activity () nor Pol II saturation (bp Pol II footprint) is ever observed suggests that some constraint prohibits this system from maintaining a continuous active state, and/or it is not straightforward to alter . Instead, a constant switching correlation time suggests that this value is important in facilitating robust patterning. We propose that both expression timing and noise minimization jointly constrain switching rates.
The mechanistic origins of the conserved parameters are unknown. One possibility is that protein–DNA affinities have been individually selected to confer the switching rates we observe. However, it is unclear how transient transcription factor interactions, usually on the order of seconds, could generate bursts on the order of minutes (Elf et al., 2007; Izeddin et al., 2014; Karpova et al., 2008). Another possibility is that the fast transcription factor binding kinetics is masked by the slower dynamics of common general factors involved in the transcription process. In fact, recent evidence suggests that Mediator and TBP binding, as well as the core promoter and its shape play a key role in bursting (Li et al., 2018; Schor et al., 2017; Tantale et al., 2016). Alternatively, processes of potentially even slower dynamics such as long-range enhancer–promoter interactions, chromatin modification, or Pol II pausing may determine common bursting kinetics (Chen et al., 2018; Henriques et al., 2018; Nicolas et al., 2018).
The observed constancy of will guide further modeling and identification of the molecular mechanisms. This constancy is connected to the binomial noise level (STAR Methods, Eq. 9). Extensions of the two-state model must provide similar filtering of the binomial noise, which will restrict the possible class of models. For example, we tested two particular extensions of the two-state model. One possibility is a 3-state model, consisting in a two-step reversible activation (Rieckh and Tkačik, 2014). Alternatively, a model with an additional noise term such as input noise stemming from input transcription factor diffusion (Kaizu et al., 2014; Tkačik et al., 2008) could explain dual modulation of switching rates observed under the two-state model. However, distinguishing these models will require live imaging.
The common transcriptional parameters of the gap genes highlight a form of complexity reduction: despite the variety of upstream regulatory elements, all expression boundaries result from similar bursting kinetics. Whether this signature results from an underlying molecular simplicity has yet to be determined. Regardless of the mechanistic means by which these similarities are achieved, the convergence suggests the general constraints that limit the range of permitted bursting rates and/or minimize transcription variability. The unexpected conservation of the initiation rate and the correlation time might indicate a path to general rules in transcriptional regulation. It is now possible to inquire about the breadth of these generalities and whether they apply to the same gene expressed in different cell types, or to the transcriptome as a whole, or even across organisms. Indeed, it appears plausible that other classes of genes share similarly constrained bursting kinetics (Sanchez and Golding, 2013). The methods we utilize here are applicable in a variety of systems and permit the discovery of the molecular mechanism(s) conferring unified transcription kinetics.
STAR METHODS
CONTACT FOR REAGENT AND RESOURCE SHARING
Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Thomas Gregor (tg2@princeton.edu).
EXPERIMENTAL MODEL AND SUBJECT DETAILS
Fly strains
Oregon-R (Ore-R) embryos were used as wild-type. Embryos heterozygous for a deficiency spanning hb were collected from crosses of heterozygous adults of the strain w1118; Df(3R)BSC477/TM6C. Heterozygotes of the hb deficiency, as well as wild-type male and female embryos stained for gt, were distinguished from siblings by visual inspection of nascent transcription sites.
METHOD DETAILS
DNA oligonucleotides
Oligonucleotide sequences complementary to the open reading frames of each gene of interest were chosen using the Biosearch Technologies Stellaris RNA FISH probe designer (www.biosearchtech.com/support/tools/design-software/stellaris-probe-designer). Amine-modified oligonucleotides were obtained from Biosearch Technologies, chemically coupled to NHS-ester-Atto565 (Sigma-Aldrich; 72464) or -Atto633 (Sigma-Aldrich; 01464) and purified by HPLC. Probes are listed in Table S1.
smFISH protocol
We modified our smFISH protocol (Little et al., 2013) to minimize background and maximize signal. Embryos were crosslinked in 1xPBS containing 16% paraformaldehyde for 2 minutes before devitellinization. Embryos were washed four times in methanol, 5 minutes per wash, with gentle rocking at room temperature, followed by an extended 30–60 minute wash in methanol. Fixed embryos were then used immediately for smFISH without intervening storage. Embryos washed three times in 1X PBS, 5 minutes per wash, at room temperature with rocking. Embryos were then washed 3 times in smFISH wash buffer (Little et al. 2013), 10 minutes per wash, at room temperature. During this time, probes diluted in hybridization buffer (Little et al. 2013) were preheated to 37°C. Hybridization was performed for 1.5 hr at 37C with vigorous mixing every 15 minutes. During hybridization, smFISH wash buffer was preheated to 37C. Embryos were washed four times with large excess volumes of wash buffer for 3–5 minutes per wash, rinsed twice briefly in PBS, stained with DAPI, and mounted in VECTASHIELD (Vector Laboratories; H-1000). Imaging was performed within 48 hr to ensure high quality signal.
Imaging
Imaging was performed by laser-scanning confocal microscopy on a Leica SP5 inverted microscope. We used a 63x HCX PL APO CS 1.4 NA oil immersion objective with pixels of nm2 and z spacing of nm. We typically obtained stacks representing μm in total axial thickness starting at the embryo surface. The microscope was equipped with “HyD Hybrid Detector” avalanche photodiodes (APDs) that we utilized in photon counting mode. This is in contrast to our prior approach (Little et al. 2013) in which standard photomultiplier tubes (PMTs) were used to collect two separate smFISH image stacks at two different laser intensities: a low power stack for measuring transcription intensities, and a high power stack to distinguish single mRNAs. The use of low-noise photon-counting APDs in place of standard photomultipliers provided sufficient dynamic range to capture high signal transcription sites and to separate relatively dim cytoplasmic single mRNAs from background fluorescence with a single laser power. This also abrogated the need to calibrate the high- and low-power stacks for comparison. The removal of the calibration step provided an additional reduction in measurement error.
Image analysis
Raw data are processed according to previously developed image analysis pipeline (Little et al., 2013). Briefly, raw images are filtered using a Difference-of-Gaussians (DoG) filter to detect spot objects. A master threshold is applied to separate candidate spots from background. True point-like sources of fluorescence are identified, as they appeared on multiple consecutive z-slices () at the same location. All candidate particles are then labeled as transcription sites, cytoplasmic transcripts or noise based on global thresholds. The threshold separating cytoplasmic transcripts from noise is defined as the bottom of the valley between the two peaks on the particle intensity distribution. The threshold for transcription sites depends both on intensity and position, as transcription sites cluster in z and are enclosed in nuclei (segmented from DAPI staining). Intensity of transcription sites is obtained by integrating the signal over a fixed cylinder volume (, determined from the objective’s PSF).
Calibration in absolute units
We calibrated the integrated intensity of transcription sites by first characterizing the relationship between the fluorescence signal and the density of cytoplasmic transcripts. We defined summation volumes in the embryo () avoiding region of high tissue deformation and excluding transcription site location. For each summation volume we counted the number of detected cytoplasmic transcripts and integrated the fluorescence intensity. At low count density, the fluorescence per summation volume scales linearly with density (Little et al., 2013). Fitting a simple linear relationship , where corresponds to background, enables estimation of a scaling factor to calibrate transcription sites in “cytoplasmic units” (C.U.) for each embryo. Namely, the intensity in C.U. is given by where b is the background intensity per pixel in each nucleus. The resulting quantification of transcriptional activity for all gap genes is provided in Supplemental Data.
Measurement error
Embryo staging
In order to assess the timing of the different embryos, we first manually ranked the different embryos based on timing estimation from DAPI staining. We estimated the interphase stage relying on morphological features of the nuclei (shape and textures) in the DAPI channel. We then verified whether accumulation of cytoplasmic mRNAs correlates with our manual ranking (Fig. S1A). Both approaches lead to similar results and provide a decent proxy for timing. By comparing the average activity of the different embryos in the maximally expressed regions with the cytoplasmic density, we assessed the effect of timing on the mean activity (Fig. S1B). We estimated the Pearson correlation coefficient for the different genes and regions (gt anterior and posterior regions). Overall, timing explain up to = 44% of the embryo variability (defined as the variance of the mean activity among embryos ) in the maximally expressed regions (Fig. S1C), with the exception of kni that is highly correlated . We thus separated the kni embryos in two sub-populations, early and late embryos. We performed the splitting by finding the cytoplasmic density threshold that minimizes the sum of within-population variance in mean activity. We then calculated the staging variability , defined as the variability in mean activity explained by timing between late and early embryos (Fig. S1D). Given the overall small staging variability , the total mean activity is stable enough to warrant the assumption of steady-state.
Imaging noise model
We quantified measurements noise due to imaging and calibration using a two-color smFISH approach, labeling each mRNA in alternating colors along the length of the mRNA. We included 15 hb embryos in the analysis, which corresponds to approximately 4’000 nuclei activity measurements. We then normalized the activity (fluorescence signal) of the nuclei in cytoplasmic units independently in each channel. In absence of noise and provided accurate normalization, both channels would perfectly correlate with slope one. By plotting one channel against the other (Fig. 2A), we assessed the slope and characterized the spread of the data along the expected line.
We build a simple effective model to describe measurement noise:
(Eq. 2) |
where stands for the fluorescent signal in cytoplasmic units and the total nascent transcripts (in C.U.) in absence of noise. We assumed that the measurement errors were normally distributed and independent in both channels, which was motivated by the absence of correlation in the background. We further assumed that the variance would depend on activities, consistent with the increasing spread observed in the data. In order to estimate the variance specific to each channel, we fitted a straight line assuming error on both and . We expanded the variance as a function of the scalar projection along the line :
Assuming the same error along and , we then maximized the following likelihood to estimate the parameters :
Using the Akaike information criterion, we selected the best model which was parameterized by with . The best fitting parameters were: and (Fig. 2A). The variances in the noise measurement model (Eq. 2) are then given by:
where . The resulting imaging noise is shown in Figure S1E. In the maximally expressed regions, we measure transcriptional activity with an error of and relate it to absolute units with an uncertainty below (the largest deviation of the slope from ). This represents an error reduction by 3- to 4-fold compared to our previous measurements (assuming multiplicative errors; vs ) (Little et al., 2013).
Splitting of the total variance
The Anterior-Posterior axis (AP) was determined based on a mid-sagittal elliptic mask of the embryo in the DAPI channel (Little et al., 2013). Position is obtained by registration of high- and low-magnification DAPI images of the surface. We then fitted constrained splines to approximate the mean activity as a function of the AP position. We used different features of the mean profiles such as maxima and inflection points to refine the alignment between the different embryos. Overall, this realignment procedure enables us to estimate an alignment error of the order of egg length.
After alignment, we defined spatial bins along the AP-axis with a width of of egg length. Such a width was a good compromise to balance the sampling and binning error. We next sought to decompose the measured total variance of the transcriptional activity (Fig. 1D) into different components related to imaging, alignment, embryo and nuclei variability (Fig. 2B–D). We first estimated the variability of the mean across embryos in each bin (Fig. 2B); we split the total variance in each bin according to the law of total variance:
where is the total number of embryos and the global mean.
Next we aimed to determine what fraction of is explained by residual misalignment. Assuming that all the variability in the mean at boundaries results from spatial misalignment of the different embryos, one can find an upper bound on the residual alignment error :
where is the global mean profile as function of AP position . For each gene, we estimated the residual alignment error required to explain as much embryo variability as possible (Fig. S1F, diagonal dash line). Overall we found that is of the order of egg length. The total embryo variability in the maximally expressed regions cannot be explained by misalignment as and leads to a noise floor (Fig. S1F, horizontal dash line). This noise floor can be partly explained by variability in the stage (early versus late interphase) of the different embryos (Fig. S1C–D). In the following we thus split where is the residual embryo to embryo variability.
Finally, we assessed what fraction of the total variance corresponds to combined measurement noise where was estimated in subsection (STAR Methods, Imaging noise model). Total measurement noise remains below of the total variance for all genes and all position (Fig. S1G), and on average reaches . The remaining variability corresponds to biological variability where is the nuclei variability and was defined as:
Overall, the non-nuclear variability remains below of the total variance for all genes and all position (Fig. S1H), and on average reaches . Thus, the nuclei variability largely dominates in our data and represents of the total variance on average (Fig 1E–F).
Single parameter distribution of transcriptional activity
Noise mean-relationship in the FISH data
In practice, we measure transcriptional activity in cytoplasmic units (intensity in equivalent number of fully elongated transcripts) and not in Pol II counts directly. The measured mean activity in cytoplasmic units is proportional to the mean Pol II counts for a single gene copy , i.e. where is a conversion factor accounting for the FISH probe locations on the gene and the number of gene copies (for most gap genes , except for gt male and hb deficient that only have 2 copies). Assuming independence of loci, the measured variance follows a similar relationship, i.e. with . The conversion factors and are constant that are unique for each gene and are calculated further (STAR Methods, Conversion factor for Pol II counts).
As we will see later (Eq. 9), one can derive the following functional form for the variance in Pol II counts for a single gene copy:
where is the maximal mean Pol II counts on the gene that is determined by the Pol II initiation rate and elongation time , and a quantity that is related to the dynamics of the promoter activity and bounded . Of note, for a constitutively expressed gene such that the variance reduces to (Poisson variance). In principle, the values of both and are gene-specific and could have specific dependency on . The interpretation of the equation above and the quantities and will be discussed in greater details later on (STAR Methods, Two-state model of transcriptional activity). Using the relationships between the cytoplasmic units and Pol II counts for the mean and variance above, we can express the measured noise as:
where is the maximal mean expression level in cytoplasmic units. In practice, and (Table S2, 5’ probe location) such that the Poisson noise background in cytoplasmic units is approximately . By setting , we further simplify the equation above and obtain:
(Eq. 3) |
with . By assuming and constant, we found that the above noise-mean relationship (Eq. 3) captures the overall trend in the data well (Fig. 3A), with and ). Although both gt male and hb deficient follow a similar trend, they deviate from the black line, () and () respectively. Interestingly, despite the fact that and could a priori be gene-specific, is roughly conserved across genes and differences in can be explained by variation in gene copies ( copies for gt male and hb deficient instead of ) and gene length (gt is shorter than hb, Table S3). This suggests that some key quantities underlying transcription are conserved among the gap genes and can be highlighted by proper normalization of the measured activity.
Normalized cumulants for a single gene copy
To further investigate the transcriptional commonalities of the gap genes, we calculated the 2nd, 3rd and 4th cumulants from the data (Fig. 3B–D). For independent random variables, the cumulants have the property to be extensive, which is convenient as the measured transcriptional activities result from the sum of or independent gene copies. We first converted the th cumulants computed from the data in cytoplasmic units to Pol II counts (or number of nascent transcripts) for a single gene copy with a normalized gene length:
where is the th cumulant in Pol II counts for a single gene copy, the gene length, the gene copy number ( for most genes, except gt male and hb deficient that only have copies) and a conversion factor for the th cumulant to ensure proper normalization of the Poisson background (Eq. 4 and Table S2). The annotated gene length varies between to kb for the gap genes. In the following we used an effective gene length that is slightly larger and takes into account the possible lingering of fully elongated transcripts at the loci (Table S3). This effective gene length can be estimated from the dual color FISH data (STAR Methods, Dual color smFISH and effective gene length). For the normalization, we used a normalized gene length of kb.
We then fitted a second order polynomial of the mean activity to the variance (Fig. S2A and Fig. 3B) in order to estimate the maximal activity , which was defined as the second crossing point between the Poisson background (Fig. S2A dash line) and the fitted variance (solid line). We found Pol II for a normalized gene length of kb. Similarly, we fitted 3rd and 4th order polynomial of the mean activity to the cumulants and (Fig. 3C–D), constrained to reach the Poisson limit at . Of note, the cumulants of the Poisson distribution are all equal to the mean. As we observed in Figure 3B–D, the polynomial fits (solid lines) capture the main trend observed in the data, suggesting a simple relationship between the cumulants and the mean. It follows that the underlying activity distribution is essentially a universal single parameter distribution whose parameter is the mean activity. To test the extent of the universality, we repeated the analysis above of each gap gene individually (Fig. S2B–D). The individual fits (colored solid lines) remain relatively close to each other. Although the fits for hb slightly deviate from the other genes, the global shape of the cumulants is conserved.
Conversion factor for Pol II counts
As mentioned above, the cumulants of the transcriptional activity in cytoplasmic units are related to the cumulants in number of nascent transcripts or Pol II counts on the gene by conversion factors . We calculated these conversion factors to ensure proper normalization of the Poisson background, meaning that the conversion of cumulants in C.U. for a constitutive gene would yield the correct cumulants in Pol II counts. Knowing the exact location of the fluorescent probe binding regions along the gene, one can calculate the contribution of a single nascent transcript to the signal in C.U. as a function its length :
where is the unit step function, the end position of the probe binding region and the total number of probes. Here, we made the assumptions that each fluorescent probe contributes equally to the signal and each transcribed probe region bound. The number of probes bound to a transcript of length is given by and will be denoted for with the length of a fully elongated transcript. The total fluorescent signal in cytoplasmic units for transcripts is given by
where , with the number of transcripts whose length belongs to the length interval . Assuming that follows a Poisson distribution with parameter where , the mean fluorescent signal is then given by
where and the conversion factor that relates the mean number of transcripts to the mean fluorescent signal in cytoplasmic units. This relation remains valid for the two-state model with (Eq. 8).
As for the mean, one can calculate the conversion factors for the higher moments and cumulants assuming a Poisson background. The second moment is given by
where since initiation events are assumed independent. This only holds for the Poisson background and is no longer exact for the two-state model as the switching process would introduce correlations. Nevertheless, the conversion factors for the higher moments and cumulants calculated below remain a good approximation under the two-state model, provided most probes are located in the 5’ region. The variance of the signal is finally given by
The calculation above can be generalized to the 3rd and 4th cumulants. We found the following correction factor for the Poisson background:
(Eq. 4) |
Calculated values of for each gene and two different configurations of probe locations (5’ or 3’ region) are given in Table S2.
Two-state model of transcriptional activity
Master equation
Transcriptional activity of a single gene copy was modeled as a telegraph process (on-off promoter switching) with transcript initiation occurring as a Poisson process during the ‘on’ periods (Peccoud and Ycart, 1995). Within the two-state model (Figure 3E), the distribution of nascent transcripts on a gene results from random Pol II initiation in the active state coupled with elongation and termination (Choubey et al., 2015; Senecal et al., 2014; Xu et al., 2016). For simplicity, we combined elongation and termination as an effective process that was modeled as a deterministic progression (constant Pol II elongation rate). In addition, we assumed that all the kinetic rates of the model are constant in time and identical across embryos. The kinetic parameters of the model are the initiation rate , the promoter switching rates and , and the elongation time .
The master equation that governs the temporal evolution of nascent transcripts at loci is given by
(Eq. 5) |
with the number of nascent transcripts (or alternatively the number of Pol II) on the gene and the promoter state. We used the convention that and correspond to the ‘on’ state and ‘off’ state respectively, and the following periodic conditions and . Here, stands for the Kronecker delta since initiation only occurs in the active state. Of note, we only considered the promoter switching and the initiation of elongation (Eq. 5); we did not explicitly model release of transcripts after termination. The rationale is the following; only the initiation events occurring during the time interval contributes to the signal at time , i.e. the elongation time determined the ‘memory’ of the system. This is correct as long as the release events are instantaneous and termination is fast compared to elongation. Thus, the dynamics of nascent transcripts accumulation on the gene for is obtained by solving the master equation with zero initial transcript on the gene and an arbitrary initial distribution of promoter state.
Summary statistics
We can derive the temporal evolutions of the central moments from the master equation (Eq. 5) (Lestas et al., 2008; Sanchez and Kondev, 2008). The means of nascent transcripts and promoter states satisfy the following equations:
(Eq. 6) |
At steady state (), the mean occupancy of the promoter is simply given by . Similarly, the covariance satisfies the following set of equations:
(Eq. 7) |
Assuming zero initial transcripts and promoter at steady state, one can solve both the mean and variance for . Thus, the initial conditions are given by , , and . Solving these equations (Eq. 6 & 7) for the elongation time leads to:
(Eq. 8) |
(Eq. 9) |
where is the maximal mean nascent transcript number or equivalently the mean number of transcripts in a constitutive regime (gene always ‘on’) and a noise filtering function that takes into account the fluctuation correlation times. Here, the relevant time scales are the elongation time and the promoter switching correlation time . The variance results from the sum of two contributions; the Poisson variance stemming from the stochastic initiation of transcript and the propagation of switching noise:
For deterministic elongation, we find that the noise filtering function is given by:
In the limit of fast and slow promoter switching respectively, the noise filtering function reduces to
Thus, the noise is minimal in the fast switching regime and reaches the Poisson limit . While in the slow switching regime , none of the switching noise is filtered and the variance is described by a second order polynomial of the mean occupancy , i.e. . Of note, for exponentially distributed life-time of transcripts, such as cytoplasmic mRNA subject to degradation, the results above remain valid except that the noise averaging function becomes with the average life-time of the transcripts.
Following a similar approach as in the previous paragraph, higher order moments and cumulants are analytically calculated from the master equations (Eq. 5). The cumulants up to order 3 are equal to the central moments while higher order cumulants can be expressed as a combination of central moments. The 4th cumulant is given by , where is the 4th central moment and the variance. Assuming promoter at steady state, we solved the equations for 3rd and 4th moments of and derive the following analytical expressions for 3rd and 4th cumulants, and :
(Eq. 10) |
(Eq. 11) |
where and are noise filtering functions that vanish in the fast switching regime () and tend to one in the slow switching regime ():
The above expressions for the cumulants are exact and were tested numerically. The cumulants are polynomials of the mean promoter activity , which follows from the propagation of the binomial cumulants from the switching process. Since the cumulants are extensive, the cumulants for independent gene copies are obtained by multiplying by the expression for a single gene copy (Eq. 9, 10 & 11).
Cumulant analysis
Noise-mean relationship and cumulants predicted by the two-state model
Within the context of the two-state model, we tested whether any transcriptional parameter modulations could explain the global trends in the noise and the cumulants (Figure 3A–D). Since we showed based on the cumulants that the distribution of activity is a single parameter distribution, we restricted the analysis to single parameter modulations of the mean activity (Fig. 3F–I). It is worth mentioning a few important observations that will simplify this task.
First, we see by close inspection of the steady state cumulants (Eq. 9, 10 & 11) that sets the scale, i.e. all parameters are defined with respect to . In practice, the cumulants only depends on the three following independent parameters and . Thus, there is some freedom to set the scale of these rates. Here, we used min that is approximately the Pol II elongation time for the normalized gene length ( kb and kb/min; (Garcia et al., 2013)) and it will be considered fixed. Second, the magnitude of determines whether the Poisson (first term ) or the binomial (second term ) components dominates in the expression of the variance (Eq. 7). We immediately see that increasing the mean Pol II number on the gene by only modulating cannot explain the data, since it would lead to a monotonic increase of the variance whereas the observed trend is concave with a global maxima at mid-expression levels. The only way of achieving such a trend is by modulating provided the binomial term dominates the Poisson one. This condition implies that has to be sufficiently large for intermediate value of , i.e. . Alternatively, if is known, this inequality sets some constraints on the possible values of . Third, it is possible to give an estimate of from the polynomial fit of the measured variance (Fig. S2A and Fig. 3B). The second intercept of the fitted curve (black line) with the Poisson background (dash line), which should occur at , allows us to estimate . Assuming is maintained constant as is modulated, we have , which gives min−1 for min (see above).
We then investigated three different type of single parameter modulation to vary the mean Pol II number consistent with the observation above, namely, modulations of the mean occupancy from to by either varying alone, alone or both and while keeping the switching correlation time constant. The latter modulation also corresponds to single parameter modulation since and are then fully determined by . For each of these three types of modulation, one parameter is free (either or ) and sets the amplitude of the cumulants (Fig. S2E). In order to infer these free parameters, we fitted (maximum likelihood) the measured cumulants with the modeled ones (Eq. 9, 10 & 11) predicted by each modulation strategy (Fig. 3G–I). We found:
modulation: min−1 and
modulation: min−1 and
modulation at fixed min with and
We then calculated the noise-mean relationship (Eq. 3). We also show an example of a modulation alone (Fig. 3F, gray line); no matter the value of and this modulation cannot reproduce the trend in the data as explained above. The modulation of alone (green line) fails to capture the noise at low expression (Fig. 3F). On the other hand both the modulation of alone (blue line) and at constant (red line) provides good qualitative agreement with the data (Fig. 3F–I). As mentioned above, it is important to keep in mind that the units of and estimated here depends on the value of the elongation rate. Here, we used a conservative estimate of kb/min (Garcia et al., 2013), which is possibly too small for the gap genes (Fukaya et al., 2017). A different elongation rate would simply imply a rescaling of the rates and the correlation time without affecting the fitting results (STAR methods, Effect of elongation rate on inference). Namely, the mean occupancy would remain unchanged while the rates would be rescaled by a factor and the correlation time by , where corresponds to the new elongation rate.
Time-dependent cumulant analysis
Next, we investigated whether the single parameter modulation fitted above assuming steady state are consistent with the finite duration of the nuclear cycle (approximately 15 min in nc13). Namely, assuming all the data were taken at mid cycle, we asked under each modulation scenario whether steady state could be reached in a timely manner (mid cycle), as supported by our staging analysis (Fig. S1A–D) and other studies (Garcia et al., 2013). The relaxation time to steady state is determined by the switching correlation time . By solving the equation for the temporal evolution of the mean Pol II number (Eq. 6) with initial condition (no Pol II on the gene) and (gene initially ‘off’), one finds:
As mentioned above, the relaxation of the mean to its steady state value is determined by the correlation time through the exponential factor . As increases, the relaxation gets slower and slower (Fig. S3A). It follows that the finite duration of nc13 should set some upper bound on the possible value of . According to Fig. S3A, should not exceed 3 min for to reach approximately of the maximum activity ( for ) at mid cycle as observed in the data (Fig. 3B).
Each of the three single parameter modulations fitted above predicts different dependency of on the mean occupancy (Fig. S3B). Importantly, these values of were obtained for kb/min (Garcia et al., 2013). A larger elongation rate would lead to smaller correlation times (Fukaya et al., 2017) (STAR methods, Effect of elongation rate on inference). The main benefit of using a potentially smaller elongation rate, it provides a stronger guarantee that the time-dependent solution reaches steady state in time (as the relaxation is slower). For each modulation (Fig. S3C), we estimated what fraction of the steady state value is attained as a function of at mid cycle ( min). It turns out that the modulation clearly fails to reach steady state in time for higher occupancy, whereas both modulation of and at fixed cover the measured range of activity at mid cycle ( to of ). Each modulation predicts different boundary formation dynamics (Fig. S3D–F). For , the highly expressed regions (large ) relax much faster than the lowly expressed ones (small ), whereas for it is the opposite. Interestingly, at fixed , each position relaxes in synchrony and the activity ratio between them is conserved. The latter modulation appears more consistent with previous experimental observations (Dubuis et al., 2013; Garcia et al., 2013).
Next, we investigated the shape of the higher order time-dependent cumulants. Although the higher order time-dependent cumulants can be calculated from the moment equations, their analytical expressions are cumbersome. Alternatively, one can calculate the time-dependent cumulants directly from the time-dependent distribution of Pol II , which is easily computed numerically. With the same initial condition as the mean above, the time-dependent distribution of Pol II is given by:
where is the propagator of the telegraph model (STAR Methods, Distribution of nascent transcripts, Eq. 13) and the propagator of the switching process alone:
We then computed the 2nd, 3rd and 4th time-dependent cumulants from for each fitted modulation (Fig. S3G). Provided the elapsed time is sufficiently large compared to the correlation time and the elongation time, the time-dependent cumulants closely follow the steady state solution. Thus, both the modulation of alone and at fixed fitted assuming steady state predicts time-dependent mean versus cumulant curves at mid cycle ( min) that are consistent with the data. In addition, under these conditions, the time-dependent mean activity closely reflect the time-dependent mean occupancy :
Together it implies that even away from steady state, provided the elapsed time is sufficiently large (), the inference based on steady state solutions should yield good estimates of the parameters. Indeed, for fixed , the relationships between the mean and the cumulants at steady state are uniquely determined by , and . As long as time dependent-cumulants run along the steady state curves (Fig. S3G), the estimation of and will be correct while the estimation of the mean occupancy will in fact corresponds to the instantaneous mean occupancy as .
Inferring transcription kinetics of endogenous genes from dual color smFISH
Dual color smFISH and effective gene length
We performed dual-color smFISH tagging the 5’ and 3’ regions of the transcripts with different probe sets (Figure 4A, Table S1). After normalization in cytoplasmic units, both channels offer a consistent readout of the mean and the variability (Figure S4A,B). For each gene, given the 5’ and 3’ FISH probe configurations and assuming constant elongation rate, we calculated the expected ratio of 3’ over 5’ signal according to Eq. 4 using the annotated gene length (Fig. S4C–D, Table S3). The predicted ratios are consistent with the measured ones, albeit with small deviations likely stemming from termination (Fig. S4E). This suggests that nascent transcripts might be retained at transcription sites for a short duration. We then calculated for each gene, the effective length that would be consistent with the measured ratio (Fig. S4F, Table S3). Assuming an elongation rate kb/min (Garcia et al., 2013), we estimated the lag consistent with the length difference between the effective and annotated length (Fig. S4F inset). Nascent transcripts remain at the loci for at most s, which remains small compared to the typical elongation time for the gap genes min. In this study, we used the effective elongation time for each gene that includes the short lingering time, which was calculated from the effective gene length.
The two channels enable estimation of the total nascent transcripts (5’ channel) and the fractional occupancy of transcripts along the 5’ and 3’ portions of the gene at each locus (Figure 4B,C). Because the 5’ and 3’ activities are temporally correlated through the elongation process additional information about transcription can be extracted that is not available with a single channel/color (Figure 4B,D). Combining measurements from multiple embryos (Figure 4C,D), we select nuclei at similar positions (bins of 2.5% egg length) to generate the joint distribution of 5’ and 3’ activity across AP position bins (Figure 4D).
Distribution of nascent transcripts
Modeling the joint distribution of 5’ and 3’ activity based on the two-state model requires first to calculate two key distributions, namely the steady-state distribution of nascent transcripts (or Pol II number) on the gene and the propagator that describes the temporal evolution of an arbitrary distribution of nascent transcripts. Both distributions can be derived from the master equation (Eq. 5). Although the master equation can be solved using generating functions (Xu et al., 2016), we followed another route that can be easily extended to multi-state system and remains computationally tractable. The master equation can be written in terms of an operator containing the propensity functions of the different reactions:
After appropriate truncation on the transcript number (setting an upper bound for the maximum number of nascent transcripts) (Munsky and Khammash, 2006), the operator can be written in terms of a sum of tensor products of different matrices:
(Eq. 12) |
with standing for the identity matrix of size where is the maximum number of transcripts after truncation. The matrix encodes the rates of the possible transitions for the two-state promoter and indicates in which promoter state initiation occurs:
while describes the initiation of transcripts:
The propagator of the resulting finite system can be expressed as a matrix exponential of the operator:
(Eq. 13) |
where stands for the set of kinetic parameters . Although the propagator explicitly depends on the kinetic parameters, we chose to omit in the following for readability. The propagator dictates how an initial joint distribution of transcript and promoter state evolves after time in :
The distribution of nascent transcripts for a gene of length is typically calculated using the propagator above with the elongation time and the initial conditions. Since sets the ‘memory’ of the system, can be calculated with initially zero nascent transcript on the gene and is then given by:
(Eq. 14) |
where specifies the initial distribution of promoter state. The distribution can be computed efficiently by directly estimating the action of the initial vector on the matrix exponential (Sidje, 1998). Assuming the promoter at steady state, is then given by:
with the mean occupancy .
Provided each gene copy is independent and undistinguishable, the combination of two and four gene copies can be represented by a three- and five-state promoter model. The corresponding and matrices are given by:
The distribution of nascent transcripts is calculated according to Eq. 14, with the propagator computed from the updated operator (Eq. 12 & 13). The steady-state distribution of the -gene copy system is given by:
(Eq. 15) |
where is the steady state mean occupancy of a single promoter.
Joint distribution of 5’ and 3’ activity
Here, we lay out the approach used to calculate the joint distribution of 5’ and 3’ activity for an arbitrary configuration of 5’ and 3’ FISH probes. Analytic solutions for steady-state distributions with idealistic single color probe configuration exist (Xu et al., 2016), but solutions for arbitrary probe configurations and multi-color FISH are cumbersome. Here, the computational approach is general enough and can be applied to a large class of transcription model, at or out of steady-state (transient relaxation), provided the elongation process is assumed deterministic.
The measured 5’ and 3’ transcriptional activities result from partially elongated nascent transcripts. Each fluorescent probe is assumed to be instantaneously bound and to contribute equally to the total fluorescence. Thus, the fluorescent signal of each nascent transcript is proportional to the number of probe binding regions that have been transcribed. In order to calculate the joint distribution, one needs to proceed backward in time. Starting from the 3’ end up to the 5’ end of the gene, we accumulate the contribution of nascent transcripts to the signal that could have been initiated in the interval separating two successive probe regions. Since we assumed elongation to occur at constant speed, the distance between two successive probe regions can be converted into a time. Doing so for each interval leads to the following temporal hierarchy (Fig. S4G). We used the following naming conventions for the durations : the superscript stands for the probe channel, either for the 3’ probes (red channel) or for the 5’ probes (green channel), whereas the subscript denotes the interval separating probe from probe where increments are performed along the 3’ end to 5’ end direction.
For instance, if the 5’ and 3’ signal is measured at time , only transcripts initiated during the time interval fully contribute (1 C.U.) to the 3’ (red) signal, since only those get fully bound by 3’ FISH probes. On the other hand, transcripts initiated during will contribute less to the signal since the last probe region has not yet been transcribed at the time of the measurement . Thus, the individual contribution of these transcripts to the total 3’ signal is C.U., where is the total number of probes for the 3’ channel. As we will see below, the probability to initiate nascent transcripts during any duration is given by the propagator (Eq. 13), where and are the promoter states before and after .
For any model of promoter activity that only consider the stochastic initiation of transcripts (as a Poisson process) and deterministic elongation with instantaneous release, the propagator will satisfy the following equality:
Thus, one only needs to calculate , which can be computed much faster than the matrix exponential (Eq. 13) (Sidje, 1998). It then follows that the Chapman-Kolmogorov equation for the time propagation reduces to a discrete convolution:
This property is used extensively in the following calculation of the joint distribution.
The computation of the joint distribution is performed according to a dynamic programming approach that can in principle be applied to an arbitrary number of color probes. We first calculate recursively the 3’ contribution (red probes) to the signal , where stands for the total signal in probe space, the total number of nascent transcripts, the promoter state and the total number of probes covering the 3’ region. We then calculate the 5’ contribution in a similar fashion, . Lastly, we combine both components to generate the final joint distribution in probe space.
Step 1: calculate the 3’ contribution.
The initial distribution is given by:
where and are the initial distributions of promoter state at time and respectively. Assuming promoters at steady state, both distributions are then given by Eq. 15 for a multi-gene system. We then perform the following recursion scheme for :
where .
Step 2: calculate the 5’ contribution.
The initial distribution is given by:
We then perform the following recursion scheme for :
where . Lastly, we sum out :
Step 3: combine 3’ and 5’ contributions.
The final joint distribution of 5’ and 3’ activity in probe space is then given by:
where . and are the joint distributions computed at step 1 and 2. Since the actual signal resolution is of the order of 1 cytoplasmic unit (a fully tagged transcript with fluorescent probes), the joint distribution can be coarse-grained by aggregating the states by a block of size corresponding to a single cytoplasmic unit. The coarse-grained distribution will be denoted in the following. In addition, it is possible to compute faster and with good accuracy using a reduced effective number of probes , provided the original probe configuration is well approximated. Lastly, we remind the readers that implicitly depends on the kinetic parameters through the two-state model propagator, the elongation rate and the position of the probes through the temporal hierarchy (Fig. S4G).
Likelihood and inference
We modeled the joint distribution of 5’ and 3’ activity based on the two-state model and the exact probe location assuming steady state and constant Pol II elongation rate (Figure 4E; STAR Methods, Joint distribution of 5’ and 3’ activity). The resulting modeled activity distribution, together with the measurement noise model (Figure 2A; STAR Methods, Imaging noise model), enable calculating the likelihood of the 5’ and 3’ activities in C.U. (i.e. ) given a set of kinetic parameters . Specifically, the likelihood of the data given the parameters is expressed in terms of the measurement noise model (Eq. 2) and the joint distribution :
where is the total amount of data, i.e. the total number of measured nuclei per AP-bin for a given gene.
The general idea underlying “classical” inference is to maximize the probability of the data under some model, namely to find the parameters that maximize the likelihood of the data . In this manuscript we adopted a Bayesian approach, estimating the probability of the kinetic rate parameters of the two-state model given the observed data (i.e. the joint posterior distribution) using Bayes’ rule:
where is the prior that encodes for prior knowledge about the parameter values. We used a non-informative and independent prior for each kinetic parameter, which was chosen as log-uniform . Note that in absence of a prior , the most likely parameters are the ones that maximize . In that case, the Bayesian approach is essentially equivalent to “classical” maximum likelihood. The main advantage of the Bayesian approach over maximum likelihood is that it provides a natural way to estimate the uncertainty on the parameters through the joint posterior and allows us to determine whether the parameters are identifiable. Indeed, as the uncertainty grows, the posterior distribution becomes wider/flatter, which directly reflects on the range of the parameter confidence intervals.
Importantly, we set the elongation rate to the experimentally measured value of kb/min (Garcia et al., 2013). At steady state, a known value of is required to set the temporal scale of the other transcriptional parameters, which can be seen by inspecting the expressions of the various cumulants of the nascent transcript distribution (Eq. 9, 10 & 11). Since all cumulants can be parameterized by the three independent parameters , and the ratio , it follows that the model is not identifiable when the temporal scale is not set.
We then sampled the joint posterior distribution using a Markov chain Monte Carlo (MCMC) algorithm (Hastings, 1970), for each gene and at each AP position individually. The sampled joint posterior distribution enables estimation of the marginal posterior distribution for each kinetic rate and any combination of these rates, such as and . All the parameters of the model and the error bars were estimated from the marginal posterior distribution, as the median and the percentiles respectively (Figure 4E). The best-fitting distributions predicted by the model match the data closely (Fig. S5B), and outliers are mainly explained by measurement and binning noise. Importantly, our inference approach does not require any a priori assumptions about the underlying parameter modulation, nor does it assume any continuity between datasets. In principle, the inferred parameters could be different for each gene and be modulated in any arbitrary way.
Parameter identifiability and performance
As mentioned above, the two-state model is fully identifiable (structural identifiability) as long as is fixed. Indeed, in that case the steady state and time-dependent solution depend on three independent parameters, such as or . In principle, provided one has enough data and measurement noise is small, each parameter can be resolved individually. On the other hand, it is true that some regimes might require a very large/infinite amount of data to infer the different parameters without ambiguity (practical identifiability). For instance, in the case of instantaneous bursts, namely when and become large (i.e. approach infinity, but with finite ratio), only the burst size and the burst frequency are well defined. Thus it is not possible to infer the exact values of and individually. Such a scenario can be clearly diagnosed based on the marginal posterior distributions and (from which the median and the error bars of the parameters are estimated). Indeed, since we used non-informative priors, the variance of these marginal posterior distributions would become extremely large and thus less informative. More intuitively, and would no longer be sharply peaked around a mean value, but would take all possible values (consistent with the prior) that satisfy some error on . This would consequently lead to to very large error bars on and . Thus, the error bars extracted from the marginal posterior distribution are indicative for whether or not we can estimate these parameters.
To validate our inference framework, we tested the inference on simulated data using a broad range of parameter values and in presence of measurement noise. Using the Gillespie algorithm (Gillespie, 1977), we generated simulated nuclei activity data based on 4 independent gene copies modeled by the telegraph model. We used the probe configuration and gene length of hb and assumed a typical elongation rate of kb/min (Garcia et al., 2013). Measurement noise was included in the simulated data according to the characterization performed previously on real data (Imaging noise model). We investigated different parameter regimes and modulation schemes of the mean activity , to test whether the input parameters used to generate the data could be inferred properly (Fig. S6A–E). Namely, we tested:
Modulation of the initiation rate alone with min and (cyan dash line).
Modulation of the on-rate alone with min−1 and min−1 (green dash line).
Modulation of the off-rate alone with min−1 and min−1 (blue dash line).
Modulation of the mean occupancy alone with min−1 and min (red dash line).
For each scenario, we generated 8 batches of data covering the range of normalized activity . Each batch was made of 10 independently sampled datasets of 500 nuclei activity measurements. We performed the inference on each dataset individually and reported the mixture of posterior distribution over the 10 datasets to take into account the finite size variability in the generated data. We conclude that the inference framework performs well, since all the inferred quantities cover the true values within error bars. In addition, we estimated globally for all synthetic data the fractional inference error from the MCMC sampled parameters . For all inferred parameters, the median of the error never exceeds (S6F). Overall, the inference allows us to distinguish the different tested modulation strategies without ambiguities. In addition, the sampled joint posterior distributions are clearly peaked in the parameter space (S5C), indicating that practical identifiability is not an issue with real data.
Effect of elongation rate on inference
As discussed above, the elongation rate sets the temporal scale of the transcriptional parameters, thus a different elongation rate would lead to different values of the parameters. In the manuscript, we used a value of kb/min which we previously measured (Garcia et al., 2013). A recent study suggests that this value might be overall larger in the blastoderm embryo, of the order of kb/min (Fukaya et al., 2017). We thus sought to determine to which extent this new value would affect our results.
In principle, a different value of rescales the transcriptional parameters in a very predictable way. No matter the elongation rate, the three quantities , and should be perfectly identifiable. It follows that the new parameters (denoted by the * superscript) have to satisfy the following equations:
Inferring the transcriptional parameters from the data with kb/min instead of kb/min (as in the main text) confirms the rescaling above (Fig. S6L–N). As predicted, and are rescaled by a factor and respectively, whereas is conserved.
QUANTIFICATION AND STATISTICAL ANALYSIS
We imaged hunchback wild-type (labeled hb wt) in embryos; a hunchback deficiency fly line with half the hb dosage (hb def) ; Krüppel (Kr) ; knirps during early (kni early) and late nc13 (kni late) ; giant females with two alleles (gt female) and giant males with one allele (gt male) . On average the number of quantified nuclei per AP bin (2.5% egg length) is (hb wt), (hb def), (Kr), (kni early), (kni late), (gt female anterior region), (gt female posterior region), (gt male anterior region) and (gt male posterior region). The confidence intervals for all point estimators of the data (mean, variance, noise, third cumulant and fourth cumulant; Figures 1, 2 and 3) were built by bootstrapping the empirical distribution of activity in each individual embryo. We used the 68% confidence intervals for the point estimators. All the error bars for the inferred parameters (Figure 5) correspond to the 10 to 90th percentiles of the marginal posterior distributions.
Supplementary Material
REAGENT or RESOURCE | SOURCE | IDENTIFIER |
---|---|---|
Experimental Models: Organisms/Strains | ||
D. melanogaster: Oregon-R, wild-type laboratory stock | Flybase: FBst1000077 | |
D. melanogaster: chromosomal deletion spanning hb w[1118]; Df(3R)BSC477/TM6C, Sb[1] cu[1] | Bloomington Drosophila Stock Center | Flybase: FBab0045343 BDSC: 24981 |
Oligonucleotides | ||
smFISH probes for hb, see Table S1 | This paper | N/A |
smFISH probes for Kr, see Table S1 | This paper | N/A |
smFISH probes for kni, see Table S1 | This paper | N/A |
smFISH probes for gt, see Table S1 | This paper | N/A |
Software and Algorithms | ||
FiSH Toolbox | Little et al. (2013) | N/A |
ACKNOWLEDGEMENTS
We thank C. Bartman, W. Bialek, P. Francois, M. Levo, J. Mozzicanocci, F. Naef, A. Raj, T. Sokolowski, G. Tkacik and E. Wieschaus for insightful discussion and valuable comments on the manuscript. B. Zoller was supported by the Swiss National Science Foundation early Postdoc.Mobility fellowship. This study was funded by grants from the National Institutes of Health (U01 EB021239, U01 DA047730, R01 GM097275) and the National Science Foundation (PHY-1734030).
Footnotes
DECLARATION OF INTERESTS
The authors declare no competing interests.
REFERENCES
- Bar-Even A, Paulsson J, Maheshri N, Carmi M, O’Shea EK, Pilpel Y, and Barkai N (2006). Noise in protein expression scales with natural protein abundance. Nat. Genet 38, 636–643. [DOI] [PubMed] [Google Scholar]
- Bartman CR, Hsu SC, Hsiung CCS, Raj A, and Blobel GA (2016). Enhancer Regulation of Transcriptional Bursting Parameters Revealed by Forced Chromatin Looping. Mol. Cell 62, 237–247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Battich N, Stoeger T, and Pelkmans L (2015). Control of Transcript Variability in Single Mammalian Cells. Cell 163, 1596–1610. [DOI] [PubMed] [Google Scholar]
- Blake WJ, Kærn M, Cantor CR, and Collins JJ (2003). Noise in eukaryotic gene expression. Nature 249, 247–249. [DOI] [PubMed] [Google Scholar]
- Blumenthal AB, Kriegstein HJ, and Hogness DS (1974). The units of DNA replication in Drosophila melanogaster chromosomes. Cold Spring Harb. Symp. Quant. Biol 38, 205–223. [DOI] [PubMed] [Google Scholar]
- Blythe SA, and Wieschaus EF (2015). Zygotic genome activation triggers the DNA replication checkpoint at the midblastula transition. Cell 160, 1169–1181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blythe SA, Wieschaus EF, Ali-Murthy Z, Lott S, Eisen M, Kornberg T, Almouzni G, Wolffe A, Amodeo A, Jukam D, et al. (2016). Establishment and maintenance of heritable chromatin structure during early Drosophila embryogenesis. Elife 5, 1752–1765. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bothma JP, Garcia HG, Esposito E, Schlissel G, Gregor T, and Levine M (2014). Dynamic regulation of eve stripe 2 expression reveals transcriptional bursts in living Drosophila embryos. Proc. Natl. Acad. Sci 111, 10598–10603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Briscoe J, and Small S (2015). Morphogen rules: design principles of gradient-mediated embryo patterning. Development 142, 3996–4009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brody Y, Neufeld N, Bieberstein N, Causse SZ, Böhnlein EM, Neugebauer KM, Darzacq X, and Shav-Tal Y (2011). The in vivo kinetics of RNA polymerase II elongation during co-transcriptional splicing. PLoS Biol. 9, e1000573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brown CR, and Boeger H (2014). Nucleosomal promoter variation generates gene expression noise. Proc Natl Acad Sci U S A 111, 17893–17898. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carey LB, van Dijk D, Sloot PMA, Kaandorp JA, and Segal E (2013). Promoter Sequence Determines the Relationship between Expression Level and Noise. PLoS Biol. 11, e1001528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen H, Levo M, Barinov L, Fujioka M, Jaynes JB, and Gregor T (2018). Dynamic interplay between enhancer–promoter topology and gene activity. Nat. Genet 304. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Choubey S, Kondev J, and Sanchez A (2015). Deciphering Transcriptional Dynamics In Vivo by Counting Nascent RNA Molecules. PLoS Comput. Biol 11, e1004345–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Coulon A, Chow CC, Singer RH, and Larson DR (2013). Eukaryotic transcriptional dynamics: from single molecules to cell populations. Nat. Rev. Genet 14, 572–584. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dar RD, Razooky BS, Singh A, Trimeloni TV, McCollum JM, Cox CD, Simpson ML, and Weinberger LS (2012). Transcriptional burst frequency and burst size are equally modulated across the human genome. Proc. Natl. Acad. Sci. U. S. A 109, 17454–17459. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dubuis JO, Samanta R, and Gregor T (2013). Accurate measurements of dynamics and reproducibility in small genetic networks. Mol. Syst. Biol 9, 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eldar A, and Elowitz MB (2010). Functional roles for noise in genetic circuits. Nature 467, 167–173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Elf J, Li G-W, and Xie XS (2007). Probing Transcription Factor Dynamics at the Single-Molecule Level in a Living Cell. Science (80-. ). 316, 1191–1194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Estrada J, Wong F, DePace A, and Gunawardena J (2016). Information Integration and Energy Expenditure in Gene Regulation. Cell 166, 234–244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fukaya T, Lim B, and Levine M (2016). Enhancer Control of Transcriptional Bursting. Cell 166, 358–368. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fukaya T, Lim B, and Levine M (2017). Rapid Rates of Pol II Elongation in the Drosophila Embryo. Curr. Biol 27, 1387–1391. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garcia HG, Tikhonov M, Lin A, and Gregor T (2013). Quantitative Imaging of Transcription in Living Drosophila Embryos Links Polymerase Activity to Patterning. Curr. Biol 23, 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gillespie DT (1977). Exact Stochastic Simulation of Coupled Chemical Reactions. J Phys. Chem 81, 2340–2361. [Google Scholar]
- Golding I, Paulsson J, Zawilski SM, and Cox EC (2005). Real-time kinetics of gene activity in individual bacteria. Cell 123, 1025–1036. [DOI] [PubMed] [Google Scholar]
- Gregor T, Garcia HG, and Little SC (2014). The embryo as a laboratory: quantifying transcription in Drosophila. Trends Genet. 30, 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Halpern KB, Tanami S, Landen S, Chapal M, Szlak L, Hutzler A, Nizhberg A, and Itzkovitz S (2015). Bursty gene expression in the intact mammalian liver. Mol. Cell 58, 147–156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hastings WK (1970). Monte carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109. [Google Scholar]
- Henriques T, Scruggs BS, Inouye MO, Muse GW, Williams LH, Burkholder AB, Lavender CA, Fargo DC, and Adelman K (2018). Widespread transcriptional pausing and elongation control at enhancers. Genes Dev. 32, 26–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoch M, Schröder C, Seifert E, and Jäckle H (1990). cis-acting control elements for Krüppel expression in the Drosophila embryo. EMBO J. 9, 2587–2595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hornung G, Bar-Ziv R, Rosin D, Tokuriki N, Tawfik DS, Oren M, and Barkai N (2012). Noise-mean relationship in mutated promoters. Genome Res. 22, 2409–2417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Izeddin I, Récamier V, Bosanac L, Cissé II, Boudarene L, Dugast-Darzacq C, Proux F, Bénichou O, Voituriez R, Bensaude O, et al. (2014). Single-molecule tracking in live cells reveals distinct target-search strategies of transcription factors in the nucleus. Elife 2014, e02230. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jacob Y, Sather S, Martin JR, and Ollo R (1991). Analysis of Krüppel control elements reveals that localized expression results from the interaction of multiple subelements. Proc. Natl. Acad. Sci. U. S. A 88, 5912–5916. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jaeger J (2011). The gap gene network. Cell. Mol. Life Sci. 68, 243–274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jaeger J, Surkova S, Blagov M, Janssens H, Kosman D, Kozlov KN, Manu, Myasnikova E, Vanario-Alonso CE, Samsonova M, et al. (2004). Dynamic control of positional information in the early Drosophila embryo. TL - 430. Nature 430 VN-, 368–371. [DOI] [PubMed] [Google Scholar]
- Jones DL, Brewster RC, and Phillips R (2014). Promoter architecture dictates cell-to-cell variability in gene expression. Science (80-. ). 346, 1533–1536. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kaizu K, De Ronde W, Paijmans J, Takahashi K, Tostevin F, and Wolde PR Ten (2014). The berg-purcell limit revisited. Biophys. J 106, 976–985. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karpova TS, Kim MJ, Spriet C, Nalley K, Stasevich TJ, Kherrouche Z, Heliot L, and McNally JG (2008). Concurrent fast and slow cycling of a transcriptional activator at an endogenous promoter. Science 319, 466–469. [DOI] [PubMed] [Google Scholar]
- Keren L, Van Dijk D, Weingarten-Gabbay S, Davidi D, Jona G, Weinberger A, Milo R, and Segal E (2015). Noise in gene expression is coupled to growth rate. Genome Res. 25, 1893–1902. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kornberg TB, and Tabata T (1993). Segmentation of the Drosophila embryo. Curr. Opin. Genet. Dev 3, 585–593. [DOI] [PubMed] [Google Scholar]
- Kvon EZ, Kazmar T, Stampfel G, Yáñez-Cuna JO, Pagani M, Schernhuber K, Dickson BJ, and Stark A (2014). Genome-scale functional characterization of Drosophila developmental enhancers in vivo. Nature 512, 91–95. [DOI] [PubMed] [Google Scholar]
- Larson DR, Fritzsch C, Sun L, Meng X, Lawrence DS, and Singer RH (2013). Direct observation of frequency modulated transcription in single cells using light activation. Elife 2013, e00750. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lawrence PA (1992). The making of a fly: The genetics of animal design.
- Lestas I, Paulsson J, Ross NE, and Vinnicombe G (2008). Noise in Gene Regulatory Networks. Autom. Control. IEEE Trans. 53, 189–200. [Google Scholar]
- Li C, Cesbron F, Oehler M, Brunner M, and Höfer T (2018). Frequency Modulation of Transcriptional Bursting Enables Sensitive and Rapid Gene Regulation. Cell Syst. [DOI] [PubMed] [Google Scholar]
- Lionnet T, and Singer RH (2012). Transcription goes digital. EMBO Rep. 13, 313–321. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Little SC, Tikhonov M, and Gregor T (2013). Precise developmental gene expression arises from globally stochastic transcriptional activity. Cell 154, 789–800. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lucas T, Ferraro T, Roelens B, Las J De, Chanes H, Walczak AM, De Las Heras Chanes J, Walczak AM, Coppey M, and Dostatni N (2013). Live imaging of bicoid-dependent transcription in Drosophila embryos. Curr. Biol 23, 2135–2139. [DOI] [PubMed] [Google Scholar]
- Manu Surkova, S., Spirov AV, Gursky VV, Janssens H, Kim AR, Radulescu O, Vanario-Alonso CE, Sharp DH, Samsonova M, et al. (2009). Canalization of gene expression in the Drosophila blastoderm by gap gene cross regulation. PLoS Biol. 7, 0591–0603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Molina N, Suter DM, Cannavo R, Zoller B, Gotic I, and Naef F (2013). Stimulus-induced modulation of transcriptional bursting in a single mammalian gene. Proc. Natl. Acad. Sci 110, 20563–20568. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Munsky B, and Khammash M (2006). The finite state projection algorithm for the solution of the chemical master equation. J. Chem. Phys 124, 44104. [DOI] [PubMed] [Google Scholar]
- Nicolas D, Zoller B, Suter DM, and Naef F (2018). Modulation of transcriptional burst frequency by histone acetylation. Proc. Natl. Acad. Sci 115, 7153–7158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- O’Brien T, and Lis JT (1993). Rapid changes in Drosophila transcription after an instantaneous heat shock. Mol. Cell. Biol 13, 3456–3463. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ochoa-Espinosa A, Yucel G, Kaplan L, Pare A, Pura N, Oberstein A, Papatsenko D, and Small S (2005). The role of binding site cluster strength in Bicoid-dependent patterning in Drosophila. Proc Natl Acad Sci U S A 102, 4960–4965. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pankratz MJ, Busch M, Hoch M, Seifert E, and Jäckle H (1992). Spatial Control of the Gap Gene Knirps in the Drosophila Embryo by Posterior Morphogen System. Science (80-. ). 255, 986–989. [DOI] [PubMed] [Google Scholar]
- Peccoud J, and Ycart B (1995). Markovian Modeling of Gene-Product Synthesis. Theor. Popul. Biol 48, 222–234. [Google Scholar]
- Perry MW, Boettiger AN, and Levine M (2011). Multiple enhancers ensure precision of gap gene-expression patterns in the Drosophila embryo. Pnas 108, 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Raj A, Peskin CS, Tranchina D, Vargas DY, and Tyagi S (2006). Stochastic mRNA synthesis in mammalian cells. PLoS Biol. 4, 1707–1719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rieckh G, and Tkačik G (2014). Noise and information transmission in promoters with multiple internal states. Biophys. J 106, 1194–1204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sanchez A, and Golding I (2013). Genetic Determinants and Cellular Constraints in Noisy Gene Expression. Science (80-. ). 342, 1188–1193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sanchez A, and Kondev J (2008). Transcriptional control of noise in gene expression. Proc. Natl. Acad. Sci 105, 707904105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sanchez A, Choubey S, and Kondev J (2013). Regulation of Noise in Gene Expression. Annu. Rev. Biophys 42, 1–23. [DOI] [PubMed] [Google Scholar]
- Scholes C, DePace AH, and Sánchez Á (2016). Combinatorial Gene Regulation through Kinetic Control of the Transcription Cycle. Cell Syst. 1–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schor IE, Degner JF, Harnett D, Cannavò E, Casale FP, Shim H, Garfield DA, Birney E, Stephens M, Stegle O, et al. (2017). Promoter shape varies across populations and affects promoter evolution and expression noise. Nat. Genet 13, 212–233. [DOI] [PubMed] [Google Scholar]
- Schroeder MD, Pearce M, Fak J, Fan H-Q, Unnerstall U, Emberly E, Rajewsky N, Siggia ED, and Gaul U (2004). Transcriptional Control in the Segmentation Gene Network of Drosophila. PLoS Biol. 2, e271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Segal E, Raveh-Sadka T, Schroeder M, Unnerstall U, and Gaul U (2008). Predicting expression patterns from regulatory sequence in Drosophila segmentation. Nature 451, 535–540. [DOI] [PubMed] [Google Scholar]
- Senecal A, Munsky B, Proux F, Ly N, Braye FE, Zimmer C, Mueller F, and Darzacq X (2014). Transcription factors modulate c-Fos transcriptional bursts. Cell Rep. 8, 75–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Siddharth SD, Jonathan EF, Prajit L, David VS, and Adam PA (2015). Orthogonal control of expression mean and variance by epigenetic features at different genomic loci. Mol. Syst. Biol May 5, 806. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sidje RB (1998). Expokit: A Software Package for Computing Matrix Exponentials. ACM Trans. Math. Softw 24, 130–156. [Google Scholar]
- Struhl G, Johnston P, and Lawrence PA (1992). Control of Drosophila body pattern by the hunchback morphogen gradient. Cell 69, 237–249. [DOI] [PubMed] [Google Scholar]
- Suter DM, Molina N, Gatfield D, Schneider K, Schibler U, and Naef F (2011). Mammalian Genes Are Transcribed with Widely Different Bursting Kinetics. Science (80-. ). 332, 472–474. [DOI] [PubMed] [Google Scholar]
- Taniguchi Y, Choi PJ, Li GW, Chen H, Babu M, Hearn J, Emili A, and Xie XS (2010). Quantifying E. coli Proteome and Transcriptome with Single-Molecule Sensitivity in Single Cells. Sci. (New York, NY) 329, 533–538. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tantale K, Mueller F, Kozulic-Pirher A, Lesne A, Victor J-M, Robert M-C, Capozi S, Chouaib R, Bäcker V, Mateos-Langerak J, et al. (2016). A single-molecule view of transcription reveals convoys of RNA polymerases and multi-scale bursting. Nat. Commun 7, 12248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tkačik G, Gregor T, and Bialek W (2008). The role of input noise in transcriptional regulation. PLoS One 3, e2774–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Voss TC, and Hager GL (2014). Dynamic regulation of transcriptional states by chromatin and transcription factors. Nat. Rev. Genet 15, 69–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weinberger L, Voichek Y, Tirosh I, Hornung G, Amit I, and Barkai N (2012). Expression Noise and Acetylation Profiles Distinguish HDAC Functions. Mol. Cell 47, 193–202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu H, Sepúlveda LA, Figard L, Sokac AM, and Golding I (2015). Combining protein and mRNA quantification to decipher transcriptional regulation. Nat. Methods 12, 739–742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu H, Skinner SO, Sokac AM, and Golding I (2016). Stochastic Kinetics of Nascent RNA. Phys. Rev. Lett 117, 128101–128106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zenklusen D, Larson DR, and Singer RH (2008). Single-RNA counting reveals alternative modes of gene expression in yeast. TL - 15. Nat. Struct. Mol. Biol 15 VN-r, 1263–1271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zoller B, Nicolas D, Molina N, and Naef F (2015). Structure of silent transcription intervals and noise characteristics of mammalian genes. Mol. Syst. Biol 11, 823. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zopf CJ, Quinn K, Zeidman J, and Maheshri N (2013). Cell-Cycle Dependence of Transcription Dominates Noise in Gene Expression. PLoS Comput. Biol 9, e1003161. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.