Skip to main content
Springer logoLink to Springer
. 2020 Dec 22;83(1):3. doi: 10.1007/s11538-020-00827-7

Statistics of Nascent and Mature RNA Fluctuations in a Stochastic Model of Transcriptional Initiation, Elongation, Pausing, and Termination

Tatiana Filatova 1,2, Nikola Popovic 2, Ramon Grima 1,
PMCID: PMC7755674  PMID: 33351158

Abstract

Recent advances in fluorescence microscopy have made it possible to measure the fluctuations of nascent (actively transcribed) RNA. These closely reflect transcription kinetics, as opposed to conventional measurements of mature (cellular) RNA, whose kinetics is affected by additional processes downstream of transcription. Here, we formulate a stochastic model which describes promoter switching, initiation, elongation, premature detachment, pausing, and termination while being analytically tractable. We derive exact closed-form expressions for the mean and variance of nascent RNA fluctuations on gene segments, as well as of total nascent RNA on a gene. We also obtain exact expressions for the first two moments of mature RNA fluctuations and approximate distributions for total numbers of nascent and mature RNA. Our results, which are verified by stochastic simulation, uncover the explicit dependence of the statistics of both types of RNA on transcriptional parameters and potentially provide a means to estimate parameter values from experimental data.

Keywords: Stochastic gene expression, Master equation, RNA fluctuations, Singular perturbation theory, Distributions of RNA molecules, Stochastic simulations

Introduction

Transcription, the production of RNA from a gene, is an inherently stochastic process. Specifically, the interval of time between two successive transcription events is a random variable whose statistics depend on multiple single-molecule events behind transcription (Sanchez and Golding 2013). When the distribution of this random variable is exponential, we say that expression is constitutive; in that case, the number of transcripts produced in a certain interval of time follows a Poisson distribution. On the other hand, when the distribution of times between two successive transcripts is non-exponential, then the number of transcripts is non-Poissonian. A special case of such non-constitutive behaviour is bursty expression, whereby transcripts are produced in short bursts that are separated by long silent intervals (Suter et al. 2011; Halpern et al. 2015). In yeast, genes whose expression is constitutive include MDN1, KAP104, and DOA1, whereas PDR5 is an example of a gene whose expression is bursty (Zenklusen et al. 2008).

For two decades, mathematical models of gene expression have been developed to predict the distribution of RNA abundance. By matching the theoretical distribution with experimental measurements from microscopy-based methods (Raj et al. 2008), one hopes to obtain insight into the underlying kinetics of transcription and to estimate transcriptional parameters. The standard model of gene expression which has been used for these analyses is the telegraph model (Peccoud and Ycart 1995), whereby a gene can be in two states. Transcription occurs in one of the states, whereupon RNA degrades; first-order kinetics is assumed for all processes. While the distribution obtained from the telegraph model can typically fit cellular RNA abundance data, there are innate difficulties with the interpretation of that fit: fluctuations in cellular RNA numbers and, hence, the shape of the experimental RNA distribution do not only reflect transcription, but also many processes downstream thereof, such as splicing, RNA degradation, and partitioning during cell division.

To counteract these difficulties, in the past few years, mathematical models (Choubey et al. 2015; Choubey 2018; Heng et al. 2016; Cao and Grima 2020) have been developed to predict the statistics of nascent RNA, i.e. of RNA in the process of being synthesised by the RNA polymerase molecule (RNAP), which can be visualised and quantified due to recent advances in fluorescence microscopy (Lenstra et al. 2016; Skinner et al. 2016; Larson et al. 2011; Antoine et al. 2014; Brouwer and Lenstra 2019). In contrast to cellular RNA, the statistics of nascent RNA is a direct reflection of the transcription process; hence, these models can potentially give more insight than the simpler, but cruder telegraph model. Choubey and collaborators (Choubey et al. 2015; Choubey 2018) have developed a stochastic model with the following properties: (i) a gene can be in two states (active or inactive); (ii) from the active state, transcription initiation occurs in two sequential steps: the pre-initiation complex is formed, after which the RNA polymerase escapes the promoter; (iii) once on the gene, the polymerase moves from one base pair to the next (with some probability) until the end of the gene is reached, when transcription is terminated and polymerase detaches. Queuing theory is used to derive analytical expressions for the transient and steady-state means and variances of numbers of RNAP that are attached to the gene in the long-gene limit when the elongation time is practically deterministic. Heng et al. (2016) have considered a coarse-grained version of that model, whereby the movement of RNAP from one base pair to the next is not explicitly modelled, obtaining an analytical expression for the total RNAP distribution in steady-state conditions. More recently, Cao and Grima (2020) have studied a model of eukaryotic gene expression that yields approximate time-dependent distributions of both nascent and cellular RNA abundance as a function of the parameters controlling gene switching, DNA duplication, partitioning at cell division, gene dosage compensation, and RNA degradation; in their coarse-grained model, the movement of RNAP is not explicitly modelled, while the elongation time is assumed to be exponentially distributed, which simplifies the requisite analysis.

The complexity of nascent RNA models has thus far not allowed the same detailed level of analysis as has been possible with the much simpler telegraph model. A few shortcomings of current models can be summarised as follows: (i) distributions of nascent RNA have been derived from models that do not explicitly model the movement of RNAP along a gene (Heng et al. 2016; Cao and Grima 2020), resulting in a disconnect between theoretical description and the microscopic processes underlying transcription; (ii) while the analysis of single-cell sequencing data and electron micrograph data yields the positions of individual polymerases along the gene, allowing for the calculation of statistics (means and variances) of the numbers of RNAP on gene segments that are obtained after binning, detailed models of RNAP elongation (Choubey et al. 2015; Choubey 2018) provide analytical results only for total RNAP on a gene and hence cannot be used to understand gene segment data; (iii) analytical calculations of the statistics of nascent RNA ignore important details of the transcription process such as pausing, traffic jams, backtracking, and premature termination, some of which have to date been explored via stochastic simulation (Klumpp and Hwa 2008; Rajala et al. 2010; Choubey et al. 2015; Rodriguez et al. 2019; Md Zulfikar et al. 2020).

In this paper, we overcome some of the aforementioned shortcomings of analytically tractable models for the transcription process. In Sect. 2, we study a stochastic model for promoter switching and the stochastic movement of RNAP along a gene, allowing for premature termination. We derive exact closed-form expressions for the first and second moments (means and variances) of local RNAP fluctuations on gene segments of arbitrary length, which allows us to study how these statistics vary along a gene as a function of transcriptional parameters; we also obtain expressions for the mean and variance of the total RNAP on the gene which generalise previous work by Choubey et al. (2015). In Sect. 3, we investigate approximations for the distributions of total RNAP and mature RNA, showing in particular that Negative Binomial distributions can provide an accurate approximation in certain biologically meaningful limits. In Sect. 4, we illustrate the difference between the statistics of local and total RNAP fluctuations and those of light fluorescence due to tagged nascent RNA. In Sect. 5, we extend our model to include pausing by deriving approximate expressions for the mean, variance, and distribution of observables. We conclude with a discussion of our results in Sect. 6.

Detailed Stochastic Model of Transcription: Set-up and Analysis

In this section, we specify the stochastic model studied here; then, we derive closed-form expressions for the moments of mature RNA and of local and total RNAP fluctuations in various parameter regimes.

Set-up of Model

We consider a stochastic model of transcription that includes the processes of initiation, elongation, and termination, as illustrated in Fig. 1. For simplicity, we divide the gene into L segments; the RNAP on gene segment i is then denoted by Pi. The promoter can be either in the inactive state (Goff) or the active state (Gon), switching from the inactive state to the active one with rate su and from the active state to the inactive one with rate sb. When the promoter is active, initiation commences via the binding of an RNAP with rate r, denoted by P1. Subsequently, the RNAP either moves from a gene segment to the neighbouring segment with rate k, or it prematurely detaches with rate d. Note that here we have made two assumptions: (i) the movement of RNAP is unidirectional, away from the promoter site and hence left to right, with no pausing or backtracking allowed; (ii) the detachment and elongation rates are independent of the position of RNAP on the gene. Each RNAP has associated with it a nascent RNA tail that grows longer as the RNAP transcribes more of the gene. When the RNAP reaches the last gene segment, termination occurs, i.e. the RNAP–nascent RNA complex gets dissociated from the gene leading to a mature RNA (M) which degrades with rate dm. Note that for simplicity, we have not considered excluded-volume interaction between adjacent RNAPs here; hence, we make the implicit assumption of low ‘traffic’, which is plausible when the initiation rate is sufficiently low. (We test the validity of this assumption through simulations below.)

Fig. 1.

Fig. 1

(Color Figure Online) Model of transcription. a The gene is arbitrarily divided into L segments, with RNAP (blue) on gene segment i denoted by Pi. The promoter switches from the active state Gon to the inactive state Goff with rate sb, while the reverse switching occurs with rate su. When the promoter is active, initiation of RNAP occurs with rate r. Initiation is followed by elongation, which is modelled as RNAP ‘hopping’ from gene segment i to the neighbouring segment i+1 with rate k, i.e. as the transformation of species Pi to Pi+1. RNAP prematurely detaches from the gene with rate d. A nascent RNA tail (red), attached to the RNAP, grows as elongation proceeds. Termination is modelled by the change of PL with rate k to mature RNA (M), which subsequently degrades with rate dm. In b, we show the probability distribution P(T) of the total elongation time T—the time between initiation and termination—as predicted by the stochastic simulation algorithm (SSA; histogram) and our theory (Erlang distribution with shape parameter L and rate k+d; solid line). The parameter values used are L=50, k=10/min, and d=1.5/min. In c, we show the dependence of the mean of the distribution P(T) on the RNAP detachment rate (d), as predicted by SSA (dots) and our theory (T=L/(k+d); solid line). The relevant parameter values are L=50 and k= 10/min (Color figure online)

Note that, while the choice of L is arbitrary, it should be kept in mind that L needs to be sufficiently large for the dynamics to be described at a fine spatial resolution. However, L also has to be small enough for the length of each gene segment to be much larger than the footprint of an RNAP; the latter is needed to ensure the validity of the low-traffic assumption. The elongation time which is the total time T from initiation to termination, that is, conditioning on those realisations for which the RNAP does not prematurely detach, is Erlang distributed with mean L/(k+d) and coefficient of variation 1/L; see ‘Appendix A’ for a derivation and Fig. 1b, c for verification through stochastic simulation (SSA).

Note that the total number of RNAPs transcribing the gene is equal to the number of nascent RNA molecules present, irrespective of their lengths; to shed light on the fluctuations of nascent RNA, in this section we therefore focus on the calculation of statistics of local and total RNAP fluctuations. We define the vector of molecule numbers m=(n0,n1,,nL,n), and we write n0, ni (i=1,2,,L), and n for the average numbers of molecules of active gene, RNAP, and mature RNA, respectively. The above model can then be conveniently described by L+2 species interacting via a set of 2L+4 reactions with the following rate functions:

Species Molecule numbers Position (in m)
Gon n0 1
Pi,i{1,,L} ni i+1
M n L+2
Reaction Rate function fj
GonsbGoff f1=sbn0
GoffsuGon f2=su(1-n0)
GonrGon+P1 f3=rn0
PikPi+1,   i{1,,L-1} fi+3=kni
PLkM fL+3=knL
Pid,    i{1,,L} fi+L+3=dni
Mdm f2L+4=dmn

Note that Goff is not an independent species; the reason is that the binary state of the gene implies a conservation law, with the sum of the numbers of Gon and Goff equalling 1. Hence, the number of independent species in the model is L+2. The rate functions fj are the averaged propensities from the underlying chemical master equation (CME); note that, because our reaction network is composed of first-order reactions, these rate functions also equal the reaction rates in the corresponding deterministic rate equations. The description of our model is completed by the (L+2)×(2L+4)-dimensional stoichiometric matrix S; the element Sij of S gives the net change in the number of molecules of the ith species when the jth reaction occurs. Given the ordering of species and reactions as described in the tables above, it follows that the matrix S has the simple form

S11=-1,S12=1,Si,i+1=1,Si,i+2=-1,Si,i+L+2=-1,SL+2,L+3=1,SL+2,2L+4=-1, 1

where i=2,,L+1.

Closed-Form Expressions for Moments of Mature RNA and Local RNAP

In this subsection, we outline the derivation of the steady-state means and variances of local RNAP fluctuations (on each gene segment), as well as of mature RNA. Our results are summarised in the following two propositions.

Proposition 1

Let η=su/(su+sb) be the fraction of time the gene spends in the active state, let ρk=r/k be the mean number of RNAPs binding to the promoter site in the time it takes for a single RNAP to move from one gene segment to the next, let ρ=r/dm be the mean number of RNAPs binding to the promoter site in the time it takes for a mature RNA to decay, and let μ=k/(k+d) be the probability that an RNAP molecule moves to the next gene segment rather than detaching prematurely. Then, the steady-state mean numbers of molecules of active gene, RNAP, and mature RNA are given by

n0=η, 2a
ni=ηρkμifori=1,,L, 2b
n=ηρμL, 2c

respectively.

Proposition 1 can be proved in a straightforward fashion, as follows. Using the underlying CME, one can show from the corresponding moment equations (Warren et al. 2006) that the time evolution of the vector m of mean molecule numbers in a system of zeroth-order or first-order reactions, i.e. with propensities that are linear in the number of molecules, is given by the time derivative dm/dt=S·f. Given the form of the stoichiometric matrix S and of the rate functions fj, as described in Sect. 2.1, it follows that the mean numbers of all species in steady state can be obtained by solving the following system of L+2 algebraic equations:

0=su(1-n0)-sbn0,0=rn0-(k+d)n1,0=kni-1-(k+d)nifori=2,,L,0=knL-dmn. 3

These equations can easily be solved simultaneously to yield the steady-state value of m, as given in Eq. (2).

Proposition 2

Let τp=1/(d+k), τg=1/(su+sb), and τm=1/dm be the timescales of fluctuations of RNAP, gene, and mature RNA, respectively, and define the three new parameters

α=11+τp/τg,γ=11+τp/τm,andθ=11+τm/τg.

Furthermore, let β=sb/su denote the ratio of gene inactivation and activation rates. Then, the variances and covariances of molecule number fluctuations of active gene, RNAP, and mature RNA are given by

Var(n0)=n02β, 4a
Cov(n0,ni)=n0niαβ·f1i,wheref1i=αi-1; 4b
Cov(n0,n)=n0nαβ·f1M,wheref1M=θαL-1, 4c
Cov(ni,nj)=δijni+ninjαβ·fij,wherefij=f(i,j)+f(j,i), 4d
Cov(ni,n)=ninαβ·fiM,wherefiM=γiθαL-1+(1-γ)q=1iγi-qfqL, 4e
Var(n,n)=n+n2αβ·fMM,wherefMM=fLM, 4f

and where i,j=1,,L. Here, δij is the Kronecker delta; moreover,

f(i,j)=αi+j-1(2α-1)i+12i+j-1i+j-1i[1-2α-12α2F1(1,i+j;j;12α)],

where 2F1 denotes the generalised hypergeometric function of the second kind (Digital Library of Mathematical Functions 2020a), which is defined as

2F1(a1,a2;b1;z)=s=0(a1)s(a2)s(b1)szss!,

with (a)s=Γ(a+s)/Γ(a) the Pochhammer symbol.

Here, we note that an alternative representation of the functions fij in Eq. (4d), in terms of finite sums, is given in Eq. (B.33) of ‘Appendix B’.

As above, since the underlying propensities are linear in the number of molecules, the CME implies (Warren et al. 2006) that the corresponding second moments in steady state are exactly given by a Lyapunov equation. That equation, which is precisely the same as the one that is obtained from the linear-noise approximation (LNA) (Elf and Ehrenberg 2003), takes the form

J·C+C·JT+D=0. 5

Here, C, J, and D are (L+2)×(L+2)-dimensional matrices; C is a variance–covariance matrix that is symmetric (Cij=Cji), J is the Jacobian matrix with elements Jij=(S·f)i/nj, and D=S·Diag(f)·ST is a diffusion matrix, where Diag(f) is a diagonal matrix whose elements are the entries in the rate function vector f. The nonzero elements of J are given by

J11=-(su+sb),J21=r,J22=-(k+d),Ji,i-1=k,Jii=-(k+d)fori=3,,L+1,JL+2,L+1=k,JL+2,L+2=-dm, 6

while the nonzero elements Di read

D11=sbn0+su(1-n0),D22=rn0+(k+d)n1,D23=-kn1,Di,i-1=-kni-2,Dii=kni-1+(k+d)ni&fori=3,,L+1,Di,i+1=-kni-1&fori=3,,L,DL+2,L+1=-knL,DL+2,L+2=knL+dmn. 7

Given the structure of the matrices J and D above, the Lyapunov Eq. (5) can be solved explicitly for the covariance matrix C whose elements are given by Eq. (4). The solution by induction is involved and can be found in ‘Appendix B’, which proves Proposition 2.

Simplification in Bursty and Constitutive Limits

Bursty limit: We now consider a particular parameter regime—the limit of large initiation rate r and large gene inactivation rate sb such that b=r/sb is constant. Since the fraction of time spent in the active state is η, it follows that the gene is mostly in the inactive state in that limit. During the short periods of time when it transitions to the active state, a burst of initiation events occur; in particular, a mean number b of RNAPs bind to the promoter during activation. Hence, such genes are often termed bursty, since transcription proceeds via sporadic bursts of activity and b is called the mean transcriptional burst size. For r and sb large with b constant, the expressions for the first two moments of RNAP at every gene segment and of mature RNA from Eqs. (2) and (4), respectively, simplify to

nib=bυkμi, 8a
nb=bυmμL, 8b
Cov(ni,nj)b=δijnib+nibnjb(υkμ)-1·hij,wherehij=12i+j-2Γ(i+j-1)Γ(i)Γ(j), 8c
Cov(ni,n)b=nibnb(υkμ)-1·hiM,wherehiM=(1-γ)q=1iγi-q·hqL 8d
Var(n)b=nb+nb2(υkμ)-1·hMM,wherehMM=hLM; 8e

here, the subscript b denotes the moments in the bursty limit. Moreover, υk=su/k, υm=su/dm, and hij=fij|α0 denotes the simplified function fij in the limit of α0, which is achieved when sb. We note that the above expressions for the functions hij are derived from the expressions for fij that are given in Eq. (B.33), rather than from those in Eq. (4d). The reason is that, in the bursty limit, we have that 12α, in which case the identity in Eq. (B.36) does not hold. The bursty limit in Eq. (B.33) is simply taken by collecting terms that are not dependent on α, since α0 in that limit.

To test the accuracy of our theory, in Fig. 2 we compare our analytical expressions for the mean of local RNAP numbers, as well as for various measures of local RNAP fluctuations—the coefficient of variation CV, the Fano factor FF, and the Pearson correlation coefficient CC—with those calculated from stochastic simulation using Gillespie’s algorithm (SSA) (Gillespie 1977). Simulations are performed for two different scenarios: (i) without volume exclusion, where the footprint of RNAPs is not taken into account; and (ii) with volume exclusion, where RNAPs are treated as solid objects with a footprint of 35 bp, which is the value reported in Md Zulfikar et al. (2020). For our simulations in Fig. 2, we use parameter values characteristic for the gene PDR5 of length 3070 bp, as reported in Zenklusen et al. (2008). Our choice of L=30 implies that the length of each gene segment is about 100 bp and, hence, that at most 3 RNAPs can fit in each segment when volume exclusion is taken into account. In this case, Gillespie’s algorithm is modified such that the initiation and RNAP ‘hopping’ rates are proportional to the available volume in the gene segment which the RNAP is moving to. That is achieved by rescaling the transcription initiation rate as rr(1-n1/3) and the RNAP hopping rate from the ith to the (i+1)th gene segment as kk(1-ni+1/3). Since we use parameters measured for a gene that demonstrates bursty expression (PDR5) (Zenklusen et al. 2008), we test the accuracy of both the exact theory from Eqs. (2) and (4) and the approximate expressions given in Eq. (8).

Fig. 2.

Fig. 2

(Color Figure Online) First and second moments of the distribution of local RNAP for the PDR5 gene in yeast, which demonstrates bursty expression. In ad, we show the dependence of the mean, coefficient of variation squared, Fano factor, and Pearson correlation coefficient, respectively, of local RNAP fluctuations on gene segment i, as predicted by our exact theory (Eqs. (2), (2); solid lines), the approximate theory in the bursty limit (Eq. (8); dashed lines), and simulation via Gillespie’s stochastic simulation algorithm (SSA), respectively. We performed simulations for two different cases: without volume exclusion (dots) and with volume exclusion (open circles). The parameters are fixed to su=0.44/min, sb=4.7/min, and r=6.7/min, which are characteristic of the PDR5 gene in yeast, as reported in Supplemental Table 2 of Zenklusen et al. (2008). The number of gene segments is arbitrarily chosen to be L=30. The total elongation time T=4.5 min is also reported for PDR5, described as the synthesis time and denoted by τ in Zenklusen et al. (2008). The elongation rate by definition takes the value of the ratio k=L/T-dL/T, since dk. The detachment rate d is arbitrarily chosen to be d=0.01/min (red lines and dots) or d=0.2/min (black lines and dots). Note that, for the SSA, moments are calculated from one long trajectory with a few million time points, sampled at unit intervals (Color figure online)

The perfect agreement between our exact theory (solid lines) and simulation without volume exclusion (dots) provides a numerical validation of that theory. Our approximate theory (dashed lines) also yields a reasonably good approximation; the mismatch can be decreased if the degree of burstiness is increased, i.e. by increasing the parameters r and sb relative to the other rates in the model. We also note that the theory is in good agreement with simulation with volume exclusion (open circles), which shows that the ‘low traffic’ assumption upon which our theory is based is valid.

The following interesting observations can be made from these figures: (i) if the rate of premature detachment is greater than zero, then the mean of local RNAP decreases monotonically with the distance i from the promoter according to a power law, whereas that mean is constant along the gene if there is no premature detachment, as expected; (ii) the size of RNAP fluctuations, as measured by CV, decreases with i for small premature detachment rates, but increases with i for sufficiently large values of the detachment rate; (iii) the Fano factor approaches 1—the value of FF for a Poissonian distribution—as i increases, which is due to the dispersal of the burst as stochastic elongation proceeds; (iv) the correlation coefficient between the local RNAP on two neighbouring gene segments decreases monotonically with i, which is exacerbated by premature detachment and is a direct result of the stochasticity inherent in the elongation process.

The observation in (iii) can be explained in detail as follows. When the detachment rate is zero, a burst of RNAPs rapidly bind to the promoter, leading to large fluctuations near that site; however, thereafter each RNAP moves distinctly from all others due to stochastic elongation. Hence, the burst is gradually dispersed as elongation proceeds, which implies a decrease in the variance of fluctuations with increasing i. When the detachment rate is nonzero, then the same effect is at play; however, the increase in the variance of fluctuations along the gene is now counteracted by the decrease in mean RNAP numbers, which leads to two types of behaviour: for small i, CV decreases with i, since the variance dominates over the mean, while for large i, the opposite occurs and CV increases with i.

Constitutive limit: The other common parameter regime is that of constitutive gene expression, where the gene spends most of its time in the active state and transcription is continuous, which corresponds to the limit of very small sb. In that limit, the expressions from Eqs. (2) and (4) simplify to

nic=Var(ni)c=ρkμiandnc=Var(n)c=bρμL, 9

while the covariances Cov(ni,nj)c and Cov(ni,n)c between the species are zero; here, the subscript c denotes the constitutive limit. This drastic simplification reflects the fact that, in the constitutive limit, the distributions of mature RNA and local RNAP are Poissonian: as the regulatory network is effectively given by P1P2...PLM then, the result follows directly from the exact solution provided in Jahnke and Huisinga (2007).

To further test the accuracy of our theory, in Fig. 3 we compare our analytical expressions for the mean of local RNAP numbers, as well as for various measures of local RNAP fluctuations, with those calculated from stochastic simulation using Gillespie’s algorithm, where we use parameters measured for a gene that demonstrates constitutive expression (DOA1) (Zenklusen et al. 2008). As before, we test the accuracy of both the exact theory given by Eqs. (2) and (4) and the approximate expressions from Eq. (9). Unsurprisingly, we observe agreement between exact theory (solid lines) and simulation (dots); the mismatch between our approximate theory and simulation is due to the fact that the gene does not spend 100% of its time in the active state—the true constitutive limit—but, rather, su/(su+sb)85%. The local mean RNAP number decreases with distance from the promoter, as was the case for bursty expression in the previous subsubsection, which is to be expected. The various measures which depend on the second moments are, however, considerably different: CV increases monotonically with i, independently of the rate of premature detachment, while FF and CC are very close to 1 and zero, respectively; moreover, the latter two measures practically show very little variation along the gene. The lack of transcriptional bursting explains all these effects in a straightforward fashion.

Fig. 3.

Fig. 3

(Color Figure Online) First and second moments of the distribution of local RNAP for the DOA1 gene in yeast, which demonstrates constitutive expression. In ad, we show the dependence of the mean, coefficient of variation squared, Fano factor, and Pearson correlation coefficient, respectively, of local RNAP fluctuations on gene segment i, as predicted by our exact theory (Eqs. (2) and (2); solid lines), the approximate theory in the constitutive limit (Eq. (9); dashed lines), and simulation via Gillespie’s stochastic simulation algorithm (SSA; dots), respectively. The parameters are fixed to su=0.7/min, sb=0.12/min and r=0.14/min, which are characteristic of the DOA1 gene in yeast, as reported in Supplemental Table 2 of Zenklusen et al. (2008). The number of gene segments is arbitrarily chosen to be L=30. The total elongation time T=2.9 min is also reported for DOA1, described as the synthesis time and denoted by τ in Zenklusen et al. (2008). The elongation rate by definition takes the value of the ratio k=L/T-dL/T, since dk. The detachment rate d is arbitrarily chosen to be d=0.01/min (red lines and dots) or d=0.2/min (black lines and dots). Note that, for the SSA, moments are calculated from one long trajectory with a few billion time points, sampled at unit intervals (Color figure online)

Finally, we remark that the accuracy of our expressions for the mean and variance of mature RNA, as given in Eq. (2) and (4), is verified by simulation (SSA) in Fig. 4a, b for parameters typical of the bursty PDR5 gene. The meaning of the dependence of descriptive statistics on L is discussed in the next section.

Fig. 4.

Fig. 4

Mean and variance of the distributions of mature RNA and total RNAP for the PDR5 gene in yeast. In a, b, we show the dependence of the moments of mature RNA fluctuations on the number of gene segments L, as predicted by our theory (Eqs. (2) and (2); solid lines) and SSA (dots). In c, d, we show the dependence of the moments of total RNAP on L, as predicted by our exact theory (Eq. (10); solid lines) and SSA (dots). The parameters su, sb, r, and T are characteristic of the PDR5 gene and are the same as in Fig. 2. The premature detachment rate is chosen to be d=0.01/min; the elongation rate is then given by kL/T. The degradation rate of mature RNA is dm=0.04/min, which is chosen such that the mean mature RNA is roughly consistent with that reported in Fig. 6(b) of Zenklusen et al. (2008). Note that, for the SSA, moments are calculated from one long trajectory with a few billion time points, sampled at unit intervals

Closed-Form Expressions for Moments of Total RNAP

While local RNAP fluctuations are measurable in experiment, as discussed in the Introduction, measurements of total RNAP on a gene are typically reported. Hence, in this section, we briefly discuss descriptive statistics of total RNAP fluctuations.

Recalling that ni is the number of RNAP molecules on the ith gene segment, the total number of RNAPs on the gene—arbitrarily divided into L segments—is given by ntot=i=1Lni. Given Eq. (2) and (4), the steady-state mean ntot=i=1Lni and the steady-state variance Var(ntot)=i,j=1LCov(ni,nj) of the total RNAP distribution are given by

ntot=ηρkμμL-1μ-1andVar(ntot)=ntot+αβ(ηρk)2i,j=1Lμi+j·fij. 10

For a detailed derivation of the variance in Eq. (10), we refer to ‘Appendix C’. These expressions for the mean and variance of the total RNAP distribution simplify in the bursty and constitutive limits, as can be seen in ‘Appendix D’. The accuracy of Eq. (10) is tested by comparing against stochastic simulation with SSA in Fig. 4c, d. Both mean and variance are seen to increase monotonically with the number of gene segments L, as we keep the mean elongation time constant; the mean shows very little dependence on L, while the dependence of the variance is more pronounced. We recall that, while the parameter L is arbitrary in principle, it actually determines the size of fluctuations in the elongation time. Since that time is the sum of L independent exponential variables with mean 1/(k+d) each, it follows that the distribution of the elongation time T is Erlang with mean T=L/(k+d) and coefficient of variation squared equal to 1/L. Hence, the larger L is, the narrower is the distribution of T and the more deterministic is elongation itself. Thus, Fig. 4c, d predicts that the mean and variance of total RNAP increase rapidly with decreasing fluctuations in the elongation time T. It hence follows that models in which the elongation rate is assumed to be exponentially distributed (Cao and Grima 2020), which correspond to the case where L=1 in our model, underestimate the size of nascent RNA fluctuations.

Special Case of Deterministic Elongation

Next, we derive expressions for the descriptive statistics of total RNAP and mature RNA in the limit of large L taken at constant mean elongation time, which corresponds to deterministic elongation. As is shown in Fig. 4, these statistics converge quickly to the ones obtained in the large-L limit; hence, the resulting limiting expressions are likely to be useful across a variety of genes.

Moments of total RNAP distribution: We define the non-dimensional parameters δg=τg/τd, Tg=T/τg, and Td=T/τd, which correspond to the ratio of the gene timescale and the polymerase detachment timescale, the ratio of the mean elongation time and the gene timescale, and the ratio of the mean elongation time and the polymerase detachment timescale, respectively; here, τd=1/d, as before. Substituting kL/T-d into Eq. (10) and taking the limit of deterministic elongation, i.e. letting L at constant T, we obtain the following expressions for the mean, variance, and CV2 of total RNAP:

ntot=ηrd(1-e-Td),Var(ntot)=ntot+ntot2·βδg(δg-1)+(δg+1)e-2Td-2δge-Tge-Td(δg-1)(δg+1)(1-e-Td)2,CV2(ntot)=ntot-1+βδg(δg-1)+(δg+1)e-2Td-2δge-Tge-Td(δg-1)(δg+1)(1-e-Td)2. 11

Here, the subscript denotes the limit of L. A detailed derivation of the variance in Eq. (11) can be found in Lemma C.1 of ‘Appendix C’.

In the special case when RNAP does not prematurely detach from the gene, i.e. for d=0, the expressions in Eq. (11) simplify to

ntot(;0)=ηrT,Var(ntot)(;0)=ntot(;0)+ntot(;0)2·2βTg-1(1-Tg-1+Tg-1e-Tg),CV(;0)2=ntot(;0)-1+2βTg-1(1-Tg-1+Tg-1e-Tg), 12

where the subscript (;0) denotes the limit of (L,d)(,0). The expressions in Eq. (12) have been previously reported in Choubey et al. (2015), where they were derived using queuing theory. Hence, our expressions in Eq. (11) constitute a generalisation of known results, by further taking into account premature detachment of RNAP from the gene.

Equation (12) shows that the coefficient of variation squared of total RNAP, denoted by CV(;0)2, can be written as the sum of two terms: (i) the inverse of the mean which is expected if the distribution of total RNAP is Poissonian, and (ii) a term that increases with increasing β and decreasing Tg. Hence, the latter term provides a measure for the deviation of the total RNAP distribution from a Poissonian. In particular, it shows that the deviation is significant in genes for which (i) the fraction of time spent in the inactive state is large (large β), and (ii) the elongation time is much shorter than the switching time between the active and inactive states (small Tg).

Moments of mature RNA distribution: Similarly, in the limit of deterministic elongation, it is straightforward to show that the expressions for the mean and variance of the distribution of mature RNA given by Eqs. (2) and (4) reduce to

n=ηρe-TdandVar(n)=n+n2·βθ. 13

These expressions can be further simplified in the special case of no premature detachment to read

n(;0)=ηρandVar(n)(;0)=n(;0)+n(;0)2·βθ. 14

Note that the mean and variance are precisely the same as would be obtained from the telegraph model, for which the corresponding Fano factor in the bursty limit is given by Eq. (16) below. Hence, we anticipate that, in the limit of no premature detachment and deterministic elongation, the distribution of mature RNA from our transcription model is the same as the distribution obtained from the coarser telegraph model. A formal proof of that claim will be given in Sect. 3.

Relationship between Fano factors of total RNAP and mature RNA: Specifying to the case of no premature detachment, it is interesting to note that in the bursty limit, i.e. for r,sb at constant mean burst size b=r/sb in Eq. (12), the Fano factor of total RNAP is given by

FFn(b;;0)=1+2b; 15

see also Eq. (D.3) in ‘Appendix D’. Here, the subscript n denotes nascent RNA (total RNAP). Eq. (15) is in contrast to the Fano factor of mature RNA in the same bursty limit:

FFm(b;;0)=1+b, 16

see Eq. (D.8) in ‘Appendix D’, where the subscript m denotes mature RNA. (Note that FFm(b;;0) also equals the Fano factor of the telegraph model in the same bursty limit (Raj et al. 2006).) Hence, by comparing Eqs. (15) and (16), we can deduce the following for bursty expression: (i) if the telegraph model is used to estimate the mean transcriptional burst size from total RNAP data where the elongation time is deterministic, then the mean burst size will be overestimated by a factor of two—in other words, the implicit assumption that the elongation time is exponentially distributed is inadequate; (ii) fluctuations in total RNAP (nascent RNA) deviate more from Poisson statistics, for which the Fano factor equals one, than fluctuations in mature RNA.

More generally, if we do not enforce the bursty limit, then we find the following relationship between the Fano factors of total RNAP and mature RNA, which are calculated from Eqs. (12) and (14), respectively:

FFn(;0)FFm(;0)=1+e-TgTrTsbΞTg2[TrTsb+Tg(Tg+Tm)]. 17

Here,

Ξ=2(Tg+Tm)+eTg[2(Tg-1)Tm+(Tg-2)Tg], 18

while Tg=(su+sb)T, Tr=rT, Tm=dmT, and Tsb=sbT are non-dimensional parameters representing the ratio of the mean elongation time to the timescales of promoter switching, initiation, decay of mature RNA, and gene deactivation, respectively. From Eq. (17), we deduce that FFn(;0)>FFm(;0) if and only if Ξ>0. From the contour plot of Ξ in Fig. 5, one can deduce that

Ξ>0if and only ifTm1-58Tg. 19

Hence, the Fano factor of nascent RNA is larger than that of mature RNA if and only if the above (approximate) condition is satisfied. In the bursty limit, Tg due to sb which, together with Tm>0, implies that Eq. (19) holds; the condition is also satisfied if promoter switching is very fast compared to elongation. By contrast, if Tm<1 and Tg<1, then it is possible to have the opposite scenario where the Fano factor of mature RNA is larger than that of nascent RNA, which occurs, for example, if promoter switching and mature RNA decay are very slow compared to elongation.

Fig. 5.

Fig. 5

Comparison between the Fano factors of nascent and mature RNA. Contour plot showing the variation of Ξ—a measure of the difference between the two Fano factors which is defined in Eq. (18)—with the non-dimensional parameters Tg and Tm which denote the ratio of the mean elongation time to the timescales of promoter switching and decay of mature RNA, respectively. As can be appreciated from Eq. (17), Ξ is positive if the Fano factor of nascent RNA is larger than that of mature RNA and negative if the reverse is true. The line Tm1-58Tg, where Ξ=0, shows where the two Fano factors are identical

Sensitivity of coefficient of variation of total RNAP and mature RNA: Since we have found explicit expressions for the first two moments of the distributions of total RNAP and of mature RNA, we can now estimate the sensitivity of the noise in each of those to small perturbations in the transcriptional parameters. Specifically, we calculate the logarithmic sensitivity (LS), which is also known as the relativity sensitivity, of the coefficient of variation (CV) to a parameter s, which is defined as Λs=(s/CV)(CV/s). (That definition implies that a 1% change in the value of the parameter s results in a change of Λs% in CV.)

In Table 1b, we report the logarithmic sensitivity of the coefficient of variation of total RNAP fluctuations, which is obtained from Eq. (12), to perturbations in the parameters su, sb, r, and T. Similarly, in Table 1c, we report the logarithmic sensitivity of the coefficient of variation of mature RNA fluctuations from Eq. (14) to perturbations in the parameters su, sb, r, and dm. In both cases, these sensitivities are calculated for parameter values estimated for five genes in yeast, as reported in Zenklusen et al. (2008); see Table 1a.

Table 1.

Logarithmic sensitivity (LS) of the coefficient of variation CV of total RNAP and mature RNA fluctuations for five genes in yeast; see Sect. 2.4 for a discussion. (a) Parameter values from Supplemental Tables 2 and 4 in Zenklusen et al. (2008). The degradation rate dm of mature mRNA is estimated from the reported mean number of mature RNA, the parameters su, sb, r, and Eq. (14) for the mean. (b) Logarithmic sensitivity of CV of total RNAP fluctuations. (c) Logarithmic sensitivity of CV of mature mRNA fluctuations. The most sensitive parameter and the next most sensitive one are marked in dark bold and italic, respectively

PDR5 POL1 DOA1 MDN1 KAP104
(a)
Mean mature RNA # 13.40 3.13 2.59 6.12 4.93
T(min) 4.50 3.75 2.90 16.75 3.50
su(min-1) 0.44 0.07 0.70 0.70 0.70
sb(min-1) 4.70 0.68 0.12 0.12 0.12
r(min-1) 6.70 2.00 0.14 0.19 0.27
dm(min-1) 0.04 0.06 0.05 0.03 0.05
LS PDR5 POL1 DOA1 MDN1 KAP104
(b)
Λsu 0.52 0.51 − 0.09 − 0.12 − 0.11
Λsb 0.18 0.29 0.09 0.09 0.10
Λr − 0.15 − 0.12 0.49 0.47 0.47
ΛT 0.48 0.34 0.49 0.50 0.49
LS PDR5 POL1 DOA1 MDN1 KAP104
(c)
Λsu 0.50 0.52 − 0.09 − 0.10 − 0.11
Λsb 0.23 0.20 0.08 0.08 0.09
Λr − 0.23 − 0.15 0.49 0.48 0.48
Λdm 0.50 0.47 0.50 0.50 0.50

The following observations can be made regarding the sensitivity of the noise in total RNAP fluctuations: (i) for the two genes PDR5 and POL1 which spend most of their time in the inactive state due to sbsu, CV is most sensitive to changes in the parameters su and T; (ii) for the genes DOA1, MDN1, and KAP104 which spend most of their time in the active state due to susb, CV is most sensitive to changes in the parameters r and T; (iii) the size of mature RNA fluctuations is found to be most sensitive to perturbations in su and dm for PDR5 and POL1, and to perturbations in r and dm for the other three genes. We furthermore note that for both total RNAP and mature RNA, r is the least sensitive parameter for the genes which are mostly inactive, whereas it is among the most sensitive parameters for genes that are mostly active.

Approximate Distributions of Total RNAP and Mature RNA

Thus far, we have derived expressions for the first two moments of the distributions of total RNAP and mature RNA. Naturally, it would also be useful to derive closed-form expressions for the distributions themselves; such a derivation is, however, analytically intractable in general (Jahnke and Huisinga 2007) due to the presence of the catalytic reaction GonGon+P1, which models initiation of the transcription process. Still, there are two special cases where analytical distributions are known: (i) when the elongation time is considered to be fixed, which corresponds to our model with L at constant T (Heng et al. 2016; ii) when the elongation time is exponentially distributed, corresponding to our model with L=1, in which case the distribution of total RNAP is identical to the one which is derived from the telegraph model (Peccoud and Ycart 1995; Raj et al. 2006). While one may argue that the analytical distribution of RNAP for deterministic elongation times may well approximate the stochastic (finite-L) case, the issue remains that the exact solution is not given in terms of simple functions unless promoter switching is slow compared to initiation, elongation, and termination, in which case the solution reduces to a weighted sum of two Poisson distributions (Heng et al. 2016). Hence, it is generally very difficult to apply in practice, such as to infer parameters from data using a Bayesian approach. Moreover, to our knowledge, no exact solutions are known for the distribution of mature RNA in our model. In this section, we aim to devise a simple approximation for the distribution of total RNAP numbers in terms of the Negative Binomial (NB) distribution; these simple distributions have shown great flexibility in describing complex gene expression models with a large number of parameters (Cao and Grima 2020). Finally, by means of singular perturbation theory, we will obtain the distribution of mature RNA under the assumption that RNA polymerase elongation is faster than degradation of mature RNA.

Approximation of Total RNAP Distribution

We approximate the distribution of total RNAP transcribing the gene via a Negative Binomial distribution, as follows. The mean and variance of the Negative Binomial distribution NB(qp) are given by pq/(1-p) and pq/(1-p)2, respectively. By assuming that these are equal to the exact mean and variance, respectively, of the total RNAP distribution, see Eq. (10), we obtain effective values for the parameters p and q:

ntotNB(q,p)NB(ntot2Var(ntot)-ntot,Var(ntot)-ntotVar(ntot)). 20

In Fig. 6, we show a comparison between the distributions of total RNAP obtained from SSA (dots) and the Negative Binomial approximation in Eq. (20) (solid lines). Our results are presented for two different values of the number of gene segments: L=1 (exponentially distributed elongation time; left column) and L=50 (quasi-deterministic elongation time; right column). Additionally, we rescale our gene inactivation rate as sbsbϵ, and we present results for three different values of the parameter ϵ: 10-3, the constitutive limit of the gene being mostly in the active state (top row); 10-1, where the gene spends almost equal amounts of time in the active and inactive states, with sbsu (middle row); and 1, the bursty limit, where the gene spends most of its time in the inactive state (bottom row).

Fig. 6.

Fig. 6

(Color Figure Online) Steady-state distribution of total RNAP and its approximation by a Negative Binomial distribution. We compare the approximation from Eq. (20) (blue lines) with the distribution of total RNAP obtained from stochastic simulation (SSA; red dots). With the exception of sb, the parameters are for the PDR5 gene in yeast and are hence the same as in Fig. 2, with d=0.01/min. Results are presented for two different values of L, corresponding to an exponentially distributed elongation time (L=1) and a quasi-deterministic elongation time (L=50); k is rescaled such that the two have the same mean elongation time. Additionally, we rescale the gene inactivation rate via sbsbϵ, where ϵ=10-3,10-1,1, corresponding to constitutive, general, and bursty expression, respectively. (Here, general expression is neither clearly constitutive nor bursty, since the gene spends roughly equal amounts of time in the inactive and active states.) Note that ϵ=1 results in a distribution of nascent RNA that is consistent with that measured for PDR5; the experimental data from Fig. 6(b) of Zenklusen et al. (2008) are plotted for comparison. The Negative Binomial approximation is found to be accurate in the limits of constitutive and bursty expression (top and bottom rows), independently of L

We can make several observations, as follows. For both L=1 and L=50, the Negative Binomial approximation performs well for bursting and constitutive expression (top and bottom rows), whereas it is appreciably poor when expression is in between those two limits (middle row). Intuitively, this observation can be explained via the following reasoning. In the limits of the gene being mostly in the active state (constitutive expression) or the inactive state (bursty expression), the distribution of total RNAP is necessarily unimodal. However, when the gene spends a considerable amount of time in each state, the distribution is the sum of two conditional distributions which can manifest either as bimodality or as a wide unimodal distribution, neither of which can be captured by a Negative Binomial distribution. Assuming bursty expression, the Negative Binomial distribution is a more accurate approximation to the distribution obtained from SSA for L=1 than it is for L=50; the reason is that L=1 corresponds to the telegraph model (Raj et al. 2006), in which case it can be proven analytically that the distribution reduces to a Negative Binomial in the limit of bursty expression. For constitutive expression, the Negative Binomial approximation is equally good for L=1 and L=50, as the distribution is necessarily Poissonian then and as it is well known that a Negative Binomial distribution can approximate a Poissonian to a high degree of accuracy. In summary, our results hence indicate that Eq. (20) yields a good approximation for the total RNAP distribution of bursty and constitutively expressed genes.

We also note from Fig. 6 that the comparison between the SSA distributions for L=1 and L=50, with equal mean elongation times, highlights the importance of modelling elongation with the correct distribution of elongation times for genes that are non-constitutive, i.e. for ϵ=10-1 or ϵ=1. In particular, if the elongation time is quasi-deterministic (L=50), there appears to be a significant increase in the probability of observing zero total RNAP transcribing the gene compared to models with an exponentially distributed elongation time (L=1).

Approximation of Mature RNA Distribution

Next, we apply singular perturbation theory to formally derive the distribution of mature RNA when the elongation rate is much larger than the degradation rate of mature RNA.

We start by defining Pj(n;t) (j=0,1) as the probability of the state n=(n1,,nL,n) at time t while the gene is either active (0) or inactive (1). Note that ni is the number of RNAPs on gene segment i for i=1,,L, while n is the number of mature RNAs. The time evolution of the probabilities Pj(n;t) can be described by a system of coupled CMEs:

tP0=suP1-sbP0+r(En1-1-1)P0+ki=1L-1(EniEni+1-1-1)niP0+k(EnLEn-1-1)nLP0+di=1L(Eni-1)niP0+dm(En-1)nP0,tP1=sbP0-suP1+ki=1L-1(EniEni+1-1-1)niP1+k(EnLEn-1-1)nLP1+di=1L(Eni-1)niP1+dm(En-1)nP1, 21

where Enic[f(n)]=f(n1,n2,,ni+c,,nL,n), with cZ, denotes the standard step operator. We assume that the elongation rate k is faster than the degradation rate dm of mature RNA, i.e. that k/dm1. Since k=L/T-d, it follows that in the limit of deterministic elongation (k), i.e. for L at constant mean elongation time T, the condition k/dm1 is naturally satisfied.

In order to find an analytical expression for the propagator probabilities P(n;t) which satisfies the system of CMEs in Eq. (21), we define the probability-generating function as F=jFj, with Fj(z;t)=n=0Pj(n;t)zn; here, z=(z1,,zL,z) is a vector of variables corresponding to the state n. Given the equations for Pj(n;t) from Eq. (21), we obtain the following systems of PDEs for the corresponding generating functions Fj(z;t):

L[F0]=suF1-sbF0+r(z1-1)F0,L[F1]=sbF0-suF1, 22

where

L=t+ki=1L-1(zi-zi+1)zi+k(zL-z)zL+di=1L(zi-1)zi+dm(z-1)z 23

is a differential operator acting on the generating functions F0 and F1. Eq. (22) represents a system of coupled, linear, first-order partial differential equations (PDEs). Now, we introduce the new variables ui=zi-1 (i=1,,L) and u=z-1 to rewrite Eq. (22) as

L[F0]=suF1-sbF0+ru1F0,L[F1]=sbF0-suF1; 24

here, the operator in Eq. (23) now takes the form

L=t+ki=1L-1(ui-ui+1)ui+k(uL-u)uL+di=1Luiui+dmuu. 25

In order to find an analytical solution to Eq. (24), we rescale all rates and the time variable by the decay rate of mature RNA; then, we apply the method of characteristics, with s being the characteristic variable. The first characteristic equation gives dm(dt/ds)=1, with solution st=dmt; hence, we can use the variable t as the independent variable and thus convert the system of PDEs in Eq. (24) into a characteristic system of ordinary differential equations (ODEs),

u˙i=(k/dm)[ui-ui+1+(d/k)ui]fori=1,,L-1, 26a
u˙L=(k/dm)[uL-u+(d/k)uL], 26b
u˙=u, 26c
F˙0=(su/dm)F1-(sb/dm)F0+(r/dm)u1F0, 26d
F˙1=(sb/dm)F0-(su/dm)F1, 26e

where the overdot denotes differentiation with respect to t. The existence of an integral-form solution to Eq. (26) follows from the fact that the reaction scheme in Fig. 1 contains first-order reactions only. Under the assumption that kdm, we define ε=dm/k; then, we apply Geometric Singular Perturbation Theory (GSPT) (Fenichel 1979; Jones 1995), with 0<ε1 as the (small) singular perturbation parameter. We hence separate the system in Eq. (26) into fast and slow dynamics, which will allow us to find an asymptotic approximation for F0 and F1 in steady state. A brief introduction to GSPT can be found in ‘Appendix E’. Given the above definition of ε, Eqs. (26a) and (26b), the governing equations for ui in the ‘slow system’, become

εu˙i=ui-ui+1+(d/k)uifori=1,,L-1,εu˙L=uL-u+(d/k)uL, 27

where ui (i,,L) are the fast variables and u, F0, and F1 are the slow ones. Setting ε=0 in Eq. (27), we can express the variables ui as ui=μ·ui+1, with μ=k/(k+d) for i=1,,L. Finally, we write the variable u1 as u1=μL·u. Next, given Eq. (26c), we apply the chain rule, with dtdu·u, to rewrite Eqs. (26d) and (26e) as

F0dmu=suF1-sbF0+rμLuF0, 28a
F1dmu=sbF0-suF1, 28b

where the prime now denotes differentiation with respect to u. Solving Eq. (28a) for F1 and substituting the result into Eq. (28b), we obtain the second-order ODE

dm2uF0+dm(dm+sb+su-rμLu)F0-rμL(dm+su)F0=0 29

for F0(u). Eq. (29) is a confluent hypergeometric differential equation (Kummer’s equation) (Digital Library of Mathematical Functions 2020b) which admits the solution

F0(u)=C·1F1(dm+sudm;dm+sb+sudm;rdmμLu), 30

where 1F1 denotes the confluent hypergeometric function; here, we consider only one of two independent fundamental solutions of Kummer’s differential equation, as we are seeking a solution in steady state where the variable u is bounded. The constant C in Eq. (30) is a constant of integration that is determined from the normalisation condition on the full generating function: F=F0+F1. From Eq. (28), one finds that F satisfies

F=rdmμLF0. 31

Making use of Eq. (31) and applying the normalisation condition F|u=0=1, we find that the generating function in steady state reads

F(z)=1F1(sudm;sb+sudm;rdmμL(z-1)). 32

The probability distribution P(n) of mature RNA can be found from the formula

P(n)=1n!dndznF(z)|z=0,

which yields the analytical expression

P(n)=1n!(su)n(sb+su)n(rdm)n(μL)n1F1(sudm+n;sb+sudm+n;-rdmμL), 33

where (·)n is the Pochhammer symbol, as before. Note that the mean and variance of mature mRNA, as calculated from the distribution in Eq. (33), agree exactly with Eqs. (2c) and (4f) in the limit of fast elongation (k). Note also that the solution in Eq. (33) depends on the parameter μL, which represents the survival probability of an RNAP molecule, i.e. the probability that RNAP will not prematurely detach from the gene. Finally, we take the limit of deterministic elongation, letting L at constant T, which leads to

P(n)=1n!(su)n(sb+su)n(rdm)ne-ndT1F1(sudm+n;sb+sudm+n;-rdme-dT). 34

Note that in the limit of no premature detachment (d=0), Eq. (34) is precisely equal to the distribution of mature RNA predicted by the telegraph model, which is in wide use in the literature (Raj et al. 2006). Hence, our perturbative approach can be seen as a means to formally derive the conventional telegraph model of gene expression starting from a more fundamental and microscopic model. In Fig. 7, we verify our analytical solution with stochastic simulation for two different genes in yeast. We also note that, for nonzero premature detachment rates (d0), Eq. (34) is the steady-state solution predicted by the telegraph model, with parameter r renormalised to re-dT; that is to be expected, as the latter is the rate at which RNAPs undergo termination, leading to mature RNAs.

Fig. 7.

Fig. 7

Steady-state distribution of mature RNA for two different genes in yeast. We compare the distribution obtained from SSA (dots) to the perturbative approximation in Eq. (33) (solid lines) for two different genes. In a, we consider the PDR5 gene, fixing the parameters as in Fig. 2: su=0.44/min, sb=4.7/min, r=6.7/min, d=0.01/min, and T=4.5 min. The degradation rate of mature RNA takes the values dm=0.04,0.10,0.40/min; note that the experimental value is dm=0.04/min. In b, we consider the DOA1 gene, fixing the parameters as in Fig. 3: su=0.7/min, sb=0.12/min, r=0.14/min, d=0.01/min, and T=2.9 min. The degradation rate of mature RNA again takes the values dm=0.04,0.10,0.40/min; the experimental value is dm=0.05/min. For both genes, the agreement between SSA and our perturbative approximation increases with k/dm, as expected, since Eq. (33) is derived under the assumption that kdm. Note that the distribution is practically independent of L, since Eq. (33) depends on L only through μL, which for small premature detachment rates d implies μL1 for any L (Color figure online)

Statistics of Fluorescent Nascent RNA Signal

Thus far, we have determined the statistics of the total number of RNAP transcribing the given gene; these are also the statistics of the number of nascent RNA molecules. However, in experiments using single-molecule fluorescence in situ hybridisation [smFISH (Heng et al. 2016)], molecule numbers of nascent RNA cannot be directly determined. Rather, the experimentally measured RNA ‘abundance’ is the fluorescent signal emitted by oligonucleotide probes bound to the RNA. Since the length of the nascent RNA grows as RNAP moves away from the promoter, it follows that we must account for the increase in the fluorescent signal as elongation proceeds.

In this section, we take into account these experimental details to obtain closed-form expressions for the mean and variance of the fluorescent signal of local and total nascent RNA. We assume that the signal from nascent RNA on the ith gene segment is given by ri=(ν/L)ini for i=1,,L, where ν is some experimental constant; the value of the parameter (ν/L)i is increasing with i, which models the fact that the fluorescent signal becomes stronger as RNAP moves along the gene. The formula for the mean fluorescent signal at gene segment i is then given by ri=(ν/L)ini, where ni follows from Eq. (2b); the covariance of two fluorescent signals along the gene, ri and rj (i,j=1,,L), is given by Cov(ri,rj)=(ν/L)2ijCov(ni,nj), where Cov(ni,nj) is obtained from Eq. (4d). In Fig. 8a, b, we plot the mean and Fano factor of the local signal as a function of the gene segment i; note the contrast between the statistics of the fluorescent signal and the corresponding statistics of local RNAP—which is the statistics of nascent RNA—shown in Fig. 2a, c.

Fig. 8.

Fig. 8

First and second moments of the local and total fluorescent signal for the bursty gene PDR5 in yeast. In a, b, we show the dependence of the mean and the Fano factor of local fluorescent signal fluctuations on the gene segment i, as predicted by our exact theory (solid lines) and SSA (dots), respectively. The plots for CV2(ri) and CC(ri,ri+1) are identical to those of CV2(ni) and CC(ni,ni+1) in Fig. 2. The number of gene segments is arbitrarily chosen to be L=30. In c, d, we show the dependence of the mean and variance of total fluorescent signal fluctuations on the number of gene segments L, as predicted by our exact theory (Eq. (35); solid lines) and SSA (dots). The parameters su, sb, r, and T are characteristic of the PDR5 gene and take the same values as in Fig. 2, as do the rates of elongation and RNAP detachment. The value of the parameter ν is arbitrarily chosen to be ν=10

Similarly, denoting by rtot=i=1Lri the total fluorescent signal across the gene, we find the following expressions for the steady-state mean rtot=i=1Lri and the steady-state variance Var(rtot)=i,j=1LCov(ri,rj):

rtot=νηρkμμL[Lμ-(L+1)]+1L(μ-1)2,Var(rtot)=(νL)2ηρki=1Li2μi+(νL)2αβ(ηρk)2i,j=1Lij·μi+j·fij. 35

For a detailed derivation of the variance in Eq. (35), see Eq. (F.1) in ‘Appendix F’; see also ‘Appendix G’ for the corresponding expressions in the bursty, constitutive, and deterministic elongation limits. In Fig. 8c, d, we show the mean and Fano factor of the total signal as a function of the number of gene segments (L); as above, we note the contrasting difference between the statistics of the fluorescent signal and the corresponding statistics of total RNAP—which is the statistics of total nascent RNA—shown in Fig. 4c, d.

Hence, the calculation of the statistics of the number of nascent RNAs from the raw signal intensity presents a challenge and has to be approached carefully. The expressions presented above allow for the inference of transcriptional parameters from the first two moments of the fluorescent signal by means of moment-based inference techniques (Zechner et al. 2012). Quantitative information about nascent RNA can also be obtained from electron micrograph images (El Hage et al. 2010), which avoids the challenges presented by smFISH.

Model Extension with Pausing of RNAP

Thus far, we have studied a model where RNAPs do not pause as they move along the gene. A natural extension is provided by a modified model in which RNAPs pause along the gene at random sites and elongation is characterised by three processes: forward hopping, pausing, and unpausing of RNAP. The motivation for studying this extended model, which has recently been considered via stochastic simulation in Md Zulfikar et al. (2020), is that experiments have revealed that RNAP exhibits pauses of varying duration, typically on the timescale of few seconds (Forde et al. 2002; Adelman et al. 2002).

Closed-Form Expressions for Moments of Local RNAP Fluctuations

We extend the model described in Fig. 1 by assuming that the RNAP on gene segment i can switch between a non-paused (actively moving) state Pi and a paused state P¯i. The actively moving state Pi switches to P¯i with rate rp, while the reverse reaction occurs with rate ra. Premature detachment from the actively moving RNAP occurs with rate da, whereas it occurs with rate dp from the paused RNAP. The resulting extended model is illustrated in Fig. 9a. In ‘Appendix A’, we derive the mean and variance of the corresponding elongation time, which is not Erlang distributed now, as was the case for the model without pausing. Furthermore we find two interesting properties of the coefficient of variation CVT2 of the elongation time: (i) in the limit of large L at constant mean elongation time, CVT2 does not tend to zero, which implies that elongation is not deterministic; (ii) for small rates of premature detachment, CVT2 is at its maximum when rpra, i.e. when RNAP spends roughly half of its time in the paused state. See ‘Appendix A’ for details and Fig. 9b for a confirmation through stochastic simulation.

Fig. 9.

Fig. 9

Model of transcription that includes RNAP pausing. In a, we extend the model in Fig. 1 so that it takes into account pausing of RNAP at random segments on the gene. Pausing on gene segment i is modelled by the transition from the active state Pi to the paused state P¯i with rate rp, while the reverse (‘unpausing’) transition occurs with rate ra. Premature termination of RNAP occurs with rate da from the actively moving state, and with rate dp from the paused state. In b, we show the dependence of the coefficient of variation squared (CVT2) of the elongation time distribution on the pausing rate (rp), as predicted from SSA (dots) and theory (Eq. (A.7); solid lines). Results are shown for two different parameter regimes: D0{da=0/min=dp} (no premature polymerase detachment) and D1{da=0.05/min=dp} (premature polymerase detachment). The remaining parameters are fixed to L=50, k=10/min, and ra=0.1/min

Proposition 3

Let the number of RNAP molecules in the active state Pi be denoted by nia, let the number of molecules in the paused state P¯i be nip, and let the number of molecules of mature RNA be denoted by n. Let σ=rp/ra be the ratio of the pausing and activation rates, let πra=ra/(ra+dp) be the probability of RNAP switching to the actively moving state from the paused state, and let πdp=dp/(ra+dp) be the probability of premature RNAP detachment from the paused state. Furthermore, define the new parameters μ~=k/(k+da+rpπdp) and λ=σπra.

Then, it follows that the steady-state mean number of RNAP molecules in the active and paused states on gene segment i (i=1,L) is given by

nia=ηρkμ~iandnip=niaλ. 36

Hence, the total mean number of RNAP molecules on each gene segment i reads

ni=nia+nip=nia(1+λ). 37

The proof of Proposition 3 can be found in ‘Appendix H’. Note that in the limit of no pausing, i.e. for rp=0, Eq. (37) reduces to the expression for the mean of RNAP reported in Eq. (2b).

Proposition 4

Let τra=1/ra be the timescale of RNAP activation from the paused state, let τdp=1/dp be the timescale of premature termination of paused RNAP, let τp=1/(k+da) be the typical time that an actively moving RNAP spends on a gene segment, and let τpp=1/(ra+dp) be the typical time spent in the paused state. Furthermore, define the new parameters λrp=πrp/(1-πrp), where πrp=rp/(rp+k+da) is the probability of the actively moving RNAP switching to the paused state, as well as

ωra=πraτgπraτra+τg,α~=τg+λrpπdpτgτg+τp+λrpτg(1-ωra),andω=τgτpp+τg. 38

Assume that the elongation rate is faster than the rates of RNAP pausing, activation, and premature termination, i.e. that kra,rp,da,dp. Then, it follows that to leading order in 1/k, asymptotic expressions for the variances and covariances of molecule number fluctuations of active and paused RNAP are given by

Cov(nia,nja)=δijnia+nianjaα~β·gijaa,wheregijaa=gaa(i,j)+gaa(j,i),Cov(nia,njp)=nianjpα~β·gijap,wheregijap=ωα~j-1,Cov(nip,nja)=nipnjaα~β·gijpa,wheregijpa=ωα~i-1,Cov(nip,njp)=δijnip+nipnjpα~β·gijpp,wheregijpp=(gijap+gijpa)/2; 39

here, i,j=1,2,,L and

gaa(i,j)=α~i+j-1(2α~-1)i+12i+j-1i+j-1i[1-2α~-12α~2F1(1,i+j;j;12α~)].

These results are proved in full in ‘Appendix H’. From ‘Appendix A’, we also have that the mean elongation time in the pausing model is given by

T=L(ra+dp)2+rarp(ra+dp)[(k+da)(ra+dp)+dprp]. 40

Solving Eq. (40) for the elongation rate k, we find that in the limit of L taken at constant mean elongation time, k tends to infinity and hence is much larger than ra, rp, da, and dp, which implies that the results of Proposition 4 hold naturally in that limit.

Approximate Distributions of Total RNAP and Mature RNA

Negative Binomial approximation of total RNAP distribution: We define the total number of RNAP molecules as ntot=i=1Lni. It then immediately follows from Eq. (37) that the mean of the total RNAP distribution in the pausing model is given by

ntot=ηρk(1+λ)μ~μ~L-1μ~-1. 41

It can also be shown that the variance of total RNAP fluctuations reads

Var(ntot)=ntot+(ηρk)2α~β[2i,j=1Lgaa(i,j)+λ(2+λ)ωLα~L-1α~-1]; 42

see ‘Appendix H’. Next, we approximate the distribution of total RNAP by a Negative Binomial distribution whose mean and variance match those just derived, i.e. we consider Eq. (20) with the mean and variance of the total RNAP distribution given by Eqs. (41) and (42) now, respectively. The resulting approximate Negative Binomial distribution is compared with the distribution obtained from SSA in Fig. 10a, b for two different yeast genes, PDR5 and DOA1. The results verify that our approximation is accurate provided the elongation rate k is significantly larger than the other parameters, as assumed in Proposition 4.

Fig. 10.

Fig. 10

Dependence of the steady-state probability distributions of total RNAP and mature RNA on the RNAP pausing rate rp for two different genes in yeast. In a, b, we compare the distribution P(ntot) of the total number of RNAP molecules, as predicted by our model (solid lines), with that obtained from SSA (dots) for yeast genes PDR5 and DOA1, respectively. The model prediction involves fitting a Negative Binomial distribution with a mean and variance given by the closed-form expressions in Eqs. (41) and (42). In c, d, we compare the distribution P(n) of mature RNA, as obtained from singular perturbation theory (Eq. (43); solid lines) with the SSA (dots) for yeast genes PDR5 and DOA1, respectively. Note that for both genes, we keep all parameters fixed (including the elongation rate k) while varying the pausing rate rp to simulate an experiment where the pausing rate can be perturbed directly. The parameters for each gene can be found in Table 1a; we furthermore used L=50 and fixed k to L/T, where T is the mean elongation time measured experimentally and reported in Table 1a. Note that the actual mean elongation time is not fixed, as it depends on the pausing rate (rp) via Eq. (40). The remaining parameters are fixed to ra=0.1/min, da=0.01/min, and dp=0.03/min. The value of da is taken from Table 1 in Rajala et al. (2010), where it is reported as the premature termination rate of polymerase in E. coli; the value of dp was chosen to be larger than that of da to simulate a scenario where premature detachment is enhanced in the paused state. Note that our theory is less accurate for PDR5 than it is for DOA1, as all parameters are very small compared to the elongation rate in the latter case, hence satisfying better the assumptions behind the theory (Color figure online)

Perturbative approximation of mature RNA distribution: We can apply singular perturbation theory to formally derive the distribution of mature RNA, assuming that k/dm1 and ra/dm1. Following the derivation in Sect.  3.2, we find the following analytical expression for the steady-state probability distribution of mature RNA:

P(n)=1n!(su)n(sb+su)n(rdm)n(μ~L)n1F1(sudm+n;sb+sudm+n;-rdmμ~L); 43

see ‘Appendix I’ for details. Note that the solution in Eq. (43) is dependent on the parameter μ~L, which gives the probability that an RNAP molecule does not prematurely detach before termination; see ‘Appendix A’. Also, note that in the limit of zero premature termination, i.e. for da=0=dp, Eq. (43) is identical to the distribution of mature RNA predicted by the telegraph model. Finally, by solving Eq. (40) for k, then substituting the resulting expression into Eq. (43) and taking the long-gene limit of L at constant T, we obtain that the probability distribution of mature RNA has the same functional form as in Eq. (43), albeit with

limLμ~L=e-ψT,whereψ=da+rpπdp1+σπra. 44

Note that Eqs. (43) and (44) equal the steady-state solution predicted by the telegraph model, with the initiation rate r renormalised to rμ~L or re-ψT, respectively. In Fig. 10c, d, we verify the accuracy of our analytical solution using stochastic simulation for two different genes in yeast. Note that a change in the pausing rate rp has relatively little effect on the distribution of mature RNA, as compared to the effect on the distribution of total RNAP; cf. panels (a) and (b) of Fig. 10 in comparison with panels (c) and (d), respectively.

Summary and Conclusion

In this paper, we have analysed a detailed stochastic model of transcription. Our model extends previous analytical work (Choubey et al. 2015; Heng et al. 2016) by (i) taking into account salient processes, such as premature detachment and pausing of RNAP, that were previously not considered analytically; (ii) deriving explicit expressions for the mean and variance of RNAP numbers (nascent RNA) on gene segments as well as on the entire gene; (iii) deriving explicit expressions for the mean and variance of the fluorescent nascent RNA signal obtained from smFISH and identifying differences between the statistics thereof and those of direct measurements of nascent RNA; and (iv) finding approximate distributions of total nascent RNA fluctuations on a gene, without assuming slow promoter switching. A number of interesting observations from our work include the following:

  • (i)

    When the premature detachment rate of RNAP is nonzero and gene expression is bursty, the coefficient of variation of local RNAP fluctuations can either decrease or increase with distance from the promoter. By contrast, when expression is constitutive, the coefficient of variation increases monotonically with distance from the promoter. Other statistical measures such as the mean, Fano factor, and correlation coefficient of local RNAP numbers decrease monotonically with distance from the promoter.

  • (ii)

    In the limits of bursty expression, deterministic elongation, and no premature detachment or pausing, the Fano factor of total nascent RNA equals 1+2b, whereas that of mature RNA is 1+b, where b denotes the mean burst size. An implication is that the telegraph model will result in an overestimate of the mean burst size from nascent RNA data by a factor of 2. Another implication is that deviations from Poisson fluctuations are more apparent in data for nascent RNA than they are for mature RNA. One can further state the following relationship: the Fano factor of nascent RNA equals twice the Fano factor of mature RNA, minus 1. If expression is non-bursty, then the Fano factor of nascent RNA can be larger or smaller than that of mature RNA, as determined by the condition in Eq. (19).

  • (iii)

    For genes characterised by bursty expression, the sensitivity of the noise in total RNAP fluctuations is highest to perturbations in the gene activation rate and the mean elongation time; for constitutive genes, the most sensitive parameters are the initiation rate and the mean elongation time.

  • (iv)

    A Negative Binomial distribution, parameterised with the expressions for the mean and variance of total nascent RNA derived here, provides a good approximation to the true distribution of total nascent RNA fluctuations on a gene when expression is either bursty or constitutive; the approximation is not accurate when the gene spends roughly equal amounts of time in the active and inactive states. We show that the distribution of nascent RNA is highly sensitive to the distribution of elongation times. In particular, if the elongation time is assumed to be exponentially distributed, as is implicitly assumed by telegraph models of nascent RNA, then the probability of observing zero RNA is much lower than if the elongation time is assumed to be fixed.

  • (v)

    Using geometric singular perturbation theory (GSPT), we have rigorously proven that, in the limit of deterministic elongation (or fast elongation), no pausing and premature detachment, the steady-state distribution of mature RNA in our model is identical to that in the telegraph model (Raj et al. 2006). Consideration of pausing and premature detachment leads to a distribution that can also be obtained from a telegraph model with appropriately renormalised parameters.

A summary of the main theoretical results can be found in Table 2, with all requisite parameters and functions defined in Table 3. The main limiting assumption of our theoretical approach is that the initiation rate is slow enough such that RNAP molecules do not frequently collide with each other while moving along the gene. Hence, the expressions we have derived are reasonable for all but the strongest promoters which are characterised by very fast initiation rates. We anticipate that approximate closed-form expressions for the corresponding moments can also be derived when volume exclusion between RNAPs is taken into account by a modification of methods previously devised to understand molecular movement and kinetics in crowded conditions (Cianci et al. 2016; Smith et al. 2017). It is also possible to extend our model by including translation of mature RNA to protein; one can then again apply GSPT to derive distributions for protein numbers in the limit of RNA decaying much faster than protein; however, given item (v) above, we anticipate that the resulting protein distribution will be very similar to those derived from models that do not explicitly take into account nascent RNA (Shahrezaei and Swain 2008; Popović et al. 2016). Further research is required to develop simple approximations of the nascent RNA distribution that are accurate independently of the ratio of gene switching rates. Finally, given the strong recent interest in the development of statistical inference techniques in molecular biology (Gorin et al. 2020; Zechner et al. 2012; Kaan Öcal et al. 2019), we expect that our closed-form expressions for the moments and distributions of nascent and mature RNA will be useful for developing computationally efficient and accurate methods for estimating transcriptional parameters.

Table 2.

Summary of main results

graphic file with name 11538_2020_827_Tab2_HTML.jpg

The cartoon represents our model in various limits: no pausing (rp=0), pausing (rp0), stochastic elongation (T Erlang distributed), deterministic elongation (T fixed), bursty limit (r,sb), and premature RNAP detachment (d,da,dp0). We summarise our analytical expressions for the approximate distributions and moments of total RNAP and mature RNA

Table 3.

Definition of parameters and functions

f(i,j)=αi+j-1(2α-1)i+12i+j-1i+j-1i[1-2α-12α2F1(1,i+j;j;12α)],
gaa(i,j)=α~i+j-1(2α~-1)i+12i+j-1i+j-1i[1-2α~-12α~2F1(1,i+j;j;12α~)],
Ξ=2(Tg+Tm)+eTg[2(Tg-1)Tm+(Tg-2)Tg]
η=su/(su+sb) Fraction of time the gene spends in the active state
ρk=r/k Mean number of bound RNAPs in the time 1/k
ρ=r/dm Mean number of bound RNAPs in the time 1/dm
μ=k/(k+d) Local RNAP survival probability (no-pausing case)
τp=1/(d+k) Timescale of fluctuations of RNAP
τg=1/(su+sb) Timescale of fluctuations of gene
τd=1/d Timescale of RNAP detachment
τm=1/dm Timescales of fluctuations of mature RNA
α=1/(1+τp/τg) Non-dimensional parameter
γ=1/(1+τp/τm) Non-dimensional parameter
θ=1/(1+τm/τg) Non-dimensional parameter
β=sb/su Ratio of gene inactivation and activation rates
b=r/sb Mean burst size
υk=su/k Ratio of gene activation and RNAP elongation rates
υm=su/dm Ratio of gene activation and mature RNA degradation rates
δg=τg/τd Ratio of gene timescale and RNAP detachment timescale
Tg=T/τg Ratio of elongation timescale and gene timescale
Td=T/τd Ratio of elongation timescale and RNAP detachment timescale
Tr=rT Ratio of the mean elongation time to the timescale of initiation
Tm=dmT Ratio of the mean elongation time to the timescale of decay of mature RNA
Tsb=sbT Ratio of the mean elongation time to the timescale of gene deactivation
σ=rp/ra Ratio of the pausing and activation rates
πra=ra/(ra+dp) Probability of RNAP activation
πdp=dp/(ra+dp) Probability of premature RNAP detachment from the paused state
λ=σπra Probability of RNAP pausing from active state
μ~=k/(k+da+rpπdp) Local RNAP survival probability (in pausing case)
τra=1/ra Timescale of RNAP activation from the paused state
τdp=1/dp Timescale of premature termination of paused RNAP
τp=1/(k+da) Typical time that an actively moving RNAP spends on a gene segment
τpp=1/(ra+dp) Typical time spent in the paused state
λrp=πrp/(1-πrp) Ratio of active RNAP timescale over RNAP pausing timescale
πrp=rp/(rp+k+da) Probability of the actively moving RNAP switching to the paused state
ωra=πraτg/(πraτra+τg) Non-dimensional parameter
α~=(τg+λrpπdpτg)/(τg+τp+λrpτg(1-ωra)) Non-dimensional parameter.

Acknowledgements

R.G. acknowledges useful discussions with Zhixing Cao and Tineke Lenkstra. This work was supported by a departmental PhD studentship to T.F.

Appendix

A. Distribution of Elongation Time

In this section, we answer the following question: what is the distribution of the elongation time, i.e. the time between initiation and termination? In other words, with reference to Fig. 9—which includes the non-pausing model in Fig. 1 as a special case-we want to find the distribution of the time at which RNAP leaves gene segment L (termination) if it was in the active state on gene segment 1 at time t=0 (initiation).

Let zi(t) be the probability of an RNAP to be on gene segment i in the active state at time t, let z~i(t) be the probability of the RNAP to be on gene segment i in the paused state at time t, and let zi(t) be the probability of the RNAP moving to gene segment i+1 at time t; note that zL(t) is the probability of the RNAP falling off the gene and forming a mature RNA, since for i=L, gene segment L+1 does not exist. Then, it follows from the reaction scheme illustrated in Fig. 9 that the master equations describing the Markovian dynamics on gene segment i are given by

tzi(t)=-(rp+k+da)zi(t)+raz~i(t), A.1a
tz~i(t)=-(dp+ra)z~i(t)+rpzi(t), A.1b
tzi(t)=kzi(t). A.1c

Now, we use these equations to find the distribution of the time when RNAP jumps to gene segment i+1, given that it is on gene segment i in the active state at t=0, i.e. that zi(0)=1 and z~i(0)=0. Taking the Laplace transform of Eqs. (A.1a) and (A.1b), we find

sz^i(s)-1=-(rp+k+da)z^i(s)+raz~^i(s), A.2a
sz~^i(s)=-(dp+ra)z~^i(s)+rpz^i(s), A.2b

where f^(s)=0e-stf(t)dt. Solving these equations simultaneously, we obtain

z^i(s)=s+dp+ra(s+k+da)(s+dp+ra)+rp(s+dp) A.3

Let w(t)dt be the probability that the RNAP moves from segment i to i+1 in the time interval (t,t+dt). Then, it follows from Eq. (A.1c) that w(t)=tzi(t)=kzi(t). Integrating w(t) over all times gives us the probability that the RNAP ultimately moves to the next segment i+1,

0w(t)dt=w^(0)=kz^i(0)=kra+dpda+kra+dp+dprp. A.4

Note that w^(0) is identical to the parameter μ~, as defined in Proposition 3. Let y(t)dt be the probability that the RNAP moves from gene segment i to segment i+1 in the time interval (t,t+dt), conditioned on those realisations that lead to an RNAP moving to the next gene segment i+1. (In other words, we exclude those realisations that lead to premature detachment.) Then, it follows by the definition of conditional probabilities that y(t)=w(t)/w^(0), which implies

y^(s)=w^(s)w^(0)=[(da+k)(ra+dp)+dprp](ra+dp+s)(ra+dp)[(da+k+s)(ra+dp+s)+rp(dp+s)]. A.5

It follows that the mean t and variance Var(t) of the time t it takes RNAP to move to the next gene segment are given by

t=-dy^(s)ds|s=0=(ra+dp)2+rarp(ra+dp)[(da+k)(ra+dp)+dprp], A.6a
Var(t)=d2y^(s)ds2|s=0-(dy^(s)ds|s=0)2=2rarp(ra+dp)(da+ra+dp+k)+(ra+dp)4+rarp2(ra+2dp)(ra+dp)2[(da+k)(ra+dp)+dprp]2, A.6b

respectively. Since RNAP can only move forwards in our model (irreversible motion), it follows that the time it takes an RNAP to move from the ith to the (i+1)th gene segment is independent of the time taken to move from another, jth segment to the (j+1)th segment. Hence, the time required for an RNAP to move across the entire gene from the first to the Lth segment, i.e. the ‘elongation’ time T from initiation to termination, is a sum of L independent and identical random variables. Thus, we can immediately state that the mean elongation time is T=Lt, whereas the variance of the elongation time is Var(T)=LVar(t). The coefficient of variation squared takes the form

CVT2=Var(T)T2=1+2rarp[(da+k)(ra+dp)+dprp][(ra+dp)2+rarp]2. A.7

From Eq. (A.7), it can be shown that for small premature detachment rates, the coefficient of variation of the elongation time is maximised when rpra. Taking the limit of infinitely many gene segments at constant mean elongation time, i.e. solving for k from the expression for the mean elongation time in Eq. (A.6), substituting into Eq. (A.7), and taking the limit of L, we obtain

limLCVT2=2rarpT(ra+dp)[(ra+dp)2+rarp]. A.8

For the non-pausing model shown in Fig. 1, the above results simplify considerably due to rp=0=dp and da=d; in that case, the inverse Laplace transform of Eq. (A.5) implies that y(t) is an exponential distribution with parameter k+d. Hence, the total time it takes an RNAP to move across the entire gene is the sum of L independent and identically distributed exponential random variables, i.e. an Erlang distribution with shape parameter L and rate k+d, which implies that the mean elongation time is L/(k+d), with coefficient of variation 1/L. It can be seen from Eq. (A.8) that deterministic elongation can only be observed when there is no pausing, i.e. when rp=0.

B. Solution of Lyapunov Equation

Proof of Proposition 2

We start by defining the symmetric functions fij=fji for i,j=1,,L as

f00=1,f0j=αj-1,f0M=θαL-1,fij=(fi-1,j+fi,j-1)/2,fiM=γfi-1,M+(1-γ)fiL,fMM=fLM, B.1

where the non-dimensional parameters α, γ, and θ are defined in Proposition 2. The elements of the Lyapunov equation given by Eq. (5) can be written explicitly as a set of simultaneous equations:

C11·2J11=-D11, B.2a
C12·(J11+J22)=-J21C11, B.2b
C1j·(J11+Jjj)=-Jj,j-1C1,j-1forj=3,,L+1, B.2c
C1,L+2·(J11+JL+2,L+2)=-JL+2,L+1C1,L+1, B.2d
C22·2J22=-2J21C12-D22, B.2e
C23·(J22+J33)=-J21C13-J32C22-D23, B.2f
C2j·(J22+Jjj)=-J21C1j-Jj,j-1C2,j-1forj=4,,L+1, B.2g
C2,L+2·(J22+JL+2,L+2)=-J21C1,L+2-JL+2,L+1C2,L+1, B.2h
Cii·2Jii=-2Ji,i-1Ci-1,i-Diifori=3,,L+1, B.2i
Ci,i+1·(Jii+Ji+1,i+1)=-Ji,i-1Ci-1,i+1-Ji+1,iCii-Di,i+1fori=3,,L, B.2j
Cij·(Jii+Jjj)=-Ji,i-1Ci-1,j-Jj,j-1Ci,j-1fori=3,,L+1andj=i+2,,L+1, B.2k
Ci,L+2·(Jii+JL+2,L+2)=-Ji,i-1Ci-1,L+2-JL+2,L+1Ci,L+1fori=3,,L+1, B.2l
CL+2,L+2·2JL+2,L+2=-2JL+2,L+1CL+1,L+2-DL+2,L+2. B.2m

Now, we substitute the elements of the Jacobian matrix J and the diffusion matrix D from Eqs. (6) and (7), respectively, into the above system of algebraic equations, which we then solve to find the elements of the covariance matrix C. Note that, for the following mathematical derivation, we take into account the expressions for the steady-state mean numbers of species given in Eq. (2), as well as the definition of the functions fij in Eq. (B.1).

From Eq. (B.2a), one easily obtains C11=η2β. Then, it follows from Eq. (B.2b) that

C12=rsu+sb+k+dC11=ρkμα(η2β)=η(ηρkμ)αβ=ηn1αβ·f01. B.3

Eq. (B.2c) implies that, for j=3,,L+1:

C1j=ksu+sb+k+dC1,j-1=μα·C1,j-1=(μα)j-2C12=(μα)j-2(ηn1αβ)=ηnj-1αβ·f0,j-1. B.4

From Eq. (B.2d), we have that

C1,L+2=ksu+sb+dmC1,L+1=kdmθ(nLαβ·f0L)=η(kdmnL)(αβ)(θ·f0L)=ηn·f0M; B.5

from Eq. (B.2e), we find

C22=rn0+(k+d)n12(k+d)+rk+dC12=ρkμη+n12+(ρkμ)(ηn1αβ·f01)=n1+n12αβ·f11, B.6

since f11=(f01+f10)/2=f01 from the definition in Eq. (B.1).

From Eq. (B.2f), we obtain

C23=-k2(d+k)n1+r2(k+d)C13+k2(d+k)C22=-n22+12(ρkμη)n2αβ·f02+12μ[n1+n12αβ·f11]=-n22+12n1n2αβ·f02+n22+12(μn1)n1αβ·f11=12n1n2αβ·f02+12n2n1αβ·f11=n1n2αβ12(f02+f11)=n1n2αβ·f12, B.7

since f12=(f02+f11)/2 from the definition in Eq. (B.1).

From Eq. (B.2g), we have that, for j=4,,L+1,

C2j=r2(k+d)C1j+k2(k+d)C2,j-1=ρkμ2C1j+μ2C2,j-1=ρkμ2q=0j-4(μ2)qC1,j-q+(μ2)j-3C23. B.8

The proof of Eq. (B.8) is given in Lemma B.1. The above expression for C2j can be further simplified to

C2j=ρkμ2q=0j-4(μ2)qηnj-q-1αβ·f0,j-q-1+(μ2)j-3n1n2αβ·f12=q=0j-4(12)q+1(ρkμη)(μqnj-q-1)αβ·f0,j-q-1+(12)j-3n1(μj-3n2)αβ·f12=q=0j-4(12)q+1n1nj-1αβ·f1,j-q-1+(12)j-3n1nj-1αβ·f12=n1nj-1αβ[q=0j-4(12)q+1f1,j-q-1+(12)j-3f12]=n1nj-1αβ·f1,j-1. B.9

For the proof of the last equality in Eq. (B.9), see Lemma B.2.

From Eq. (B.2h), we have that

C2,L+2=rk+d+dmC1,L+2+kk+d+dmC2,L+1=ρkμγC1,L+2+μγC2,L+1=(ρkμγ)(ηnαβ·f0M)+(μγ)(n1nLαβ·f1L)=(ρkημ)nαβ·γf0M+μdmkn1kdmnLαβ·γf1L=n1nαβ·[γf0M+μdmkγf1L]=n1nαβ·[γf0M+(1-γ)·f1L]=n1nαβ·f1M, B.10

where f1M is defined in Eq. (B.1).

Eqs. (B.2i) through (B.2k) yield the system

Cii=kni-2+(k+d)ni-12(k+d)+kk+dCi-1,i=ni-1+μCi-1,i,Ci,i+1=μ2Ci-1,i+1+μ2Cii-μ2ni-1=μ2(Ci-1,i+1+μCi-1,i),Cij=μ2(Ci-1,j+Ci,j-1), B.11

which can be rewritten more compactly as

Cij=δijni-1+ni-1nj-1αβ·fi-1,j-1fori,j=3,,L+1, B.12

where δij is the Kronecker delta. A detailed derivation is given in Lemma B.3.

From Eq. (B.2l), we have that for i=3,,L+1,

Ci,L+2=kk+d+dmCi-1,L+2+kk+d+dmCi,L+2=μγCi-1,L+2+(k/dm)(1-γ)Ci,L+1=γ(μni-2)nαβ·fi-2,M+(1-γ)ni-1(k/dmnL)αβ·fi-1,L=ni-1nαβ·[γfi-2,M+(1-γ)fi-1,L]=ni-1nαβ·fi-1,M, B.13

where fiM is defined in Eq. (B.1).

Finally, Eq. (B.2m) yields

CL+2,L+2=knL+dmn2dm+kdmCL+1,L+2=n+(k/dm)nLnαβ·fLM=n+n2αβ·fMM, B.14

where fMM=fLM is defined in Eq. (B.1).

Summarising the above results, we conclude that the solution for the symmetric covariance matrix C is given by the system in Eq. (4), where we have that Cov(ni,nj)=Ci+1,j+1, Cov(ni,n)=Ci+1,L+2 for i,j=0,,L, and Var(n,n)=CL+2,L+2. Here, the functions fij are defined as in Eq. (B.1). Now, the recurrence relation fij=(fi-1,j+fi,j-1)/2 in Eq. (B.1) can be solved for i,j=1,2,,L via the method of generating functions, which gives the following analytical expression:

fij=f(i,j)+f(j,i), B.15

where

f(i,j)=αi+j-1(2α-1)i+12i+j-1i+j-1i[1-2α-12α2F1(1,i+j;j;12α)];

see Lemma B.5 for a detailed derivation. Additionally, we can easily prove that the function fiM in Eq. (B.1) can be rewritten as

fiM=γif0M+(1-γ)q=1iγi-qfqL, B.16

as shown in Lemma B.4.

Lemma B.1

For j=4,,L+1, we have the identity

C2j=ρkμ2C1j+μ2C2,j-1=ρkμ2q=0j-4(μ2)qC1,j-q+(μ2)j-3C23, B.17

as stated in Eq. (B.8).

Proof

The identity in Eq. (B.17) will be proved by induction: one can easily show that it holds for j=4. Now, we assume that Eq. (B.17) is true for some j5; hence, for j+1, we have

C2,j+1=q=0j-3(μ2)qρkμ2C1,j+1-q+(μ2)j-2C23=ρkμ2C1,j+1+q=1j-3(μ2)qρkμ2C1,j+1-q+(μ2)j-2C23=ρkμ2C1,j+1+μ2[q=1j-3(μ2)q-1ρkμ2C1,j+1-q+(μ2)j-3C23]=ρkμ2C1,j+1+μ2[q=0j-4(μ2)qρkμ2C1,j-q+(μ2)j-3C23]=ρkμ2C1,j+1+μ2C2j, B.18

as claimed, which implies that the identity in Eq. (B.17) holds for all j=4,,L+1.

Lemma B.2

The function f1j, which is defined by the recurrence relation f1j=(f0j+f1,j-1)/2 in Eq. (B.1), satisfies the identity

f1j=q=0j-3(12)q+1f0,j-q+(12)j-2f12forj=3,,L, B.19

as stated in Eq. (B.9).

Proof

We will again prove Eq. (B.19) by induction. For j=3, we have from Eq. (B.19) that f13=(f03+f12)/2, which is true by the definition of f13. We assume that the identity in Eq. (B.19) is correct for some j4; then, for j+1, the definition of f1,j+1, in combination with our assumption, implies

f1,j+1=12f0,j+1+12f1j=12f0,j+1+12[q=0j-3(12)q+1f0,j-q+(12)j-2f12]=12f0,j+1+12[q=1j-2(12)qf0,j+1-q+(12)j-2f12]=q=0j-2(12)q+1f1,j+1-q+(12)j-1f12,

as claimed. Hence, the equality in Eq. (B.19) is true for all j=3,,L.

Lemma B.3

The system in Eq. (B.11), which is given by

Cii=ni-1+μCi-1,ifori=3,,L,Ci,i+1=μ2(Ci-1,i+1+μCi-1,i)fori=3,,L,Cij=μ2(Ci-1,j+Ci,j-1)fori=3,,L+1andj=i+1,,L+1, B.20

is equivalent to the system

Cij=δijni-1+ni-1nj-1αβ·fi-1,j-1fori,j=3,,L+1, B.21

as stated in Eq. (B.12). Here, the functions fij are defined as in Eq. (B.1).

Proof

We again use the method of induction. For i=3, we have

C33=n2+n22αβ·f22=n2+μn1n2αβ·f12=n2+μC23,C34=n2n3(βα)·f34=n2n3αβ(f22+f13)/2=[n2n3αβ·f22+n2n3αβ·f13]/2=[μn1n3αβ·f13+μ2n1n2αβ·f12]/2=μ2[n1n3αβ·f13+μn1n2αβ·f12]=μ2(C24+μC23),C3j=n2nj-1αβ·f2,j-1=n2nj-1αβ(f2,j-2+f1,j-1)/2=[n2nj-1αβ·f2,j-2+n2nj-1αβ·f1,j-1]/2=[n2μnj-2αβ·f2,j-2+μn1nj-1αβ·f1,j-1]/2=μ2[n2nj-2αβ·f2,j-2+n1nj-1αβ·f1,j-1]=μ2(C3,j-1+C2j) B.22

Now, we assume that the statement is true for some i4; then, for i+1, we have

Ci+1,i+1=ni+ni2αβ·fii=ni+μni-1niαβ·fi-1,i=ni+μCi,i+1,Ci+1,i+2=nini+1αβ·fi,i+1=nini+1αβ(fi-1,i+1+fii)/2=[nini+1αβ·fi-1,i+1+nini+1αβ·fii]/2=[μni-1ni+1αβ·fi-1,i+1+μ2ni-1niαβ·fi-1,i]/2=μ2[ni-1ni+1αβ·fi-1,i+1+μni-1niαβ·fi-1,i]=μ2(Ci,i+2+μCi,i+1),Ci+1,j=ninj-1αβ·fi,j-1=ninj-1αβ(fi-1,j-1+fi,j-2)/2=[ninj-1αβ·fi-1,j-1+ninj-1αβ·fi,j-2]/2=[μni-1nj-1αβ·fi-1,j-1+μninj-2αβ·fi,j-2]/2=μ2[ni-1nj-1αβ·fi-1,j-1+ninj-2αβ·fi,j-2]=μ2(Cij+Ci+1,j-1), B.23

which is also correct. Hence, the statement of the lemma is true for all i and j, as stated.

Lemma B.4

For i=1,,L, the function fiM defined in Eq. (B.1) can be simplified as in Eq. (B.16); specifically, we have the identity

fiM=γfi-1,M+(1-γ)fi,L=γi·f0M+(1-γ)q=1iγi-q·fqL. B.24

Proof

The proof is by induction: for i=1, the identity is obvious. We now suppose that Eq. (B.24) is true for some i2; hence, for i+1, we have

fi+1,M=γi+1·f0M+(1-γ)q=1i+1γi+1-q·fqL=γ[γi·f0M+(1-γ)q=1iγi-q·fqL]+(1-γ)fi+1,L=γfiM+(1-γ)fi+1,L, B.25

which is correct. Hence, Eq. (B.24) is true for all i, as stated.

Lemma B.5

For i,j=1,,L, the solution of the recurrence relation fij=(fi,j-1+fi-1,j)/2 in Eq. (B.1) is given by fij=f(i,j)+f(j,i), where

f(i,j)=αi+j-1(2α-1)i+12i+j-1i+j-1i[1-2α-12α2F1(1,i+j;j;12α)]. B.26

Proof

In order to solve the recurrence relation for the function fij, we take into account the initial conditions f00=1 and f0j=fj0=αj-1. Then, we define a generating function g(xy) via

g(x,y)=i,j0fijxiyj=f00+j1f0jyj+i1fi0xi+i,j1fijxiyj, B.27

where the last term can be rewritten as

i,j1fijxiyj=i,j112(fi-1,j+fi,j-1)xiyj=12xi,j1fi-1,jxi-1yj+12yi,j1fi,j-1xiyj-1=12xi0j1fijxiyj+12yi1j0fijxiyj=12x(i,j0fijxiyj-i0fi0xi)+12y(i,j0fijxiyj-j0f0jyj)=12x(g(x,y)-i0fi0xi)+12y(g(x,y)-j0f0jyj). B.28

Hence, Eq. (B.27) becomes

g(x,y)=f00+j1f0jyj+i1fi0xi+12x(g(x,y)-i0fi0xi)+12y(g(x,y)-j0f0jyj),

which is equivalent to

g(x,y)(1-12x-12y)=f00(1-12x-12y)+(1-12y)j1f0jyj+(1-12x)i1fi0xi

or

g(x,y)=f00+(1-12y)11-12x-12yj1f0jyj+(1-12x)1-12x-12yi1fi0xi. B.29

Taking into account the initial conditions, we find that

j1f0jyj=j1αj-1yj=1αj1(αy)jandi1fi0xi=1αi1(αx)i, B.30

which we substitute into Eq. (B.29) to obtain

g(x,y)=1+(1-12y)11-12x-12y1αj1(αy)j+(1-12x)11-12x-12y1αi1(αx)i. B.31

Making use of the well-known symmetric, bivariate generating function of the binomial coefficients

11-s-t=i,j0i+jisitj, B.32

we can rewrite Eq. (B.31) as

g(x,y)=1+(1-12y)1αj1(αy)ji,j0i+jixiyj2i+j+(1-12x)1αi1(αx)ii,j0i+jixiyj2i+j=(1-12y)i,j0q=0j-1i+qiαj-q-12i+qxiyj+(1-12x)i,j0q=0i-1j+qqαi-q-12j+qxiyj.

Rearranging sums in the above expression, we find

g(x,y)=i,j0[q=0j-1i+qiαj-q-12i+q-q=0j-2i+qiαj-q-22i+q+1+q=0i-1j+qqαi-q-12j+q-q=0i-2j+qqαi-q-22j+q+1]xiyj.

Hence, we obtain the following exact expression for the function fij,

fij=q=0j-1i+qqαj-q-12i+q-q=0j-2i+qqαj-q-22i+q+1+q=0i-1j+qqαi-q-12j+q-q=0i-2j+qqαi-q-22j+q+1. B.33

The expression in Eq. (B.33) can be simplified further due to its symmetry with respect to the indices i and j: we write fij=f(i,j)+f(j,i), where f(ij) is defined as

f(i,j)=q=0j-1i+qqαj-q-12i+q-q=0j-2i+qqαj-q-22i+q+1. B.34

The function f(ij) can be further simplified as

f(i,j)=i+j-1j-112i+j-1+2αq=0j-2i+qqαj-q-22i+q+1-q=0j-2i+qqαj-q-22i+q+1=i+j-1j-112i+j-1+(2α-1)αj-22i+1q=0j-2i+qq(12α)q; B.35

next, we use the identity

q=0ji+qixq=1(1-x)i+1-xj+1j+1+ij+12F1(1,j+i+2;j+2;x), B.36

where 2F1 is again the generalised hypergeometric function of the second kind (Digital Library of Mathematical Functions 2020a). Note that the above identity can be used only when |x|<1, as the hypergeometric function 2F1 is not defined otherwise.

Hence, Eq. (B.35) becomes

f(i,j)=i+j-1j-112i+j-1+(2α-1)αj-22i+1×[(2α2α-1)i+1-1(2α)j-1j+i-1j-12F1(1,j+i;j;12α)]=αi+j-1(2α-1)i+12i+j-1i+j-1i[1-2α-12α2F1(1,j+i;j;12α)] B.37

Given the expression for f(ij) in Eq. (B.37), one can find the corresponding expression for f(ji) by exchanging the indexes ij.

C. Variance of Total RNAP Distribution

In this section, we derive the exact expression for the variance of the total RNAP distribution, as stated in Eq. (10), which is given by the sum over the covariances Cov(xi,xj) (i,j=1,,L), as defined in Eq. (4d). Hence, we have

Var(ntot)=i,j=1LCov(ni,nj)=i=1LVar(ni)+ijCov(ni,nj)=i=1L[ni+ni2αβ·fii]+ijninjαβ·fij=i=1Lni+αβ(i=1Lni2·fii+ijninj·fij)=i=1Lni+αβi,j=1Lninj·fij, C.1

where the function fij is given in Eq. (10). The first term in Eq. (C.1) equals ntot, the mean of the total RNAP distribution, as stated in Eq. (10); substituting in the expressions for the means ni from Eq. (2b), as well, we obtain

Var(ntot)=ntot+αβ(ηρk)2i,j=1Lμi+j·fij. C.2

Lemma C.1

In the limit of deterministic elongation, i.e. for L, the expression for Var(ntot) in Eq. (10) simplifies to

Var(ntot)=ntot+β(ηr)2(sb+su-d)-(sb+su+d)e-2dT+2de-(sb+su+d)Td(sb+su+d)(sb+su-d), C.3

which can be further simplified to the expression in Eq. (11).

Proof

In order to find the limit of L in Eq. (10) (or Eq. (C.2)), we have to evaluate the term i,j=1Lμi+j·fij in that limit. For the following derivation, we consider the function fij=f(i,j)+f(j,i), where f(ij) is defined in terms of sums in Eq. (B.34). Hence, we have

i,j=1Lμi+j·fij=i,j=1Lμi+jf(i,j)+i,j=1Lμi+jf(j,i)=2i,j=1Lμi+jf(i,j)=2[i,j=1Lq=0j-1μi+ji+qqαj-q-12i+q-i,j=1Lq=0j-2μi+ji+qqαj-q-22i+q+1]=2i,j=1Lμi+ji+j-1i12i+j-1G1+2(2α-1)i,j=1Lq=0j-2μi+ji+qqαj-q-22i+q+1G2. C.4

Substituting kL/T-d in Eq. (C.4) and taking the limit of L, we have that G1L0; hence, Var(ntot) evaluates to

Var(ntot)=ntot+limL[αβ(ηρk)2G2] C.5

in that limit, which yields the expression in Eq. (C.3), as can easily be verified with the computer algebra package Mathematica. Hence, in the limit of deterministic elongation, the expression for the variance of the RNAP distribution in Eq. (10) reduces to the one in Eq. (11), as claimed.

D. Moments of Total RNAP and Mature RNA in Bursty and Constitutive Limits

Moments of total RNAP in the bursty limit: In the bursty limit, the expressions for the mean and variance of the total RNAP distribution given in Eq. (10) simplify to

ntotb=bsukμμL-1μ-1,andVar(ntot)b=ntotb+b2suki,j=1Lμi+j·hij. D.1

If, furthermore, we take the limit of deterministic elongation, with L at constant T, Eq. (D.1) simplifies to

ntot(b;)=bsud(1-e-Td)andVar(ntot)(b;)=ntot(b;)+ntot(b;)2dsu1+e-Td1-e-Td, D.2

where the subscript (b;) denotes the bursty limit with infinite L. In the limit of zero RNAP detachment, Eq. (D.2) further simplifies to

ntot(b;;0)=bsuTandVar(ntot)(b;;0)=ntot(b;;0)(1+2b), D.3

where the subscript (b;;0) denotes the bursty limit, with L and d0.

Moments of total RNAP in the constitutive limit: In the constitutive limit, Eq. (10) simplifies to

ntotc=rkμμL-1μ-1=Var(ntot)c. D.4

If, furthermore, we take the limit of deterministic elongation, i.e. L at constant T, Eq. (D.4) simplifies to

ntot(c;)=rd(1-e-Td)=Var(ntot)(c;); D.5

finally, in the limit of zero RNAP detachment, Eq. (D.5) further simplifies to

ntot(c;;0)=rT=Var(ntot)(c;;0). D.6

Moments of mature RNA distribution in the bursty limit: In that limit, the closed-form expressions in Eq. (8) are given by

nb=bυmμLandVar(n)b=nb+nb2(υkμ)-1·hMM, D.7

which in the limit of deterministic elongation simplify to

n(b;)=bυme-TdandVar(n)(b;)=n(b;)+n(b;)2υm-1. D.8

In the limit of zero RNAP detachment, these expressions further simplify to

n(b;;0)=bυmandVar(n)(b;;0)=n(b;;0)+n(b;;0)2υm-1. D.9

E. Introduction to Geometric Singular Perturbation Theory (GSPT)

We consider a system of first-order autonomous ordinary differential equations in the general (‘standard’) form

εx˙=f(x,y,ε), E.1
y˙=g(x,y,ε), E.2

where (x,y)Rm×Rl, with m,lN. Here, 0<ε1 is a (real) singular perturbation parameter, and the overdot denotes differentiation with respect to the ‘slow’ time t. (Correspondingly, Eq. (E.1) is referred to as the ‘slow’ system.) The variable x is referred to as the ‘fast variable’, while y is the ‘slow variable’. For simplicity, the functions f:Rm×Rl×R+Rm and g:Rm×Rl×R+Rl are assumed to be C-smooth in all their arguments. In the context of our analysis of the characteristic system in Eq. (26), we have the ‘slow system’

εu˙i=ui-ui+1+(d/k)uifori=1,,L-1, E.3a
εu˙L=uL-u+(d/k)uL, E.3b
u˙=u, E.3c
F˙0=(su/dm)F1-(sb/dm)F0+(r/dm)u1F0, E.3d
F˙1=(sb/dm)F0-(su/dm)F1. E.3e

By comparing the system of equations in Eq. (E.3) with the general form in Eq. (E.1), we see that ui (i=1,,L) are the fast variables, while u, F0, and F1 are slow. Correspondingly, we have m=L and l=3 in the above notation, which implies f=(f1,f2,,fL), with fi=fi(ui,ui+1)=ui-ui+1+(d/k)ui for i=1,,L-1, fL=fL(uL,u)=uL-u+(d/k)uL, and g=(g1,g2,g3)(u1,u,F0,F1)=(u,(su/dm)F1-(sb/dm)F0+(r/dm)u1F0,(sb/dm)F0-(su/dm)F1).

Now, we introduce a new ‘fast’ time τ=t/ε, which we substitute into Eq. (E.1) to find the ‘fast system’

x=f(x,y,ε), E.4a
y=εg(x,y,ε) E.4b

corresponding to Eq. (E.1); here, the prime denotes the derivative with respect to τ. Hence, rewriting Eq. (E.3) in the fast formulation, we find

ui=ui-ui+1+(d/k)uifori=1,,L-1, E.5a
uL=uL-u+(d/k)uL, E.5b
u=εu, E.5c
F0=ε[(su/dm)F1-(sb/dm)F0+(r/dm)u1F0], E.5d
F1=ε[(sb/dm)F0-(su/dm)F1]. E.5e

For positive ε, the systems in Eqs. (E.1) and (E.4)—and, correspondingly, the systems in Eqs. (E.3) and (E.5)—are equivalent; however, in the singular limit of ε0, we obtain two different systems: setting ε=0 in Eq. (E.1), we have the ‘reduced problem’

0=f(x,y,0), E.6a
y˙=g(x,y,0), E.6b

while we obtain the ‘layer problem’

x=f(x,y,0), E.7a
y=0 E.7b

for ε=0 in Eq. (E.4). The ‘reduced problem’ for the system in Eq. (E.3) implies that the flow of (u,F0,F1) is constrained to lie on the (l=3)-dimensional ‘critical manifold’ S0 that is defined by f=0:

ui=μ·ui+1=μi+L-1·ufori=1,,L, E.8

where uL+1u and (F0,F1) are assumed to vary in an appropriately chosen subset of R2.

From the ‘layer problem’ of the system in Eq. (E.3), we conclude that y=(u,F0,F1) is a parameter which parameterises the (m=L)-dimensional flow of ui=fi (i=1,,L), the equilibria of which are located on S0.

The Jacobian matrix Dxf(x,y,0) of the ’layer problem’ corresponding to Eq. (E.5) about S0 has the eigenvalues

λi=k(1+(d/k)-ui+1)=(k+d)(1-μi+L+1u)fori=1,,L. E.9

Since our definition of the generating function F(z,τ) in Sect.  3.2 assumed z[-1,1], we may restrict to u[-2,0] which, by Eq. (E.9), implies that λi>0. Hence, the critical manifold S0 is ‘normally hyperbolic’—and, in fact, normally repelling—with an (m+l=L+3)-dimensional unstable manifold Wu(S0).

The geometric singular perturbation theory due to Fenichel (1979) thus implies that S0 will persist, for ε positive and sufficiently small, as a slow manifold’ Sε that is (locally) invariant, smooth, and O(ε)-close to S0. (As the unstable manifold Wu(S0) equals the entire phase space of Eq. (E.3), it trivially persists as the unstable manifold Wu(Sε) for Sε.) In particular, as S0 is repelling in forward time, it follows that the inverse characteristic transformation corresponding to Eq. (26) is well defined in backward time; details can be found in Veerman et al. (2018), Popović et al. (2016).

F. Variance of Fluctuating Total Fluorescent Signal

By definition, the variance of the total fluorescent signal is given by the sum over all elements Cov(ri,rj) for i,j=1,,L, where ri=(ν/L)ini; the corresponding definitions can be found in Sect. 4 of the main text. Hence, we have that

Var(rtot)=i,j=1LCov(ri,rj)=i,j=1LCov(νLini,νLjnj)=(νL)2i,j=1Lij·Cov(ni,nj)=(νL)2(i=1Li2Var(ni)+ijij·Cov(ni,nj))=(νL)2(i=1Li2[ni+ni2αβ·fii]+ijijninjαβ·fij)=(νL)2i=1Li2ni+(νL)2αβ(i=1Li2ni2·fii+ijijninj·fij)=(νL)2i=1Li2ni+(νL)2αβi,j=1Lijninj·fij. F.1

Substituting the expressions for the means ni from Eq. (2b) into Eq. (F.1), we obtain

Var(rtot)=(νL)2ηρki=1Li2μi+(νL)2αβ(ηρk)2i,j=1Lij·μi+j·fij, F.2

which is the expression stated in Eq. (35).

G. Moments of Fluctuations in Total Fluorescent Signal in Various Limits

Deterministic elongation Substituting kL/T-d and taking the long-gene limit of L in Eq. (35), we obtain the simplified expressions

rtot=νηrdTd[1-(1+Td)e-Td],Var(rtot)=rtot·F0+rtot2·βδgF1+F2+F32(δg-1)2(δg+1)2[1-(1+Td)e-Td]2, G.1

where

F0=ν[2Td-Tde-Td1-(1+Td)e-Td],F1=(δg-1)2(2δg+1),F2=(δg+1)2[2δg(1+Td)(1+Td-Tg)-1]e-2Td,F3=-4δg3(1+Td+Tg)e-Tge-Td; G.2

the expression for the variance in Eq. (G.1) is found via the same method as is used in Lemma C.1 of ‘Appendix C’. When there is no detachment of RNAP from the gene, i.e. when d=0, Eq. (G.1) simplifies to

rtot(;0)=12νηrT,Var(rtot)(;0)=rtot(;0)2ν3+rtot(;0)2·8βTg-1[13-12Tg-1+Tg-3-Tg-3(1+Tg)e-Tg]. G.3

Bursty limit: In the limit when the rates sb and r are large, the expressions for the mean and variance of the total fluorescent signal given in Eq. (35) become

rtotb=νbsud(kd(1-μL)μL-μL),Var(rtot)b=(νL)2bsuki=1Li2μi+(νL)2b2suki,j=1Lij·μi+j·fij. G.4

Constitutive limit: When the gene spends most of its time in the active state, Eq. (35) simplifies to

rtotc=νLρkμ1+μL[L(μ-1)-1](μ-1)2,Var(rtot)c=(νL)2ρkμ1+μ-μL[L2μ2+(1+L)2μ-(2L2+2L-1)](1-μ)3. G.5

Bursty expression with deterministic elongation. In this case, Eq. (G.4) simplifies to

rtot(b;)=νbsudTd[1-(1+Td)e-Td],Var(rtot)(b;)=rtot(b;)·F0+rtot(b;)2·d2su1-(1+2Td+2Td2)e-2Td[1-(1+Td)e-Td]2, G.6

where F0 is given by Eq. (G.2). In the special case of no premature RNAP detachment from the gene (d0), Eq. (G.6) can be further simplified to

rtot(b;;0)=12νbsuT,Var(rtot)(b;;0)=rtot(b;;0)·2ν3+rtot(b;;0)2·83suT. G.7

Constitutive expression with deterministic elongation: In this case, Eq. (G.5) simplifies to

rtot(c;)=νTdrd[1-(1+Td)e-Td]andVar(rtot)(c;)=ν2Td2rd[2-(2+2Td+Td2)e-Td], G.8

which reduces to

rtot(c;;0)=12νrTandVar(rtot)(c;;0)=13ν2rT G.9

for the special case of zero RNAP detachment from the gene.

H. Extended Model with RNAP Pausing

Proof of Proposition 3

The new pausing model presented in Fig. 9 can be conveniently described by 2L+2 species interacting via an effective set of 5L+4 reactions. The vector m of the number of molecules of the respective species is given by m=(n0,n1a,,nLa,n1p,,nLp,n); in the table below, we summarise the respective positions of each entry in m, as well as the definition of the rate functions fj, for j=1,,5L+4.

Species Molecule numbers Position (in m)
Gon n0 1
Pi,   i{1,,L} nia i+1
P¯i,   i{1,,L} nip i+L+1
M n 2L+2
Reaction Rate function fj
GonsbGoff f1=sbn0
GoffsuGon f2=su(1-n0)
GonrGon+P1 f3=rn0
PikPi+1,   i{1,,L-1} fi+3=knia
PLkM fL+3=knLa
Pida,   i{1,,L} fi+L+3=dania
PirpP¯i,   i{1,,L} fi+2L+3=rpnia
P¯iraPi,   i{1,,L} fi+3L+3=ranip
P¯idp,   i{1,,L} fi+4L+3=dpnip
Mdm f5L+4=dmn

Note that we do not consider Goff as an independent species, as a conservation law implies Goff=1-n0. Given the ordering of species and reactions as described in the above tables, we can define the (2L+2)×(5L+4)-dimensional stoichiometry matrix S, with nonzero elements given by

S11=-1,S12=1,Si,i+1=1,Si,i+2=-1,Si,i+L+2=-1,Si,i+2L+2=-1,Si,i+3L+2=1,Si+L,i+2L+2=1,Si+L,i+3L+2=-1,Si+L,i+4L+2=-1,S2L+2,L+3=1,S2L+2,5L+4=-1, H.1

where i=2,,L+1. From the associated CME, it can be shown via the moment equations that the time evolution of the vector m of mean molecule numbers in a system of reactions with propensities that are linear in the number of molecules is determined by dm/dt=S·f. Given the form of the stoichiometric matrix S and of the rate functions fj, it follows that the mean numbers of molecules of active gene, active and paused RNAP, and mature RNA in steady-state can be obtained by solving the following system of 2L+2 algebraic equations:

0=su(1-n0)-sbn0,0=rn0-(k+da+rp)n1a+ran1p,0=kni-1a-(k+da+rp)nia+ranipfori=2,,L,0=rpnia-(ra+dp)nipfori=1,,L,0=knLa-dmn. H.2

Here, we recall the definition of the following parameters from the main text: η=suτg, where τg=1/(su+sb) is the gene switching timescale, ρk=r/k, and ρ=r/dm. Also, we define several new parameters: σ=rp/ra as the ratio of the pausing and activation rates; πra=ra/(ra+dp), which is the probability of RNAP switching to the active state; πdp=dp/(ra+dp), which is the probability of premature termination from the paused RNAP state; μ~=k/(k+da+rpπdp); and λ=σπra. It follows that the solution of Eq. (H.2) can be written as

n0=η,nia=ηρkμ~i,nip=niaλ,andn=ηρμ~L. H.3

Proof of Proposition 4

In order to solve the Lyapunov equation J·C+C·JT+D=0 for the symmetric elements Cij=Cji of the (2L+2)×(2L+2)-dimensional covariance matrix C, we will follow the same approach as in ‘Appendix B’. First, we define the (2L+2)×(2L+2)-dimensional Jacobian and diffusion matrices for our system. The Jacobian matrix J has the following nonzero elements:

J11=-(su+sb),J21=r,J22=-(k+da+rp),J2,2+L=ra,Ji,i-1=k,Jii=-(k+da+rp),Ji,i+L=rafori=3,,L+1,Ji+L,i=rp,Ji+L,i+L=-(ra+dp)fori=2,,L+1,J2L+2,L+1=k,J2L+2,2L+2=-dm, H.4

while the nonzero elements of the symmetric diffusion matrix D are given by

D11=su(1-n0)+sbn0,D22=rn0+(k+da+rp)n1a+ran1p,D23=-kn1a,D2,2+L=-rpn1a-ran1p;fori=3,,L+1:Dii=kni-2a+(k+da+rp)ni-1a+rani-1p,Di,i+1[iL]=-kni-1a,DL+1,2L+2=-knLa,Di,i+L=-rpni-1a-rani-1p;fori=2,,L+1:Di+L,i+L=rpni-1a+(ra+dp)ni-1p,D2L+2,2L+2=knLa+dmn. H.5

Next, using the definition of J and D from Eqs. (H.4) and (H.5), respectively, we solve the Lyapunov equation. Here, we note that we are only interested in expressions for the covariances of fluctuations in active and paused RNAP, but not of mature RNA fluctuations; hence, we require closed-form expressions for the elements Cij with i,j2L+2, which we derive by following the same procedure as in ‘Appendix B’.

Now, we recall that β=sb/su is the ratio of gene deactivation and activation rates, while τp=1/(k+da) is the typical time that an actively moving RNAP spends on a gene segment. Additionally, let τra=1/ra be the timescale of RNAP activation from the paused state, let τdp=1/dp be the timescale of premature termination of paused RNAP, and let τpp=1/(ra+dp) be the typical time spent in the paused state. Finally, we define the following new parameters: λrp=πrp/(1-πrp), where πrp=rp/(rp+k+da) is the probability of actively moving RNAP switching to the paused state, as well as

ωra=πraτgπraτra+τg,α~=τg+λrpπdpτgτg+τp+λrpτg(1-ωra),andω=τgτpp+τg; H.6

then, closed-form expressions for the covariances of the active gene with itself and the remaining species are given by

Var(n0)=η2β·g00aa,whereg00aa=1,Cov(n0,nja)=ηnjaα~β·g0jaa,whereg0jaa=α~j-1,Cov(n0,njp)=ηnjpα~β·g0jap,whereg0jap=ωα~j-1. H.7

Similarly, closed-form expressions for the covariances between all RNAP species read

Cov(nia,nja)=δijnia+nianjaα~β·gijaa,Cov(nia,njp)=nianjpα~β·gijap,Cov(nip,nja)=nipnjaα~β·gijpa,Cov(nip,njp)=δijnip+nipnjpα~β·gijpp, H.8

where the functions gijaa=gjiaa, gijap=gjipa, and gijpp=gjipp satisfy the following recurrence relations:

gijaa=[(k+da)(ra+dp)+rpdp](gi-1,jaa+gi,j-1aa)+rarp(gijap+gijpa)2(k+da+rp)(ra+dp),gijap=[(k+da)(ra+dp)+rpdp]gi-1,jap+(ra+dp)2gijaa+rarpgijpp(k+da+ra+rp+dp)(ra+dp),gijpp=gijap+gijpa2. H.9

Now, we assume that the elongation rate is faster than the rates of RNAP pausing, activation, and premature termination, i.e. that kra,rp,da,dp in Eq. (H.9). Taking the limit of k, we find that the expressions in Eqs. (H.7) and (H.8) remain unchanged, while Eq. (H.9) simplifies to

gijaa=(gi-1,jaa+gi,j-1aa)/2, H.10a
gijap=gi-1,jap, H.10b
gijpp=(gijap+gijpa)/2; H.10c

in particular, to leading order in 1/k, the functions gijaa, gijap, gijpa, and gijpp hence do not depend on k. Eq. (H.10a) defines a recurrence relation for the symmetric function gijaa=gjiaa with initial conditions g00aa and g0jaa from Eq. (H.7). Using the same mathematical technique as in Lemma B.5, we find that the solution for the function gijaa is given by gijaa=gaa(i,j)+gaa(j,i), where

gaa(i,j)=α~i+j-1(2α~-1)i+12i+j-1i+j-1i[1-2α~-12α~2F1(1,i+j;j;12α~)]; H.11

Eq. (H.10b) is a recurrence relation for the function gijap with initial conditions g0jap from Eq. (H.7); the corresponding solution is then given by gijap=ωα~j-1. Finally, the solution of the recurrence relation in Eq. (H.10c) for gijpp is given by gijpp=ω(α~j-1+α~i-1)/2. In sum, the leading-order asymptotics (in 1/k) of the covariances between the various RNAP species for k large is hence given by Eq. (H.8), with gijaa, gijap=gijpa, and gijpp as stated above.

Asymptotics of variance of total RNAP distribution: The variance of the total RNAP distribution for the pausing model is given by

Var(ntot)=i,j=1L(Cov(nia,nja)+Cov(nia,njp)+Cov(nip,nja)+Cov(nip,njp)), H.12

where the expressions for the corresponding covariances are given in Eq. (39). In order to simplify the above expression, we consider each term on the right-hand side in Eq. (H.12) separately, as follows:

i,j=1LCov(nia,nja)=i,j=1Lδijnia+(ηρk)2α~βi,j=1Lgijaa,i,j=1LCov(nia,njp)=(ηρk)2α~βλi,j=1Lgijap,i,j=1LCov(nip,nja)=(ηρk)2α~βλi,j=1Lgijpa,i,j=1LCov(nip,njp)=i,j=1Lδijnip+(ηρk)2α~βλ2i,j=1Lgijpp. H.13

Since i,j=1L(δijnia+δijnip)=i=1Lni=ntot, Eq. (H.12) becomes

Var(ntot)=ntot+(ηρk)2α~βi,j=1L(gijaa+λgijap+λgijpa+λ2gijpp). H.14

Using the expressions for the functions gijaa, gijap, gijpa, and gijpp from Eq. (39), we conclude that Eq. (H.14) further simplifies to

Var(ntot)=ntot+(ηρk)2α~β[2i,j=1Lgaa(i,j)+λ(2+λ)ωLα~L-1α~-1]. H.15

I. Approximation of Mature RNA Distribution in Extended Model

Similarly to Sect. 3.2, we apply geometric singular perturbation theory (GSPT) to formally derive the distribution of mature RNA for the extended pausing model. As was done there, we define Pj(n;t) (j=0,1) as the probability of the state n=(n1a,,nLa,n1p,,nLp,n) at time t while the gene is either active (0) or inactive (1); then, the time evolution of these probabilities can be described by a system of coupled CMEs:

tP0=suP1-sbP0+r(En1-1-1)P0+ki=1L-1(EniaEni+1a-1-1)niaP0+k(EnLaEn-1-1)nLaP0+dai=1L(Enia-1)niaP0+rpi=1L(EniaEnip-1-1)niaP0+rai=1L(EnipEnia-1-1)nipP0+dpi=1L(Enip-1)nipP0+dm(En-1)nP0,tP1=sbP0-suP1+ki=1L-1(EniaEni+1a-1-1)niaP1+k(EnLaEn-1-1)nLaP1+dai=1L(Enia-1)niaP1+rpi=1L(EniaEnip-1-1)niaP1+rai=1L(EnipEnia-1-1)nipP1+dpi=1L(Enip-1)nipP1+dm(En-1)nP1. I.1

In order to find analytical expressions for the propagator probabilities P(n;t) which satisfy the system of CMEs in Eq. (I.1), we define the probability-generating functions Fj(z;t), where z=(z1a,,zLa,z1p,,zLp,z) is a vector of variables corresponding to the state n. Given the equations for Pj(n;t) from Eq. (I.1), we obtain the following system of PDEs for the corresponding generating functions Fj(z;t):

L[F0]=suF1-sbF0+r(z1a-1)F0,L[F1]=sbF0-suF1; I.2

here,

L=t+dm(z-1)z+ki=1L-1(zia-zi+1a)zia+k(zLa-z)zLa+dai=1L(zia-1)zia+rpi=1L(zia-zip)zia+rai=1L(zip-zia)zip+dpi=1L(zip-1)zip I.3

is a differential operator acting on the functions F0 and F1. Eq. (I.2) represents a system of coupled, linear, first-order PDEs. Now, we introduce new variables uia=zia-1, uip=zip-1, and u=z-1; we also rescale all rates and the time variable with the degradation rate dm of mature RNA. Next, we apply the method of characteristics, with s being the characteristic variable. The first characteristic equation will give us dm(dt/ds)=1, with solution sdmt; hence, we can use the variable t=dmt as the independent characteristic variable and thus convert the system of PDEs in Eq. (I.2) into a characteristic system of ODEs:

u˙ia=(k/dm)[(uia-ui+1a)+(da/k)uia+(rp/k)(uia-uip)]fori=1,,L-1, I.4a
u˙La=(k/dm)[(uLa-u)+(da/k)uLa+(rp/k)(uLa-uLp)], I.4b
u˙ip=(ra/dm)[(uip-uia)+(dp/ra)uip]fori=1,,L, I.4c
u˙=u, I.4d
F˙0=(su/dm)F1-(sb/dm)F0+(r/dm)u1aF0, I.4e
F˙1=(sb/dm)F0-(su/dm)F1, I.4f

where the overdot denotes differentiation with respect to t. Here, we assume that k/dm1 and ra/dm1; hence, we define ε=dm/k as the singular perturbation parameter, and we write dm/ra=εδ, where δ=k/ra=O(1) by assumption. Since 0<ε1 is small, we can apply GSPT in order to separate the system in Eq. (I.4) into fast and slow dynamics, which will allow us to find an asymptotic approximation for F0 and F1 in steady state. With the above definitions, the governing equations for uia and uip in the ‘slow system’ in Eqs. (I.4a) through (I.4c) become

εu˙ia=(uia-ui+1a)+(da/k)uia+(rp/k)(uia-uip)fori=1,,L-1, I.5a
εu˙La=(uLa-u)+(da/k)uLa+(rp/k)(uLa-uLp), I.5b
εu˙ip=[(uip-uia)+(dp/ra)uip]/δfori=1,,L. I.5c

It follows that uia and uip (i=1,,L) are the fast variables in our system, while u, F0, and F1 are the slow ones; see ‘Appendix E’. Setting ε=0 and solving the system in Eq. (I.5), we find u1a=μ~L·u, where μ~=k/(k+da+rpπdp) has previously been defined in Proposition 3. Now, given Eq. (I.4d), we apply the chain rule, dtdu·u, to rewrite Eqs. (I.4e) and (I.4f) as:

F0dmu=suF1-sbF0+rμ~LuF0, I.6a
F1dmu=sbF0-suF1, I.6b

where the prime now denotes differentiation with respect to u. The system in Eq. (I.6) is the same as that in Eq. (28), with the substitution μμ~; hence, following the same derivation as in Sect. 3.2, we conclude that the steady-state analytical expression for the probability distribution of mature RNA is given by

P(n)=1n!(su)n(sb+su)n(rdm)n(μ~L)n1F1(sudm+n;sb+sudm+n;-rdmμ~L). I.7

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. Adelman K, La Porta A, Santangelo TJ, Lis JT, Roberts JW, Wang MD. Single molecule analysis of RNA polymerase elongation reveals uniform kinetic behavior. Proc Nat Acad Sci. 2002;99(21):13538–13543. doi: 10.1073/pnas.212358999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Ali MZ, Choubey S, Das D, Brewster RC. Probing mechanisms of transcription elongation through cell-to-cell variability of RNA polymerase. Biophys J. 2020;118(7):1769–1781. doi: 10.1016/j.bpj.2020.02.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Brouwer I, Lenstra TL. Visualizing transcription: key to understanding gene expression dynamics. Curr Opin Chem Biol. 2019;51:122–129. doi: 10.1016/j.cbpa.2019.05.031. [DOI] [PubMed] [Google Scholar]
  4. Cao Z, Grima R. Analytical distributions for detailed models of stochastic gene expression in eukaryotic cells. Proc Nat Acad Sci. 2020;117(9):4682–4692. doi: 10.1073/pnas.1910888117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Choubey S. Nascent RNA kinetics: transient and steady state behavior of models of transcription. Phys Rev E. 2018;97(2):022402. doi: 10.1103/PhysRevE.97.022402. [DOI] [PubMed] [Google Scholar]
  6. Choubey S, Kondev J, Sanchez A. Deciphering transcriptional dynamics in vivo by counting nascent RNA molecules. PLoS Comput Biol. 2015;11(11):e1004345. doi: 10.1371/journal.pcbi.1004345. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Cianci C, Smith S, Grima R. Molecular finite-size effects in stochastic models of equilibrium chemical systems. J Chem Phys. 2016;144(8):084101. doi: 10.1063/1.4941583. [DOI] [PubMed] [Google Scholar]
  8. Coulon A, Ferguson ML, Va de Turris M, Palangat CCC, Larson DR (2014) Kinetic competition during the transcription cycle results in stochastic RNA processing. eLife 3:e03939 [DOI] [PMC free article] [PubMed]
  9. Digital Library of Mathematical Functions (2020a) Chapter 15: https://dlmf.nist.gov/15. Accessed 15 May 2020
  10. Digital Library of Mathematical Functions (2020b) Chapter 15: https://dlmf.nist.gov/13. Accessed 15 May 2020
  11. El Hage A, French SL, Beyer AL, Tollervey D. Loss of topoisomerase i leads to r-loop-mediated transcriptional blocks during ribosomal RNA synthesis. Genes Dev. 2010;24(14):1546–1558. doi: 10.1101/gad.573310. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Elf J, Ehrenberg M. Fast evaluation of fluctuations in biochemical networks with the linear noise approximation. Genome Res. 2003;13(11):2475–2484. doi: 10.1101/gr.1196503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Fenichel N. Geometric singular perturbation theory for ordinary differential equations. J Differ Equ. 1979;31(1):53–98. [Google Scholar]
  14. Forde NR, Izhaky D, Woodcock GR, Wuite GJL, Bustamante C. Using mechanical force to probe the mechanism of pausing and arrest during continuous elongation by escherichia coli RNA polymerase. Proc Nat Acad Sci. 2002;99(18):11682–11687. doi: 10.1073/pnas.142417799. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Gillespie DT. Exact stochastic simulation of coupled chemical reactions. J Phys Chem. 1977;81(25):2340–2361. [Google Scholar]
  16. Gorin G, Wang M, Golding I, Heng X. Stochastic simulation and statistical inference platform for visualization and estimation of transcriptional kinetics. PLoS ONE. 2020;15(3):e0230736. doi: 10.1371/journal.pone.0230736. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Halpern KB, Tanami S, Landen S, Chapal M, Szlak L, Hutzler A, Nizhberg A, Itzkovitz S. Bursty gene expression in the intact mammalian liver. Mol Cell. 2015;58(1):147–156. doi: 10.1016/j.molcel.2015.01.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Heng X, Skinner SO, Sokac AM, Golding I. Stochastic kinetics of nascent RNA. Phys Rev Lett. 2016;117(12):128101. doi: 10.1103/PhysRevLett.117.128101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Jahnke T, Huisinga W. Solving the chemical master equation for monomolecular reaction systems analytically. J Math Biol. 2007;54(1):1–26. doi: 10.1007/s00285-006-0034-x. [DOI] [PubMed] [Google Scholar]
  20. Jones CKRT (1995) Geometric singular perturbation theory. In: Dynamical systems. Springer, pp 44–118
  21. Klumpp S, Hwa T. Stochasticity and traffic jams in the transcription of ribosomal RNA: intriguing role of termination and antitermination. Proc Nat Acad Sci. 2008;105(47):18159–18164. doi: 10.1073/pnas.0806084105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Larson DR, Zenklusen D, Bin W, Chao JA, Singer RH. Real-time observation of transcription initiation and elongation on an endogenous yeast gene. Science. 2011;332(6028):475–478. doi: 10.1126/science.1202142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Lenstra TL, Rodriguez J, Chen H, Larson DR. Transcription dynamics in living cells. Ann Rev Biophys. 2016;45:25–47. doi: 10.1146/annurev-biophys-062215-010838. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Öcal K, Grima R, Sanguinetti G. Parameter estimation for biochemical reaction networks using Wasserstein distances. J Phys A Math Theor. 2019;53(3):034002. [Google Scholar]
  25. Peccoud J, Ycart B. Markovian modeling of gene-product synthesis. Theor Popul Biol. 1995;48(2):222–234. [Google Scholar]
  26. Popović N, Marr C, Swain PS. A geometric analysis of fast-slow models for stochastic gene expression. J Math Biol. 2016;72(1–2):87–122. doi: 10.1007/s00285-015-0876-1. [DOI] [PubMed] [Google Scholar]
  27. Raj A, Peskin CS, Tranchina D, Vargas DY, Tyagi S. Stochastic mRNA synthesis in mammalian cells. PLoS Biol. 2006;4(10):e309. doi: 10.1371/journal.pbio.0040309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Raj A, Van Den Bogaard P, Rifkin SA, Van Oudenaarden A, Tyagi S. Imaging individual mRNA molecules using multiple singly labeled probes. Nat Methods. 2008;5(10):877–879. doi: 10.1038/nmeth.1253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Rodriguez J, Ren G, Day CR, Zhao K, Chow CC, Larson DR. Intrinsic dynamics of a human gene reveal the basis of expression heterogeneity. Cell. 2019;176(1–2):213–226. doi: 10.1016/j.cell.2018.11.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Sanchez A, Golding I. Genetic determinants and cellular constraints in noisy gene expression. Science. 2013;342(6163):1188–1193. doi: 10.1126/science.1242975. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Shahrezaei V, Swain PS. Analytical distributions for stochastic gene expression. Proc Nat Acad Sci. 2008;105(45):17256–17261. doi: 10.1073/pnas.0803850105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Skinner SO, Xu H, Nagarkar-Jaiswal S, Freire PR, Zwaka TP, Golding I (2016) Single-cell analysis of transcription kinetics across the cell cycle. eLife 5:e12175 [DOI] [PMC free article] [PubMed]
  33. Smith S, Cianci C, Grima R. Macromolecular crowding directs the motion of small molecules inside cells. J R Soc Interface. 2017;14(131):20170047. doi: 10.1098/rsif.2017.0047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Suter DM, Molina N, Gatfield D, Schneider K, Schibler U, Naef F. Mammalian genes are transcribed with widely different bursting kinetics. Science. 2011;332(6028):472–474. doi: 10.1126/science.1198817. [DOI] [PubMed] [Google Scholar]
  35. Tiina R, Antti H, Shannon H, Olli Y-H, AndreS R. Effects of transcriptional pausing on gene expression dynamics. PLoS Comput Biol. 2010;6(3):e1000704. doi: 10.1371/journal.pcbi.1000704. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Veerman F, Marr C, Popović N. Time-dependent propagators for stochastic models of gene expression: an analytical method. J Math Biol. 2018;77(2):261–312. doi: 10.1007/s00285-017-1196-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Warren PB, Tănase-Nicola S, ten Wolde PR. Exact results for noise power spectra in linear biochemical reaction networks. J Chem Phys. 2006;125(14):144904. doi: 10.1063/1.2356472. [DOI] [PubMed] [Google Scholar]
  38. Zechner C, Ruess J, Krenn P, Pelet S, Peter M, Lygeros J, Koeppl H. Moment-based inference predicts bimodality in transient gene expression. Proc Nat Acad Sci. 2012;109(21):8340–8345. doi: 10.1073/pnas.1200161109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Zenklusen D, Larson DR, Singer RH. Single-RNA counting reveals alternative modes of gene expression in yeast. Nat Struct Mol Biol. 2008;15(12):1263. doi: 10.1038/nsmb.1514. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Bulletin of Mathematical Biology are provided here courtesy of Springer

RESOURCES