Abstract
Gene expression has inherent stochasticity resulting from transcription's burst manners. Single-cell snapshot data can be exploited to rigorously infer transcriptional burst kinetics, using mathematical models as blueprints. The classical telegraph model (CTM) has been widely used to explain transcriptional bursting with Markovian assumptions. However, growing evidence suggests that the gene-state dwell times are generally non-exponential, as gene-state switching is a multi-step process in organisms. Therefore, interpretable non-Markovian mathematical models and efficient statistical inference methods are urgently required in investigating transcriptional burst kinetics. We develop an interpretable and tractable model, the generalized telegraph model (GTM), to characterize transcriptional bursting that allows arbitrary dwell-time distributions, rather than exponential distributions, to be incorporated into the ON and OFF switching process. Based on the GTM, we propose an inference method for transcriptional bursting kinetics using an approximate Bayesian computation framework. This method demonstrates an efficient and scalable estimation of burst frequency and burst size on synthetic data. Further, the application of inference to genome-wide data from mouse embryonic fibroblasts reveals that GTM would estimate lower burst frequency and higher burst size than those estimated by CTM. In conclusion, the GTM and the corresponding inference method are effective tools to infer dynamic transcriptional bursting from static single-cell snapshot data.
Keywords: inference, transcriptional bursting, single-cell snapshot, non-Markovian, gene expression
1. Introduction
Gene expression is a complex biochemical reaction process with inherent stochasticity, leading to cell-to-cell variability in messenger RNA (mRNA) abundance [1]. Many experimental studies have shown that, in both prokaryotic and eukaryotic cells, the expression of most genes exhibits a stochastic burst pattern over time, characterized by silent intervals interspersed between transcriptional events of genes [2,3]. Their burst kinetics, described by burst frequency and burst size [4,5], are closely related to the whole molecular processes of transcriptional regulation, but the mechanism is not clear. One crucial question is how to learn and infer interpretive biological mechanisms from extensive experimental data, thus bridging the disconnect between transcriptional bursts and their underlying molecular processes, which are crucial for understanding cell-fate decisions [6,7].
Addressing these questions requires visualizing transcription and measuring burst kinetics directly. A growing number of single-molecule experiments have dynamically highlighted transcriptional burst events. MS2 and PP7 imaging systems allow directly detecting the in vivo time-resolved RNA fluorescence of different genes within the same cell, revealing real-time dynamic transcriptional bursts in living cells [8–11]. Single-molecule fluorescence in situ hybridization (smFISH) [12,13] quantifies the steady-state distributions of RNA in thousands of fixed single cells, from which burst parameters such as burst frequency and burst size can be inferred. However, studies based on these experimental approaches are limited to a few genes, and the burst kinetics cannot be generalized to a genome-wide perspective. Recently, single-cell RNA sequencing (scRNA-seq) [14,15] has revolutionized our understanding of cell-fate decisions and made it possible to infer the dynamic behaviour of each gene from static expression distributions. To fulfil the promise of these scRNA-seq technologies, it will be crucial that mathematical models and computational methods are available to unambiguously reveal general principles of transcription on a genome-wide scale [4].
In principle, models of gene transcription should satisfy two basic requirements. First, the gene expression model should be interpretable and mechanistic. That is, the model can offer a way to understand the mechanisms behind transcriptional bursts—for example, addressing questions such as ‘how do the silent transcription intervals control transcriptional bursts?', or ‘how transcriptional bursting relates to gene regulation?’. Second, the model should be tractable. Tractability means the model can be analysed mathematically and used to infer transcriptional bursting kinetics for large datasets. The classical telegraph model (CTM) [16], the first rigorous mathematical treatment, connects transcription burst to stochastic gene expression. In this model, the gene switches randomly between active (ON) and inactive (OFF) states, with only the former permitting transcription initiation. The CTM has been applied in the genome-wide inference of burst kinetics from scRNA-seq [17–22]. For example, the inferences provided transcriptome-wide evidence that promoter elements affect burst size and enhancers control burst frequencies [17].
Despite the widespread use of the CTM, the model's basic assumption—gene switching between active and inactive states at constant rates—does not always hold in some specific biological systems [5,23,24]. Mathematically, the CTM is based on the Markovian assumption that all the biochemical reaction rates involved are constant, which implies that the dwell time in each state follows an exponential distribution [25]. However, most genes have complex control processes, such as chromatin opening, recruitment of transcription factors, pre-initiation complex formation, transcription initiation, as well as promoter pause and release [26]. Such processes could generate non-exponential time intervals between transcription windows in some genes or cell types. In particular, gene-state switching between active and inactive states is not a single-step manner but a multi-step process [24,27]. The multi-step process including sufficient rate-limiting steps can form a molecular memory between individual biochemical events [28], confirmed by increasing time-resolved biological experimental data [29,30]. Furthermore, this molecular memory can affect transcriptional burst kinetics [31,32].
Modelling, analysing and inferring the molecular memory in gene-state switching is challenging. One possible way is to introduce multiple intermediate states, i.e. the promoter architecture with multiple OFF and ON states [23,33,34]. Although the inclusion of additional gene states can improve the fit between a model and experimental data [35–37], the difficulty in determining a possibly large number of promoter states and parameters will be detrimental to the inference of the data [5,38–40]. Even though the topology structure of state-switching networks is simple, the number of transition rate parameters is the same order of the number of promoter states. Thus, it is difficult to handle in practical applications using a multi-state model in the case of numerous promoter states. Alternatively, one can adopt a non-Markovian modelling framework by introducing two general dwell-time distributions for OFF and ON states, respectively. Importantly, the general dwell-time distributions are not limited to the exponential distribution [28,41–45]. This non-Markovian model has two key advantages. First, the model is built in terms of experimentally measurable quantities and interpretable parameters, rather than unobserved or inconvenient measurement gene states. Second, the model can choose an appropriate distribution with fewer parameters to characterize the transition dynamics between only two gene states and therefore overcomes the difficulty in determining the number of promoter states. Despite the good properties of the non-Markovian model, it remains challenging to derive analytical solutions, develop a practical inference algorithm, and particularly infer bursting kinetics from scRNA-seq data. Our previous study described the gene expression process with the non-exponential dwell time of OFF state and exponential dwell time of ON state using a non-Markovian model [46]. However, there is experimental evidence that the non-exponential waiting time for the ON state is also important [47], and genome-wide inference has not been studied for gene expression models in which both OFF and ON dwell times are non-exponential.
In this study, we develop a statistical inference framework to infer transcriptional bursting kinetics from single-cell snapshot data based on a generalized telegraph model (GTM) we build that extends the traditional exponential dwell-time distributions for ON and OFF states to arbitrary distributions. We solve the model analytically and derive the arbitrarily high-order steady-state binomial moments for mRNAs. Furthermore, we develop a statistical inference method based on approximate Bayesian computation to estimate the burst kinetics of the GTM. As a result, we show that the CTM can be misleading for inferring burst kinetics from simulation data based on the GTM. After the validation of synthetic data, the results with our inference algorithm are accurate and scalable. Finally, we apply this dynamic model and inference method to scRNA-seq data from mouse embryonic fibroblasts and find that GTM which consider molecular memory would estimate lower burst frequency and higher burst size than those estimated by CTM on a genome-wide scale. In conclusion, our study provides a paradigm for inferring the transcriptional bursting kinetics from single-cell snapshot data.
2. Model
2.1. Model description
Transcription occurs predominantly in episodic bursts, characterized by burst frequency and burst size (figure 1a). The CTM is the prevailing model for describing the kinetic behaviour of transcriptional bursts (figure 1b). However, the promoter-state switching involves multiple biochemical reaction processes [48], resulting in the number of effective states of most promoters being greater than 2 (i.e. multiple ON states and OFF states) and diverse switching between states [5,23,33,39] (electronic supplementary material, figure S1). Mapping this complex promoter architecture to the ON–OFF non-Markovian model leads to the ON and OFF dwell times being no longer exponentially distributed, as reported in previous studies [24,27,37,49].
Figure 1.
Schematic diagram burst kinetics and gene expression models. (a) The top panel shows a typical output of transcriptional burst in the mRNA production. The green shadow represents the time window of ON dwell time in transcription, while the orange shadow represents OFF dwell time. Correspondingly, the bottom panel represents the burst process in two states switching with each other. The short blue lines represent the transcription events during the ON state, and the purple line represents the cycle time (sum of ON and OFF time in one burst). Burst size (BS) is defined as the mean number (#) of mRNA produced per burst, and burst frequency (BF) is defined as the reciprocal of the mean cycle time. (b) The classical telegraph model (CTM), in which the promoter contains one OFF state and one ON state. The dwell time of the gene in these two states follows exponential distributions. The rates of mRNA synthesis and degradation are constants rsyn and rdeg. (c) The generalized telegraph model (GTM), in which the promoter contains one OFF state and one ON state. The dwell time of the gene in these two states follows arbitrary distributions foff(t) and fon(t). The rates of mRNA synthesis and degradation are constants rsyn and rdeg.
To make this idea precise, we consider a more general stochastic transcription model, called GTM, as illustrated in figure 1c. We assume that the dwell times in OFF and ON states, two random variables τoff and τon, follow arbitrary probability distributions, denoted by foff(t) and fon(t), rather than the limited exponential distributions. The mRNA synthesis and degradation process are assumed to be Markovian, i.e. exponential distributions of transcription waiting time fsyn(t) and mRNA's lifetime fdeg(t). Specifically, , , where rsyn and rdeg are the mean rate of mRNA synthesis and degradation, respectively. The reaction scheme is summarized by the reaction diagram:
| 2.1 |
2.2. Burst kinetic of the generalized telegraph model
The burst size and burst frequency (or the burst cycle time) are the two most critical parameters to characterize burst kinetics. First, we derive the probability distributions and their statistics for the burst size and cycle time with the GTM (figure 1a). Burst frequency describes the average number of bursts that occurred per unit time, i.e. the reciprocal of the mean cycle time. According to the definition of cycle time, which is the summation of the τoff and τon, the distribution of cycle time is the convolution of the distributions foff(t) and fon(t), i.e. . Consequently, we can obtain the expression of burst frequency, BF for GTM (see the electronic supplementary material, note S1.1 for a detailed derivation):
| 2.2 |
where and are the mean OFF and ON state dwell time, respectively.
Burst size describes the average number of mRNA molecules produced per burst. We first derive the distribution of transcription-event numbers per burst. For the exponential transcription process with rate rsyn, the probability of the occurrence of x transcription events conditioned on a fixed duration time t of the ON state is a time-dependent Poisson distribution . Then the probability of the transcription-event number per ON state period, denoted by P(X), can be computed by the total probability theorem . Therefore, we obtain the burst size, BS for GTM by some algebraic calculations (see the electronic supplementary material, note S1.2 for a detailed derivation):
| 2.3 |
Next, we confirm the necessity of the GTM with the help of the obtained burst kinetics (equations (2.2) and (2.3)) of the GTM. From the principle of parsimony, we know that the burst size and frequency can also be directly computed with the CTM [17–19]. A natural and important question is whether the CTM quantitatively enough describes the burst kinetics (burst size and burst frequency). We answer it by performing inference on simulation data from the GTM. First, we repeatedly generate synthetic scRNA-seq data using the simulation algorithm of GTM (see the electronic supplementary material, table S1). In the GTM, molecular memories are characterized by Gamma distributions [31], i.e. and where Γ( · ) is Gamma function, roff and ron are the rate parameters of state switching and koff and kon are the number of possible reaction steps from OFF to ON and vice versa. The Gamma distribution with only two free parameters is widely used to characterize non-exponential dynamics in biophysical reality [29,50–52]. In addition, the Gamma distribution can be used to discern the existence of complex multi-step events in single-molecule data [53]. Other distributions that may be used are the Weibull distribution and so on. Then, we use the CTM to estimate the burst kinetic parameters of the synthetic data via the maximum-likelihood estimation approach. As a result, we find that although the CTM can well fit the gene expression distributions generated by the GTM (figure 2a,e), it cannot accurately describe the dwell-time distributions (figure 2b,c,f,g), further leading to the erroneous estimations of burst frequency and burst size defined by GTM (figure 2d,h). A recent work [54] has found that the different waiting time distributions between two mRNA production events can generate indistinguishable mRNA distributions. They pointed out that an analytical necessary condition should be satisfied to reduce the more detailed mechanistic model to the CTM. These results supported our study and suggest that when we use the non-time-resolved single-cell snapshots data to infer burst kinetics, different dwell-time distributions may yield identical gene expression distributions, which leads to misunderstandings of burst kinetics inferred from the CTM. Therefore, it is necessary to develop a scalable statistical inference approach to accurately estimate the burst kinetics from genome-wide scRNA-seq data based on the GTM.
Figure 2.
The CTM fits well the data generated by the GTM, but misleads the burst kinetics defined by GTM. (a) The histogram represents the data generated from the GTM with parameters koff = 3, roff = 0.5, kon = 2, ron = 0.5, rsyn = 20, rdeg = 1. The solid red line represents the mRNA distribution estimated from the CTM. (b) The histogram shows the simulated data of OFF-state dwell time from the GTM. The solid red line represents the distribution of OFF-state dwell time estimated from the CTM. (c) Results similar to (b) corresponds to the ON-state dwell time. (d) Scatter plots of transcriptional burst frequency (BF) and burst size (BS) were obtained from 1000 repeats of GTM simulations and CTM estimations. The red cross represents the most probable parameters obtained by using smooth kernel density for the burst parameters estimated by the CTM, while the orange star represents the true burst parameters calculated from the GTM. (e–h) The results of another example with the parameters koff = 5, roff = 3, kon = 1, ron = 0.8, rsyn = 30, rdeg = 1.
2.3. Model analysis
To perform statistical inference on the burst kinetics by a steady gene expression distribution from static scRNA-seq data, we theoretically solve the statistical properties of the transcriptional process described in equation (2.1) using a supplementary variable method [55,56]. Denote by M(t) the number of mRNA molecules at time t and G(t) the gene state at time t. Eoff(t) (Eon(t)) is defined as the elapsed time since the gene switches to the OFF (ON) state at time t, respectively. Then, {M(t), G(t), Eoff(t), Eon(t);t ≥ 0} is a continuous-time Markov process. Let poff(n, τ, t) and pon(n, τ, t) be the probability density functions (PDF) that n mRNA molecules are produced and the elapsed time is τ in OFF and ON state at time t, respectively, and we have:
| 2.4 |
According to the relationship between the states of the system at time t and t + Δt, we obtain the following Chapman–Kolmogorov backward equations:
| 2.5 |
where Hoff(τ) = foff(τ)/Soff(τ) is a hazard rate function with the survival function . The definitions of Hon(τ) and Son(τ) are similar.
Next, we focus on steady-state distributions. If the stationary distributions of poff(n, τ, t) and pon(n, τ, t) exist (above simulation has verified this point), and are denoted by poff(n, τ) and pon(n, τ), respectively, equation (2.5) converts to the following stationary chemical master equation in the limit of small Δt and large t:
| 2.6 |
with the integral boundary conditions , .
Based on equation (2.6) with its boundary conditions, we use the binomial moment method [57] to calculate the mRNA stationary distribution and its statistical characteristics. Binomial moments of the mRNA stationary distribution are defined as , where the symbol represents the combinatorial number. Note that binomial moments converge to zero as their orders go to infinity, and can be used to reconstruct P(n) by . After some algebra of equation (2.6), we can obtain the nth binomial moment of mRNA in a recursive form (see the electronic supplementary material, note S2 for a detailed derivation):
| 2.7 |
where
| 2.8 |
for n = 1, 2, …. Here, C0 = (〈τoff〉 + 〈τon〉)−1 is equal to the burst frequency. represents the Laplace transform of function f(t). Especially, and . According to the relationship between binomial moments and central moments and equations (2.7) and (2.8), we obtain the mean and noise of mRNA expression:
| 2.9 |
and
| 2.10 |
Note that if let τoffrdeg and τonrdeg be the rescaled random variables for OFF-state and ON-state dwell times, and rsyn/rdeg be the rescaled mean synthesis rate, the mean expression not only equals the product of the mean synthesis rate and the stationary probability of ON state, but also the product of burst size and burst frequency.
Finally, we consider two specific cases of the GTM. First, if the ON-state dwell time is exponential, i.e. , the GTM and corresponding results reduce to the previous conclusions [46] (see the electronic supplementary material, note S3.1). Further, if the OFF-state dwell time is also exponential, i.e. , the GTM reduces to the CTM. In this model, the mean activation and inactivation rates are roff = 〈τoff〉−1 and ron = 〈τon〉−1, respectively. Notably, the expression for the nth binomial moment of mRNA can be simplified as follows (see the electronic supplementary material, note S3.2):
| 2.11 |
Using the reconstruct formula, we obtain the mRNA stationary distribution for CTM (see the electronic supplementary material, note S3.2), consistent with previous results [3,16].
3. Binomial moment-based inference
In this section, we will develop a statistical inference method based on binomial moments for GTM. First, we assume that the number of cells in the experiment is large enough to allow us to acquire a steady expression distribution of a specific gene. This assumption is reasonable in scRNA-seq data and has been widely applied to genome-wide studies [17,20]. Next, with the steady distribution of each gene from scRNA-seq data as a bridge, we develop a computation inference framework that uses an approximate Bayesian computation (ABC) approach [58] to infer reliable parameter sets for our GTM (see the electronic supplementary material, table S2).
3.1. Inference framework
We develop a statistical framework that combines the ABC approach to estimate the Bayesian posterior probability π(θ|yobs) of the model parameter-vector (θ) given the observed scRNA-seq data (yobs). The prior information related to θ is denoted as the prior distribution π(θ), which will be iteratively updated through computing the likelihood function p(yobs|θ) of the GTM. Using Bayes' theorem, the resulting posterior distribution is given by:
| 3.1 |
Ideally, we can perform the inference methods relying on iterative likelihood function maximization. However, the approximation of the mRNA distribution of the GTM is computationally prohibitive, making it impossible to evaluate the likelihood function directly.
Alternatively, we resort to a ‘likelihood-free' Bayesian approach by computing only low-order moments instead of the whole probability distribution according to the obtained binomial moments (equation (2.7)). Specifically, we use the sequential Monte Carlo ABC [59] (see the electronic supplementary material, table S2), a variant of ABC, to implement the statistical inference of our model. ABC allows us to accept the parameters that make the simulated data and the observed data sufficiently close in distribution and to estimate the posterior distribution π(θ|yobs) of the parameters through numerous simulations. First, given a dynamic model (figure 3a), we sample a candidate parameter-vector θ from the prior distribution π(θ) (figure 3b), and simulate a dataset ymodel from the GTM. Then, we check whether the distribution of simulated data approximates the observed data yobs (figure 3d) by predefining three extra parameters: (i) summary statistics s(y) for sufficiently representing data; (ii) discrepancy metrics ρ( · , · ) for measuring the distance between summary statistics of observed data sobs and simulated data smodel from the GTM (figure 3e); and (iii) threshold ε for controlling acceptable errors (figure 3f). Note that the low threshold ε of ABC promises a good approximation of π(θ|yobs), but also imposes a huge computational cost and low-rate acceptances. To avoid the difficulties of ABC in terms of computational power and convergence, we use the sequential Monte Carlo ABC.
Figure 3.
Inference procedure. Given a dynamic model (a), the parameter θ is sampled from the prior distribution π(θ) (b), and then the theoretical binomial moment (c) is used to compute the summary statistic smodel (e). The static single-cell snapshot data yobs (d) can be used to calculate the summary statistic sobs (e). The sampled θ is accepted by comparing whether ρ(sobs, smodel) is less than the threshold ε (f). As the output of the inference procedure, the posterior distribution π(θ|yobs) of the parameters is obtained by Bayes' theorem (g).
Sequential Monte Carlo ABC adds a sequence of threshold values {ε0, ε1, …, εT} satisfied the condition ε0 > · · · > εT ≥ 0, and thus constructs a sequence of the intermediate posterior distributions , where f0(θ) = π(θ). When the (t − 1)th iteration is over, we can obtain the ft−1(θ|ρ(sobs, smodel) ≤ εt−1). In the tth iteration, we first sample the parameters θt−1 randomly from ft−1(θ|ρ(sobs, smodel) ≤ εt−1) and sample a θ* from the proposal distribution Kt(θ|θt−1). Then we accept the θ* if . After repeating the above procedures in this iteration, we get the posterior distribution ft(θ|ρ(sobs, smodel) ≤ εt). In theory, when εT → 0. Finally, we can obtain the posterior distribution π(θ|yobs) (figure 3g).
3.2. Inference procedure
3.2.1. Summary statistics
The inference procedure requires a summary statistic to reduce the high-dimensional data to low-dimension features to compare whether the distribution of simulated data ymodel from the GTM is close to the observed data yobs. Here, we choose six commonly used moment statistics for inference [60,61] that are important to characterize the shape of gene expression distribution as the summary statistics: (i) the mean value μ1 is the most commonly used indicator in statistics, and it represents the average level of mRNA expression in scRNA-seq data; (ii) the noise strength is a measurement of the dispersion of the probability distribution, defined as where μ2 is the variance; (iii) the Fano factor is another statistic that measures the dispersion of a probability distribution relative to a Poisson distribution, defined as μ2/μ1; (iv) the skewness is a description of the symmetry of the distribution, and it is defined as , where μ3 is the third central moment; (v) the kurtosis describes whether the peak of the distribution is abrupt or flat, which is defined as , where μ4 is the fourth central moment; and (vi) the bimodality coefficient can describe the bimodal distribution [62], which is usually a critical feature in a dynamical system. The values of the bimodality coefficient range from 0 to 1, and values greater than 5/9 may indicate a bimodal or multimodal distribution.
Crucially, we used the binomial moment approach which theoretically provides an efficient method to directly compute the theoretical summary statistics for a given parameter θ of the GTM. Precisely, we can calculate the central moments with the binomial moment:
| 3.2 |
in which with being the Stirling number of the second kind. Therefore, the above summary statistics can be expressed by binomial moments as follows:
| 3.3 |
It should be noted that we can extend the summary statistics to higher-order moments because our binomial moments can compute arbitrary high-order moment statistics. However, we emphasize that the practice of calculating summary statistics directly from theory rather than through simulations works well but is quite unusual in the context of ABC. As a result, the algorithm is not a true Bayesian approach, because it makes the acceptance and rejection for a set of parameters become a deterministic process, and the width of the posterior distribution obtained by the algorithm will be narrower than the true posterior distribution. We suggested that the mode of the posterior distribution obtained by the algorithm is credible, while the error of the width is dependent on the sample size (the larger the sample size, the smaller the width error) [58].
To assess the sensitivity of the summary statistics, we investigate the influence of the model parameters (kon, koff, ron and roff) on the six summary statistics through the simulation algorithm of GTM (electronic supplementary material, table S1) and the theoretical results (equation (3.3)). Note that the parameter of ON and OFF dwell times show opposite tendencies of influence on the summary statistics (electronic supplementary material, figures S2 and S3). The results also show that the six summary statistics are sensitive to the parameters of dwell times, implying the rationality of statistics selection.
3.2.2. Discrepancy metrics
To eliminate the influence of absolute size between different summary statistics, we take the natural logarithm of the data instead of computing the Euclidean distance [36,63]:
| 3.4 |
where s(i) is the ith component of the summary statistics vector. Note that the logarithm transformations of data do not change the data properties and correlation, but the scale is compressed. In addition, these transformations can make the data more stable and weaken the effects of collinearity and heteroscedasticity on the model.
3.2.3. Acceptance threshold
Generally, the acceptance thresholds {ε0, ε1, …, εT} are usually determined by experience. However, the algorithm causes a waste of computational resources in this iteration if the default threshold is greater than the maximum distance obtained by sampling in the last iteration. Therefore, we adopt a strategy of adaptive acceptance thresholds to prevent this situation. For the first iteration, we use a large threshold ε0 to accept 10 000 coarse parameter samples quickly, and then select the first 1000 parameter samples with the smallest discrepancy between sobs and smodel as input for the next iteration. For the tth iteration, the acceptance threshold εt is set to the median of the discrepancies of the results obtained from the (t − 1)th iteration. In detail, we set the hyperparameters: initial thresholds ε0 = 1 and round number T = 5. With this setting, it takes about 75 s to run an example of synthetic data using an Intel(R) Core(TM) i7–6700 CPU @ 3.40 GHz.
3.2.4. Prior distribution
As required in Bayesian inference, we should set prior distribution π(θ), allowing for the initial parameter sampling θ in the inference procedure. In the GTM, we assume that the mRNA decay rate rdeg = 1 and the dwell times in the ON state and OFF state are Gamma distributed, i.e. and , respectively. The parameter-vector is θ = [koff, roff, kon, ron, rsyn]. We set the prior distributions of koff and kon follow the uniform distribution U[0, 5] and the prior distributions of roff and ron follow the log-uniform distribution from interval [ − 1, 1] based on 10. In addition, the prior distribution of transcriptional rates rsyn is U[0, 50].
3.2.5. Proposal distribution
In the inference procedure, the parameter θ* sampled in tth iteration is based on small perturbations around the parameter θt−1 sampled from the distribution ft−1(θ|ρ(sobs, smodel) ≤ εt−1). Therefore, we use lognormal distribution LN(θ;θt−1, σ) as the proposed distribution for sampling all parameters in the tth iteration. In detail, we set the hyperparameters: the varience of lognormal distribution σ = 0.2 for all parameter.
4. Results
4.1. Synthetic data
We first use synthetic scRNA-seq data from the GTM to verify whether the inference method can accurately estimate the burst kinetics of the gene expression. We apply the simulation algorithm for the GTM with given parameters (see the electronic supplementary material, table S1) to generate mRNA distribution. Figure 4a shows that the dwell times for OFF and ON states for the GTM are non-exponential distributions and have a bimodal distribution at a steady state, which has important significance in biological systems [64] and is a common observation in scRNA-seq data [65]. Having obtained 1000 samples by the inference procedure, we find that the posterior distribution of transcriptional rate rsyn can accurately estimate the true parameter value (figure 4b). Importantly, we show that the dwell time's first-order moment information (mean) can be estimated accurately (figure 4c). In addition, the burst frequency and burst size can also be estimated accurately (figure 4d) (although the individual parameters are unidentifiable in large part of parameter space). In another example, the distribution of GTM is unimodal, and the burst kinetics can also be accurately inferred (figure 4f–h).
Figure 4.
Validation of inferring burst kinetics on synthetic scRNA-seq data. (a) Comparison between synthetic scRNA-seq data for inference with parameters koff = 3, roff = 0.5, kon = 2, ron = 0.5, rsyn = 20, (red solid line, the PDF fitted by kernel smoothing function) and simulation data of inferred parameters (green histogram). (b) Marginal posterior distributions (blue histogram) of transcriptional rate rsyn estimated by the inference method, where the red solid line corresponds to the true parameter and the black dashed line to the mode of estimated parameter. (c) The scatter plot shows the two-dimensional posterior distribution of the dwell time τoff and τon. The colour bar is a normalized probability density. The meanings of lines are the same as those in (b). (d) Posterior distribution of the burst frequency (BF) and burst size (BS). (e–h) The results of another example with the parameters koff = 5, roff = 3, kon = 1, ron = 0.8, rsyn = 30, .
4.2. Experimental data
Next, we apply the inference procedure to mouse embryonic fibroblasts for each allele (C57 × CAST) [17] to estimate transcriptional bursting parameters on a genome-wide scale. This scRNA-seq data was widely used to investigate genome-wide transcriptional burst kinetics [20,66,67], which was sequenced based on smart-seq2 technology (unique molecular identifier counts were used) and contained 10 727 genes and 224 individual cells on each allele. For quality control, we filter out genes expressed in less than 50 cells and cells expressed in less than 2000 genes. In addition, we remove genes with mean expression levels below 2 to ensure high expression of the inferred genes. Finally, we combine the two allele expressions together to eventually form a single-cell matrix consisting of 2162 genes and 413 cells.
Interestingly, we observe that the genes with the same average expression level have a different combination of burst frequency and burst size, consistent with previous reports [17,24]. This result implies that gene expression may be regulated by diverse burst kinetic mechanisms (figure 5a). In addition, we perform genome-wide burst kinetics inference for the same data based on the CTM, using a maximum likelihood estimation method. We discover that the estimated burst frequency and burst size keep a high positive correlation between GTM and CTM (p-value < 2.2 × 10−16; figure 5b,c). Notably, we observe that the transcriptional burst kinetics inferred with the GTM would have lower burst frequency (figure 5b) and higher burst size (figure 5c) than those estimated by CTM on the genome-wide scale, consistent with the results for single genes in figure 2d,h. Moreover, the inference method with an alternative definition of the burst frequency (1/〈τoff〉) also presents the same results (electronic supplementary material, figure S4). These results suggest that the GTM, as an extension of the CTM, provide a more flexible way to predict burst kinetics, and corresponding inference methods has the capability to perform genome-wide studies on scRNA-seq data.
Figure 5.
Genome-wide characteristics of transcriptional burst kinetics inferred from the scRNA-seq data of mouse embryonic fibroblasts. (a) Scatter plot of burst frequency (BF) and burst size (BS) inferred by the inference method, where the colour bar represents the mean expression levels of each gene. (b,c) Scatter plots show the burst frequency (b) and burst size (c) estimated by GTM and CTM, which are correlated in the sense of Pearson correlation test (p-value < 2.2 × 10−16). The slope of the grey dashed lines equals 1.
5. Discussion
With the emergence of next-generation sequencing technologies, inferring gene expression burst kinetics on a genome-wide scale from static single-cell snapshot data is challenging in computational systems biology [4]. In previous studies [17,18], gene expression models used for inference and analysis, e.g. CTM, have long relied on Markov assumptions. However, increasing experimental data show that the dwell time of states is not simply exponentially distributed [24,27,37,49], leading to the failure of the Markov approximation. There is thereby an urgent need but it is still a significant challenge to develop effective methods for modelling, analysing and inferring non-Markov gene expression models.
In this paper, we have developed an inference method for inferring burst kinetics from scRNA-seq data based on GTM, which allows consideration of ON–OFF transitions with arbitrary dwell-time distributions. We demonstrated that the CTM could not estimate the non-Markovian burst kinetics, although it can well fit the gene expression distributions sometimes. We theoretically derived the analytical solution for arbitrary order binomial moments of GTM, which in turn enables us to calculate the statistics of mRNA. We developed the inference procedure to infer transcriptional burst kinetics using the summary statistic calculated by binomial moments. The results of the synthetic dataset show that our inference method can precisely estimate the burst frequency and burst size of the gene expression system as well as the average dwell time in ON and OFF states based on GTM. Furthermore, we performed a genome-wide burst kinetics inference on the mouse embryonic fibroblasts scRNA-seq data with the inference method. We found that the transcriptional burst kinetics inferred with the GTM would have lower burst frequency and higher burst size than CTM.
The GTM and the corresponding inference method are applicable to study burst kinetics on a genome-wide scale. First, the GTM is interpretable. The GTM, as an extension of the CTM, is a mechanistic model that considers the dwell times to be arbitrary distributions; and the model parameters, such as dwell times of OFF and ON states, transcription rates and degradation rates, are measurable from experiments [7,24]. Second, the GTM is solvable. The arbitrary order binomial moments of mRNA's distribution are theoretically derived. In particular, the model extends previous results. For example, the CTM is a special case of the GTM if foff(t) and fon(t) are set to exponential distributions. Third, the inference method for GTM is scalable. The approximation of the probability distributions of the stochastic model is often computationally prohibitive. With the theoretical results derived, we used a ‘likelihood-free' approach by computing only low-order moments instead of the whole probability distribution. The efficiency of the inference method facilitates the extension of individual genes to genome-wide study. Finally, our inference method can learn gene expression information from static snapshot data without time-resolved data.
Our work opens up several avenues for further research. From a modelling perspective, the GTM simplifies biological processes in several aspects. The GTM only considers the generalized dwell-time distribution for OFF and ON states but does not consider the non-exponential waiting time for transcriptional and degradation processes. Many experimental data suggest that transcription (such as mRNA elongation, pause and release [68–70]) and degradation (such as mRNA senescence [31,71,72]) are also multi-step processes in cells. In addition, other biological processes that can affect burst kinetics should be considered in future work. For example, a recent study reveal that the inference of splicing dynamics can further investigate the transcriptional burst dynamics [73]. Also, another study has investigated that the post-transcriptional noise and different cell cycle phases would effect the transcription inference [74]. From a statistical inferring perspective, using GTM to interpret the experimental biological data has yet to be fully developed. First, the binomial moments obtained from our analysis are in the sense of a steady distribution (t → ∞) and not as a function of time t. Solving for the model's temporal solutions facilitates applying the GTM to time-resolved data. Second, the inference based on the steady-state distribution still suffers from the unidentifiability of the parameters, which may depend on the properties that different dwell-time distributions lead to the same static mRNA distribution. Third, a fraction of the variability in scRNA-seq data comes from technical noise [75], and it is a challenge to couple the technical noise into the GTM for inference, owing to the computational complexity of the model. Inspired by recent studies [20,76], we can address this issue in our future work by introducing sampling distributions.
Finally, we note that our approach is not limited to scRNA-seq data, but could also be useful for other kinds of single-cell data in which the probability distributions of mRNA can be estimated, such as smFISH data. We expect the GTM and corresponding inference method to facilitate an understanding of gene expression mechanisms from the enormous amount of biological data.
Data accessibility
Relevant code of algorithm and analysis for this research work are stored in GitHub: https://github.com/cellfate/BurstGTM and have been archived within the Zenodo repository: https://doi.org/10.5281/zenodo.7512499. Experimental data of mouse primary fibroblasts were downloaded from [17]. Additonal data is available in the electronic supplementary material [77].
Authors' contributions
S.L.: data curation, formal analysis, investigation, methodology, software, validation, visualization, writing—original draft, writing—review and editing; Z.Z.: data curation, formal analysis, investigation, methodology, validation, visualization, writing—original draft, writing—review and editing; Z.W.: data curation, formal analysis, methodology, writing—review and editing; X.Y.: data curation, formal analysis, validation, writing—review and editing; X.C.: data curation, formal analysis, validation, writing—review and editing; T.Z.: funding acquisition, supervision, writing—original draft, writing—review and editing; J.Z.: conceptualization, data curation, formal analysis, funding acquisition, investigation, methodology, project administration, resources, software, supervision, validation, visualization, writing—original draft, writing—review and editing.
All authors gave final approval for publication and agreed to be held accountable for the work performed therein.
Conflict of interest declaration
We declare we have no competing interests.
Funding
This work was supported by the National Key R&D Program of China (grant no. 2021YFA1302500); the Natural Science Foundation of P. R. China (grant nos. 12171494, 11931019 and 11775314); the Guangdong Basic and Applied Basic Research Foundation (grant no. 2022A1515011540); the Key-Area Research and Development Program of Guangzhou, P. R. China (grant nos. 2019B110233002 and 202007030004); the Special Projects in Key Fields for Colleges and Universities in Guangdong Province (grant no. 2022ZDZX2045) and the Guangdong Province Key Laboratory of Computational Science at the Sun Yat-sen University (grant no. 2020B1212060032).
References
- 1.Elowitz MB, Levine AJ, Siggia ED, Swain PS. 2002. Stochastic gene expression in a single cell. Science 297, 1183-1186. ( 10.1126/science.1070919) [DOI] [PubMed] [Google Scholar]
- 2.Yu J, Xiao J, Ren X, Lao K, Xie XS. 2006. Probing gene expression in live cells, one protein molecule at a time. Science 311, 1600-1603. ( 10.1126/science.1119623) [DOI] [PubMed] [Google Scholar]
- 3.Raj A, Peskin CS, Tranchina D, Vargas DY, Tyagi S. 2006. Stochastic mRNA synthesis in mammalian cells. PLoS Biol. 4, e309. ( 10.1371/journal.pbio.0040309) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Rodriguez J, Larson DR. 2020. Transcription in living cells: molecular mechanisms of bursting. Annu. Rev. Biochem. 89, 189-212. ( 10.1146/annurev-biochem-011520-105250) [DOI] [PubMed] [Google Scholar]
- 5.Tunnacliffe E, Chubb JR. 2020. What is a transcriptional burst? Trends Genet. 36, 288-297. ( 10.1016/j.tig.2020.01.003) [DOI] [PubMed] [Google Scholar]
- 6.Eldar A, Elowitz MB. 2010. Functional roles for noise in genetic circuits. Nature 467, 167-173. ( 10.1038/nature09326) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Lammers NC, Kim YJ, Zhao J, Garcia HG. 2020. A matter of time: using dynamics and theory to uncover mechanisms of transcriptional bursting. Curr. Opin. Cell Biol. 67, 147-157. ( 10.1016/j.ceb.2020.08.001) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Chen SY, et al. 2020. Optogenetic control reveals differential promoter interpretation of transcription factor nuclear translocation dynamics. Cell Syst. 11, 336-353.e24. ( 10.1016/j.cels.2020.08.009) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Chao JA, Patskovsky Y, Almo SC, Singer RH. 2008. Structural basis for the coevolution of a viral RNA-protein complex. Nat. Struct. Mol. Biol. 15, 103-105. ( 10.1038/nsmb1327) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Bertrand E, Chartrand P, Schaefer M, Shenoy SM, Singer RH, Long RM. 1998. Localization of ASH1 mRNA particles in living yeast. Mol. Cell 2, 437-445. ( 10.1016/S1097-2765(00)80143-4) [DOI] [PubMed] [Google Scholar]
- 11.Larson DR, Zenklusen D, Wu B, Chao JA, Singer RH. 2011. Real-time observation of transcription initiation and elongation on an endogenous yeast gene. Science 332, 475-478. ( 10.1126/science.1202142) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Femino AM, Fay FS, Fogarty K, Singer RH. 1998. Visualization of single RNA transcripts in situ. Science 280, 585-590. ( 10.1126/science.280.5363.585) [DOI] [PubMed] [Google Scholar]
- 13.Raj A, Van Den Bogaard P, Rifkin SA, Van Oudenaarden A, Tyagi S.. 2008. Imaging individual mRNA molecules using multiple singly labeled probes. Nat. Methods 5, 877-879. ( 10.1038/nmeth.1253) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Zheng GX, et al. 2017. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 1-12. ( 10.1038/ncomms14049) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Picelli S, Björklund ÅK, Faridani OR, Sagasser S, Winberg G, Sandberg R. 2013. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096-1098. ( 10.1038/nmeth.2639) [DOI] [PubMed] [Google Scholar]
- 16.Peccoud J, Ycart B. 1995. Markovian modeling of gene-product synthesis. Theor. Popul. Biol. 48, 222-234. ( 10.1006/tpbi.1995.1027) [DOI] [Google Scholar]
- 17.Larsson AJ, et al. 2019. Genomic encoding of transcriptional burst kinetics. Nature 565, 251-254. ( 10.1038/s41586-018-0836-1) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Kim JK, Marioni JC. 2013. Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing data. Genome Biol. 14, R7. ( 10.1186/gb-2013-14-1-r7) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Ochiai H, et al. 2020. Genome-wide kinetic properties of transcriptional bursting in mouse embryonic stem cells. Sci. Adv. 6, eaaz6699. ( 10.1126/sciadv.aaz6699) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Luo S, Wang Z, Zhang Z, Zhou T, Zhang J. 2022. Genome-wide inference reveals that feedback regulations constrain promoter-dependent transcriptional burst kinetics. Nucleic Acids Res. 11, 68-83. ( 10.1093/nar/gkac1204) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Vu TN, Wills QF, Kalari KR, Niu N, Wang L, Rantalainen M, Pawitan Y. 2016. Beta-Poisson model for single-cell RNA-seq data analyses. Bioinformatics 32, 2128-2135. ( 10.1093/bioinformatics/btw202) [DOI] [PubMed] [Google Scholar]
- 22.Jiang Y, Zhang NR, Li M. 2017. SCALE: modeling allele-specific gene expression by single-cell RNA sequencing. Genome Biol. 18, 74. ( 10.1186/s13059-017-1200-8) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Zhang J, Zhou T. 2014. Promoter-mediated transcriptional dynamics. Biophys. J. 106, 479-488. ( 10.1016/j.bpj.2013.12.011) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Suter DM, Molina N, Gatfield D, Schneider K, Schibler U, Naef F. 2011. Mammalian genes are transcribed with widely different bursting kinetics. Science 332, 472-474. ( 10.1126/science.1198817) [DOI] [PubMed] [Google Scholar]
- 25.Van Kampen NG. 1992. Stochastic processes in physics and chemistry. Amsterdam, The Netherlands: Elsevier. [Google Scholar]
- 26.Fuda NJ, Ardehali MB, Lis JT. 2009. Defining mechanisms that regulate RNA polymerase II transcription in vivo. Nature 461, 186-192. ( 10.1038/nature08449) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Harper CV, et al. 2011. Dynamic analysis of stochastic transcription cycles. PLoS Biol. 9, e1000607. ( 10.1371/journal.pbio.1000607) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Zhang J, Zhou T. 2019. Markovian approaches to modeling intracellular reaction processes with molecular memory. Proc. Natl Acad. Sci. USA 116, 23 542-23 550. ( 10.1073/pnas.1913926116) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Stumpf PS, et al. 2017. Stem cell differentiation as a non-Markov stochastic process. Cell Syst. 5, 268-282. ( 10.1016/j.cels.2017.08.009) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Voss TC, Hager GL. 2014. Dynamic regulation of transcriptional states by chromatin and transcription factors. Nat. Rev. Genet. 15, 69-81. ( 10.1038/nrg3623) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Pedraza JM, Paulsson J. 2008. Effects of molecular memory and bursting on fluctuations in gene expression. Science 319, 339-343. ( 10.1126/science.1144331) [DOI] [PubMed] [Google Scholar]
- 32.Jia T, Kulkarni RV. 2011. Intrinsic noise in stochastic models of gene expression with molecular memory and bursting. Phys. Rev. Lett. 106, 058102. ( 10.1103/PhysRevLett.106.058102) [DOI] [PubMed] [Google Scholar]
- 33.Zhang J, Chen L, Zhou T. 2012. Analytical distribution and tunability of noise in a model of promoter progress. Biophys. J. 102, 1247-1257. ( 10.1016/j.bpj.2012.02.001) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Zhou T, Zhang J. 2012. Analytical results for a multistate gene model. SIAM J. Appl. Math. 72, 789-818. ( 10.1137/110852887) [DOI] [Google Scholar]
- 35.Zoller B, Nicolas D, Molina N, Naef F. 2015. Structure of silent transcription intervals and noise characteristics of mammalian genes. Mol. Syst. Biol. 11, 823. ( 10.15252/msb.20156257) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Desai RV, et al. 2021. A DNA repair pathway can regulate transcriptional noise to promote cell fate transitions. Science 373, eabc6506. ( 10.1126/science.abc6506) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Rodriguez J, Ren G, Day CR, Zhao K, Chow CC, Larson DR. 2019. Intrinsic dynamics of a human gene reveal the basis of expression heterogeneity. Cell 176, 213-226. ( 10.1016/j.cell.2018.11.026) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Klindziuk A, Kolomeisky AB. 2018. Theoretical investigation of transcriptional bursting: a multistate approach. J. Phys. Chem. B 122, 11 969-11 977. ( 10.1021/acs.jpcb.8b09676) [DOI] [PubMed] [Google Scholar]
- 39.Sepúlveda LA, Xu H, Zhang J, Wang M, Golding I. 2016. Measurement of gene regulation in individual cells reveals rapid switching between promoter states. Science 351, 1218-1222. ( 10.1126/science.aad0635) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Fritzsch C, Baumgärtner S, Kuban M, Steinshorn D, Reid G, Legewie S. 2018. Estrogen-dependent control and cell-to-cell variability of transcriptional bursting. Mol. Syst. Biol. 14, e7678. ( 10.15252/msb.20177678) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Daigle BJ Jr, Soltani M, Petzold LR, Singh A. 2015. Inferring single-cell gene expression mechanisms using stochastic simulation. Bioinformatics 31, 1428-1435. ( 10.1093/bioinformatics/btv007) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Schwabe A, Rybakova KN, Bruggeman FJ. 2012. Transcription stochasticity of complex gene regulation models. Biophys. J. 103, 1152-1161. ( 10.1016/j.bpj.2012.07.011) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Kumar N, Singh A, Kulkarni RV. 2015. Transcriptional bursting in gene expression: analytical results for general stochastic models. PLoS Comput. Biol. 11, e1004292. ( 10.1371/journal.pcbi.1004292) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Zhang J, Zhou T. 2019. Stationary moments, distribution conjugation and phenotypic regions in stochastic gene transcription. Math. Biosci. Eng. 16, 6134-6166. ( 10.3934/mbe.2019307) [DOI] [PubMed] [Google Scholar]
- 45.Stinchcombe AR, Peskin CS, Tranchina D. 2012. Population density approach for discrete mRNA distributions in generalized switching models for stochastic gene expression. Phys. Rev. E 85, 061919. ( 10.1103/PhysRevE.85.061919) [DOI] [PubMed] [Google Scholar]
- 46.Shi C, Jiang Y, Zhou T. 2020. Queuing models of gene expression: analytical distributions and beyond. Biophys. J. 119, 1606-1616. ( 10.1016/j.bpj.2020.09.001) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Shelansky R, Abrahamsson S, Doody M, Brown CR, Patel HP, Lenstra TL, Larson DR, Boeger H. 2022. A telltale sign of irreversibility in transcriptional regulation. bioRxiv. [DOI] [PMC free article] [PubMed]
- 48.Blake WJ, Balázsi G, Kohanski MA, Isaacs FJ, Murphy KF, Kuang Y, Cantor CR, Walt DR, Collins JJ. 2006. Phenotypic consequences of promoter-mediated transcriptional noise. Mol. Cell 24, 853-865. ( 10.1016/j.molcel.2006.11.003) [DOI] [PubMed] [Google Scholar]
- 49.Dunham LS, et al. 2017. Asymmetry between activation and deactivation during a transcriptional pulse. Cell Syst. 5, 646-653. ( 10.1016/j.cels.2017.10.013) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Lagha M, et al. 2013. Paused Pol II coordinates tissue morphogenesis in the Drosophila embryo. Cell 153, 976-987. ( 10.1016/j.cell.2013.04.045) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Coulon A, Ferguson ML, de Turris V, Palangat M, Chow CC, Larson DR.. 2014. Kinetic competition during the transcription cycle results in stochastic RNA processing. Elife 3, e03939. ( 10.7554/eLife.03939) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Coleman RA, et al. 2017. p53 dynamically directs TFIID assembly on target gene promoters. Mol. Cell. Biol. 37, e00085-17. ( 10.1128/MCB.00085-17) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Floyd DL, Harrison SC, Van Oijen AM.. 2010. Analysis of kinetic intermediates in single-particle dwell-time distributions. Biophys. J. 99, 360-366. ( 10.1016/j.bpj.2010.04.049) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Braichenko S, Holehouse J, Grima R. 2021. Distinguishing between models of mammalian gene expression: telegraph-like models versus mechanistic models. J. R. Soc. Interface 18, 20210510. ( 10.1098/rsif.2021.0510) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Cox DR. 1955. The analysis of non-Markovian stochastic processes by the inclusion of supplementary variables. Cambridge, UK: Cambridge University Press. [Google Scholar]
- 56.Alfa AS, Rao TS. 2000. Supplementary variable technique in stochastic models. Probab. Eng. Inf. Sci. 14, 203-218. ( 10.1017/S0269964800142068) [DOI] [Google Scholar]
- 57.Zhang J, Nie Q, Zhou T. 2016. A moment-convergence method for stochastic analysis of biochemical reaction networks. J. Chem. Phys. 144, 194109. ( 10.1063/1.4950767) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Franks JJ. 2020. Handbook of approximate Bayesian computation. Boca Raton, FL: CRC Press. [Google Scholar]
- 59.Sisson SA, Fan Y, Tanaka MM. 2007. Sequential Monte Carlo without likelihoods. Proc. Natl Acad. Sci. USA 104, 1760-1765. ( 10.1073/pnas.0607208104) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Zechner C, Ruess J, Krenn P, Pelet S, Peter M, Lygeros J, Koeppl H. 2012. Moment-based inference predicts bimodality in transient gene expression. Proc. Natl Acad. Sci. USA 109, 8340-8345. ( 10.1073/pnas.1200161109) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Fröhlich F, Thomas P, Kazeroonian A, Theis FJ, Grima R, Hasenauer J. 2016. Inference for stochastic chemical kinetics using moment equations and system size expansion. PLoS Comput. Biol. 12, e1005030. ( 10.1371/journal.pcbi.1005030) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Ellison AM. 1987. Effect of seed dimorphism on the density-dependent dynamics of experimental populations of Atriplex triangularis (Chenopodiaceae). Am. J. Bot. 74, 1280-1288. ( 10.1002/j.1537-2197.1987.tb08741.x) [DOI] [Google Scholar]
- 63.Lenive O, W Kirk PD, H Stumpf MP. 2016. Inferring extrinsic noise from single-cell gene expression data using approximate Bayesian computation. BMC Syst. Biol. 10, 81. ( 10.1186/s12918-016-0324-x) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Ochab-Marcinek A, Tabaka M. 2010. Bimodal gene expression in noncooperative regulatory systems. Proc. Natl Acad. Sci. USA 107, 22 096-22 101. ( 10.1073/pnas.1008965107) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Sarkar A, Stephens M. 2021. Separating measurement and expression models clarifies confusion in single-cell RNA sequencing analysis. Nat. Genet. 53, 770-777. ( 10.1038/s41588-021-00873-4) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Larsson AJ, Ziegenhain C, Hagemann-Jensen M, Reinius B, Jacob T, Dalessandri T, Hendriks G-J, Kasper M, Sandberg R. 2021. Transcriptional bursts explain autosomal random monoallelic expression and affect allelic imbalance. PLoS Comput. Biol. 17, e1008772. ( 10.1371/journal.pcbi.1008772) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Larsson AJ, Coucoravas C, Sandberg R, Reinius B. 2019. X-chromosome upregulation is driven by increased burst frequency. Nat. Struct. Mol. Biol. 26, 963-969. ( 10.1038/s41594-019-0306-y) [DOI] [PubMed] [Google Scholar]
- 68.Tantale K, et al. 2021. Stochastic pausing at latent HIV-1 promoters generates transcriptional bursting. Nat. Commun. 12, 4503. ( 10.1038/s41467-021-24462-5) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Engl C, Jovanovic G, Brackston RD, Kotta-Loizou I, Buck M. 2020. The route to transcription initiation determines the mode of transcriptional bursting in E. coli. Nat. Commun. 11, 2422. ( 10.1038/s41467-020-16367-6) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Dobrzyński M, Bruggeman FJ. 2009. Elongation dynamics shape bursty transcription and translation. Proc. Natl Acad. Sci. USA 106, 2583-2588. ( 10.1073/pnas.0803507106) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Keene J, et al. 2001. As examples accumulate of both ARE-bearing stable mRNAs and labile mRNAs lacking AREs, the ARE dogma has incrementally given way to alternative bona. Nat. Rev. Mol. Cell Biol. 2, 237-246. ( 10.1016/j.molcel.2008.01.007) [DOI] [PubMed] [Google Scholar]
- 72.Decker CJ, Parker R. 1993. A turnover pathway for both stable and unstable mRNAs in yeast: evidence for a requirement for deadenylation. Gene Dev. 7, 1632-1643. ( 10.1101/gad.7.8.1632) [DOI] [PubMed] [Google Scholar]
- 73.Gorin G, Pachter L. 2022. Modeling bursty transcription and splicing with the chemical master equation. Biophys. J. 121, 1056-1069. ( 10.1016/j.bpj.2022.02.004) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Fu X, Patel HP, Coppola S, Xu L, Cao Z, Lenstra TL, Grima R. 2022. Quantifying how post-transcriptional noise and gene copy number variation bias transcriptional parameter inference from mRNA distributions. Elife 11, e82493. ( 10.7554/eLife.82493) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Kim JK, Kolodziejczyk AA, Illicic T, Teichmann SA, Marioni JC. 2015. Characterizing noise structure in single-cell RNA-seq distinguishes genuine from technical stochastic allelic expression. Nat. Commun. 6, 8687. ( 10.1038/ncomms9687) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Gorin G, Fang M, Chari T, Pachter L. 2022. RNA velocity unraveled. PLoS Comput. Biol. 18, e1010492. ( 10.1371/journal.pcbi.1010492) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Luo S, Zhang Z, Wang Z, Yang X, Chen X, Zhou T, Zhang J. 2023. Inferring transcriptional bursting kinetics from single-cell snapshot data using a generalized telegraph model. Figshare. ( 10.6084/m9.figshare.c.6486197) [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
- Luo S, Zhang Z, Wang Z, Yang X, Chen X, Zhou T, Zhang J. 2023. Inferring transcriptional bursting kinetics from single-cell snapshot data using a generalized telegraph model. Figshare. ( 10.6084/m9.figshare.c.6486197) [DOI] [PMC free article] [PubMed]
Data Availability Statement
Relevant code of algorithm and analysis for this research work are stored in GitHub: https://github.com/cellfate/BurstGTM and have been archived within the Zenodo repository: https://doi.org/10.5281/zenodo.7512499. Experimental data of mouse primary fibroblasts were downloaded from [17]. Additonal data is available in the electronic supplementary material [77].





