Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2023 Jun 23:2023.06.20.545522. [Version 1] doi: 10.1101/2023.06.20.545522

Bayesian analysis dissects kinetic modulation during non-stationary gene expression

Christian Wildner 1,+, Gunjan D Mehta 2,+, David A Ball 3, Tatiana S Karpova 3, Heinz Koeppl 1,*
PMCID: PMC10370195  PMID: 37503023

Abstract

Labelling of nascent stem loops with fluorescent proteins has fostered the visualization of transcription in living cells. Quantitative analysis of recorded fluorescence traces can shed light on kinetic transcription parameters and regulatory mechanisms. However, existing methods typically focus on steady state dynamics. Here, we combine a stochastic process transcription model with a hierarchical Bayesian method to infer global as well locally shared parameters for groups of cells and recover unobserved quantities such as initiation times and polymerase loading of the gene. We apply our approach to the cyclic response of the yeast CUP1 locus to heavy metal stress. Within the previously described slow cycle of transcriptional activity on the scale of minutes, we discover fast time-modulated bursting on the scale of seconds. Model comparison suggests that slow oscillations of transcriptional output are regulated by the amplitude of the bursts. Several polymerases may initiate during a burst.

Introduction

Transcription is one of the fundamental processes of cellular life. RNA synthesis consists of the three major steps of initiation, elongation and termination. A closer look reveals that the individual steps are highly regulated and subject to intrinsic as well as extrinsic stochastic effects14. The details of transcriptional regulation on the molecular level are still far from understood.

Single-cell measurements of RNA have revealed that transcription is not only heterogenous between cells or within the genome but can also change for the same gene over time. The frequently observed pattern of high transcriptional activity interspersed with periods of silence is known as transcriptional bursting5. Recent evidence suggest that bursting may occur on multiple superimposed timescales6. It is unclear how these different timescales are regulated. A common question of interest is whether transcriptional output is regulated by burst amplitude, burst frequency, or burst duration79. Traditionally, this is investigated using RNA counting data. Continuous transcription with exponentially distributed time intervals between initiation events leads to a Poisson distribution of mature mRNA. Therefore, deviations from the Poisson distribution may indicate bursty transcription. Early work in this direction modeled the promoter as a telegraph process that stochastically switches between transcriptionally active and inactive states. In the active state, transcriptional output follows a Poisson process. For inference, predicted distributions of the model are matched to empirical RNA count histograms5,10,11. Later, the method was extended to multi-state models and to include nascent mRNA by solving the chemical master equation of the underlying system numerically2,9. While successful at confirming bursting, information theory suggests that RNA counting data is fundamentally limited in distinguishing different multi-state promoters12,13. In addition, the employed models oversimplify elongation and termination. Therefore, all variance in the observed data is necessarily attributed to the initiation process which may bias results towards more complex promoter models.

Novel imaging techniques such as the stem loop approach now enable time-resolved measurements of single transcription sites (TS) in live cells and in real time by fusing a fluorescent marker to a binding protein that attaches to hairpin structures formed by the nascent mRNA14,15. Observed by a fluorescence microscope, the TS appears as a moving diffraction-limited spot with fluctuating intensity. An idealized time trace of a single polymerase consists of an approximately uniform increase in intensity, followed by a plateau phase when all stem loops have been formed and a sharp drop when the transcript detaches from the transcript site16. For actively transcribed genes, the observed intensity is a superposition of several polymerases that have initiated with varying inter-event times. Together with other sources of noise accumulated during image acquisition the resulting trace may seem highly random. Thus, while time-resolved measurements of single transcriptions sites are potentially more informative than RNA counting data, they are also much more challenging from an inference perspective. Phenomenologically, bursting has been studied by binarizing the fluorescent traces via thresholding. Modifying suspected regulators then allows to measure the corresponding effect on burst statistics17. However, for fast switching dynamics, extracting bursts directly from the trace is prone to error18. First, a simple Poisson signal convolved with non-Gaussian observation noise and measured with a detection threshold may give an impression of burstiness. Second, if time between two bursts is shorter than the production time of the nascent mRNA, a bursty signal may be classified as continuous. A more sophisticated method for trace analysis that allows to determine initiation rate and expected production time of nascent transcripts is based on the autocorrelation function of the intensity signal16,19. From a stochastic kinetic model of transcription, a theoretical autocorrelation function for the system is computed and then fit to match the empirical autocorrelation function of the traces. This provides a computationally efficient approach to extract average mRNA production times. An adaptation for dual-color labelling of the same transcript is also available20. However, by design, fluctuation analysis works best for stationary systems and long observation times. Extensions to non-stationary settings or more complex transcription models involving multi-state promoters or interactions between individual polymerases are challenging and currently rely on phenomenological corrections20,21. An alternative idea is to split the problem into two parts. In a first step, initiation times are reconstructed from a fluorescent trace by a deconvolution algorithm. In a second step, the recovered initiation time distribution is then compared to theoretical predictions of different promoter models22. While this form of analysis proved effective, it requires reliable extraction of initiation time sequences which may not be possible for more irregular signals. Bayesian inference in combination with stochastic process models of transcription provides a principled framework to extract information from individual single-site traces. Due to the high computational demands, studies in this direction have been limited to simplified models where elongation and termination are treated as deterministic processes23. More complex models with stochastic elongation and termination have so far only been used within moment-based or simulation-based inference frameworks24,25.

In this work, we use a kinetic model with stochastic treatment of the main transcription steps and develop a hierarchical Bayesian framework that performs joint inference on a collection of traces. The hierarchical approach allows one to jointly infer cycle-independent parameters shared by all cells and cycle-dependent parameters shared by cells within the same time window. This improves accuracy significantly compared to inferring data from individual traces and then pooling the results. We use this approach to investigate dynamic changes in the kinetics of transcription for CUP1 promoter in Saccharomyces cerevisiae. Previously, CUP1 has been shown to undergo a slow cycle of transcriptional activity with variable transcriptional output on a timescale of minutes in response to a heavy metal stressor26. Within this slow cycle, fast bursts of transcription on the scale of seconds regulated by fast interdependent cycling of transcription activator and chromatin remodeler were inferred from smFISH modeling27. In this work, we investigated the first period of the slow cycle by monitoring transcription sites in live cell using the stem loop approach and Bayesian inference. To account for the non-stationary setting, we split the cycle into short windows with higher frame rate and used the hierarchical Bayesian inference framework. By employing stochastic variational inference28, the method can handle datasets consisting of several thousand traces. Model comparison of several candidate models reveals fast bursting of CUP1 on the scale of seconds, indicated by quasiperiodic transcription in individual cells through the slow cycle of bursting. This bursting on a faster timescale on the order of seconds is comparable in timescale with previously observed cycling of transcription activator on CUP1 promoters27. Our discovery supports the hypothesis of fast bursts of transcription activated by fast cycling of TF. Regulation of the CUP1 transcriptional output occurs most likely via modulation of the burst amplitude. In addition to parameter posterior distributions our method can recover via latent state inference based on stochastic filtering unobserved dynamic quantities such as initiation times and polymerase loading. We demonstrate that multiple polymerases may be loaded onto the same promoter during a burst. We also observe that the elongation speed of RNAP II varies, undergoing a slow cycle correlated with the slow cycle of transcription output. Our method reveals a delay between rise-and-fall patterns of observed bursts and the actual intervals of activity. While we demonstrated the method on non-stationary transcription data, our approach is applicable to a wide range of systems that can be modeled as a Markov jump process and can be straightforwardly adapted to other sources of heterogeneity. We provide a corresponding Python toolbox available at XXX.

Results

In continuous presence of Cu2+, CUP1 undergoes bursts of transcription

CUP1 encodes metallothionein protein Cup1 that protects the cells from heavy metal stress. CUP1 is present in 10 tandem copies per chromosome VIII in Saccharomyces cerevisiae26. In yeast cells activated with Cu2+, quantification of mature mRNA by RT-qPCR reveals oscillations in CUP1 mRNA transcriptional output in the cell population (Fig. 1a). This indicates that as described for several but not all systems, in continuous presence of an activator, CUP1 transcription occurs not continuously but in oscillations: periods of transcriptional activity are interspersed with periods of transcriptional silence (Fig. 1h). Moreover, transcription output is modulated through the cycle. As shown previously, these oscillations are not dependent on cell cycle26. In living cells, we can monitor nascent mRNA formation at CUP1 TS via the stem loop approach (Fig. 1b,c)14,15. A single ORF within the array of CUP1 genes was replaced in one chromosome of the diploid yeast by a reporter encoding PP7 stem loops, visualized by the PP7 phage coat protein (PCP) tagged with GFP as a single green spot. Signal to noise was optimized in this system by low-level expression of PCP-GFP under a constitutive promoter pSEC61 (see SI Appendix, Sec. S5.1). As the mRNA of the reporter contains only stem loops, it is not translated due to abundant stop codons, and thus, no protein is produced. Reporter is controlled by a natural pCUP1 within the natural CUP1 array; thus, expression of the reporter characterizes initiation from the CUP1 promoters. However, the reporter sequence is different from natural CUP1 ORF, and the transcript is longer. Thus, the production time for this reporter may not reflect the production of the wild-type CUP1 ORF sequence.

Figure 1. Engineering, visualization, and characterization of the CUP1 transcription site.

Figure 1.

a Oscillations in CUP1 mRNA expression level quantified by RT-qPCR and normalized by expression of the housekeeping gene ACT1. Error bars represent standard error of the mean (SEM) from two biological replicates. b Schematic of the 14x PP7 reporter replacing one copy of the CUP1 ORF in chromosome VIII of S. cerevisiae. Two types of the stem loop sequence (red and green stems, purple loops, or bulges) are present in the reporter sequence, each stem is bound by a PP7-GFP dimer. Hence, 28 GFP molecules associate with single mRNA. The length of reporter transcript is 862 bp, however a few transcripts may be longer as there is no terminator immediately after 14xPP7. The expression of KANMX observed by RT-qPCR is constitutive and does not follow oscillations of 14X PP7 transcripts, which indicates that great majority of 14xPP7 transcripts stop before KANMX ORF (data not shown). c Example field view of cells with active TS containing nascent 14x PP7 reporter transcripts. Cells were imaged after 9 min of Cu2+ induction. Z-stack of the entire cell volume is presented as a maximum intensity projection for the GFP channel. Scale: 5μm. d TS in individual cell display independent spikes of activity. TS dynamics from 10 representative cells are presented for the first 21 min since Cu2+ addition, imaged with 1 min time-lapse. Maximum intensity projections of the entire z-volume of the cells were cropped by keeping the TS in the center of the 13×13 pixels area. e Illustration of the movie collection for datasets imaged with 3 s time-lapse. Movies on the same coverslip are started every 3 min and are recorded for 90 s (green blocks). The remaining time is used to move the microscope to the next position on the coverslip. By collecting several such movie sequences starting either 3 min or 4.5 min after induction, the whole first cycle is covered. f The fraction of cells in the population showing an active TS follows the oscillation pattern of expression of the whole CUP1 locus. Cells with TS were counted in independent fields imaged sequentially with 3 min time interval. Error bars represent standard error of the percentage (SEP). g TS in individual cells express more transcripts at the peak of the oscillation. In sequential 90 s movies collected after Cu2+ addition with 3 s time-lapse, the spot intensities were measured in the first frame of every movie. The graph shows the population average of these spot intensities indicating that transcriptional output of individual cells follows the oscillation pattern of the whole system. h Schematics of CUP1 multi-scale bursting. Top – long oscillations between transcription (ON1 state) and no-transcription phases (OFF1 state) on the population level (cf. a and d). Bottom – short transcription spikes (ON2 and OFF2 states) as observed for individual TS (f). Transcriptional amplitude of the spikes is defined by the average number of mRNA produced per spike (striped lines represent different nascent mRNA produced during spike). The gene switches from an inactive OFF2 state to an active ON2 state with rate kon, and back to OFF2 state with rate koff. As this work focuses on the first ON1 phase, subscripts are dropped from hereon.

By counting the fraction of the cells with active nascent reporter mRNA over time (responder ratio), we confirmed a time-modulated response to constant activation by Cu2+ (first oscillation is quantified in Fig. 1f). Interestingly, in the movies of TS the average brightness of the individual TS also changed depending on the time of activation, indicating that at the peak of oscillation each TS produces more mRNA (Fig. 1g). This implies that the time-varying transcriptional output on the population level observed by RT-qPCR cannot be explained by a change in the number of responding cells alone but that parameters of transcription are modulated over time. Therefore, CUP1 transcription is not in a steady state. Interestingly, individual TS observed through the first oscillation show independent dynamics, and the majority of the cells display several bursts of activity (Fig. 1d, exemplary cells no. 1, 3, 5, 7, 9). This suggest modulation on two scales: long-term oscillations on the order of several minutes governing transcriptional output of the cell population (slow cycle) and fast spikes (bursting) on the order of seconds regulating output of individual TS (Fig. 1h). The long-term oscillations are schematically represented by interspersed ON1 and OFF1 states. The short-term spikes are represented by interspersed ON2 and OFF2 states.

In this paper, we focused on the characterization of transcriptional activity within the first transcription phase of roughly 30 min. Due to photobleaching, such a long period cannot be imaged with our setup under a high frame rate. However, more importantly, the observed modulation of transcriptional output indicates that averaging through the long periods of observation may hide putative oscillatory variability in transcription parameters. We therefore split the cycle into small windows and performed parameter inference on all traces within the same window (cf. Fig. 4a). The first dataset was recorded with 12 s time-lapse for 300 s. We observed that a time-lapse of 12 s was too long for the relatively short gene template leading to problems separating elongation speed and termination rate. In addition, a duration of 5 min turned out too long to capture dynamic parameter changes adequately. Thus, we collected a second dataset with a duration of 90 s imaged with a time-lapse of 3 s (Fig. 1e) leading to a high frame rate covering of the first 30 min after induction. After quality control, we extracted fluorescent traces with a custom tracker based on three-dimensional Gaussian fitting of the transcription site combined with ideas from stochastic filtering (SI Appendix, Sec. S12). Thus, we obtained two datasets of traces, each containing more than 3000 traces, with varying starting times during the first transcription phase. A detailed description of the pre-processing and the collected datasets is provided in SI Appendix, Sec. S6. To quantitatively analyze these non-stationary datasets we developed a Bayesian inference approach based on a stochastic kinetic model of transcription discussed in the following sections.

Figure 4. Hierarchical Bayesian inference of the first slow cycle.

Figure 4.

a Illustration of the hierarchical model for the first slow cycle. Traces are grouped into windows according to the starting time with respect to Cu2+ induction. Parameters are split into local and global parameters. Local parameters are shared between traces of the same window, global parameters are shared between all traces. While for illustration purposes, only three windows are shown, the traces of the 3 s dataset are assigned to 19 windows. b A PGM representation of the hierarchical inference problem for a model with local initiation rate and all remaining parameters constant. c Illustration of the predictive score S. Synthetic traces are simulated from the fitted model. The intensity distribution of the simulated data is compared to the intensity distribution of the real data by the Wasserstein metric, a general similarity measure for probability distribution. The score C is obtained by averaging over all windows and the parameter posterior. d Two-fold model selection based on the posterior predictive score S based on the Wasserstein distance and approximate Bayes factor ΔELBO. For S smaller values indicate that simulated data is more similar to measured data (smaller is better), for ΔELBO higher values indicate a larger probability of the model compared to a reference model. The reference model, assuming no switching and only global parameters, is the same for all data points. e Mean intensity of the first frame of traces over the slow cycle. The dashed line indicates experimental results of the 3 s dataset (cf. Fig 1), the shaded regions corresponds to a 90 % credible interval of the posterior predictive distribution. f, g Gaussian kernel density representations of the SVI approximate parameter posteriors of the best ranking model. f Local initiation rate per time-window since induction. g Global kinetic parameters.

Stochastic kinetic model of transcription

Our model (Fig. 2) is based on the totally asymmetric exclusion process (TASEP), where the DNA template is partitioned into L sites30. Polymerases initiate at site 1 with rate ki. After initiation, elongation proceeds in discrete steps with rate ke. Termination occurs at site L with rate kt (Fig. 2a). Early termination is not permitted. While typical models of transcription assume independent polymerases, the TASEP model permits at most one polymerase per site allowing possible interactions in actively transcribed genes. Our model adopts this behavior for all but the termination site accounting for the possibility of transcripts residing at the TS after elongation. A typical progression of a single polymerase is shown in Fig. 2c.

Figure 2. Stochastic kinetic model of transcription.

Figure 2.

a Kinetic transcription model based on TASEP. The DNA template is coarse-grained and partitioned into sites of 120 nt corresponding roughly to twice the footprint of a stem loop. Therefore, each of the first sites is associated with two dual GFP. The promoter switches between a transcriptionally active state (green) and an inactive state (red) with rates kon and koff. In the active state, Polymerases initiate with rate ki, step along the lattice at rate ke and terminate with rate kt. b Illustration of a two-state promoter model switching between an inactive and an active state with rates kon and koff. These parameters implicitly define other bursting-related quantities such as the burst duration τon, the time between bursts τoff and the burst frequency fb. The burst amplitude Nb is defined as the number of initiation events per burst. c Illustration of a single polymerase initiating and progressing on the lattice. The three kinetic parameters of the TASEP model determine quantities such as termination time τt, elongation time τe and mRNA production time τp. The relation for expected elongation time is only valid when the polymerase density is low, in general there is now closed form expression available due to possible polymerase interactions. d Illustration of how to simulate synthetic traces. First, a trajectory from the telegraph-augmented TASEP model is created using the Gillespie algorithm. This typically produces a superposition of several polymerases. This is illustrated in form of a kymograph plot that shows the probability of a site to be occupied over time. From the full occupancy X(t) we can extract unobserved quantities of interest such as the polymerase loading Np(t), or initiation, elongation and production times of mRNAs (panel c) and the time between initiation events Δτi (panel b). To simulate the measured fluorescence intensity, we first extract the number of stem loops NS(t) by summing the contributions of all polymerases. The spot intensity I(t) is then formed by multiplication with intensity per GFP γ, addition of background levels b0 and b1 and exponential bleaching with rate λ. Finally, the continuous-time intensity is sampled at equidistant time points and convolved with multiplicative noise to obtain the simulated measurements Y(t).

The times between transitions are exponentially distributed. Thus, the model is a continuous-time Markov chain (CTMC) describing the stochastic movement of RNAP II on the DNA template by the occupancy vector X(t)=X1(t),,XL(t). In order to model bursting, we introduce an additional promoter site X0(t) that switches between an active and an inactive state with rates kon and koff (Fig. 2b). In the extended model, initiation at site X1(t) is only allowed if the promoter site X0(t) is in the active state. This behavior is akin to the random telegraph model31 often used for the analysis of RNA counting data9,10,17,32. Within this model, transcription dynamics are governed by the vector of parameters θ=kon,koff,ki,ke,kt. The extended model is still a CTMC, therefore samples can be generated by the Gillespie algorithm. We denote such a full sample path as X[0,T]. The transient probability p(x,t)Pr(X(t)=x) satisfies a master equation

ddtpx,t=xQx,xθpx,t (1)

where the sum is over all possible configurations of the lattice and Q is the transition function of the process parametrized by θ. Details on how to construct Q are given in SI Appendix, Sec. S8. 1.

In order to compare simulated to measured traces, the occupancy X(t) has to be converted to predicted intensity (Fig. 2d). As the positions of stem loops are known, one can compute the number of GFPs attached to every nascent mRNA from the lattice occupancy X(t). To convert the number of GFPs to predicted intensity, one requires a scaling factor γ and the bleaching rate λ and background variables b0,b1. The predicted intensity is then sampled at measurement times t1,,tn and passed through a multiplicative noise model to obtain a synthetic trace Y=Y1,,Yn. For later use, we combine all parameters related to the observation model into the vector ω. A detailed description of the observation model can be found in SI Appendix, Sec. S8.2.

Calibration of the observation model

The parameters of the observation model have a major impact on the simulated measurements. Since this can cause issues with parameter identifiability, we perform independent calibration measurements for the scaling factor γ and the bleaching rate λ. To estimate the scaling factor, we engineered three yeast strains with sub-cellular structures attached to a known number of GFP molecules (see Methods - Yeast strains and plasmids) and measured spot intensities for a number of cells. As shown in Fig. 3a, the intensity distributions of the three constructs roughly follow a linear shape. Next, we combined the observation model for intensity prediction with vague priors and computed a posterior distribution using Monte Carlo. A similar Bayesian calibration was applied to time-lapse data of a static construct to determine the bleaching rate. An in-depth description is given in SI Appendix, Sec. S9. The posterior distributions from the calibration measurements where used as priors for the time trace inference.

Figure 3. Inference from single cell traces.

Figure 3.

a Intensity distribution of the three strains used for calibration over the expected number of GFPs. Darker regions indicate a higher density. The red line indicates a linear regression fit to the data. b The autocorrelation of a single TS is generated by multiplying a signal shifted in time by a delay, τ, and multiplying it with the original signal and integrating. By repeating this process over many values of τ, the ACF function is generated for a single trace. Autocorrelation function averaged for many traces can reveal temporal characteristics of the system. c Left panel: Average autocorrelation of 1000 simulated trajectories for a time-lapse of 3 s observed over 90 s and 900 s showing an ideal autocorrelation function that could be analyzed to extract transcription parameters. Right panel: Average autocorrelation of CUP1 transcription sites from 282 cells imaged for 90 s with 3 s time-lapse after 9 min of Cu2+ activation compared to simulated results. d Illustration of Bayesian inference for single cell traces. From a measured trace Y(t), we obtain the posterior distributions of model parameters θ, and observation parameters ω. Note that this is an illustration as θ and ω are vectors of multiple parameters. In addition, latent state inference recovers the most likely sample paths of the unobserved lattice process X(t) and the promoter state X0(t). From these traces, other dynamic quantities of interest such as the polymerases loading Np(t) and the number of active stem loops Ns(t) can be extracted. e Probabilistic graphical model representation of the single trace inference problem. Arrows indicate conditional relationships in the data generating process, grey color indicates that the corresponding node is observed29. The process X(t) is sampled at times t1,,tn and observed via noisy measurement Y=Y1,,Yn. As illustrated by the nodes Xti+h,X(t) is continuous in time. f Graphical model of the joint inference problem with pooling of m traces to infer shared parameters. A plate indicates multiple conditionally independent variables given the parent29. Every pair X[0,T](i),Y(i) is of the form shown in e. g Gaussian kernel density representation of the results of joint Bayesian inference of the initiation rate on an increasing number of pooled simulated traces. As the number of traces increases, the posterior concentrates around the true value used to generate the data.

Autocorrelation analysis is not applicable

A traditional approach to estimate kinetic parameters of live transcription sites is to use the autocorrelation function (ACF) of the intensity traces16,19,21 (Fig. 3b). However, ACF analysis is designed for long traces in a steady state setting. To test the applicability of this analysis to CUP1 transcription, we analyzed traces for a single time window starting 9 min after the addition of Cu2+ (Methods) where TS activity is close to its peak (cf. Fig. 1g). Average ACF of real data, collected for 90 s total, is difficult to interpret and cannot be used for extracting kinetic parameters (Fig. 3, right panel). As the ACF remains declining even at the longest delays, 90 s appear to be an insufficient amount of time for correlation analysis to be used. In fact, our simulations suggested that a TS has to be observed for at least 15 min (Fig. 3c, left panel). Although it may be possible to obtain an interpretable ACF by collecting over a longer time period, this would also lead to averaging over time-points from different parts of the slow cycle imposing the steady state assumption.

Pooled Bayesian inference identifies model parameters

The goal of Bayesian inference is essentially to invert the data generation process depicted in Fig. 2. Given a measured trace y=y1,,yn we want to reconstruct kinetic parameters θ, observation parameters ω (parameter inference). In addition, a stochastic process model also allows to reconstruct the most probable lattice configurations over time x[0,T] given the data (state inference, Fig. 3d). More formally, this corresponds to computing the joint posterior distribution pθ,ω,x[0,T]y1,,yn. A probabilistic graphical model (PGM) representation of the single trace inference problem is given in Fig. 3e.

Sampling from this posterior distribution using Markov chain Monte Carlo (MCMC) involves evaluating the marginal data likelihood py1,,ynθ,ω which in turn requires integration of the master equation (1) for many different configurations of model parameters θ and observation parameters ω. We developed an efficient approach to evaluate the marginal likelihood and its gradient in parallel for multiple traces which allows to apply efficient gradient-based inference algorithms such as Hamiltonian Monte Carlo (HMC) and stochastic variational inference (SVI) (SI Appendix, Sec. S10).

Inference from single traces is often challenging due to issues with parameter identifiability33. To test identifiability for our setup, we simulated a set of synthetic traces following the steps illustrated in Fig. 2 using a fixed parameter configuration (SI Appendix, Table S14). Indeed, Bayesian inference of a single trace essentially reproduces the prior distribution indicating that a single trace does not contain sufficient information to identify the system (Fig. 3g, left panel). Pooling multiple traces and performing inference jointly (Fig. 3f) can improve the results substantially. Indeed, inference accuracy increases with the number of pooled cell, implying the system is identifiable given sufficient data (Fig. 3g, right panel). We stress that the pooling performed in joint Bayesian inference is a principled approach and more reliable compared to performing inference on single traces and then comparing the posterior means. An extended plot showing the posterior of more parameters is provided in SI Appendix, Fig. S19.

Hierarchical Bayesian model captures slow cycle of transcription

In order to analyze the dependence of the parameters on the slow cycle, we split the dataset into subgroups pooling all traces that share the same time window since induction. As individual movies are short (90 s), we can assume constant parameters during individual windows leading to a joint Bayesian inference problem (cf. Fig. 3f,g) for each window. This straightforward approach has two problems. First, in some of the windows the number of traces is quite small (SI Appendix, Table S13) leading to unreliable inference. Second, not all of the parameters are expected to depend on the slow cycle. To take full account of the pooling, we developed a collection of mixed hierarchical models where some parameters are shared locally between traces in the same window and others are shared globally between all traces (Fig. 4a). A PGM representation of one such model with local initiation rate is shown in Fig 4b. For each of these models, we also included a version with a constitutive promoter (i.e. X0(t)=1 for all times) to investigate if the data supports the bursting hypothesis. While increasing inference accuracy, a hierarchical model of 3000 traces was too computationally expensive for MCMC, as every step of MCMC requires evaluation of the log-likelihood of all traces in the dataset. Inference of the full dataset was therefore done by SVI28.

Allowing local variability of different parameter combinations gives rise to a collection of models. Bayesian model selection based on the marginal likelihood provides a systematic approach of finding the most likely model given the data and automatically penalizes models with too many free parameters34. As the marginal likelihood is costly to compute, we used the evidence lower bound (ELBO), that is obtained by variational inference, as an approximation. To prevent possible issues with model mismatch that cannot be detected from the marginal likelihood alone, we designed an additional metric based on the posterior predictive35 that relies on the Wasserstein distance between predicted and measured cycle (Fig. 4c and Methods — Model selection).The results of the model selection are shown in Fig 4d. A small value of the Wasserstein distance indicates that that data simulated from the learned model (posterior predictive) agrees well with measured data. In contrast, a higher value of ΔELBO suggest that the learned model is more probable than other models, given the data. Consequently, the overall best models are found in the lower right region of the graph. We also observe that most of the investigated models are close to a line in the two-dimensional evaluation space, indicating consistency of the two scores. A full account of all tested models on both datasets is given in SI Appendix, Table S15. We observe that a time-dependent elongation rate alone cannot explain the cycle. Additionally, the graph suggests that a switching promoter is more probable than a constitutive promoter. The best explanation of the slow cycle is a combination of time-dependent initiation rate with time-independent promoter switching rates. This implies that for CUP1 the cyclic response is likely regulated by burst amplitude rather than burst frequency. The best ranking models are able to reproduce the intensity pattern over the cycle fairly well (Fig. 4e). Corresponding parameter posteriors of the best ranking model are shown in Fig. 4f,g. The posterior initiation rate (Fig. 4f) closely follows the cycle pattern known from responder ratio and spot intensity distribution (cf. Fig. 1f, g). Interestingly, the variance of the posterior is larger close to the cycle peak.

Previously we demonstrated that heterogeneity in CUP1 transcriptional response is exacerbated by depletion of chromatin remodeler RSC27. This implies that this heterogeneity is caused by variable accessibility of the binding sites in the promoter to transcription activator Ace1p. Thus, we propose gradual changes in promoter accessibility through the slow cycle of transcription. Current observation is compatible with the following hypothesis - the rate of increase in accessibility is the same for all cells, but the maximal opening of the binding sites at the peak of the cycle may vary, providing heterogeneity in transcriptional output.

Posterior distributions for global model parameters are given in Fig. 4f. The full graph including all observation parameters can be found in SI Appendix, Fig. S20. As distributions of kon and koff are close, the promoter seems to be active approximately half of the time with an active time of ≈25 s.

Stochastic filtering reveals time dependence of unobserved quantities

The true power of a continuous-time stochastic process is the possibility for latent state inference (cf. Fig. 3d). Given a measured trace, we can reconstruct the trajectories of the latent stochastic process that have most likely produced the observations by the backward filtering forward sampling approach (Methods - Bayesian inference). Results for three exemplary traces are shown in Fig. 5a. These traces are selected to reflect typical cases present in the dataset: continuous expression, a single burst and a double burst. The posterior predictive plots in the top panel demonstrate that our model is capable of explaining qualitatively different traces of the dataset. The reconstructed distributions of the number of polymerase and stem loops over time closely follows the shape of the observations (Fig. 5a, second and third row). The polymerase distribution indicates that CUP1 is a highly transcribed gene that binds multiple polymerases simultaneously. This observation agrees with earlier predictions from smFISH modeling27. From the kymograph plot (fourth row), we can observe the movements of individual polymerases along the lattice. The bottom row demonstrates that the model is capable of reconstructing the promoter activity. Importantly, the active intervals are shifted in time with respect to the rise-and-fall patterns of the measurements and the inactive phase is identified even though the intensity does not drop to base level. This is an advantage of a model-based method compared to approaches that identify bursts directly from the traces. As shown in the kymograph, several polymerases may initiate during a burst. From the posterior path distribution, arbitrary path statistics such as the number of initiation events in a given time interval or the distribution of time between initiation events can be computed. By comparing such statistics of posterior paths over the cycle, we can investigate the time dependence of unobserved quantities. We stress that this is different from analyzing sample paths simulated from the fitted model. As the model is a Markov process, forward simulations will always exhibit exponential inter-event time distributions between fundamental events. In contrast the posterior process is non-homogenous, and can recover non-exponential inter-event times if the data provides evidence accordingly. As one path statistic of interest we investigated local elongation times by which we mean the average dwell time of the polymerases on individual sites as it progresses the lattice (cf. Fig. 2c). A graphical representation of the elongation times over the lattice and over the cycle is shown in Fig. 5b. Interestingly, polymerases tend to progress slower during the peak of the cycle. As during the peak of the cycle, the number of transcribing polymerases is also higher, this indicates a higher polymerase density is associated with lower average speed per polymerase. This could be caused, e.g. by steric hinderance from tightly spaced polymerase or competition for elongation factors.

Figure 5. State inference.

Figure 5.

a Posterior path distributions for three characteristic traces from the 3 s dataset. The top panel shows the measured data Y(t) (gray dots) compared to the corresponding posterior predictive mean (red line) with 90 % credible interval (red shaded). Lower panels show the posterior distribution over time of the polymerase loading Np(t) including termination site, the number of active stem loops Ns(t), kymograph of actively elongating polymerases and promoter activity X0(t). Here, darker colors indicate higher probability. The three selected traces represent typical cases in the dataset: a spot that is already active when imaging starts and stays active during the interval (a), a single rise followed by a decay (b), and a more dynamic site with multiple separated phases of activity. b Average local elongation speeds per site and over the slow cycle. Elongation speeds were computed by simulating traces from the state posterior as shown in panel a. For these traces, we extracted the times polymerases remained at each individual sites. Result were collected for all traces of a window and averaged over the posterior paths. c-h Initiation-related summary statistics over the first slow cycle. Statistics were computed by simulating a posterior sample (one smoothing sample for every trace in the dataset) and then pooling over windows. The results were then averaged over the parameter posterior. Red dots indicate the posterior mean, error bars correspond to 90% credible interval. c Average number of initiation events per trace. d Average fraction of time spent in the active promoter state. e Average number of bursts per trace. f Average duration of a burst. g Average number of initiation events per burst (burst amplitude). h Coefficient of variation of the distribution of times between initiation events.

We investigated a number of selected path statistics related to initiation dynamics (Fig. 5ch). The number of initiation events per trace closely follows the cycle. This is expected as the number of initiation events is directly related to the initiation rate ki (Fig. 5c). In contrast, the mean activity of the promoter shows an inverse relation which seems counter intuitive (Fig. 5d). A possible explanation is the small number of initiation events away from the peak. When there are few events, it is not possible to reliably distinguish bursty and non-bursty behavior. Therefore, inference favors the simpler explanation of constitutive expression with small rate. An alternative explanation would be a multi-state promoter that is in a leaky baseline state away from the peak and switches to a more active regime during the cycle peak. The average burst duration (Fig. 5f) points in a similar direction: In the later part of the cycle, burst durations are longer with smaller number of initiation events per burst. The number of bursts per trace does not show a clear dependence on time since induction (Fig. 5e). The burst amplitude follows the cycle in a slightly shifted form (Fig. 5g). Interestingly, the peak of the cycle shows shorter burst times combined with higher burst amplitude meaning that time between initiation events is much shorter during the cycle peak. This suggest that efficiency of the initiation machinery is one target of regulation during the slow cycle. Finally, we studied the distribution of times between initiation events (Fig. 5g) by means of the coefficient of variation (CV). A CV of one corresponds to an exponential distribution and suggest constitutive expression. ACV larger than one indicates a heavy-tailed distribution which suggest burstiness. Indeed, in the beginning and in the end, the coefficient of variation of the inter-event time distribution is close to one suggesting exponential behavior while close to the peak we observe larger values indicating burstiness. Note that for some quantities the uncertainty is significantly larger away from the peak. This is explained by the smaller number of selected traces in the corresponding windows (cf. SI Appendix, Table S13).

Methods

Yeast strains and plasmids

We utilized haploid strains of Saccharomyces cerevisiae (BY4742 and BY4741) for live transcript analysis. The strains were engineered to include 14X PP7 binding sites and a MET3 integrative vector for expressing PP7-NLS-GFP. For photobleaching correction and GFP calibration, we used strains YTK541, YTK1231, and YTK1268, each containing a known number of GFP molecules per locus. Further details are provided in the SI Appendix, Sec. S5.1.

Media and growth conditions

YTK1799 cells were grown in CSM-URA media under specific conditions for live transcript analysis (SI Appendix, Sec. S5.2). The growth protocol involved a series of inoculations, refrigeration, and daily inoculations to maintain consistent results. Cells were harvested and placed under a CSM-URA agarose pad for imaging. Strains YTK541, YTK1231, and YTK1268 were grown under similar conditions for photobleaching correction and GFP calibration.

Quantitative RT-PCR (RT-qPCR)

RNA was extracted from samples at specified time points post-Cu induction. The extracted RNA was used to prepare cDNA, which was then used for quantitative real-time PCR (qPCR). The expression of the housekeeping gene ACT1 was used for normalization. The process was repeated at least twice, with qPCR performed in duplicates for each experiment.

Microscope settings and imaging conditions

Live cells were imaged using a DeltaVIsion Elite Microscope under specific conditions (SI Appendix, Sec. S5.4). 3D time-lapse movies were acquired at room temperature using a specific imaging regime to cover the entire slow cycle. The same conditions were used for imaging strain YTK1231 for photobleaching correction.

Trace extraction

We developed a custom 3D method based on sequential filtering to track fluorescence spots and quantify fluorescence levels. The method involves a state estimation using an approximate recursive filter based on the Laplace approximation and includes a binary state to take account of vanishing spots. Details are provided in SI Appendix, Sec. S6.2 and Sec. S12.

Calibration measurements

We performed calibration measurements using strains YTK541, YTK1231, and YTK1268 containing a known number of GFP per locus. Spot intensities were measured as for the live cell experiments but only for a single time point. Assuming a linear relationship between number of GFP and intensity, we extracted estimates of the scaling factor by Bayesian log-linear regression. Similarly, an independent estimate for the bleaching rate was obtained by recording videos of strain YTK1231 and assuming an exponential decay of spot brightness. A detailed description is given in SI Appendix, Sec. S9.

Stochastic modeling

The model is an instance of a Markov jump process that satisfies the master equation 1. The system’s state represents the occupation of lattice sites by RNAP II and the transition matrix is defined by initiation, elongation and termination along with the exclusion principle (SI Appendix, Sec. S8.1). By splitting the transition matrix into contributions corresponding to individual parameters, the master equation can be solved efficiently for a given parameter vector for fairly large state spaces by the Krylov subspace approximation for matrix exponentials41,42. The lattice state is converted to fluorescence intensity by assuming an affine-linear dependence on the number of formed stem loops and a multiplicative noise model with a correction for small intensities. Details are provided in SI Appendix, Sec. S8.2.

Autocorelation analysis

The average autocorrelation functions (ACF) were calculated as described in21 using the intensities of all TS tracks over time. The ACF was calculated for 3s intervals and 12s intervals after correcting the individual traces for photobleaching.

Bayesian inference

The joint posterior pθ,ω,x[0,T]y1,,yn of model parameters θ, observation parameters ω and latent lattice trajectory x[0,T] was split into marginal parameter posterior p(θ)p(ω)p(yθ,ω) and the conditional state posterior px[0,T]θ,ω,y1,,yn. Sampling from the marginal parameter posterior was realized by HMC and SVI using the probabilistic programming language Pyro45. To integrate the stochastic process model with Pyro, we developed a procedure to evaluate to evaluate the marginal data likelihood logp(yθ,ω) by stochastic filtering. Combined with a modified backward filter to compute the gradients, we designed a differentiable inference procedure applicable to general Markov jump process models. To recover the full posterior, we used backward filtering forward sampling approach to generate posterior paths from the conditional state posterior px[0,T]θ,ω,y1,,yn. For a more comprehensive description, we refer to SI Appendix, Sec. S10. The corresponding code is available as a Python package at XXX.48

Model selection

We used a two-fold approach to Bayesian model selection. The evidence lower bound obtained from variational inference was used to approximate the marginal likelihood of different models and corresponding Bayes factors. In order to check for model mismatch, we used an additional metric based on the Wasserstein distance of the posterior predictive distribution and the empirical data distribution. The details are given in SI Appendix, Sec. S11.

Supplementary Material

Supplement 1

Acknowledgements

C.W. and H.K. acknowledge support by the European Research Council (ERC) within the CONSYN project, grant agreement number 773196. T.S.K. and G.D.M. were supported by Intramural Research Program of National Institutes of Health (NCI, CCR). The authors gratefully acknowledge the computing time provided to them on the high-performance computer Lichtenberg at the NHR Centers NHR4CES at TU Darmstadt. We would like to thank Dr.Tineke Lenstra for the plasmid pTL031, and Dr. Daniel Larson for the discussion of autocorrelation analysis.

Footnotes

Additional information

The authors declare no competing interests.

References

  • 1.Sanchez A. & Golding I. Genetic Determinants and Cellular Constraints in Noisy Gene Expression. Science 342, 1188–1193, DOI: 10.1126/science.1242975 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Xu H., Skinner S. O., Sokac A. M. & Golding I. Stochastic kinetics of nascent RNA. Phys. Rev. Lett. 117, 128101, DOI: 10.1103/PhysRevLett.117.128101 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Shandilya J. & Roberts S. G. The transcription cycle in eukaryotes: From productive initiation to RNA polymerase II recycling. Biochimica et Biophys. Acta (BBA) - Gene Regul. Mech. 1819, 391–400, DOI: 10.1016/j.bbagrm.2012.01.010 (2012). [DOI] [PubMed] [Google Scholar]
  • 4.Lenstra T. L., Rodriguez J., Chen H. & Larson D. R. Transcription Dynamics in Living Cells. Annu. Rev. Biophys. 45, 25–47, DOI: 10.1146/annurev-biophys-062215-010838 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Golding I., Paulsson J., Zawilski S. M. & Cox E. C. Real-Time Kinetics of Gene Activity in Individual Bacteria. Cell 123, 1025–1036, DOI: 10.1016/j.cell.2005.09.031 (2005). [DOI] [PubMed] [Google Scholar]
  • 6.Tantale K. et al. A single-molecule view of transcription reveals convoys of RNA polymerases and multi-scale bursting. Nat. Commun. 7, 12248, DOI: 10.1038/ncomms12248 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Dar R. D. et al. Transcriptional burst frequency and burst size are equally modulated across the human genome. Proc. Natl. Acad. Sci. 109, 17454–17459, DOI: 10.1073/pnas.1213530109 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Molina N. et al. Stimulus-induced modulation of transcriptional bursting in a single mammalian gene. Proc. Natl. Acad. Sci. 110, 20563–20568, DOI: 10.1073/pnas.1312310110 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Senecal A. et al. Transcription Factors Modulate c-Fos Transcriptional Bursts. Cell Reports 8, 75–83, DOI: 10.1016/j.celrep.2014.05.053 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Raj A., Peskin C. S., Tranchina D., Vargas D. Y. & Tyagi S. Stochastic mRNA Synthesis in Mammalian Cells. PLOS Biol. 4, e309, DOI: 10.1371/journal.pbio.0040309 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Zenklusen D., Larson D. R. & Singer R. H. Single-RNA counting reveals alternative modes of gene expression in yeast. Nat. Struct. & Mol. Biol. 15, 1263–1271, DOI: 10.1038/nsmb.1514 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Zhang J. & Zhou T. Promoter-mediated Transcriptional Dynamics. Biophys. J. 106, 479–488, DOI: 10.1016/j.bpj.2013.12.011 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Rieckh G. & Tkačik G. Noise and Information Transmission in Promoters with Multiple Internal States. Biophys. J. 106, 1194–1204, DOI: 10.1016/j.bpj.2014.01.014 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Bertrand E. et al. Localization of ASH1 mRNA Particles in Living Yeast. Mol. Cell 2, 437–445, DOI: 10.1016/S1097-2765(00)80143-4 (1998). [DOI] [PubMed] [Google Scholar]
  • 15.Chao J.A., Patskovsky Y., Almo S. C. & Singer R. H. Structural basis for the coevolution of a viral RNA–protein complex. Nat. Struct. & Mol. Biol. 15, 103–105, DOI: 10.1038/nsmb1327 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Larson D. R., Zenklusen D., Wu B., Chao J. A. & Singer R. H. Real-Time Observation of Transcription Initiation and Elongation on an Endogenous Yeast Gene. Science 332, 475, DOI: 10.1126/science.1202142 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Lenstra T. L., Coulon A., Chow C. C. & Larson D. R. Single-Molecule Imaging Reveals a Switch between Spurious and Functional ncRNA Transcription. Mol. Cell 60, 597–610, DOI: 10.1016/j.molcel.2015.09.028 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Ferraro T. et al. New methods to image transcription in living fly embryos: The insights so far, and the prospects. WIREs Dev. Biol. 5, 296–310, DOI: 10.1002/wdev.221 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Ferguson M. L. & Larson D. R. Measuring Transcription Dynamics in Living Cells Using Fluctuation Analysis. In Shav-Tal Y. (ed.) Imaging Gene Expression: Methods and Protocols, 47–60, DOI: 10.1007/978-1-62703-526-2_4 (Humana Press, Totowa, NJ, 2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Coulon A. et al. Kinetic competition during the transcription cycle results in stochastic RNA processing. eLife 3, e03939, DOI: 10.7554/eLife.03939 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Coulon A. & Larson D. Chapter Seven - Fluctuation Analysis: Dissecting Transcriptional Kinetics with Signal Theory. In Filonov G. S. & Jaffrey S. R. (eds.) Methods in Enzymology, vol. 572, 159–191, DOI: 10.1016/bs.mie.2016.03.017 (Academic Press, 2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Tantale K. et al. Stochastic pausing at latent HIV-1 promoters generates transcriptional bursting. Nat. Commun. 12, 4503, DOI: 10.1038/s41467-021-24462-5 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Liu J. et al. Real-time single-cell characterization of the eukaryotic transcription cycle reveals correlations between RNA initiation, elongation, and cleavage. PLOS Comput. Biol. 17, e1008999, DOI: 10.1371/journal.pcbi.1008999 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Filatova T., Popovic N. & Grima R. Statistics of Nascent and Mature RNA Fluctuations in a Stochastic Model of Transcriptional Initiation, Elongation, Pausing, and Termination. Bull. Math. Biol. 83, 3, DOI: 10.1007/s11538-020-00827-7 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Gorin G., Wang M., Golding I. & Xu H. Stochastic simulation and statistical inference platform for visualization and estimation of transcriptional kinetics. PLOS ONE 15, e0230736, DOI: 10.1371/journal.pone.0230736 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Karpova T. S. et al. Concurrent Fast and Slow Cycling of a Transcriptional Activator at an Endogenous Promoter. Science 319, 466–469, DOI: 10.1126/science.1150559 (2008). [DOI] [PubMed] [Google Scholar]
  • 27.Mehta G. D. et al. Single-Molecule Analysis Reveals Linked Cycles of RSC Chromatin Remodeling and Ace1p Transcription Factor Binding in Yeast. Mol. Cell 72, 875–887.e9, DOI: 10.1016/j.molcel.2018.09.009 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Blei D. M., Kucukelbir A. & McAuliffe J. D. Variational Inference: A Review for Statisticians. J. Am. Stat. Assoc. 112, 859–877, DOI: 10.1080/01621459.2017.1285773 (2017). [DOI] [Google Scholar]
  • 29.Koller D. & Friedman N. . Adaptive Computation and Machine Learning (MIT Press, 2009). [Google Scholar]
  • 30.MacDonald C. T., Gibbs J. H. & Pipkin A. C. Kinetics of biopolymerization on nucleic acid templates. Biopolymers 6, 1–25, DOI: 10.1002/bip.1968.360060102 (1968). [DOI] [PubMed] [Google Scholar]
  • 31.Peccoud J. & Ycart B. Markovian Modeling of Gene-Product Synthesis. Theor. Popul. Biol. 48, 222–234, DOI: 10.1006/tpbi.1995.1027 (1995). [DOI] [Google Scholar]
  • 32.Zoller B., Little S. C. & Gregor T. Diverse Spatial Expression Patterns Emerge from Unified Kinetics of Transcriptional Bursting. Cell 175, 835–847.e25, DOI: 10.1016/j.cell.2018.09.056 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Wieland F.-G., Hauber A. L., Rosenblatt M., Tönsing C. & Timmer J. On structural and practical identifiability. Curr. Opin. Syst. Biol. 25, 60–69, DOI: 10.1016/j.coisb.2021.03.005 (2021). [DOI] [Google Scholar]
  • 34.Wasserman L. Bayesian Model Selection and Model Averaging. J. Math. Psychol. 44, 92–107, DOI: 10.1006/jmps.1999.1278 (2000). [DOI] [PubMed] [Google Scholar]
  • 35.Gelman A. et al. Bayesian Data Analysis, Third Edition. Chapman & Hall/CRC Texts in Statistical Science; (Taylor & Francis, 2013). [Google Scholar]
  • 36.Lenstra T. L. & Larson D. R. Single-Molecule mRNA Detection in Live Yeast. Curr. Protoc. Mol. Biol. 113, 14.24.1–14.24.15, DOI: 10.1002/0471142727.mb1424s113 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Janke C. et al. A versatile toolbox for PCR-based tagging of yeast genes: New fluorescent proteins, more markers and promoter substitution cassettes. Yeast 21, 947–962, DOI: 10.1002/yea.1142 (2004). [DOI] [PubMed] [Google Scholar]
  • 38.Belmont A. S. & Straight A. F. In vivo visualization of chromosomes using lac operator-repressor binding. Trends Cell Biol. 8, 121–124, DOI: 10.1016/S0962-8924(97)01211-7 (1998). [DOI] [PubMed] [Google Scholar]
  • 39.Bullitt E., Rout M. P., Kilmartin J. V. & Akey C. W. The Yeast Spindle Pole Body Is Assembled around a Central Crystal of Spc42p. Cell 89, 1077–1086, DOI: 10.1016/S0092-8674(00)80295-0 (1997). [DOI] [PubMed] [Google Scholar]
  • 40.Drennan A. C. et al. Structure and function of Spc42 coiled-coils in yeast centrosome assembly and duplication. Mol. Biol. Cell 30, 1505–1522, DOI: 10.1091/mbc.E19-03-0167 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Moler C. & Van Loan C. Nineteen Dubious Ways to Compute the Exponential of a Matrix, Twenty-Five Years Later. SIAM Rev. 45, 3–49, DOI: 10.1137/S00361445024180 (2003). [DOI] [Google Scholar]
  • 42.Saad Y. Analysis of Some Krylov Subspace Approximations to the Matrix Exponential Operator. SIAM J. on Numer. Analysis 29, 209–228, DOI: 10.1137/0729014 (1992). [DOI] [Google Scholar]
  • 43.Särkkä S. Bayesian Filtering and Smoothing, vol. 3 of Institute of Mathematical Statistics Textbooks; (Cambridge, 2013). [Google Scholar]
  • 44.Anderson D. F. A modified next reaction method for simulating chemical systems with time dependent propensities and delays. The J. Chem. Phys. 127, 214107, DOI: 10.1063/1.2799998 (2007). [DOI] [PubMed] [Google Scholar]
  • 45.Bingham E. et al. Pyro: Deep universal probabilistic programming. J. Mach. Learn. Res. 20, 1–6 (2019). [Google Scholar]
  • 46.Huang L. et al. Reconstructing Dynamic Molecular States from Single-Cell Time Series. J. The Royal Soc. Interface 13, DOI: 10.1098/rsif.2016.0533 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Neal R. MCMC using Hamiltonian dynamics. In Brooks S., Gelman A., Jones G. L. & Meng X.-L. (eds.) Handbook of Markov Chain Monte Carlo (2012). [Google Scholar]
  • 48.Ghahramani Z. Probabilistic machine learning and artificial intelligence. Nature 521, 452–459, DOI: 10.1038/nature14541 (2015). [DOI] [PubMed] [Google Scholar]
  • 49.Llorente F., Martino L., Delgado D. & López-Santiago J. Marginal Likelihood Computation for Model Selection and Hypothesis Testing: An Extensive Review. SIAM Rev. 65, 3–58, DOI: 10.1137/20M1310849 (2023). [DOI] [Google Scholar]
  • 50.Kolouri S., Park S., Thorpe M., Slepčev D. & Rohde G. K. Optimal Mass Transport: Signal processing and machine-learning applications. IEEE signal processing magazine 34, 43–59, DOI: 10.1109/MSP.2017.2695801 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1

Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES