Abstract
Transcriptional pausing is highly regulated by the template DNA and nascent transcript sequences. Here, we propose a thermodynamic model of transcriptional pausing, based on the thermal energy of transcription bubbles and nascent RNA structures, to describe the kinetics of the reaction pathways between active translocation, elemental, backtracked, and hairpin-stabilized pauses. The model readily predicts experimentally detected pauses in high-resolution optical tweezers measurements of transcription. Unlike other models, it also predicts the effect of tension and the GreA transcription factor on pausing.
INTRODUCTION
During bacterial transcription, there are frequent pauses in the forward translocation of RNA polymerase. Pauses observed in vivo and in vitro vary in durations from milliseconds to minutes [1, 2]. Short pauses, which typically last less than one second and are referred to as elemental pauses, are proposed to be intermediate precursors of long pauses [3]. Long pauses, which may last tens of seconds, are classified as Class I ‘hairpin-stabilized’ and Class II ‘backtracked’ signals and have been structurally characterized and mechanistically explored [4, 5]. They are thought to be regulated by the sequence of the DNA template, the structure of the nascent transcript, and the availability of transcription factors [6–8].
Previous models of the kinetics of backtracked pauses predict some types of experimentally detected pauses [9–12] but fail to predict other types of pausing and pause duration, and do not treat external tension or transcription factors. Here, we propose a model based on our current biochemical understanding of transcription pausing mechanisms and optimize the parameters of the model with high-resolution transcription data. This purely thermodynamic model provides a mechanistic explanation of the effect of external tension and transcription factors, and after refinement accurately simulates experimentally observed pause sites and durations. Furthermore, the model accurately predicts transcription dynamics on unfamiliar DNA sequences not used for refinement and is readily extendable to incorporate the initiation and termination stages.
MODEL DESCRIPTION
Ternary Transcription Elongation Complex (TEC) Configuration And State Transition
A TEC is described by a transcription position (m) and state (n). The position along the template (m) indicates the length of the RNA transcript. TEC can be in one of two translocation states: active (n=0) or backtracked (n<0), or in a conformationally distinct hairpin-stabilized state (hsp). The interconnection among these states is shown in Figure 1a. From an active state at position m (m,0), a transcription complex can translocate to the next active state (m+1,0), or branch into backtracked (m, −1) or hairpin-stabilized states (m, hsp).
FIG. 1.

State transitions in the model and the statistical approach to the transcription bubble configuration (a) A diagram of transcriptional states considered in the model shows their interconnections. (b) An illustration of the statistical approach to characterize transcription bubble configurations including the forward translocation step. Dashed arrows indicate fast equilibrium and solid arrows indicate the allowed state transitions.
The energy of the TEC is estimated as the sum of four contributions: the free energy of the (i) transcription bubble, (ii) DNA-nascent RNA hybrid, (iii) nascent RNA, and (iv) RNAP-DNA :
| (1) |
In this estimate, the first two terms are clearly sequence-dependent, as is the secondary structure of nascent RNA (the third term). The fourth term represents interactions between the nucleic acids and RNAP subunits and is effectively constant and sequence-independent as argued previously [9–11].
To determine the configuration of a transcription bubble and the details of the energy profile of a TEC, we used an approach based on statistical mechanics, the basis of which was described by Tadigotla [10]. A transcription complex (m, n) is in a rapid equilibrium among many microstates, each defined by the parameter (b) which depends on the number of unpaired DNA bases upstream (u) and downstream (d) of the DNA-RNA hybrid inside the RNAP enzyme, the length of the hybrid (h) and the number of single-stranded RNA bases protected by RNAP (r) (Figure 1b).
Equilibrium among microstates (dashed arrows in Figure 1b) is reached rapidly compared to the time required for state transitions. Thus, for each transcription complex (m, n), the probability of a particular microstate b is given by the Boltzmann distribution:
| (2) |
| (3) |
The overall forward translocation rate is calculated as
| (4) |
Figure 1b shows the forward translocation step as an example of statistical treatment in the model. All state transitions in the model are determined according to Equation (4), as the summation of the products of the probability and the forward translocation rate of individual microstates.
Forward Translocation
The forward (active) translocation of RNAP is modeled by the Michaelis-Menten (M-M) equation
| (5) |
where kmax is the rate of NTP hydrolysis, Kd is the NTP dissociation constant, and Ki is the equilibrium constant between two adjacent translocation states determined by their base pairing energy. The equation is derived from the Brownian-ratchet model [13], in which forward translocation occurs in three steps: (i) a fast equilibrium between position m and position m+1, (ii) recruitment of NTP at active site, (iii) catalysis and release of pyrophosphate (Figure 2a). Fitting kmax and Kd of equation (5) to experimental data identifies slow translocation sites that precede the long-lived pauses. These slow translocation events are interpreted as pre-translocated, elemental pauses on the pathway of translocation. Further discussion of the on-pathway and off-pathway characteristics of short pauses is given in the Discussion section.
FIG. 2.

Model construction. (a) An illustration of RNAP forward translocation using Michaelis-Menten equation. (b) The free energy landscape for the backtracking pathways. Note that the first backtracking step has different energy barrier than the deeper backtracking steps. (c) The proposed kinetic mechanism for the hairpin-stabilized pause.
Backtracking
Backtracking has been previously modeled using the Arrhenius Equation (6) with an activation barrier of 40 – 50 kBT for each step of backward translocation [9]. This value seems unreasonably high given that the free energy of base pairing in a transcription bubble is typically less than −20 kBT [10].
| (6) |
We take the same Arrhenius approach but treat the first step of backtracking differently from the subsequent ones (Figure 2b), based on the assumption that initially the 3’ end of the nascent transcript blocks the active site and subsequently invades the secondary channel of RNAP [14], while additional backtracking stabilizes the interaction of RNA within the secondary channel.
We assume the energy barrier for an active TEC to enter the backtracked state to be:
| (7) |
where ∆Gbt is a fixed activation energy specific for entering a backtracked state. We can assume that ∆Gbt will be limited to the energy available from complete collapse of the bubble, which is estimated to be in the range −(10 ~ 20)kBT. ∆G0 is the energy of a TEC at an active site. The rate constant to enter the backtracked state from the active state (0) would be
| (8) |
where k1 is the prefactor of backtracking.
For any further backward translocation of RNAP, the energy barrier should relate to the energy difference between two adjacent translocation states and the backtracked distance. Thus, for n > 0,
| (9) |
and
| (10) |
where ∆Gbt_increment represents the backtracking energy barrier due to increase in the length of the transcript inserted into the secondary channel.
The model considers that the recovery from a backtracked state (kbtr) can be achieved by two pathways: a diffusive and a cleavage pathway. The former occurs through RNAP diffusion, which also follows the Arrhenius equation with the energy barriers described above.
| (11) |
and
| (12) |
The cleavage pathway occurs by cleaving nascent RNA inserted into the secondary channel to register the 3’ end of nascent RNA in the active site. This process is likely to be sequence-independent, and was assumed to occur at a constant rate.
Hypertranslocation, which refers to the forward translocation of RNAP without concurrent RNA elongation at the active site, is a pausing event translocationally similar to backtracking. However, we do not include hypertranslocation in the model for two reasons. First, hypertranslocation may not be a general phenomenon during transcription [15], and it cannot be distinguished from backtracking in force spectroscopy assays. Second, hypertranslocation is never energetically favored, because the extent of base-pairing is reduced with respect to the active state.
Hairpin-stabilized Pausing
To model a hairpin-stabilized pause, we take an allosteric view, in which an RNA hairpin contacts a short α helix at the tip of the RNAP flap domain that covers the RNA exit channel to induce the pause [4, 16]. The pathway is modeled as a fast equilibrium between two configurational states, a state free of hairpin and a state with a hairpin positioned close to the RNAP flap domain. The equilibrium is followed by a rate-limiting catalytic step (Figure 2c). The equilibrium is considered rapid compared to the formation of chemical bonds that stabilize the inactive state.
We use Equation (13) to model the entry rate to the hairpin-stabilized pause,
| (13) |
where kon is the catalytic rate of interaction between the RNA hairpin loop and the RNAP flap interaction, and Ki,h is the fraction of hairpin formation. Equation (14) gives the expression for Ki,h, which represents the equilibrium among all possible RNA secondary structures. The secondary structure of RNA transcript rapidly transitions among many microstates, and the simulation of transitions among these microstates is computationally expensive. We bypass this difficulty by simplifying the equilibrium to a two-state system of the lowest energy state and the hairpin-included state
| (14) |
In absence of RNase A which digests the nascent RNA transcript, the lowest energy state is determined by allowing all or at most a 100-nucleotide-long stretch of RNA outside of the exit channel to fold freely. A state including a hairpin is determined by first searching from the 3’ end of RNA for possible hairpin structures near the exit channel (up to 30 nt) before allowing up to 100 of the remaining ribonucleotides of the transcript to fold freely (Figure 5). The equilibrium between the lowest energy state and the hairpin state can be used to estimate the fraction of hairpin formation. In presence of RNase A, the length of freely folded RNA is shortened to 15 nt which may eliminate or generate pause stabilizing hairpins (see below “Comparison of the model with experimental data”).
FIG. 5.

Comparison of the lowest energy conformation with one including a proximal (3’) hairpin at position 44 (‘P2’.) Hairpin formation is unfavorable at this position without RNase. In the presence of RNase, the length of freely folded RNA is limited to 15 nt, so hairpin formation is favored.
A chemical bond between the hairpin loop and the RNAP flap is required to stabilize the hairpin-flap interaction. The catalytic rate relates to the length of stem and loop, and the fraction of G and C in the loop as shown below,
| (15) |
where k2 is the prefactor, Dstem and Dloop are the deviation from optimal lengths of stem (3 – 8 bases) and loop (4–20 bases), respectively, FGC is the fraction of G and C nucleotides within the loop, and ∆Gstem, ∆Gloop, and ∆GGC are the energy changes due to Dstem, Dloop, and FGC.
The exit rate from a hairpin-stabilized paused state (khspr) must be much slower than the entry rate, and is determined by the rate of RNAP hairpin denaturation. For simplicity, the rate is taken to be a constant.
The Effect of Tension and Transcriptional Factors
Tension and transcription factors (TFs) have been reported as critical components that can affect and even determine the transcription products by adjusting the energy profile of transcription complex and/or interacting with transcription machinery[1, 4]. The effects of external tension and TFs on the thermodynamics of TEC were considered in our model. For the forward translocation and backtracking pathways, we employed the idea that the equilibrium constant in forward translocation step Ki and the energy barrier of backtracking step ∆Gn→n−1 is modulated by the work produced by tension [17] and the presence of GreB factors,
| (16) |
and
| (17) |
where Gpre and Gpost are the energy of TEC in pre- and post-translocation state, respectively, Lforward and Lbt are the effective lengths over which external tension acts in the forward translocation step and in the backtracking step, respectively, and ∆GGreB is the energy barrier change due to GreB factor.
The hairpin-stabilized pause was assumed to be unaffected by any applied tension, since it does not involve RNAP translocation, but the length of freely folded RNA transcript can be limited by the presence of RNase A as stated in previous sections. Since the experimental data we used to validate the model were acquired under tension of magnitude ranging from −7pN to 25pN and in presence/absence of GreB and RNase A, we quantitatively determined the effect of tension and TFs by fitting the model with data acquired under different experimental conditions.
Model Training
It is important to notice that transcription is a process that involves only very small numbers of reactants, thus the rates cannot be determined from the chemical law of mass action. Rather, we apply two stochastic methods: (i) the continuous-time Markov chain and (ii) stochastic simulation. The continuous-time Markov chain allows us to analytically solve for the expected time spent in each state at a certain position. The stochastic simulation reveals how individual pausing events develop. The details are given in the Methods section.
The model is encapsulated in a MATLAB class object, which can generate a predicted residence time histogram with the input of a template sequence and a guess of unknown parameters. Thus, the model can be trained with the data from real-time single molecule experiments. We used the time series obtained in high-resolution optical tweezers transcription experiments by Gabizon et al. [18] with or without GreB and RNase A. The transcription experiments were performed on a DNA template (8XHis) containing the T7A1 promoter followed by eight tandem repeats of a 239 bp sequence containing the his-leader pause site and four other known sequence-dependent pause sites [1]. The temporal resolution is high enough to detect pausing events longer than 100 ms. For transcription rates of 10–20 bp/s this is sufficient to resolve pauses with one base-pair resolution using optical tweezers. Alignment of the traces under different forces and with different transcription factors generates the residence time histograms (Figure 3) as described previously.
FIG. 3.

Model fitting and prediction. (a) Stacked histogram produced by the model for the condition of 10 pN in presence of RNase. The residence time due to different pausing mechanisms is represented by different colors. The experimental result is shown by the black line. Goodness of fitting is 0.948 for the major pause sites except for ‘c’ and 0.884 for the overall histogram; (b) Stacked histogram produced by the model for the condition of 10 pN in absence of RNase. Goodness of fitting is 0.959 for the major pause sites except for ‘c’ and 0.904 for the overall histogram; (c) Predicted histogram by the model on an unfamiliar sequence. Goodness of fitting is 0.871 for the overall histogram.
Comparison of the model with experimental data
Experimental data under different conditions with various accessory factors helped to expose the mechanisms of the pauses. Also, the analysis of the backtracking dynamics helped differentiate backtracked pauses from others. Table I summarizes the position and duration of pauses as well as their response to GreB or RNase (factors). Pauses at position ‘a’ are likely pre-translocation, since their duration is barely affected by the addition of GreB or RNase. Pauses at position ‘b’ are likely due to both backtracking and hairpin-stabilization, as their duration responds to the presence of GreB and RNase, and they are preceded by backward RNAP translocation, as previous analysis suggests [18]. ‘P1’, ‘d’ and ‘his’ are hairpin-stabilized pauses that almost disappear in the presence of RNase. Pause ‘P2’ is also hairpin-related, but unlike hairpin-stabilized pause ‘P1’, ‘d’ and ‘his’, it only appears in the presence of RNase.
TABLE I.
Summary of experimental pause positions, durations and mechansims for 10pN under different transcriptional factor conditions.
| Pause | Position of Peak (bp) | Averaged Duration (s) | Associated state(s) | ||
|---|---|---|---|---|---|
| WT | +GreB | +RNase | |||
| ‘a’ | 9 | 0.66 | 0.58 | 0.64 | Pre-translocated |
|
| |||||
| ‘b’ | 34 | 0.94 | 1.27 | 0.59 | Backtracked + Hairpin-stabilized |
|
| |||||
| ‘c’ | 66 | 0.42 | 0.41 | 0.38 | Unknown |
|
| |||||
| ‘d’ | 94 | 0.74 | 0.96 | 0.33 | Hairpin-stabilized |
|
| |||||
| ‘his’ | 161 | 0.68 | 0.95 | 0.25 | Hairpin-stabilized |
|
| |||||
| ‘P1’ | 16 | 0.41 | 0.40 | 0.25 | Hairpin-stabilized |
|
| |||||
| ‘P2’ | 44 | 0.16 | 0.17 | 0.34 | Hairpin-stabilized (with RNase) |
We optimized the values of the model parameters (Table II) to produce a dwell time histogram that resembled the experimental data (Figure 3a and b). The model clearly reproduces the positions and lifetimes of pauses observed experimentally except for pause ‘c’. We propose possible reasons why the model fails at pause ‘c’ in the Discussion section.
TABLE II.
Values (95% confidence interval from 100 bootstrapped values) of the optimized parameters under 10 pN assisting force and WT conditions.
| Parameters and descriptions | Symbol and Value | Note | |
|---|---|---|---|
| Forward Translocation | Rate of NTP catalysis for AUCG | kmax = [85(9), 77(5), 82(9), 41(3)]s−1 | Fitted |
| Equilibrium constant for AUCG | Kd = [34(3), 96(9), 15(2), 26(4)]µM | ||
| Effective length for forward translocation | Lforward = 0.56(0.07)bp | ||
|
| |||
| Backtracking | Prefactor of backtracking | k1 = 1000s−1 | Fixed |
| Energy barrier height of first base-pair backtracking | Gbt = 9.8(0.8)kBT | Fitted with fixed kmax and Kd | |
| Energy barrier height of deeper backtracking | Gbt_incre = 1.8(0.1)kBT | ||
| Effective length for backtracking | Lbt = 0.06(0.01)bp | ||
|
| |||
| Hairpin-stabilized pause | Energy change due to unlikely stem length | ∆Gstem = Inf | Fixed values |
| Energy change due to unlikely loop size | ∆Gloop = Inf | ||
| Energy change due to GC fraction | ∆GGC = 8.8(1.1)kBT | Fitted with fixed kmax and Kd and backtrack related parameters | |
| Hairpin-flap interaction rate | kon = 807(71)s−1 | ||
| Hairpin denaturation rate | khspr = 3.4(0.3)s−1 | ||
|
| |||
| TEC structure | Allowed RNA-DNA hybrid length | h = 7 ∼ 9bp | Fixed range |
| Allowed upstream spacer length | u = 1 ∼ 3bp | ||
| Allowed downstream spacer length | d = 1 ∼ 3bp | ||
| Allowed number of single-stranded RNA protected by RNAP | r = 1 ∼ 3bp | ||
The model also successfully predicts the mechanisms of the pauses suggested by the experimental results (Figure 4a). The experimental data suggest that the presence of GreB extends the dwell time at pause ‘b’ [18]. The model achieves this effect by adjusting the energy barrier of backtracking. The presence of RNase significantly decreases the dwell time at sites ‘P1’, ‘d’ and ‘his’, increases the dwell time at sites ‘P2’, and has little effect on the duration of pauses at other sites. Constraining the model to operate on shorter nascent RNA reproduces the observed changes in pause times by slowing or destabilizing hairpin formation at pause sites ‘P1’, ‘d’ and ‘his’ while favoring the hairpin formation at pause site ‘P2’ (Figure 5).
FIG. 4.

Averaged dwell times from experiments (blue) and model (red) at pause sites. (a) With various transcriptional factor conditions under 10pN assisting tension and (b) WT condition under different tensions. Error bars are the 25th and 75th percentile of 100 bootstrapped values.
The effect of tension is modeled by introducing two different effective lengths Lforward and Lbt for forward and backtracking translocation, respectively (Figure 4b). Notice that the effective length for the forward translocation pathway is shorter than 1 base, while the external force acts on an effective length shorter than 0.1 base during backtracking (Table II). The fitted values of effective length agree with those from previous work [13, 19]. These results indicate that opposing tension extends pauses by decreasing the transcription rate and accentuating the entry into backtracked pausing. It also supports the idea that the entry into long-lived pauses, such as backtracked pauses, follows short-lived pauses.
The predictive power of the model is demonstrated by the fact that it accurately predicts major pauses in the transcription of an unfamiliar 200 base sequence. This sequence preceding the repeat region of the 8XHis template was not included in the data used to optimize the model parameters. Figure 3c shows that the model successfully predicts the main pauses near bases 15, 45, 140 and 180 found experimentally by aligning transcription records and histogramming the dwell times.
To further test the validity of the model, we used Monte Carlo simulations to generate a large number of transcription traces, and we compared the dynamics of backtracking in experimental and simulated traces. The pauses at site ‘b’ in simulated traces were analyzed for backtrack depth and backtrack duration (Figure 8). The clear agreement between simulated and experimental results lends further support to the model.
FIG. 8.

Comparison between the backtracking dynamics in experimental and simulated data. (a) Examples of traces generated by Monte Carlo simulation. The simulated traces show similar pauses at sites ‘a’, ‘b’, ‘d’ and ‘his’ and generate comparable transcription rates to experimental data. (b) Distributions of backtrack depth observed experimentally and predicted by the model. (c) Distributions of backtrack duration observed in the experiments and predicted by the model.
DISCUSSION
Strengths and limitations of the model
Note that, in the model, short, pre-translocated pauses (also referred to as ubiquitous or elemental pauses elsewhere) are identified as positions of slow forward translocation which is on-pathway. In other reports, short pauses were identified as off-pathway events that branch off from the active translocation pathway. The dwell time data from Gabizon et al. (18) follows a power-law distribution up to 4–5 seconds as shown in Figure 9. There is no indication of decay from an off-pathway elemental pause state. This led us to assume an on-pathway elemental pause state and fitting Kd and kmax, equation 5 gives a good agreement at elemental pause ‘a’ and predicts the slow translocation rates at other long-lived pause sites. Nevertheless, the current model, like others, cannot definitively place elemental pauses on- or off-pathway, because differentiating slow translocation from actual pausing is difficult. In fact, the model shows greater agreement with the experimental data at major, long-lived pause sites than elsewhere (see figure 3).
FIG. 9.

The probability density distribution of dwell times in the transcription records. Dwell times ranged between 0.1 and 10 seconds with a power law distribution representing a single, on-pathway state between 0.5 and 4 seconds and superposition of dwell times from other states between 4 and 10 seconds.
The previously reported values of Kd and kmax generate different pause sites from those observed in the experimental data examined here (Figure 10)[12]. This may reflect the fact that those values originated from modeling transcription data without localizing pauses. Alternatively, this difference might indicate that an on-pathway state does not fully describe elemental pauses. However, the goodness of fitting and accuracy of prediction indicate that an on-athway system patterned on the M-M equation has sufficient complexity to effectively model the elemental pauses. Given that the off-pathway events may involve unknown rearrangement of active site of RNAP, fitting a system of suitable complexity is a good approach to bypass the difficulty in modeling off-pathway elemental pauses[20]. With our fitted values but not the previously reported values of parameters, the M-M equation predicts a slow translocation rate of 3.4 bp/sec at a consensus elemental pause site identified using NET-seq[7], which lends further support for fitting Kd and kmax to produce correct pause sites.
FIG. 10.

Comparison between the experimental histogram and the dwell time histogram generated by fixing Kd and kmax using previously reported values [12]. Exceptionally long pauses are predicted at sites ‘a’ (9 bp) and ‘b’ (34 bp) and at 90 and 110 bp where there are no significant, experimentally observed pauses.
The model identifies transcriptional pausing sites and correctly characterizes the mechanism of pausing. Our results support a previous theoretical analysis of transcriptional pauses which suggests that long-lived pauses develop from a short-lived, more ubiquitous pauses [3]. For example, at pause site ‘b’, backtracking is favored over forward translocation because of the low forward translocation rate. Indeed, the energetic parameters of the model would predict comparable backtracking rates at the 35 bp site (pause ‘b’) and at the 190 bp site, but the fast forward translocation rate at the 190 bp site diminishes backtracking (Figure 6). Using the canonical Michaelis-Menten expression, we determined that the forward translocation rate along the template varies from less than 3 nt/s to 70 nt/s. This implies that a slowly transcribing complex may enter into a long-lived pause at one site, even if the backtracking energy barrier at this position is higher than the barrier height at a position where transcription is faster.
FIG. 6.

Backtracking probability is highly dependent on the forward rate. The purple bands indicate two backtracking favored energy profiles. The one at 35 bp has a slow forward rate and causes the pause “b”, while the one at 190 bp has a fast forward rate and shows no backtracking pause in both experiment and model fitting.
The model predicts that the effective length over which tension acts is about one half of a base pair for the forward translocation pathway, but less than 0.1 base for the backtracking pathway. This result suggests that ordinary levels of force affecting translocation insignificantly affect the backtracking rate. During backtracking, RNAP must ratchet backwards on the DNA and disrupt the RNA-DNA hybrid near the active site. We hypothesize that the rate is determined in large measure by the denaturation of the last formed base pair. Thus, external forces cannot alter this process as much as biasing the equilibrium constant in the forward translocation pathway.
The hairpin-stabilized pause requires the interaction between a transcript hairpin and the RNAP flap domain. Previous models simulated the folding of nascent transcripts using the lowest-energy method [10, 13, 21]. However, that method may not locate the correct positions of hairpins, since RNA folds co-transcriptionally and may not readily reach the lowest-energy configuration for RNAP at the pause site. In addition, simulation of co-transcriptional RNA folding requires enormous computational resources, so we devised a new method, which considers the stability difference between a structure including a hairpin and the lowest-energy structure, to estimate the likelihood of hairpin formation. In this case, hairpins at position 101 and 178, although they are stable structures, are less likely to interact with RNAP than the less stable hairpins at positions 94 and 161, which correspond to pauses at sites ‘d’ and ‘his’, respectively (Figure 7). Our method also readily reproduced the pause ‘P2’, which is significantly lengthened in the presence of RNase as illustrated in Figure 5.
FIG. 7.

Comparison of energy and Ki,h at different positions. Hairpin formation is unfavorable at position 101 and 178, although the hairpin structures at these positions are fairly stable. While at position 94 and 161, the hairpin structures can readily form and induce the hairpin-stabilized pauses.
Some paused states are likely overlooked in the current model. For example, the current model cannot characterize the pauses observed at site ‘c’ and at other less significant sites. The duration of the pause at site ‘c’ is largely unaffected by the addition of either GreB or RNase, suggesting a mechanism distinct from backtracking or hairpin-stabilized pausing that is not captured in the current model. In a recent work by Janissen et al., three interconnected paused states were extracted from long transcription assays, including an elemental paused state, a backtracked paused state and a backtrack-stabilized state [22]. The backtrack-stabilized paused state is not included in our model, since the data for the tandemly repeated, 239 bp DNA sequence does not contain extremely long pauses (˜100 s) that are classified as backtrack-stabilized states by Janissen et al.
Conclusion and outlook
This purely thermodynamic consideration of the transcription complex accurately reproduces transcription kinetics. By incorporating both Class I and Class II pauses, the model refines our current understanding of active pathway and branched pathways in transcription and can be used to predict the occurrence of Class I and II pauses that regulate transcription.
The model described herein significantly extends earlier efforts to model the kinetics of transcription. Bai et al., Tadigotla et al. and others independently proposed models in which the kinetic of transcription is treated as a competition between the active transcription pathway and a branched pathway [9, 10, 12, 13, 21]. Although their models yield results in statistical agreement with experimental results, the predicted pauses differ from those observed in single-molecule measurements, and the effects of tension and transcription factors were neglected. By fitting specific kinetic parameters under specific experiment conditions, our model achieves not only statistical agreement with experimental results, but reveals quantitative detail regarding the effects of DNA sequences, applied tension, and transcription factors.
Further improvements in our biochemical understanding of transcriptional pauses, in the quality of experimental data, and in the model itself, could improve the predictive power. For example, the model might predict the pause at site ‘c’ if the mechanism underlying this pause is determined and incorporated. Longer spans of high resolution transcription data would also improve optimization of the model and the accuracy of predictions by providing more sequence variations.
METHODS
Simulation of Dwell Time Histogram using Continuous-Time Markov Chain
With the transition rates between states, we can write the rate matrix of the Markov chain:
| (18) |
The elements of row n and column m represent the transition rate from state n to state m. The rows(columns) represent sequentially the active state, the backtracked states from 1 to 10 backtracking depth, the hairpin-stabilized state, and the next translocation state. The matrix is guaranteed non-singular. Given an initial state, which is clearly [1, 0, 0, …, 0, 0], the time spent in each state can be expressed as a matrix exponential
| (19) |
where α is the initial distribution of states, V consists of the eigenvectors of rate matrix , and D is a diagonal matrix of the diagonal elements of eigenvalues of ordered like the eigenvectors in V. Thus, the expected time spent in each state is
| (20) |
where λ is the negative inverse of the diagonal element of D after replacing 0 eigenvalues with 1.
Optimizing the model with experimental data
To optimize the model parameters, we first considered only the forward translocation pathway and fitted the equilibrium parameters Kd, the kinetic parameters kmax and an effective length Lforward over which the external force acts during the forward translocation of RNAP. These parameters were optimized to generate a histogram that maximizes the goodness of fit (R), which is evaluated by the following equation
| (21) |
where Oi and Xi is the i-th element of the fitted and experimental histogram, respectively. In this step, parameters related to forward translocation are fitted to produce slow translocation rates at all experimentally detected pause sites. The result suggests that the pause at position ‘a’ is a pre-translocated pause which is consistent with the experimental data showing insensitivity to GreB. However, pauses at other sites were characteristically longer than the dwell time produced by forward translocation only.
In the next step, we included the backtracked pathway, the energy barriers ∆Gbt, ∆Gbt_increment and an effective length Lbt for backtracking, maintaining the parameters of the forward translocation pathway set in the previous step (Table II). To reduce the complexity of the model, we fixed the prefactor k1 as 1000s−1. The result suggests the backtracked pause is a large component of pauses at position ‘b’ but not at other positions, in agreement with the analysis of backtracking dynamics.
Lastly, we included the hairpin-stabilized pause pathway with the rest of the parameters in the model fixed at the values identified in the preceding two steps (Table II). For any hairpins with stems or loops exceeding 3–8 or 4–20 bases respectively, ∆Gstem and/or ∆Gloop were set to infinity as they were unlikely to stabilize pauses. The result gives a good agreement with pauses at other sites.
We repeated the procedure above to fit the experimental data with GreB and RNase. The goodness of fit is evaluated for major pause sites (dwell time > 0.05 s) and overall histogram separately using Equation (21). Overall, the model faithfully reproduces pause times at all pause sites under all conditions with the exception of pause ‘c’. Pauses at ‘c’ might originate from a different mechanism. Table II gives a list of the values of the fitted model parameters.
We applied the tuned model from previous steps on a new sequence. Since this new sequence precedes the repeat region in the transcription experiment, we have fewer experimental data on this sequence and the aligned histogram shows more minor peaks than the histogram of the well aligned repeated region. Nonetheless, the tuned model successfully reproduced the major pauses, as shown in Figure 3c, and the goodness of fit on this unfamiliar sequence indicates the tuned model is not an overfit.
Monte Carlo simulation and analysis on backtracking dynamics
To further validate the model, we generated transcription data using Monte Carlo simulation and compared the dynamics of backtracking in the experimental and simulated traces. Figure 8a shows the example traces generated by using the optimized parameters shown in Table II under 10 pN of assisting tension. The simulation was performed by calculating the probability of state transitions from the transition rates every 0.001 s. We collected the backtrack depth and duration from the simulated traces at pause site ‘b’ and compared them to the experimental results. Figure 8b and c shows that the simulated backtracking depth and duration were similar to those observed experimentally.
ACKNOWLEDGMENTS
This work was supported by the National Institutes of Health (NIH) grants R01 GM084070 to LF. We are grateful to Carlos Bustamante and Alex Tong for generously providing high resolution transcription data.
References
- [1].Herbert KM, Porta A, Wong BJ, Mooney RA, Neuman KC, Landick R, and Block SM, Sequence-Resolved Detection of Pausing by Single RNA Polymerase Molecules, Cell 125, 1083 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Abbondanzieri EA, Greenleaf WJ, Shaevitz JW, Landick R, and Block SM, Direct observation of base-pair stepping by RNA polymerase, Nature 438, 460 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Artsimovitch I and Landick R, Pausing by bacterial RNA polymerase is mediated by mechanistically distinct classes of signals, Proceedings of the National Academy of Sciences 97, 7090 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Toulokhonov I, Artsimovitch I, and Landick R, Allosteric Control of RNA Polymerase by a Site That Contacts Nascent RNA Hairpins, Science 292, 730 (2001). [DOI] [PubMed] [Google Scholar]
- [5].Kang JY, Mishanina TV, Bellecourt MJ, Mooney RA, Darst SA, and Landick R, RNA Polymerase Accommodates a Pause RNA Hairpin by Global Conformational Rearrangements that Prolong Pausing, Molecular Cell 69, 802 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Abdelkareem MM, Saint-André C, Takacs M, Papai G, Crucifix C, Guo X, Ortiz J, and Weixlbaumer A, Structural Basis of Transcription: RNA Polymerase Backtracking and Its Reactivation, Molecular Cell 75, 298 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Larson MH, Mooney RA, Peters JM, Windgassen T, Nayak D, Gross CA, Block SM, Greenleaf WJ, Landick R, and Weissman JS, A pause sequence enriched at translation start sites drives transcription dynamics in vivo, Science 344, 1042 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Herbert KM, Zhou J, Mooney RA, Porta AL, Landick R, and Block SM, E. coli NusG Inhibits Backtracking and Accelerates Pause-Free Transcription by Promoting Forward Translocation of RNA Polymerase, Journal of Molecular Biology 399, 17 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Bai L, Shundrovsky A, and Wang MD, Sequence-dependent Kinetic Model for Transcription Elongation by RNA Polymerase, Journal of Molecular Biology 344, 335 (2004). [DOI] [PubMed] [Google Scholar]
- [10].Tadigotla VR, Maoiléidigh DO, Sengupta AM, Epshtein V, Ebright RH, Nudler E, and Ruckenstein AE, Thermodynamic and kinetic modeling of transcriptional pausing, Proceedings of the National Academy of Sciences 103, 4439 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Yager TD, Hippel V, and P. H, A thermodynamic analysis of RNA transcript elongation and termination in Escherichia coli, Biochemistry 30, 1097 (1991). [DOI] [PubMed] [Google Scholar]
- [12].Bai L, Fulbright RM, and Wang MD, Mechanochemical Kinetics of Transcription Elongation, Phys. Rev. Lett 98, 68103 (2007). [DOI] [PubMed] [Google Scholar]
- [13].Maoiléidigh DO, Tadigotla VR, Nudler E, and Ruckenstein AE, A Unified Model of Transcription Elongation: What Have We Learned from Single-Molecule Experiments?, Biophysical Journal 100, 1157 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Nickels BE and Hochschild A, Regulation of RNA Polymerase through the Secondary Channel, Cell 118, 281 (2004). [DOI] [PubMed] [Google Scholar]
- [15].Larson MH, Greenleaf WJ, Landick R, and Block SM, Applied Force Reveals Mechanistic and Energetic Details of Transcription Termination, Cell 132, 971 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Chauvier A, Nadon JF, Grondin JP, Lamontagne AM, and Lafontaine DA, Role of a hairpin-stabilized pause in the Escherichia coli thiC riboswitch function, RNA Biology 16, 1066 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Tinoco I and Bustamante C, The effect of force on thermodynamics and kinetics of single molecule reactions, Biophysical Chemistry, 513 (2002). [DOI] [PubMed] [Google Scholar]
- [18].Gabizon R, Lee A, Vahedian-Movahed H, Ebright RH, and Bustamante CJ, Pause sequences facilitate entry into long-lived paused states by reducing RNA polymerase transcription rates, Nature Communications 9, 2930 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Neuman KC, Abbondanzieri EA, Landick R, Gelles J, and Block SM, Ubiquitous Transcriptional Pausing Is Independent of RNA Polymerase Backtracking, Cell 115, 437 (2003). [DOI] [PubMed] [Google Scholar]
- [20].Toulokhonov I, Zhang J, Palangat M, and Landick R, A central role of the rna polymerase trigger loop in active-site rearrangement during transcriptional pausing, Molecular Cell 27, 406 (2007). [DOI] [PubMed] [Google Scholar]
- [21].Bochkareva A, Yuzenkova Y, Tadigotla VR, and Zenkin N, Factor-independent transcription pausing caused by recognition of the RNA-DNA hybrid sequence, The EMBO Journal 31, 630 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Janissen R, Eslami-Mossallam B, Artsimovitch I, Depken M, and Dekker NH, High-throughput single-molecule experiments reveal heterogeneity, state switching, and three interconnected pause states in transcription, Cell Reports 39 (2022). [DOI] [PubMed] [Google Scholar]
