Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Mar 9.
Published in final edited form as: J Chem Theory Comput. 2022 Feb 8;18(3):2002–2015. doi: 10.1021/acs.jctc.1c01233

Landscape-zooming towards the prediction of RNA cotranscriptional folding

Xiaojun Xu 1,‡,*, Lei Jin 2,, Liangxu Xie 1, Shi-Jie Chen 2,*
PMCID: PMC8904291  NIHMSID: NIHMS1781136  PMID: 35133833

Abstract

RNA molecules fold as they are transcribed. Cotranscriptional folding of RNA plays a critical role in RNA functions in vivo. Present computational strategies focus on simulations where large structural changes may not be completely sampled. Here we describe an alternative approach to predicting cotranscriptional RNA folding by zooming in and out of the RNA folding energy landscape. By classifying the RNA structural ensemble into “partitions” based on long, stable helices, we zoom out of the landscape and predict the overall slow folding kinetics from the inter-partition kinetic network, and for each inter-partition transition, we zoom in on the landscape to simulate the kinetics. Applications of the model to the 117-nucleotide E. coli SRP RNA and the 59-nucleotide HIV-1 TAR RNA show agreements with the experimental data and new structural and kinetic insights into biologically significant conformational switches and pathways for these important systems. This approach, by zooming in/out of RNA folding landscape at different resolutions, might allow us to treat large RNAs in vivo with transcriptional pause, transcription speed, and other in vivo effects.

Graphical Abstract

graphic file with name nihms-1781136-f0001.jpg

Introduction

RNA molecules fold as they are transcribed in the cell.1-3 RNA folding in transcription is a kinetically controlled process as the nascent RNA chain may not have sufficient time to reach equilibrium before a new nucleotide emerges from the RNA polymerase. As a result, an RNA often folds into a multitude of kinetic intermediates in the transcription process. During the early stage of transcription, the short nascent RNA chain can likely quickly relax to thermal equilibrium before new nucleotides emerge. In the late stage of transcription, however, the chain is long and has folded into complex nascent structures thus relaxation to equilibrium can be slow. Because the kinetic effects accumulate continuously until the full-length RNA chain is transcribed, the cotranscriptional folding product of an RNA can be very different from that of the refolding of the full-length RNA sequence. Several factors, such as ions, metabolites, transcription rate (speed of elongation), and site-specific pausing, can influence the coupling of transcription and RNA folding. Therefore, accurate prediction of RNA cotranscriptional folding is important for understanding RNA function. Riboswitches regulate the downstream gene expression through the formation of ligand-dependent structures during transcription, in which the presence/absence of ligand molecules can alter the co-transcriptional folding pathways.4-7 An accurate prediction and thorough analysis of cotranscriptional RNA folding is essential for understanding riboswitch functions and riboswitch-targeted design of antibiotics.

Parallel to the experimental investigation using various techniques such as single-molecule FRET, SHAPE-seq, NMR spectroscopy4-18 are the development of promising computational approaches to the modeling of RNA cotranscriptional folding.19-43 Molecular dynamics (MD) simulations allow detailed investigation of structural dynamics of biomolecules with high temporal and spatial resolutions.21 As recently reviewed by Sponer et. al.,44 the development of atomistic and coarse-grained force fields and the recent advances in a broad spectrum of enhanced sampling methods have provided more direct, high-resolution predictions for structure-dynamics-function relationships for many RNA systems, ranging from the smallest RNA oligonucleotides to the RNA motifs, riboswitches, ribosomes, and RNA/protein complexes. However, accurate prediction of long-timescale dynamics with large structural changes, such as the folding transitions during RNA transcription, remains a challenge. In the present study, as a complement to the molecular dynamics21 and other simulation36 methods for RNA cotranscriptional folding, we develop a novel multi-scale coarse-graining approach to modeling RNA folding energy landscape and cotranscriptional folding kinetics.

To account for the huge, dynamic conformational ensemble for the RNA in the chain elongation process, low-resolution RNA structure models are used in kinetic modeling. For example, through the formation and dissociation (or switching) of individual RNA base pair/stacking as an elementary kinetic step, Kinfold models RNA folding as a time-series of structural transitions at single base pair resolution.25 The single base pair-based kinetic move set provides a useful approximation for the calculation of transition rates from the free energies of the secondary structures, and the model allows a detailed description of folding pathways at the single base pair level. However, conformational description at single base pair level can result in an excessively large conformational space for kinetic sampling, as a result, modeling the folding kinetics for large RNAs can be limited by long computational time.

The need to reduce the conformational sampling space demands models that go beyond single base pair model by using a more coarse-grained treatment of RNA conformations. For example, Ndifon developed a complex adaptive systems approach to model RNA folding kinetics by allowing several uncorrelated base pairs to change in a single time step.27 Kinwalker computes RNA cotranscriptional folding heuristically using stepwise combination of thermodynamically optimal secondary structural elements.28 By defining the pre-equilibrated conformational macrostates from the complete (base pair-based) conformational ensemble, kinetic cluster model enables deterministic predictions for the population kinetics, transition rates between macrostates, and pathways for RNA folding.39 A fundamental structural motif in an RNA structure is helix, which is formed by consecutive base pairs and stabilized primarily by base stacking interactions. Using helices as building blocks and the formation and disruption of a helix as a kinetic move, a multitude of different approaches have been developed and enabled prediction of the folding kinetics of large RNAs.31-43 These models, however, have two major limitations. First, the number of possible helices increases dramatically with the elongation of RNA sequence, making a complete sampling of RNA structures and folding trajectories computationally infeasible for long RNAs. Second, the formation (disruption) of a helix involves the closing/opening of multiple base pairs, where the pathway and rate constants can be dependent on the structural context. For example, in a pseudoknotted structure, the formation of the loop-closing helix in a hairpin structure can have very different kinetics of the folding of the second helix. Therefore, modeling of the transition rate and rate constant for a kinetic move should account for the structural context, and the traditional calculation based only on an isolated helix can lead to inaccuracy.

We here develop a new helix-based multiscale approach to the modeling of RNA folding kinetics at the 2D structural level. The fact that folding and unfolding of a long, stable helix is slower than a short, less stable helix suggests a two-scale approach: (a) to compute the (slow) rates for the creation/deletion of long, stable helices from an ensemble of kinetic trajectories that involve (fast) folding/unfolding of short, less stable helices, and (b) to determine the overall folding kinetics of the system from the different combinations of creation/deletion/switch of long, stable helices. The model employs the Kinetic Monte Carlo (KMC) simulation method to evaluate the transition rates of adding/deleting a (long, stable) helix to/from a structure. The KMC approach allows us to estimate the rates without a priori defined specific transition states for different helices. According to constituent long, stable helices, the large structural ensembles can be grouped into pre-equilibrated partitions such that intra-partition structural conversions are much faster than inter-partition transitions.45 From the network of inter-partition transitions, the model predicts the overall folding kinetics. Mimicking the zooming in/out features of a telescope, this new model can give the RNA folding kinetics at different structural resolutions, thus provides a feasible and efficient solution for the prediction of large RNA folding, in particular, as shown in the present study, cotranscriptional folding of an RNA such as the E. coli SRP RNA and HIV 5’UTR. Application to the cotranscriptional folding of HIV-1 TAR provides detailed insights into the folding pathways of TAR and the impact of the folding pathway on transcriptional pause. The theory-experiment agreement on the TAR system further supports the reliability of the model.

Methods

Helix pools and structure partitioning

For a given RNA sequence, we first build a helix pool consisting of all the possible helices formed by canonical base pairs (A-U, G-C, and G-U). Each helix in the helix pool contains at least one base stack formed by two consecutive base pairs. Only saturated helices, which cannot be further extended on either side through canonical base pairs, are included in the helix pool. The free energy of each helix is evaluated by the base stacking energies from the Turner parameters.46 In general, the size of the helix pool for a given RNA sequence grows rapidly with RNA sequence length.

We rank all the helices in the pool according to their free energies and select a given number of top-ranked lowest-free energy helices. We use the selected lowest-free energy helices to divide the RNA conformational space into different partitions. For example, the helix pool shown in Fig. 1(A) contains five (saturated) helices. Turning on/off the different helices leads to a total of 25 = 32 structures each of which defines a partition. For example, the partition (+h1, −h2, +h3, −h4, −h5) in Fig. 1(A) contains helices h1 and h3 and excludes helices h2, h4, and h5. Here, structures containing overlapping nucleotides (nucleotides belonging to two or more helices) or cross-linked base pairs such as base pairs in a pseudoknot are removed. We calculate the free energy of each partition using the Vfold2D model,47 which implements the dynamic programming algorithm to exhaustively sample the 2D structures with the helix constraints and applies the Turner parameters46 and the Vfold-derived RNA loop parameters to evaluate the free energy of each sampled structure, where all the compatible stable and less-stable helices (helices not selected for the construction of partitions) are sampled. Specifically, the total partition function of each partition i is given by Zi=seΔGiskBT, where Gis is the free energy of Vfold2D-sampled structure s in partition i, kB is the Boltzmann constant, and T is the temperature. The free energy of partition i is calculated as Gi = −kBT ln Zi.

Figure 1:

Figure 1:

(A) Helix-based RNA cotranscriptional folding model. During the sequence elongation with n nucleotides transcribed, the model first lists a helix pool with all the possible saturated helices. Next, the model divides the whole energy landscape of n-nt nascent RNA chain into discrete partitions by the combination of selected helices. Here, the symbols of “+” and “−” represent the inclusion and exclusion of the desired helices, respectively. To calculate the transition rates for the helix-based kinetic moves, the model builds a network of structural ensemble (partial base pairs of the helix). The populational kinetics for the n-nt chain is solved by the master equation for the network of partitions, with the initial populations inherited from the kinetics of the (n-1)-nt nascent chain. (B) The relationship between two selected helix pools of the (n-1)- and n-nt RNA chains. I+II and II+III are the selected helix pools for the (n-1)-nt and the n-nt sequences, respectively. (C) Experimentally proposed cotranscriptional folding pathway of the 117-nt SRP RNA, which involves three most stable helices of H1, H2, and H3.

Due to the dynamic nature of RNA conformations in solution, helix terminal base pairs are usually less stable and can undergo opening and closing dynamic fluctuations. Therefore, we allow overlapping and cross-linked base pairs to occur for the terminal nucleotides of a helix. It allows helices to overlap by a single nucleotide, a scenario that often happens in junction motifs, where a nucleotide could swap between base pairs in different helices.

Transition rates for helix-based kinetic moves

In general, the formation and disruption of stable helices usually involve a high kinetic barrier. The model uses stable helices to define partitions, assuming that structure transitions within each partition are faster than those between partitions. Therefore, each partition can be treated as a quasi-equilibrium macrostate, and the overall folding kinetics is mainly determined by the kinetic network of inter-partition transitions.

From the definition of partitions, different partitions differ by at least one helix and an inter-partition transition can be described as the deletion and addition of a helix (“helix-based transitions”). To compute the inter-partition kinetic rates, we consider all the possible kinetic pathways for the deletion/addition of a helix, such as helix h2 in Fig. 1(A). On each pathway, structures are connected by the addition/deletion of a single base pair (shown as red arrows in Fig. 1) and the intermediate structures contain partially formed helix. We apply the conventional Metropolis rule48 to calculate the rate constant kAB for the transition from structure A to its neighboring structure B (through the addition/deletion of a base pair):

kAB=k0min(1,eΔGABkBT) (1)

where, ΔGAB = GB - GA is the free energy difference between the two states, kB is the Boltzmann constant, T is the folding temperature, and k0 is the solution-dependent attempt frequency to be determined from comparisons with experiment. Here we use the Vfold2D model to calculate the free energy of a structure. From the above constant kAB for a base pair, we apply the KMC simulations30, 49 to calculate the rate for the whole helix from the pathways. For example, for the inter-partition transition in Fig. 1(A3), the KMC simulation starts from the initial partition (before helix h2 is formed) and the folding trajectory terminates once the final partition (with helix h2 formed) is visited. We sample 5000 such trajectories, each of which gives a first passage time (FPT) from the initial helix-open partition to the final helix-close partition. The transition rate for the helix formation (addition) is calculated from the mean first passage time TFPT averaged over all the 5000 trajectories as khelix+=1TFPT. The transition rate for helix disruption is calculated from the detailed balance condition as khelix=e(GhelixGhelix+)kBTkhelix+, where, Ghelix and Ghelix+ are the free energies of the helix-open and the helix-close partitions, respectively.

Master equation method for kinetics

From the inter-partition rate constants, we predict the detailed populational kinetics for each partition from the master equation. Specifically, the fractional population pi(t) of partition i (= 1, 2, ⋯, Ω, where Ω is the total number of partitions) evolves with time t according to the following master equation:

dpi(t)dt=j[kjipj(t)kijpi(t)], (2)

where, kij and kji are the rates for the transitions from partitions i to j and from j to i, respectively. The rate constants constitute an Ω × Ω rate matrix M, such that Eq. 2 has an equivalent matrix form of dp(t)dt=Mp(t), with Mij = kji for ij, Mii = −Σji kij, and p(t) the fractional populational vector of [p1(t), p2(t), ⋯, pΩ(t)]. Mathematically, by solving the eigenvalues λm and eigenvectors nm (m = 1, 2, ⋯, Ω) of the rate matrix M,50 the population kinetics can be obtained analytically as

p(t)=mCmnmeλmt (3)

for the population of all the partitions at any time t, where Cm is a coefficient determined by the initial condition.

Cotranscriptional folding on a variable landscape

Compared with the folding of a full-length sequence, RNA cotranscriptional folding involves the elongation of the RNA chain, resulting in a time-varying folding landscape during transcription. Therefore, the modeling of cotranscriptional folding requires iterative, stepwise simulations as explained below. Before the emerging of the (n+1)-th nucleotide (nt), the RNA chain of n nucleotides navigates the network of the n-nt partitions, with the initial population distribution inherited from the folding of the (n−1)-nt chain. As the (n+1)-th nucleotide is synthesized, the folding landscapes may undergo a significant change. As shown in Fig. 1(A), at the time of a new nucleotide emerges, the helix pool and the resultant partitions are updated. As a result, the nascent chain explores a new ensemble of partitions and navigates a new energy landscape, accordingly, the stability of each partition, inter-partition transition rates, the eigenvalues and eigenvectors of the rate matrix M, and the initial condition-dependent coefficient Cm in Eq. 3 all need to be updated.

In this study, the whole cotranscriptional folding simulation is divided into a series of time windows. In the n-th time window with the n-th nucleotide newly transcribed (n = 1, 2, ⋯ , N, where N is the total length of RNA), the folding of the (n-1)-nt chain Ln–1 is terminated and the folding of the nascent n-nt chain Ln is started. We update the helix pool for Ln, from which we update the partitions. To investigate the folding of Ln, we need to determine the (initial) population of each Ln partition. We transfer the population distribution of Ln–1 to that of Ln based on the relationship between two helix pools as shown in Fig. 1(B) as an illustrative case. The transcription of the nascent n-th nucleotide can lead to three types of structure changes (see Fig. S1(A)): (1) no structural changes except that the 3’ dangling end is elongated by one nucleotide; (2) a new lone base stack (two consecutive base pairs) is formed by three originally unpaired nucleotides and the newly added nucleotide; (3) a new base pair is added to an existing helix. Because the above three types of structure changes are fast compared with RNA transcription, we assume local equilibrium between the (sub)structures of a partition can be quickly established, which allows us to use Boltzmann distribution to reassign the fractional population of each Ln−1 partition to individual Ln structures. We use Fig. S1(B) as an example, where the helix pools (h1, h2, h3, h4) of Ln−1 and (h3, h4, h5, h6) of Ln share common helices (h3, h4). The population of the 6th partition (−h1, −h2, +h3, −h4) of Ln−1 is partitioned into the 5th, 6th, 7th, and 8th partitions, all containing (+h3, −h4), of Ln. The populational contribution from the 6th partition of Ln−1 to the ith partition of Ln is fip6(n1), where p6(n1) is the population of the 6th partition of Ln−1 and fi=eΔGi(n)kBTZ5,6,7,8(n) (i = 5, 6, 7, 8) where Z5,6,7,8(n) is the partition function: Z5,6,7,8(n)=j=58eΔGj(n)kBT for this example. fi is the normalized Boltzmann weight taking account of the expansion of conformational space of each individual Ln−1 partition.

Results and discussion

We apply the model to the 117-nt E. coli SRP RNA, whose cotranscriptional kinetics has been experimentally studied.10 The experiment probes cotranscriptional folding using a novel SHAPE-seq method, which determines the structural information of a nascent RNA chain from the SHAPE reactivities of the nucleotides. High reactivities correspond to flexible (unpaired) nucleotides, and low reactivities indicate structured nucleotides that could be involved in base-pairing, stacking, or ligand-binding interactions. The variation in reactivity profile of the RNA during transcription reveals the structural-rearrangement events at single-nucleotide resolution. The proposed folding pathway, as shown in Fig. 1(C), involves three stable helices of H1 with 6 bps, H2 with 7 bps, and H3 with 16 bps and a structural rearrangement upon the formation of helix H3 with the disruption of helix H1. During the early transcription, a hairpin structure with the helix of H1 is formed, followed by the folding of helix H2, such that a structure with two stem-loops connected in series emerges as the nascent chain extends to 85 nt. When the transcription length reaches about 100 nucleotides, the long-range base pairing between the 5’- and 3’-end nucleotides (helix H3, which is incompatible with H1) begins to form, resulting in the formation of the native stem-loop structure with H2 and H3. In the theoretical calculation, we assume that the transcription rate is constant with a duration ΔT for each time window. ΔT can be treated as the average time duration for a new nucleotide to be transcribed and added to the nascent chain (also defined in the section of “Cotranscriptional folding on a variable landscape”). The time duration ΔT corresponds to the transcription speed of (ΔT)−1 nucleotides per unit time. All the calculations are carried out with temperature at 310 K, and k0 = 1. Therefore, the simulation time scale is in the unit of k01, where k0 is the prefactor in Eq. 1.

In general, the transcription speed ranges from 200 nt/s in phages to 20-80 nt/s in bacteria and 10-20 nt/s in human polymerase.51 To convert the simulation timescale into a physical timescale, the solution-dependent attempt frequency k0 is determined by the comparisons with (case-by-case) experimental data. For a typical value of 106 for k0,23, 24 the time duration ΔT ≈ 105 matches the transcription speed in human polymerase.

Partitioning with helix selection

Different RNA sequences can have different helix pools. As shown in Fig. 2(A), the number of saturated helices in the pool for the SRP RNA grows during transcription. In the early stage of the transcription, all the saturated helices are used to build the ensemble of partitions. During the late transcription, the number of helix combinations increases dramatically, therefore, a selected set of helices are used to assemble the partitions. Specifically, we select helices from the most stable to the least stable ones such that the number of the partitions Np reaches a predefined maximum value Npmax. For a fixed value of Npmax, as shown in Fig. 2(A), the number of partitions Np grows with the size of the helix pool during the early transcription. Due to the constraint NpNpmax, Np stays nearly unchanged (Npmax) after the transcription exceeds a certain site, as marked by a, b, c, d, e, and f for Npmax = 10, 100, 1000, 5000, 10000, and 40000, respectively.

Figure 2:

Figure 2:

(A) Sizes of helix pools (the number of helices) of the saturated helices (in gray) and the selected helices with the allowed maximum number of partitions Npmax=10 (brown), 100 (purple), 1000 (orange), 5000 (blue), 10000 (red), and 40000 (black), respectively, during transcription of SRP RNA. Inset: the number of partitions for different Npmax shown with the same color code. (B) Discrete folding landscape denoted by the stabilities of each partitions with Npmax=10 (B1), 100 (B2), 1000 (B3), 5000 (B4), 10000 (B5) and 40000 (B6), respectively. Partitions with a small Npmax may be further divided into sub-partitions, as indicated by the cases of red and grey points. (C) The eigenvalues of the slowest kinetic modes for different Npmax values during the transcription of the SRP RNA.

In the late stage of transcription, the selected set of saturated helices becomes a small fraction in the total helix pool. As shown in supplementary Fig. S2, the helix-length distribution for different nascent SRP RNA chains indicates that most saturated helices have sizes of less than 4 bps. The most stable helices selected (in the selected helix pool) contain multiple canonical base pairs and their formation and disruption can lead to large structural changes at the secondary structural level. Mimicking the zooming feature of telescopes, we can use different values of Npmax to fine-tune the coarse-graining (“resolution”) of the energy landscape, RNA structural changes, and the overall folding kinetics.

Fig. 2(B) gives the coarse-grained landscapes defined by the discrete partitions for the 117-nt (full size) SRP RNA with different values of Npmax. For the case of Npmax in Fig. 2(B1), four (out of 624 total saturated helices) most stable helices (such as h1, h2, h3, and h4) are selected to divide the whole energy landscape into ten partitions. The first partition (−h1,−h2,−h3,−h4) as marked in gray contains no helices included in the selected helix pool. The 7th partition (+h1,+h2,−h3,−h4) as marked in red is the most stable one (i.e., the global minimum). The enlargement of selected helix pool with a larger Npmax further divides the previously defined partitions into sub-partitions with much more detailed structural constraints for the inclusion and exclusion of helices. For example, there are 90 partitions for Npmax=100 with 15 helices selected as shown in Fig. 2(B2). Due to the enlargement of the selected helix pool from 4 to 15 with 11 newly added helices, the first partition of Npmax=10 is divided into 39 folding sub-partitions. Since there is only one newly added helix compatible with h1 and h2, the global minimum of Npmax=10(+h1,+h2) is only divided into two sub-partitions as shown in Fig. 2(B2). As shown in Fig. 2(B) (and supplementary Fig. S3 during transcription), Npmax servers as a measure to reveal the hierarchical structure of the conformational network and the folding landscape. Furthermore, as shown in Fig. S4, by increasing the value of Npmax, more detailed structural features are captured by the (Npmax-dependent) sub-partitions. Although the number of sub-partitions is partition-specific, the overall shape of the discrete folding landscape conserves gradually during the landscape zooming (tuned by Npmax). For example, the landscape of the 117-nt SRP RNA with Npmax=40000 shown in Fig. 2(B6) is similar to that with Npmax=10000 shown in Fig. 2(B5).

Landscape zooming for folding of helices

Mathematically, the eigenvalues of the rate matrix M provide the time scales of different kinetics modes. The overall folding kinetics is determined by the slow kinetic modes, especially the slowest one with the minimum non-zero ∣λ∣. Fig. 2(C) gives the eigenvalue of the slowest kinetic mode ∣λslowest∣ with different Npmax during transcription for the SRP RNA. Regardless of the fluctuations, ∣λslowest∣ decreases with the sequence elongation, especially for the cases of large Npmax, resulting in the increase of overall folding time as expected. Since the model uses all the saturated helices during the early transcription, ∣λslowest∣ is independent on Npmax for the short nascent chains. After the transcription proceeds and pass the Npmax-dependent site (see Fig. 2(A)), only selected stable helices are used to assemble partitions, resulting in the inconsistency of ∣λslowest∣ from different Npmax. However, the dependence of ∣λslowest∣ on the length of nascent chain during transcription converges for large Npmax (see the cases of Npmax=5000 (blue line), 10000 (red line) and 40000 (black line) in Fig. 2(C)).

Generally, different levels of coarse-graining may alter the predicted folding kinetics. The trend of ∣λslowest∣ during transcription shown in Fig. 2(C) indicates that increasing the value of Npmax, such that more helices with less stabilities are selected, can improve the prediction accuracy of RNA folding kinetics. We note that the computation of the eigenvalues and eigenvectors of rate matrix M is largely limited by the memories of the computer resource. Currently, we only treat rate matrix M smaller than 40000 × 40000 and the master equation calculation with Npmax10000 to avoid the extremely large computational cost.

Since each partition is featured by the inclusion and exclusion of selected helices, we trace the populations of selected helices during transcription. As shown in Figs. 3 and S4, by increasing the value of Npmax (i.e., zooming in the folding landscape), we uncover more details about the cotranscriptional kinetics. For example, as shown in Fig. 3(A), there are less than ten stable helices populated significantly (> 5%) for the case of Npmax=10. For a larger Npmax, less stable helices are included in the partitions, thus structural transitions involving less stable helices can be uncovered. Therefore, more helices (> 70 for Npmax=10000, as shown in Fig. 3(E)) emerge during transcription and the model can predict the folding kinetics with more structural details. The population of an individual helix is determined by the stability of the structures that contain the helix and the kinetics of structural changes. For the transcription window of 90-110 nt, the nascent SRP chain may fold into thermodynamically stable structures without helix H2, leading to the decrease in the population (nearly unpopulated for larger Npmax values) of H2 for the transcription with ΔT = 106 (Fig. 3). As shown in Fig. S5, the population of H2 (as well as other helices) is the transcription rate-dependent. However, the experimental results show that the population of H2, once saturated, remains stable towards the end of transcription.10 The difference between the experimental and the theoretical results suggests that further improvements in the free energy calculations for complex tertiary folds may be needed.

Figure 3:

Figure 3:

Populational kinetics of the selected helices predicted with the allowed maximum number of partitions Npmax=10 (A), 100 (B), 1000 (C), 5000 (D), and 10000 (E), respectively, during the transcription of the SRP RNA with ΔT = 106. The populations of the three stable helices of H1, H2, and H3 (shown in Fig. 1(C)) are highlighted in orange, blue and red, respectively, with others in gray for clarity.

As shown in Fig. 3 for the cases of H1, H2, and H3, the predicted fractional populations of the helices may change accordingly with the value of Npmax. In order to better illustrate the impact of Npmax and the transcription rate (the duration ΔT of each time window) on the predicted folding kinetics, in Fig. 4 and supplementary Fig. S6-S10, we track the fractional populations of individual helices at different transcription rates. We find that different values of Npmax may not give the identical predicted folding kinetics but the overall kinetic features, such as the timing and the rates of folding/unfolding helices, are quite conserved, especially for large Npmax values. For example, for helix H1 shown in Fig. 4(A), there is no significant accumulation of the populations (independent of the choice of Npmax) during transcription, since the nascent chains do not have sufficient time to fold into H1 in the fast transcription process with ΔT = 10−5. By slowing down the sequence elongation, the population of helix H1 emerges at the transcription site a. As suggested by experiments, there is a structural change during the late transcription, involving the disruption of H1 and formation of H3. As shown in Fig. 4(E, F, G, and H) and supplementary Fig. S7, the predicted fractional populations confirm the proposed structural transition at the transcription site b. Because the slow rate of the long-range base pairing in the most stable helix H3, we do not observe such structural changes during the fast transcriptions.

Figure 4:

Figure 4:

Populational kinetics of helix H1 predicted with different Npmax values at various transcription rates with ΔT = 10−5 (A), 102 (B), 103 (C), 104 (D), 105 (E), 106 (F), 107 (G), and 1015 (H), respectively, during the transcription of the SRP RNA. The folding (at the transcription site a) and unfolding (at the transcription site b) of helix H1 are transcription rate-dependent, as indicated by the arrows.

Furthermore, the detailed folding kinetics, such as the time scale, depends on the transcription rates. For the fast transcriptions with ΔT = 102 in Fig. 4(B) and 103 in Fig. 4(C), the population of H1 gradually increases, except for Npmax=10 (the brown lines). As indicated by the arrows in Fig. 4(D, E, and F) for the medium transcription rates of ΔT = 104, 105, and 106, respectively, the predicted population of H1 reaches a steady value (≈ 1.0) before the ending of transcription. As shown in Fig. 4(G and H) for ΔT = 107 and 1015, respectively, the consistent population-jump for H1 during the time window of (20, 25)-nt suggests that the nascent RNA chains may reach local equilibrium during the early transcription. We note that as indicated by the arrows in Fig. 4(G and H), the H3-induced unfolding of H1 during the late transcription requires a much longer folding time than the folding of H1.

Energy Landscape zooming for folding pathways

Tracing the cotranscriptional folding pathways is challenging because the helix pool varies on the fly as transcription proceeds. For simplicity, we classify partitions according to the helices included in the partitions. For example, the different “coil” partitions (−H1, −H2, ⋯, −Hn) are treated as the same partition since no helices in the (varying) helix pools are included. Generally, the overall cotranscriptional folding pathways are largely dependent on the transcription rates. With a fast transcription rate so that stable helices with multiple base pairs may not have sufficient time to fold, leading to the domination of coil-like structures in the whole transcription process. During a slow transcription, however, the partitions with stable helices may dominate the pathways over the coil-like structures. The (slow) unfolding of a stable helix and hence the structural rearrangement for a partition can occur in a slow transcription process.

As shown in Fig. 5 and supplementary Figures S11-S15, we trace the populational kinetics of all the possible partitions predicted with different Npmax at various transcription rates for the SRP RNA. During the fast transcription as shown in the case of ΔT = 10−5 in supplementary Fig. S11(A), the RNA chain remains in the “coil” partition without the formation of stable helices. With the decrease of the transcription rates, as shown in the cases of ΔT = 10−2 in Fig. 5(A) and ΔT = 10−1 in supplementary Fig. S11(C), partitions with helices of local base pairing (i.e., hairpins with small loops) emerge during the late transcription. However, the hairpin loop closed by the helix H1 has 11 unpaired nucleotides whose folding requires a relatively long time, as a result, partitions containing H1 emerge with a notable population (the orange and blue lines in Fig. 5 and supplementary Figures S11-S14) for a long-time duration ΔT ≥ 100.

Figure 5:

Figure 5:

Populational kinetics of partitions predicted with different Npmax at various transcription rates of ΔT = 10−2 (A), 104 (B), and 1015 (C), respectively, during the transcription of SRP RNA. Here the partitions (+H1, −H2, −H3), (+H1, +H2, −H3), and (−H1, +H2, +H3), the three structures on the proposed pathway shown in Fig. 1(C), are shown in orange, blue and red, respectively. The “coil” partition without the inclusion of any selected helices is shown in black. Other partitions are shown in gray for clarity. ① - ⑧ denote the partitions that emerge during the transcription. The eight partitions are chosen for illustrative proposes to show the zooming feature of the model.

Moreover, as proposed in experiment, helix H3 forms with the disruption of helix H1 during the late transcription. Such a structural change requires a much longer time, since H1 and H3 are incompatible with each other, and both helices are quite stable. As shown in Fig. 5(B and C) and supplementary Fig. S13-S15, the folding of H3 happens for ΔT > 103. At the end of the transcription, the partition (−H1, +H2, +H3) becomes dominant over others for the slow transcriptions with ΔT > 107. Consistent with the experiment, the predicted cotranscriptional folding pathway of the SRP RNA during the slow transcriptions, as shown in Fig. 6, involves the three most stable helices of H1, H2, and H3. After the folding of two hairpins, as marked with ① and ② in Fig. 5(B) and Fig. 6, during the early transcription, helix H1 emerges. Because of the high stability of H1, the newly transcribed nucleotides in the 3’-end tail of H1 form base pairs within the tail without causing the disruption of H1. Therefore, during the mid-stage of the transcription, partitions containing H1 and H2 dominate the population. In the late stage of the transcription, after the nascent chain extends to about 100 nucleotides, H3 begins to form with the disruption of H1.

Figure 6:

Figure 6:

The cotranscriptional folding pathway of SRP RNA, which involves three stable helices: H1 (in orange), H2 (in blue), and H3 (in red). The partitions ① - ⑧ play important roles, as shown in Fig. 5, during the transcription. Here, helices are depicted by the terminal nucleotide pairs. For example, helix H1 is denoted as (3-25, 8-20). The eight partitions are chosen for illustrative proposes to show the zooming feature of the model.

As indicated by the blue dashed arrows shown in Fig. 6 and the corresponding fractional populations shown in Fig. 5, the current new model has the ability to predict the folding kinetics at different resolutions. For example, with Npmax=10 and ΔT = 104, the predicted cotranscriptional folding kinetics for the SRP RNA in Fig. 5(B1) shows a population jump of the partition ③ within the time window of (65, 70)-nt, indicating that the helix (36-67, 41-62) may play an important role in the folding. For Npmax=100 in Fig. 5(B2), the time window during which partition ③ is dominant extends to (65, 90)-nt. For the late-stage transcription beyond nucleotide 70, helix (36-67, 41-62) is not in the helix pool with Npmax=10, therefore, the explicit kinetic role of this helix is missing from the prediction. Further larger Npmax values reveal that helix (30-74, 32-72) in partition ④ has significant effects on the folding kinetics of the SRP RNA in the time window of (65, 90)-nt. Furthermore, the native partition ⑤ shown in Fig. 5(C) and Fig. 6 contains only H2 and H3 for Npmax=10. With the increase of Npmax, the kinetic role of the additional helices, i.e., helix (21-94, 25-90) in ⑥, (28-85, 31-82) in ⑦, and (46-69, 48-67) in ⑧, during the late transcription would be revealed.

The free energy-based RNA 2D folding model is largely dependent on the accuracy of free energy calculations. To examine the robustness of the model, we have also used RNAstructure,52 a different free energy-based model for RNA folding to evaluate the free energies of the partitions and the rate matrices for the master equation. We have downloaded the software package from the RNAstructure website and run the calculations on local machines with the default parameters. Specifically, from the “Fold” algorithm of RNAstructure, with a constraint file (.con) to specify the included and excluded helices, we predict the lowest free energy structure and a set of low free energy structures for a partition. For short nascent RNA chains, RNAstructure may not output predictions. In that case, we manually set the free energy to be 0.0 and 2.0 kcal/mol for partitions with and without the “coil” structure, respectively. As shown in Figs. S16-S19, the RNAstructure-based cotranscriptional folding kinetics of the SRP RNA is quite similar to the Vfold2D-based prediction. Therefore, for the tested case, the use of Vfold2D may not significantly bias the presented results. In general, the predictions of the folding kinetics would be dependent on the free energy model. We note that the most important free energy parameters in the Vfold2D folding model, namely, the thermodynamic parameters for the base pairing and stacking interactions, are derived from experimental measurements and are thus reliable. The reliability of the model has also been supported by extensive tests against experimental data for the folding thermodynamics of various systems.53-57 As a caveat, however, we note that the thermodynamic parameters are dependent on solution conditions such as the temperature and salt concentration, therefore, the validity of the free energy model for different experimental conditions requires further systematic investigation.58-61 Molecular dynamics simulations, as a powerful approach to uncovering folding dynamics at the atomistic resolution, can provide reliable predictions for many RNA systems for the detailed kinetics and rate constants for base pairing and stacking interactions.44, 49, 62-64

Cotranscriptional folding pathway of HIV-1 TAR

To illustrate the application of the model, we further apply the model to investigate the cotranscriptional folding of HIV-1 transactivation response (TAR) element in the 5’ untranslated region (5’UTR). HIV-1 5’UTR contains about 360 nucleotides with several functional folded domains such as the TAR, Poly-A, PBS, and DIS.65 The first 5’ proximal stem-loop which is recognized as the trans-activating response element (TAR) has been found to be the most important transcriptional regulator at the early stage of the HIV viral RNA transcription.66-70 It has been reported that with Tat protein binding, the functional domain of TAR can regulate the genome transcription71, 72 by recruiting RNA polymerase II (RNAP II) subunit Cyclin-dependent kinase 9 (CDK9) to increase the transcription activity through trans-activation.73-77 In vivo and in vitro experiments have shown the possible functional roles of cotranscriptional RNA folding in viral replication and gene expression.2, 3, 67, 78, 79 In vitro experiment80 for the transcription of HIV-1 5’ UTR by RNAP II at a slow rate elongation ( 4 nt/sec) led to three major conclusions. (a) About 70% of the population entered a pausing state at site +62U with the formation of the nonnative hairpin (P3 and P4 in Fig. 7), and about 30% of the population entered the read-through (no pausing) transcription with the formation of the native TAR hairpin (N3 and N4 in Fig. 7); (b) The transcriptional pausing can last about 22 seconds; (c) In the absence of Tat protein binding, for the (70%) pausing population, transcription is resumed with the formation of the native TAR hairpin, suggesting a structural transition from the nonnative hairpin to the native TAR hairpin. Here, we apply our cotranscriptional folding model to predict and analyze the cotranscriptional folding pathway and the nascent structures of the TAR sequence in the 5’UTR of the HIV-1 genome.

Figure 7:

Figure 7:

(A) The cotranscriptional folding pathways of HIV-1 TAR from nucleotides 1 to 44. The upper pathway (nonnative pathway) from P1 to P4 and the lower pathway (native pathway) from N1 to N4 involve the formation of nonnative hairpin and the native TAR structure, respectively. Structure I serves as the intermediate between N1 and P2 and between N1 and P3. The black solid arrows show the folding population flow between the structures. The red solid arrow shows the net flow and the dotted red arrows indicate the pathway for the transition from P4 to N3. The blue solid arrow shows the experimentally suggested overall transition pathway from nonnative hairpin to native-like TAR hairpin. The number on each flow arrow shows the value of the corresponding population flow (flux) (Eq. 4). (B) The population kinetics during the cotranscriptional folding. (C) Free energy profile along the native pathway and the nonnative pathway, respectively.

We rescale the transcription speed to 4 nt/sec according to the experimental condition80 and apply the highest resolution of the model (Npmax=10000). Since a transcriptional pause at +62U of the RNA elongation has been detected in vitro,80 we conduct the cotranscriptional folding simulation to the elongation length of 62-nt in consistence with the experiments. Because RNAP II protects about 18 nucleotides upstream from the pause site. we allow the first 44 (= 62-18) nts of the nascent RNA to fold while the pause suspends the chain elongation.80

From the transition rates between the different states, we can compute the population flow Pij of the transition from state i to j:

Pij=[kjipj(t)kijpi(t)]dt, (4)

where kij and kji are the rates for the transitions from state i to j and from j to i, respectively, pi(t) and pj(t) are the populations of states i and j at time t, respectively. The transition pathway can be obtained from the transition net flows. As shown in Fig. 7(A), the model predicts two major cotranscriptional folding pathways. The folding pathway from N1 to N4 (the native pathway) shows the cotranscriptional formation of the native-like, partially folded TAR hairpin (N4). In this (native) pathway, structures N3 and N4 contain the -ucu- bulge loop, which is the Tat binding site. The pathway from P1 to P4 (the nonnative pathway) involves the formation of a 39-nt nonnative hairpin (P4 in Fig. 7(A)). The predicted structure P4 is consistent with the “nonnative hairpin” structure proposed in the experimental study.80

By investigating the two transcriptional folding pathways, a stem-loop structure (labeled as I in Fig. 7(A)), which connects structures P2 and N1 and structures P3 and N1, is identified as an intermediate structure for cross transition between the two main pathways. As shown in Fig. 7(B), the population flow indicates that the two folding pathways coexist at an early stage of the transcription. At the beginning of the transcription (1-nt ~ 19-nt), the cotranscriptional folding mainly follows the native pathway (> 90%). Along the transcription process, the flow in the nonnative pathway increases and the nonnative pathway becomes the major pathway (20-nt ~ 41-nt). Before the RNA chain reaches 39-nt (i.e., before N2 can be formed), the kinetic process is dominated by the transition from the (native-like) structure N1 in the native pathway to the (nonnative) structures P2 and P3 in the nonnative pathway. After the transcriptional elongation reaches 42-nt, the transition from the nonnative hairpin structure P4 to the native-like TAR hairpin N3 occurs. The transition causes the disruption of the pause-inducing nonnative hairpin and the subsequent de-pausing of the transcription; see also the experimentally suggested “escape” pathway80 in Fig. 7(A).

At the point of the transcriptional pause, the 44-nt native-like structure N4 is formed in the native pathway. Even though the transition from P4 to N3 still exists at this point, the P4 occupies about 80% of the folded structures (see Fig. 7(A) and (B)). We note that the transcription experiment for HIV-1 5’UTR shows about 70% of the transcription enters and stuck at the pause state with a nonnative hairpin structure folded at the pause site.80 The model predicts exactly the same nonnative hairpin P4 as the one found in experiment.80 The small theory-experiment difference may be attributed to the different solution conditions between the model and the experiment.

In view of the free energy profile, as shown in Fig. 7(C), before elongation reaches 42-nt, the free energy of the 39-nt nonnative hairpin P4 is much lower than that of the 39-nt native-like TAR hairpin N2 (−15.5 kcal/mol vs. −11.6 kcal/mol). As the chain elongates from 39-nt to 44-nt, the free energy of the nonnative hairpin P4 remains nearly unchanged because no additional base pairs can form, while the free energies of the native-like TAR hairpins (N3 to N4) continue to decrease as the helix extends, and the extension of the helix in the TAR hairpins induces the free energy decrease.

The equilibrium Boltzmann distribution, calculated from the free energies shown in Fig. 7(C) for the RNA at the pause site +62U, gives about 70% for the nonnative hairpin P4 and 30% for the native-like TAR hairpin N4. The equilibrium distribution is not exactly equal to the cotranscriptional distribution (about 80% for the nonnative hairpin P4 and 20% for the native-like TAR hairpin N4) even for the 4 nt/sec slow transcription.80 The nonequilibrium distribution of the cotranscriptional folding suggests the folding of TAR may be kinetically controlled. However, as a caveat, we note that the free energy calculations assume the 1M NaCl solution condition, which is different from the experimental condition (33 mM K+, 8mM Mg2+, and 0.2 mM Na+). The different solution condition can affect the free energy results and hence the predicted equilibrium and kinetic distributions. Therefore, for the slow transcription at 4 nt/s, the computational results here cannot exclude the possibility that the folding of the TAR region might be thermodynamically controlled. Nevertheless, for in vivo transcription, which proceeds at a much faster rate (over 20 nt/sec), the kinetic effect would be much more pronounced, and the cotranscriptional folding of TAR would be highly likely kinetically controlled.

As the transcription pause occurs, the non-equilibrium system will continue to evolve and the nonnative structures can slowly switch to the native-like TAR hairpin. Once the native-like TAR hairpin is formed, the Tat protein can bind to the UCU bulge loop and promote de-pausing.

Conclusions

Based on a helix-based structure network, we have developed a new RNA folding kinetics model. Specifically, the model uses stable helices to assemble partitions (local minima of the folding energy landscape) and less-stable helices to characterize the fluctuations around the partitions (minima). The network of structures that consist of partially folded and less-stable helices forms the kinetic pathways between partitions. Therefore, the model enables predictions of RNA folding kinetics from the full folding energy landscape including both the stable and unstable helices. As shown in Figs. 3, 4 and 5, from the master equation-derived fractional populations of partitions, the model reveals the kinetic roles of selected helices, detailed folding pathways, folding time scales of helix formation/disruption and structural rearrangements, as well as possible intermediate and/or trapped states (partitions) during transcription. Such kinetics features may help us understand the functions of RNAs during transcription.

Furthermore, as illustrated by the case of E. coli SRP RNA, the model can fine-tune the resolution of the predicted folding kinetics. Since the model selects helices from the most to the least stable saturated helices to build the selected helix pool, the parameter of Npmax (to control the number of selected helices) reflects the resolution of the structure description and the degrees of coarse-graining for the folding energy landscape. Smaller the value of Npmax, coarser the structure description, and larger structural changes for a kinetic move. For short RNAs, we can use large values of Npmax to predict the folding kinetics with high resolutions. For long RNAs, however, lowering the structural resolution by using a smaller Npmax may be necessary in order to increase the computational efficiency. In particular, for a long RNA sequence, we can use a small number of long (stable) helices to effectively divide the whole folding energy landscape into a desired (computationally feasible) number of partitions and focus on the overall folding kinetics with a low-resolution description of the structure.

The model predictions not only agree with the experimental results but also provide new insights into the cotranscriptional folding of the TAR domain of the HIV-1 5’UTR. We predict two parallel folding pathways: one leads to the native TAR hairpin and the other leads to a misfolded hairpin that can induce a transcription pause. The predicted populations of the misfolded and the native-like hairpins is close to the experimental result. Moreover, the populational distribution is non-equilibrium, suggesting a cotranscriptional kinetic control of the folding. Our simulation further shows a slow transition from the misfolded structure to the native TAR structure. We infer that the transcriptional pause may promote the formation of the native TAR structure with or without the help of Tat.

The current version of the model omits pseudoknots for the reason of computational efficiency. To sample all the possible pseudoknots during the cotranscriptional folding and calculate their transition rates using KMC simulation can be exceedingly demanding in computation. To test the impact of pseudoknots on cotranscriptional folding kinetics, we investigate the B. cereus fluoride riboswitch for which experiments suggest the formation of a pseudoknot.10, 81 As shown in supplementary Figs. S20 and S21, including the pseudoknot in the kinetic ensemble can indeed significantly improve the accuracy of the prediction. Therefore, future development of the model should include pseudoknots in the conformational ensemble.

Moreover, transcription in vivo can be more complicated than the process in the simplified model. The speed of nucleotide elongation can be site-specific82 and many genes contain function-related specific pausing sites where the sequence elongation is temporarily stalled.83-85 The new model developed here may serve as framework for considering such in vivo effects. In addition, metal ions and ligand molecules can alter RNA folding kinetics.86-92 If the appropriate energy and rate parameters are available, the current model may be further extended to treat the kinetic coupling between RNA folding and ligand and/or ion binding.

Supplementary Material

suppl

Acknowledgments

We thank Dr. Xiao Heng for many useful discussions. This work was supported by the National Science Foundation of China 12074151 (to X. Xu) and 22003020 (to L. Xie), and the National Institutes of Health R35-GM134919 (to S.-J. Chen).

Footnotes

The authors declare no competing financial interest.

Supplementary Information

Technical details about the populational inheritance, Size distributions of the saturated helices of the SRP RNA, Discrete folding landscapes of the SRP RNA during transcription with different values of Npmax, Structural features of the full-length SRP RNA with different values of Npmax, Fractional populational kinetics of the selected helices predicted with different values of Npmax and ΔT, Fractional populational kinetics of folding partitions predicted with different values of Npmax and ΔT, The cotranscriptional folding kinetics of SRP RNA, predicted with RNAstructure for the free energy calculations. (PDF)

This information is available free of charge via the Internet at http://pubs.acs.org.

References

  • [1].Boyle J; Robillard GT; Kim SH Sequential folding of transfer RNA: A nuclear magnetic resonance study of successively longer tRNA fragments with a common 5 end. J. Mol. Biol 1980, 139, 601–625. [DOI] [PubMed] [Google Scholar]
  • [2].Kramer FR; Mills DR Secondary structure formation during RNA synthesis. Nucleic Acids Res. 1981, 9, 5109–5124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Pan T; Sosnick T RNA folding during transcription. Annu. Rev. Biophys. Biomol. Struct 2006, 35, 161–175. [DOI] [PubMed] [Google Scholar]
  • [4].Frieda KL; Block SM Direct observation of cotranscriptional folding in an adenine riboswitch. Science 2012, 338, 397–400. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Helmling C; Wacker A; Wolfinger MT; Hofacker IL; Hengesbach M; Furtig B; Schwalbe H NMR structural profiling of transcriptional intermediates reveals riboswitch regulation by metastable RNA conformations. J. Am. Chem. Soc 2017, 139, 2647–2656. [DOI] [PubMed] [Google Scholar]
  • [6].Uhm H; Kang W; Ha KS; Kang C; Hohng S Single-molecule FRET studies on the cotranscriptional folding of a thiamine pyrophosphate riboswitch. Proc. Natl. Acad. Sci. USA 2018, 115, 331–336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Strobel EJ; Cheng L; Berman KE; Carlson PD; Lucks JB A ligand-gated strand displacement mechanism for ZTP riboswitch transcription control. Nat. Chem. Biol 2019, 15, 1067–1076. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Zamft B; Bintu L; Ishibashi T; Bustamante C Nascent RNA structure modulates the transcriptional dynamics of RNA polymerases. Proc. Natl. Acad. Sci. USA 2012, 109, 8948–8953. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Geary C; Rothemund PW; Andersen ES A single-stranded architecture for cotranscriptional folding of RNA nanostructures. Science 2014, 345, 799–804. [DOI] [PubMed] [Google Scholar]
  • [10].Watters KE; Strobel EJ; Angela MY; Lis JT; Lucks JB Cotranscriptional folding of a riboswitch at nucleotide resolution. Nat. Struct. Mol. Biol 2016, 23, 1124–1131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Incarnato D; Morandi E; Anselmi F; Simon LM; Basile G; Oliviero S In vivo probing of nascent RNA structures reveals principles of cotranscriptional folding. Nucleic Acids Res. 2017, 45, 9716–9725. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Alkallas R; Fish L; Goodarzi H; Najafabadi HS Inference of RNA decay rate from transcriptional profiling highlights the regulatory programs of Alzheimers disease. Nat. Commun 2017, 8, 909. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Larsson AJ, Johnsson P, Hagemann-Jensen M, Hartmanis L, Faridani OR, Reinius B, Segerstolpe A, Rivera CM, Ren B and Sandberg R Genomic encoding of transcriptional burst kinetics. Nature 2019, 565, 251–254. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Tadigotla VR; Maoileidigh DO; Sengupta AM; Epshtein V; Ebright RH; Nudler E; Ruckenstein AE Thermodynamic and kinetic modeling of transcriptional pausing. Proc. Natl. Acad. Sci. USA 2006, 103, 4439–4444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Wong TN; Sosnick TR; Pan T Folding of noncoding RNAs during transcription facilitated by pausing-induced nonnative structures. Proc. Natl. Acad. Sci. USA 2007, 104, 17995–18000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Chauvier A; Picard-Jean F; Berger-Dancause JC; Bastet L; Naghdi MR; Dube A; Turcotte P; Perrreault J; Lafontaine DA Transcriptional pausing at the translation start site operates as a critical checkpoint for riboswitch regulation. Nat. Commun 2017, 8, 13892. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Steinert H; Sochor F; Wacker A; Buck J; Helmling C; Hiller F; Keyhani S; Noeske J; Grimm S; Rudolph MM; Keller H; Mooney RA; Landick R; Suess B; Furtig B; Wohnert J; Schwalbe H Pausing guides RNA folding to populate transiently stable RNA structures for riboswitch-based transcription regulation. eLife 2017, 6, e21297. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Widom JR; Nedialkov YA; Rai V; Hayes RL; Brooks CL III; Artsimovitch I; Walter NG Ligand modulates cross-coupling between riboswitch folding and transcriptional pausing. Mol. Cell 2018, 72, 541–552. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Aitken S; Alexander RD; Beggs JD Modelling reveals kinetic advantages of co-transcriptional splicing. PLoS Comput. Biol 2011, 7, e1002215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Honkela A; Peltonen J; Topa H; Charapitsa I; Matarese F; Grote K; Stunnenberg HG; Reid G; Lawrence ND; Rattray M Genome-wide modeling of transcription kinetics reveals patterns of RNA production delays. Proc. Natl. Acad. Sci. USA 2015, 112, 13115–13120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Yu AM; Gasper PM; Cheng L; Lai LB; Kaur S; Gopalan V; Chen AA; Lucks JB Computationally reconstructing cotranscriptional RNA folding from experimental data reveals rearrangement of non-native folding intermediates. Mol. Cell 2021, 81, 870–883. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Sauerwine B; Widom M Kinetic Monte Carlo method applied to nucleic acid hairpin folding. Phys. Rev. E 2011, 84, 061912. [DOI] [PubMed] [Google Scholar]
  • [23].Faber M; Klumpp S Kinetic Monte Carlo approach to RNA folding dynamics using structure-based models. Phys. Rev. E 2013, 88, 052701. [DOI] [PubMed] [Google Scholar]
  • [24].Lutz B; Faber M; Verma A; Klumpp S; Schug A Differences between cotranscriptional and free riboswitch folding. Nucleic Acids Res. 2014, 42, 2687–2696. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].Flamm C; Fontana W; Hofacker IL; Schuster P RNA folding at elementary step resolution. RNA 2000, 6, 325–338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Zhang W; Chen S-J RNA hairpin-folding kinetics. Proc. Natl. Acad. Sci. USA 2002, 99, 1931–1936. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Ndifon W A complex adaptive systems approach to the kinetic folding of RNA. Biosystems 2005, 82, 257–265. [DOI] [PubMed] [Google Scholar]
  • [28].Geis M; Flamm C; Wolfinger MT; Tanzer A; Hofacker IL; Middendorf M; Mandl C; Stadler PF; Thurner C Folding kinetics of large RNAs. J. Mol. Biol 2008, 379, 160–173. [DOI] [PubMed] [Google Scholar]
  • [29].Hofacker IL; Flamm C; Heine C; Wolfinger MT; Scheuermann G; Stadler PF BarMap: RNA folding on dynamic energy landscapes. RNA 2010, 16, 1308–1316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Xu X; Chen S -J. Kinetic mechanism of conformational switch between bistable RNA hairpins. J. Am. Chem. Soc 2012, 134, 12499–12507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Mironov AA; Dyakonova LP; Kister AE A kinetic approach to the prediction of RNA secondary structures. J. Biomol. Struct. Dyn 1985, 2, 953–962. [DOI] [PubMed] [Google Scholar]
  • [32].Mironov AA; Kister AE RNA secondary structure formation during transcription. J. Biomol. Struct. Dyn 1986, 4, 1–9. [DOI] [PubMed] [Google Scholar]
  • [33].Mironov AA; Lebedev VF A kinetic model of RNA folding. Biosystems 1993, 30, 49–56. [DOI] [PubMed] [Google Scholar]
  • [34].Gultyaev AP; Batenburg FHD; Pleij CWA The computer simulation of RNA folding pathways using a genetic algorithm. J. Mol. Biol 1995, 250, 37–51. [DOI] [PubMed] [Google Scholar]
  • [35].Shapiro BA; Bengali D; Kasprzak W; Wu JC RNA folding pathway functional intermediates: the prediction and analysis. J. Mol. Biol 2001, 312, 27–44. [DOI] [PubMed] [Google Scholar]
  • [36].Isambert H; Siggia ED Modeling RNA folding paths with pseudoknots: application to hepatitis delta virus ribozyme. Proc. Natl. Acad. Sci. USA 2000, 97, 6515–6520. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Xayaphoummine A; Bucher T; Isambert H Kinefold web server for RNA/DNA folding path and structure prediction including pseudoknots and knots. Nucleic Acids Res. 2005, 33, W605–W610. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [38].Danilova LV; Pervouchine DD; Favorov AV; Mironov AA RNAkinetics: a web server that models secondary structure kinetics of an elongating RNA. J. Bioinf. Comput. Biol 2006, 4, 589–596. [DOI] [PubMed] [Google Scholar]
  • [39].Zhang W; Chen S -J. Exploring the complex folding kinetics of RNA hairpin: I. General folding kinetics analysis. Biophys J. 2006, 90, 765–777. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [40].Zhao P; Zhang W; Chen S-J Cotranscriptional folding kinetics of ribonucleic acid secondary structures. J. Chem. Phys 2011, 135, 245101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [41].Gong S; Wang Y; Wang Z; Zhang W Co-transcriptional folding and regulation mechanisms of riboswitches. Molecules 2017, 22, 1169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [42].Wang Y; Wang Z; Liu T; Gong S; Zhang W Effects of flanking regions on HDV cotranscriptional folding kinetics. RNA 2018, 24, 1229–1240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [43].Sun T; Zhao C; Chen S-J Predicting cotranscriptional folding kinetics for riboswitch. J. Phys. Chem. B 2018, 122, 7484–7496. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [44].Sponer J; Bussi G; Kepl M; Banas P; Bottaro S; Cunha RA; Gil-Ley A; Pinamonti G; Poblete S; Jurecka P; Walter NG; Otyepka M RNA structural dynamics as captured by molecular simulations: A comprehensive overview. Chem. Rev 2018, 118, 4177–4338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [45].Wang F; Sun L-Z; Sun T; Chang S; Xu X Helix-based RNA landscape partition and alternative secondary structure determination. ACS Omega 2019, 4, 15407–15413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [46].Turner DH; Mathews DH NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Res. 2010, 38, D280–D282. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [47].Xu X; Zhao P; Chen S-J Vfold: a web server for RNA structure and folding thermodynamics prediction. PLoS ONE 2014, 9, e107504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [48].Metropolis N; Rosenbluth AW; Rosenbluth MN; Teller AH; Teller E Equation of state calculations by fast comuting machines. J. Chem. Phys 1953, 21, 1087–1092. [Google Scholar]
  • [49].Xu X; Yu T; Chen S-J Understanding the kinetic mechanism of RNA single base pair formation. Proc. Natl. Acad. Sci. USA 2016, 113, 116–121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [50].Hall CK; Helfand E Coformational state relaxation in polymer: Timecorrelation functions. J. Chem. Phys 1982, 77, 3275–3282. [Google Scholar]
  • [51].Lai D; Proctor JR; Meyer IM On the importance of cotranscriptional RNA structure formation. RNA 2013, 19, 1461–1473. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [52].Bellaousov S; Reuter JS; Seetin MG; Mathews DH RNAstructure: web servers for RNA secondary structure prediction and analysis. Nucleic Acids Res. 2013, 41, W471–W474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [53].Cheng Y; Zhang S; Xu X; Chen S-J Vfold2D-MC: A physics-based hybrid model for predicting RNA secondary structure folding. J. Phys. Chem. B 2021, 125, 10108–10118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [54].Cao S; Xu X; Chen S-J Predicting structure and stability for RNA complexes with intermolecular loop-loop base-pairing. RNA 2014, 20, 835–845. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [55].Cao S; Chen S-J Structure and stability of RNA/RNA kissing complex: with application to HIV dimerization initiation signal. RNA 2011, 17, 2130–2143. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [56].Cao S; Chen S-J Free energy landscapes of RNA/RNA complexes - with applications to snRNA complexes in spliceosomes. J. Mol. Biol. 2006, 356, 292–312. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [57].Cao S; Chen S-J Predicting RNA pseudoknot folding thermodynamics. Nucleic Acids Res. 2006, 34, 2634–2652 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [58].Tan Z-J; Chen S-J Importance of Diffuse Metal Ion Binding to RNA. In Structural and Catalytic Roles of Metal Ions in RNA, Sigel A, Sigel H, Sigel RKO, Eds.; Royal Society of Chemistry Publishing: London, United Kingdom, 2011; pp 101–124. [PMC free article] [PubMed] [Google Scholar]
  • [59].Tan Z-J; Chen S-J Predicting electrostatic forces in RNA folding. Methods Enzymol 2009, 469, 465–487. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [60].Sun L-Z; Chen S-J Monte Carlo Tightly Bound Ion Model: Predicting Ion-Binding Properties of RNA with Ion Correlations and Fluctuations. J. Chem. Theory Comput 2016, 12, 3370–3381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [61].He Z-J; Chen S-J Predicting ion-nucleic acid interactions by energy landscape-guided sampling. J. Chem. Theory Comput 2012, 8, 2095–2102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [62].Wang F; Sun L-Z; Cai P; Chen S-J; Xu X Kinetic mechanism of RNA helix-terminal base pairing - a kinetic minima network analysis. Biophys. J 2019, 117, 1674–1683. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [63].Colizzi F; Bussi G RNA unwinding from reweighted pulling simulations. J. Am. Chem. Soc 2012, 134, 5173–5179. [DOI] [PubMed] [Google Scholar]
  • [64].Hagan MF; Dinner AR; Chandler D; Chakraborty AK Atomistic understanding of kinetic pathways for single base-pair binding and unbinding in DNA. Proc. Natl. Acad. Sci. USA 2003, 100, 13922–13927. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [65].Brigham BS; Kitzrow JP; Reyes JPC; Musier-Forsyth K; Munro JB Intrinsic conformational dynamics of the HIV-1 genomic RNA 5’ UTR. Proc. Natl. Acad. Sci. USA 2019, 116, 10372–10381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [66].Cullen BR Regulation of HIV-1 gene expression. FASEB J. 1991, 5, 2361–2368. [DOI] [PubMed] [Google Scholar]
  • [67].Karn J; Stoltzfus CM Transcriptional and posttranscriptional regulation of HIV-1 gene expression. Cold Spring Harb Perspect. Med 2012, 2, a006916. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [68].Ouellet DL; Vigneault-Edwards J; Letourneau K; Gobeil LA; Plante I; Burnett JC; Rossi J; Provost P Regulation of host gene expression by HIV-1 TAR microRNAs. Retrovirology 2013, 10, 86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [69].Roebuck KA; Saifuddin M Regulation of HIV-1 transcription. Gene Expr. 1999, 8, 67–84. [PMC free article] [PubMed] [Google Scholar]
  • [70].Bannwarth S; Gatignol A HIV-1 TAR RNA: the target of molecular interactions between the virus and its host. Curr. HIV Res 2005, 3, 61–71. [DOI] [PubMed] [Google Scholar]
  • [71].Schulze-Gahmen U; Hurley JH Structural mechanism for HIV-1 TAR loop recognition by Tat and the super elongation complex. Proc. Natl. Acad. Sci. USA 2018, 115, 12973–12978. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [72].Roy S; Delling U; Chen CH; Rosen CA; Sonenberg N A bulge structure in HIV-1 TAR RNA is required for Tat binding and Tat-mediated trans-activation. Genes Dev. 1990, 4, 1365–1373. [DOI] [PubMed] [Google Scholar]
  • [73].Zhang J; Tamilarasu N; Hwang S; Garber ME; Huq I; Jones KA; Rana TM HIV-1 TAR RNA enhances the interaction between Tat and cyclin T1. J. Biol. Chem 2000, 275, 34314–34319. [DOI] [PubMed] [Google Scholar]
  • [74].Unwalla HJ; Li MJ; Kim JD; Li HT; Ehsani A; Alluin J; Rossi JJ Negative feedback inhibition of HIV-1 by TAT-inducible expression of siRNA. Nat. Biotechnol 2004, 22, 1573–1578. [DOI] [PubMed] [Google Scholar]
  • [75].Davidson A; Leeper TC; Athanassiou Z; Patora-Komisarska K; Karn J; Robinson JA; Varani G Simultaneous recognition of HIV-1 TAR RNA bulge and loop sequences by cyclic peptide mimics of Tat protein. Proc. Natl. Acad. Sci. USA 2009, 106, 11931–11936. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [76].Cullen BR Trans-activation of human immunodeficiency virus occurs via a bimodal mechanism. Cell 1986, 46, 973–982. [DOI] [PubMed] [Google Scholar]
  • [77].Jones KA Taking a new TAK on tat transactivation. Genes Dev. 1997, 11, 2593–2599. [DOI] [PubMed] [Google Scholar]
  • [78].Hidalgo L; Swanson CM Regulation of human immunodeficiency virus type 1 (HIV-1) mRNA translation. Biochem. Soc. Trans 2017, 45, 353–364. [DOI] [PubMed] [Google Scholar]
  • [79].Richter SN; Belanger F; Zheng P; Rana TM Dynamics of nascent mRNA folding and RNA-protein interactions: an alternative TAR RNA structure is involved in the control of HIV-1 mRNA transcription. Nucleic Acids Res. 2006, 34, 4278–4292. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [80].Palangat M; Meier TI; Keene RG; Landick R Transcriptional pausing at +62 of the HIV-1 nascent RNA modulates formation of the TAR RNA structure. Mol. Cell 1998, 1, 1033–1042. [DOI] [PubMed] [Google Scholar]
  • [81].Baker JL; Sudarsan N; Weinberg Z; Roth A; Stockbridge RB; Breaker RR Widespread genetic switches and toxicity resistance proteins for fluoride. Science 2012, 335, 233–235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [82].Buratowski S The basics of basal transcription by RNA polymerase II. Cell 1994, 77, 1–3. [DOI] [PubMed] [Google Scholar]
  • [83].Artsimovitch I; Landick R Interaction of a nascent RNA structure with RNA polymerase is required for hairpin-dependent transcriptional pausing but not for transcript release. Genes Dev. 1998, 12, 3110–3122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [84].Landick R RNA polymerase slides home: pause and termination site recognition. Cell 1997, 88, 741–744. [DOI] [PubMed] [Google Scholar]
  • [85].Landick R The regulatory roles and mechanism of transcriptional pausing. Biochem. Soc. Trans 2006, 34, 1062–1066. [DOI] [PubMed] [Google Scholar]
  • [86].Denesyuk NA; Thirumalai D How do metal ions direct ribozyme folding? Nat. Chem 2015, 7, 793–801. [DOI] [PubMed] [Google Scholar]
  • [87].Hori N; Denesyuk NA; Thirumalai D Salt effects on the thermodynamics of a frameshifting RNA pseudoknot under tension. J. Mol. Biol 2016, 428, 2847–2859. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [88].Shi Y-Z; Jin L; Feng C-J; Tan Y-L; Tan Z-J Predicting 3D structure and stability of RNA pseudoknots in monovalent and divalent ion solutions. PLoS Comput. Biol 2018, 14, e1006222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [89].Jin L; Tan Y-L; Wu Y; Wang X; Shi Y-Z; Tan Z-J Structure folding of RNA kissing complexes in salt solutions: predicting 3D structure, stability, and folding pathway. RNA 2019, 25, 1532–1548. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [90].Kuhnl F; Stadler PF; Will S Tractable RNA-ligand interaction kinetics. BMC Bioinformatics 2017, 18, 424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [91].Gong S; Wang Y; Wang Z; Wang Y; Zhang W Genetic regulation mechanism of the yjdF riboswitch. J. Theor. Biol 2018, 439, 152–159. [DOI] [PubMed] [Google Scholar]
  • [92].Wolfinger MT; Flamm C; Hofacker IL Efficient computation of co-transcriptional RNA-ligand interaction dynamics. Methods 2018, 143, 70–76. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

suppl

RESOURCES