Skip to main content
Biophysical Journal logoLink to Biophysical Journal
. 2022 Aug 24;121(18):3345–3357. doi: 10.1016/j.bpj.2022.08.026

Topology and kinetics of R-loop formation

Boris P Belotserkovskii 1,, Philip C Hanawalt 2,∗∗
PMCID: PMC9515371  PMID: 36004778

Abstract

R-loops are structures containing an RNA-DNA duplex and an unpaired DNA strand. They can be formed upon “invasion” of an RNA strand into a DNA duplex, during which the RNA displaces the homologous DNA strand and binds the complementary strand. R-loops have many significant beneficial or deleterious biological effects, so it is important to understand the mechanisms for their generation and processing. We propose a model for co-transcriptional R-loop formation, in which their generation requires passage of the nascent RNA “tail” through the gap between the separated DNA strands. This passage becomes increasingly difficult with lengthening of the RNA tail. The length of the tail increases upon increasing distance between the transcription start site and the site of R-loop initiation. This causes reduced yields of R-loops with greater distance from the transcription start site. However, alternative pathways for R-loop formation are possible, involving either transient disruption of the transcription complex or the hypothetical formation of a triple-stranded structure, as a “collapsed R-loop.” These alternative pathways could account for the fact that in many systems R-loops are observed very far from the transcription start site. Our model is consistent with experimental data and makes general predictions about the kinetics of R-loop formation.

Graphical abstract

graphic file with name fx1.jpg

Significance

RNA copies are made from sections of the DNA genome as an essential step in the transfer of genetic information for synthesis of proteins and control of cellular processes. During transcription, the nascent RNA is normally separated from the DNA template to become available for further transactions; however, in some cases it remains bound to the transcribed DNA strand to form what is termed an “R-loop.” Since R-loops may have both regulatory and deleterious biological effects, it is important to learn details of their frequency and persistence. We have developed a mechanistic model for R-loop formation that is consistent with experimental results and may contribute to predictions of R-loop occurrence and an understanding of their biological functions.

Introduction

During transcription the nascent RNA is usually separated from the DNA template before leaving the transcription complex, and it remains separated as the transcript is completed. However, in some cases the nascent RNA rehybridizes with the template DNA strand to form an RNA-DNA duplex. The resulting structure containing the RNA-DNA duplex and the displaced non-template DNA strand is termed an R-loop (reviewed in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)). In addition to co-transcriptional formation, R-loops can also be generated by RNA invasion following complete separation of the RNA product from its DNA template (17, 18, 19). In that case, the RNA could have been transcribed from a region that is different from one that it invades (invasion in trans). This mechanism of R-loop formation usually requires special proteins to promote RNA invasion (17, 18, 19). In the following discussion we will consider only co-transcriptional R-loop formation.

R-loop formation is facilitated by any factors that render the RNA-DNA duplex more energetically favorable in comparison with the DNA-DNA duplex. These factors include superior intrinsic stability of the RNA-DNA duplex over DNA-DNA duplex for certain “R-loop-prone” sequences (in particular, the sequences in which the non-template DNA strand is enriched in guanines), negative supercoiling that destabilizes the DNA-DNA duplex, and various anomalies within the non-template strands (e.g., unusual DNA structures, defects, or bound ligands) that interfere with its ability to bind the template DNA strand (reviewed in (5)).

Importantly, even in the case in which R-loop formation is energetically favorable, the probability of its formation in one round of transcription could be low because RNA polymerases (RNAPs) are “designed” to defacilitate R-loop formation; first by a special protein moiety within the RNAP that peels RNA from the DNA template after the synthesis of a short (around 8 bp) RNA-DNA duplex localized inside the transcription complex, and then by an exiting channel that guides the nascent RNA away from the DNA (20, 21, 22, 23, 24, 25).

Because of numerous biological effects of R-loops, it is of importance to build a physical model for their formation that would enable prediction of their distribution within genomes based on their locations, sequence, DNA supercoiling, and other factors.

In this regard, a detailed physical description of R-loop formation based on a statistical mechanical equilibrium analysis was proposed (26). In this approach, the probability of formation of an R-loop that occupies a certain DNA sequence within the transcribed region was calculated, as if this sequence was in thermodynamic equilibrium between the states that contain an R-loop and the state in which the DNA duplex and the RNA are separated. Within the framework of this approach the probability of R-loop formation depends only on the difference in the free energies between the R-loop-containing states and the state in which R-loops are absent, regardless of the mechanism of R-loop formation. Predictions for sequence dependence of R-loop formation based on this approach are in good agreement with the experimental results (26). However, from the equilibrium point of view the probability of R-loop formation would not depend on the distance between the R-loop-prone sequence and the transcription start site, because the free energy of R-loop formation does not depend on this distance. In contrast, much more efficient R-loop formation in the vicinity of the transcription start site was observed in many experimental systems (27, 28, 29, 30, 31). Importantly, for the same R-loop-prone sequence the yields of R-loops decreased sharply upon increased distance from the transcription start site, while the stability of these R-loops against dissociation did not depend on the distance (31). This indicates that, at least in this system, the yields of R-loops are defined by the on-rate of R-loop formation (which likely depends on the initiation event within a small portion of R-loop-forming sequences) rather than the total energy of interactions within the entire R-loop-forming sequence (which would define the rate of R-loop dissociation). Strong dependence of the efficiency of R-loop formation on the presence of short very R-loop-prone sequences (G-clusters) also indicates the key role of the initiation step for R-loop formation (32).

Thus, certain important aspects of R-loop formation cannot be explained within the equilibrium approach that is based solely on the free energy of an R-loop; rather, this explanation would require non-equilibrium kinetic considerations based on the mechanism(s) of R-loop formation and, in particular, its initiation step.

In (33), we suggested several possible mechanisms (pathways) for R-loop formation. Here we analyze these pathways in more detail, and build upon them a kinetic model that is in agreement with the available experimental data.

Results

Pathways for R-loop formation

Tail-passage pathway

The tail-passage pathway for R-loop formation is shown in Fig. 1.

Figure 1.

Figure 1

R-loop formation via the tail-passage pathway. DNA is shown in gray (the template strand is darker); RNA is shown in black; base pairing is shown by thin straight lines. RNA polymerase (RNAP) is shown as a gray oval. The state S corresponds to the normal transcription in elongation mode. In the state S, a short DNA region (marked by olive-green asterisk) becomes transiently opened due to thermal fluctuations (DNA “breathing”). In the state I, the initial base pairing is formed between the nascent RNA and the transiently opened template DNA strand. This base pairing grows up to a certain length producing the paranemic duplex (state P). In the state P, a loop formed by the nascent RNA tail (shown in orange) protrudes into the gap between the DNA strands. In the course of the thermal motion of the RNA tail the size of the loop could change in a random walk fashion, forming a family of states collectively designated as P. Eventually the entire RNA tail would protrude through the gap. The completion of this protrusion (the tail passage) leads to the state R that contains the stable plectonemic RNA-DNA duplex. To see this figure in color, go online.

We designated as S the starting state that contains the normal transcription complex and an undisturbed DNA duplex. Within the DNA duplex a transient disruption of DNA base pairing (opening) due to thermal fluctuations could occur to create an unstable short-lived state S.

If a homologous region of the nascent RNA occurs in the vicinity of this transiently opened DNA region, it can base pair with the complementary DNA strand within that region. This initial base-paired state, containing a minimal possible number of base pairs sufficient for hybridization (perhaps only one base pair) is designated as I. This hybridization can initiate the RNA “invasion” into the DNA, during which the DNA-DNA duplex is replaced by the RNA-DNA duplex. For R-loop-prone sequences this displacement is energetically favorable. However, unless the R-loop-prone sequence is localized at the very end of the transcript, this displacement encounters a topological problem that arises because, in order to form an R-loop, the nascent RNA must wind around the template DNA strand while rendering the non-template DNA strand unwound. Within a normal transcription complex RNAP is bound to both the DNA strands and to the nascent RNA strand, and, consequently, the positions of the strands relative to each other are fixed, thus restricting their rearrangements. In this case, the nascent RNA winding around the template DNA strand could be achieved only by the RNA tail passage through the gap between the DNA strands. Until this passage is completed the duplex between the nascent RNA and the DNA remains topologically unwound (further referred to as “paranemic,” in contrast to a topologically intertwined “plectonemic” duplex).

The paranemic RNA-DNA duplex (designated as P) is structurally distorted in comparison with the normal plectonemic duplex and, consequently, is less stable. However, these distortions (and respective destabilization) would become strongly pronounced only when the duplex approaches the length at which the RNA and the DNA strands normally would be intertwined, (i.e., about one helical turn) For shorter duplexes the distortion would not be strongly pronounced. Consequently, within an R-loop-prone sequence (i.e., the sequence for which the “normal” plectonemic RNA-DNA duplex is more stable than the respective DNA-DNA duplex), the paranemic RNA-DNA duplex could displace the normal DNA-DNA duplex up to some critical length that is less than one helical turn (i.e., less than 10–11 bp). Thereafter, the paranemic RNA-DNA duplex becomes less stable than the normal DNA-DNA duplex, so the displacement of the latter by the former would no longer be energetically favorable, and consequently the paranemic duplex would stop growing. Thus, we suggest that the paranemic RNA-DNA duplex could grow only up to a certain length, less than one helical turn. The “lifetime” of such a short duplex before its dissociation would be relatively short. However, during this “lifetime,” a loop formed by the nascent RNA strand (shown in orange in Fig. 1) could protrude into the gap between the paranemic duplex and the non-template DNA strand. (The loop-containing states are designated as P.) This loop could either spontaneously grow or “shrink” due to thermal diffusion, and eventually it would return to the initial “non-looped” state or the entire RNA tail would pass through the gap. If the nascent RNA tail were to complete the passage before the paranemic duplex dissociation, it would form a stable plectonemic RNA-DNA duplex (designated as R). If this happened, an “attempt” to form an R-loop would be successful; otherwise, the “attempt” would fail, and the entire process would start over.

Note that, upon R-loop propagation, the tail passage must be repeated roughly each time the RNA-DNA duplex grows by one helical turn (i.e., about 10 bp). However, after the first tail passage the R-loop becomes kinetically stable. Thus, this first tail passage is likely to be a limiting step for R-loop formation.

Alternative pathways for R-loop formation

The tail-passage mechanism predicts that the farther away from the transcription start site the R-loop formation occurs, the longer the RNA tail; consequently, it becomes increasingly difficult for this tail to pass through the gap between the DNA strands. In accordance with this, multiple observations show that R-loop formation is much more efficient if the R-loop-prone sequence is localized closer to transcription start site (27, 28, 29, 30, 31). For example, in an in vitro system with mitochondrial RNAP the yield of R-loops sharply decreased about 50-fold while the distance from the transcription start site increased from 4 bp to about 100 bp (31).

However, in some systems R-loops were observed hundreds of nucleotides away from the transcription start site (e.g., (26,34,35), reviewed in (36)). Thus, R-loops could form far away from the transcription start site, in which case the RNA-tail-passage pathway (Fig. 1) is likely to be strongly suppressed because the tail is very long; in addition, the RNA tail is likely to be bound to various RNA-binding proteins in vivo that might completely block the tail passage.

This suggests existence of alternative “tail-independent” pathways for R-loop formation. The contribution of these alternative pathways to R-loop formation is small for short tails but becomes increasingly more pronounced with the growing of the tail length that suppresses the tail passage.

Since the requirement for the tail passage stems from the simultaneous binding of the RNAP to all nucleic acid strands within the transcription complex, the alternative pathways for R-loop formation could involve partial disruption of interactions between the RNAP and the nucleic acid strands within the transcription complex. For example, the RNAP could (transiently) detach from the non-template DNA strand so that the transcription complex can rotate around the template DNA strand, thus “spooling” the RNA onto the template DNA strand (Fig. 2).

Figure 2.

Figure 2

Conversion of paranemic duplex into plectonemic via RNAP rotation. Within the transcription complex containing the paranemic duplex (P) the non-template DNA strand transiently unbinds from the RNAP, forming intermediate complexes P# within which the transcription complex is capable of rotating using the single-stranded flanking region of the template DNA strands as swivels (the rotational axis between the swivels is shown as a red dash-dotted line). The states that correspond to the rotation angles 0° and 180° are shown. Rotation through 360° leads to formation of the plectonemic RNA-DNA duplex R# where the non-template DNA strand is still unbound. The non-template strand then rebinds to the RNAP, leading to formation of the final state R. To see this figure in color, go online.

This pathway would require overcoming the substantial energy barrier caused by disruption of RNAP interactions with the non-template DNA strand and by additional DNA unwinding at the junctions of the transcription bubble to alleviate sterical impediments for the rotation of the bulky RNAP.

A similar pathway could be mediated by complete dissociation of RNAP from the transcription complex (Fig. 3, left).

Figure 3.

Figure 3

R-loop formation mediated by complete RNAP dissociation. At the left, dissociation occurs at the stage of the paranemic duplex formation; at the right, dissociation occurs from the normal transcription complex. To form the plectonemic duplex, the short RNA-DNA duplex that survived RNAP dissociation rotates in a similar way as the transcription complex in Fig. 2. To see this figure in color, go online.

As mentioned above, the normal transcription complex contains a short (about 8 bp) RNA-DNA duplex. Such a short duplex by itself is likely to be short-lived; however, it could survive after the RNAP dissociation for a sufficiently long period of time to initiate further RNA invasion into the DNA duplex. For the R-loop-prone sequence and/or negative supercoiling this invasion would occur spontaneously via strand exchange, driven by the superior stability of the RNA-DNA hybrid versus the DNA-DNA hybrid. During this invasion, the RNA-DNA duplex would rotate around the template DNA strand, similar to the RNAP rotation pathway (Fig. 2). However, the energy barrier for this rotation would be much less for the duplex alone than for the entire transcription complex, because the duplex is much smaller.

In Fig. 3 (left), we show the RNAP dissociation pathway preceded by paranemic duplex formation. In this case, formation of the RNA-DNA duplex behind the transcribing RNAP could destabilize the transcription complex and facilitate RNAP dissociation (reviewed in (5)).

In principle, this pathway could be also initiated by RNAP dissociation from the normal transcription complex (Fig. 3, right). However, since normal transcription elongation complexes are usually very stable, their dissociation might require some external factors (e.g., deproteinization). Induction of R-loop formation by deproteinization was directly demonstrated in an in vitro system with Escherichia coli RNAP (37).

Yet another solution of the topological problems for R-loop formation is to form a triple-helical structure (often referred to as “collapsed R-loop” (38,39)), in which the non-template DNA strand is wound around the RNA-DNA duplex with the same helicity as the RNA (Fig. 4).

Figure 4.

Figure 4

Formation and propagation of “collapsed” R-loop. Displaced region of the non-template DNA strand wrapped around the RNA-DNA duplex is shown in gold. At the very bottom: after RNAP dissociation the displaced non-template DNA strand could unwind from the RNA-DNA duplex, thus converting the collapsed R-loop into the normal R-loop. To see this figure in color, go online.

Such tight winding implies localization of the displaced DNA strand within one of the grooves on the surface of the RNA-DNA duplex. In Fig. 4 the non-template DNA strand is positioned in the minor groove of the RNA-DNA duplex according to the model suggested in (40). In this structure the windings of the RNA and the non-template DNA strand are the same, so formation of the collapsed R-loop would not require either tail passage or RNAP detachment from the non-template DNA strand (5) (see Fig. S1 for more detailed analysis based on “inversed helical representation”). Although there is experimental evidence supporting the existence of collapsed R-loops (38,40), their existence is not yet firmly established. Possibly, collapsed R-loops exist only as long as the RNAP is holding all three strands together, after which they are converted to canonical R-loops upon RNAP dissociation.

Another possible pathway for R-loop formation is via RNAP backtracking (reviewed in (10,41)). During backtracking RNAP slides back, and the nascent RNA is extruded from the front part of RNAP through the nucleotide entering pore (reviewed in (41)). According to this model, R-loop formation initiated by this extruded RNA in front of the RNAP rather than behind the RNAP, as in conventional models. Since the front-extruded RNA “tail” could be short regardless of the length of the transcript, backtracking could be a mechanism for R-loop formation far away from the transcription start site. There is compelling experimental evidence in favor of this mechanism for R-loop formation in certain systems (e.g., (42)). However, there are questions about the generality and details of this mechanism: For example, how are experimentally observed very long R-loops, containing hundreds or even thousands of base pairs (e.g., (43)), formed by this mechanism: does the RNAP backtrack the entire length, or does it dissociate after formation of a short R-loop, after which the rest of RNA invades DNA via strand exchange? Also, the backtracking mechanism suggests that transcription is terminated following R-loop formation, while there is experimental evidence that, at least in some systems, transcription continues after R-loop formation (35). In addition, the T7 RNAP polymerase, which efficiently forms R-loops (e.g., (44)), is probably incapable of backtracking (45). Since backtracking-mediated R-loop formation does not depend on the length of the tail, in the “first approximation” R-loop formation by this mechanism should not depend on the distance from the transcription start site. However, one could also argue that within a longer RNA tail, stable RNA secondary structures that would suppress backtracking are more likely to appear, which might account for a dependence on the distance from the transcription start site.

Kinetics of R-loop formation

To evaluate the effective rate constant (keff) of R-loop formation we will first consider only the tail-passage pathway, and we will apply the steady-state approximation to the kinetic scheme shown in Fig. 1. Note that in our kinetic analysis, instead of considering all loop-containing states (P) separately, we introduce an apparent rate constant ktp that describes the transition from the paranemic complex P to the plectonemic complex R via loop-containing states P.

Within the steady-state approximation, the rates of formation of the intermediate products are equivalent to the rates of their dissociation.

Thus, from the steady-state condition for S, I, and P, we obtain

k1+S+k2I=(k1+k2+)S, (1)
k2+S+k3P=(k2+k3+)I, (2)
k3+I=(k3+ktp)P. (3)

The rate of accumulation of the final product R from the paranemic complex is

dRdt=ktpP. (4)

The effective rate constant keff for the conversion of the starting state S into the final state R is defined by the equation

dRdt=keffS. (5)

By expressing P via S from Eqs. 1, 2, and 3, substituting the result into Eq. 4, and comparing the obtained equation with Eq. 5, we obtain

keff=k1+k2+k3+ktpktp(k1k2+k1k3++k2+k3+)+k1k2k3. (6)

This expression could be simplified by taking into account the fact that the rate constants for the collapse of the unstable states S and I, k1 and k2, are much larger than the other rate constants; thus, in the denominator of Eq. 6 only the terms that contain the product of these rate constants need to be retained, while the rest could be omitted.

In that case, Eq. 6 converts to

keff=k1+k2+k1k2k3+ktpktp+k3=KIk3+ktpktp+k3, (7)

in which

KI=k1+k2+k1k2 (8)

is the equilibrium constant for formation of the initiation complex I from the initial state S.

If ktpk3, i.e., the tail passage is much faster than the dissociation of the paranemic complex, then Eq. 7 is converted to

keffKIk3+. (9)

In the opposite case, in which ktpk3,

keffKIk3+ktpk3=KPktp, (10)

where

KP=k1+k2+k3+k1k2k3 (11)

is the equilibrium constant for formation of the paranemic complex P from the initial state S.

Note that in principle, it is possible that behind the transcription machinery there is a permanently open region sufficient for the initial hybridization (in which case the state S would be the initial state rather than an unstable intermediate). In this case it could be shown that the expression for the effective rate constant would remain the same as Eq. 7, except that the constants k1± would be omitted.

Dependence of the rates of R-loop formation on the distance between the transcription start site and the R-loop-prone sequence

To obtain the distance dependence defined by Eq. 7, we consider the distance dependence for the rate and equilibrium constants included in this equation.

The tail-passage rate constant ktp should strongly depend on the length of the tail N, which, as we explained above, is equivalent to the distance from the transcription start site. In derivation A of supporting material we show that this dependence could be approximated by

ktp(N)Nα, (12)

where the parameter α4.

We assume that other rate constants do not depend on the length of the tail. Within this approximation, for the short tails for which the Eq. 9 is applicable, the effective rate constant for R-loop formation keff is practically independent of the length of the tail. In contrast, for the long tails for which Eq. 10 is applicable, keffNα.

Switching between these two regimes occurs at some characteristic length NP, at which the tail-passage rate constant ktp is equivalent to the rate constant for the paranemic complex dissociation k3.

Thus,

k3(NP)α, (13)

where the proportionality coefficient is the same as that in Eq. 12.

Substituting Eqs. 12 and 13 into Eq. 7, we obtain

keff(N)11+(NNP)α. (14)

(Here “” means proportionality with a coefficient that does not depend on N.)

From Eq. 14 it is seen that for NNP the rate constant keff is practically independent of N, while for NNP it decreases as Nα.

Next, we extend our model to include alternative pathways of R-loop formation, which do not require the tail passage (Figs. 2, 3, and 4). If we assume that, similar to the tail-passage pathway, these alternative pathways also begin with the paranemic duplex formation, then these pathways could be taken into account simply by adding their rate constant kalt to the rate constant of the tail passage ktp. Consequently, Eq. 7 is converted to

keff=KIk3+ktp+kaltktp+kalt+k3. (15)

For very long tails ktp approaches zero and keff approaches some small, but finite value

keff=KIk3+kaltkalt+k3KIk3+k3kalt=KPkalt. (16)

Similar to the length NP at which the rate constant for the tail passage is equivalent to the rate constant of the paranemic complex decay, it is convenient to introduce the length Nalt at which the rate constant for alternative pathways is equivalent to the rate constant for the tail passage. We expect that Nalt>NP; thus, our model predicts three regimes for R-loop formation: For N<NP the tail passage is faster than the paranemic complex dissociation, and the tail-length dependence for R-loop formation is weak; for NP<N<Nalt the rate of R-loop formation is mostly defined by the tail passage, so the tail dependence is maximal in this regime and is described by the power law (Eq. 12); for N>Nalt the alternative pathway(s) become faster than the tail passage (so they mostly define the rate of R-loop formation) and, consequently, in this regime the tail dependence decreases and gradually disappears with growing N.

Within the “tail-dependent” term in Eq. 15 we can make a rearrangement,

ktp+kaltktp+kalt+k3=k3kalt+k3(kaltk3+1kalt+k3ktp+1). (17)

Replacing in the above equation the rate constants by respective lengths, and noting that the term outside the brackets does not depend on N, we obtain that in the presence of alternative pathways for R-loop formation, Eq. 14 is replaced by

keff(N)11+(NNch)α+(NPNalt)α, (18)

where Nch is defined by the equation

(1Nch)α=(1NP)α+(1Nalt)α. (19)

It is apparent that Eq. 14 could be obtained as a special case of Eq. 18 if Nalt.

In Fig. S2 we plot dependences that correspond Eq. 18 in double-logarithmic coordinates to illustrate various regimes for the “tail dependence.”

In our estimate for rate of the tail passage (derivation A in supporting material), we did not take into account the RNA secondary structures that might form within the RNA tail. If these structures are not very stable and transiently open and fold back due to thermal fluctuations, they could unfold before the tail penetration into the gap and then re-fold afterward; consequently, they would slow down the tail diffusion in both directions. Thus, the general dependence on the tail length (Eq. 12) could still hold in the presence of such structures, although the tail passage would be slower. With increasing tail length a stable RNA secondary structure could eventually appear that would strongly suppress the tail passage but might occur at the tail length, at which the tail passage is suppressed anyway, and alternative tail-independent pathways predominate. The effect of the RNA secondary structures on the tail passage could be elucidated by placing various sequence motifs (either prone or not prone to form secondary structures) upstream from the R-loop-prone sequence.

Application to experimental results

The effective rate constant keff(N) corresponds to the rate of initiation of R-loop formation at a given distance N from the transcription start site. During transcription, the RNAP travels along the entire DNA template: thus, the rate of R-loop formation observed in experiments depends on R-loop initiation events that occur over the whole transcribed DNA region.

In supporting material (derivation B), we establish a connection between the “local” rate constant keff and the experimentally measurable parameters: the fraction of transcripts that form R-loops and the apparent “global” rate constant of R-loop formation that could be observed in experiments.

Below, we compare predictions of our model with the experimental results from (31). In this work the authors positioned a strongly R-loop-prone sequence 17 bp long at various distances from the transcription start site, and measured the molar fraction (which we designate as φ) of RNA transcripts containing the regions that were protected from the single-strand-specific ribonuclease cleavage (which indicate that these regions were in the form of RNA-DNA hybrids within R-loops) as a function of the distance (that we designated as N0) between the transcription start and the upstream edge of the R-loop-prone sequence.

In Fig. 5, we plot experimental data from (31) (black dots) in the form y(x), where y and x are expressed via experimentally measured φ(N0) and the distance N0 as

y=lnln(1φ(N0))ln(1φ(N0(min))) (20)

and

x=lnN0N0(min), (21)

respectively.

Figure 5.

Figure 5

Comparison of the predicted dependences for the rate of R-loop formation with the experimental data. The red, blue, and green theoretical curves correspond to Nalt=, and the values of NP=4e230, NP=2e215, and NP=8e260, respectively. The orange curve takes into account alternative pathways for R-loop formation, and corresponds to the same values of NP as for the red curve, and Nalt=79. Experimental data from (31) are shown as black dots. To see this figure in color, go online.

Here, N0(min)=4 is the minimal distance used in the experiments. (In (31) the authors used distances 4, 17, 38, 72, 93, 151, 261, and 369. We did not use the last two data points because they are too close to background.)

We compare these results with our theoretical predictions (Eq. B20 in supporting material). In Fig. 5 we plot several curves defined by Eq. B20 at α=4.

The blue, red, and green curves do not take into account the alternative pathway for R-loop formation (formally corresponding to Nalt=), and the parameter NP for these curves was 15, 30, and 60, respectively. It is observed that the red curve is in reasonable agreement with experimental data; however, upon increasing distance the slope of theoretical curves approaches a constant value, while for experimental data the magnitude of this slope appears to decrease (i.e., with increasing tail length it appears that its inhibitory effect on R-loop formation approaches saturation). This is most likely due to alternative tail-independent pathways for R-loop formation. The orange curve that takes into account alternative pathways, with NP=30 and Nalt=79, provides a better fit for the experimental results.

Note that the length dependences used in Fig. 5 are obtained via integration of the local rate constant of R-loop formation (Eq. 18) over the R-loop-prone sequence, and that they are described by a more complicated equation (supporting material, Eq. B14 in derivation B) than for the local rate constant. However, on the double-logarithmic plot in Fig. 5 they exhibit features similar to those of the local rate constant (Fig. S2): first, a relatively slow decrease that corresponds to the fast tail passage; then a linear dependence that is defined by the power law (Eq. 12), with the slope equivalent to α (for graphs in Fig. 5, α=4); and then saturation due to alternative pathways.

Dependence of the rate of R-loop formation on negative supercoiling

Since R-loop formation is associated with the DNA unwinding, it is facilitated by negative supercoiling (12,26,27,37,46). Here, we will use our kinetic model to evaluate the effect of the negative supercoiling on the rate of R-loop formation. We wish to find a ratio of the apparent rate constants for R-loop formation within two DNA substrates with the same sizes and sequences, and with different superhelical densities that we designate σ and σref.

According to our model, R-loop formation proceeds with a short-lived intermediate that contains an open DNA region. Since this intermediate is in “quasi-equilibrium” with the unperturbed initial state, an apparent energy barrier for R-loop formation would be equivalent to the free energy of this intermediate relative to the unperturbed initial state.

Thus, the ratio of the apparent rate constants is

keffσkeffσref=exp-ΔGscσΔGscσrefkBT, (22)

where ΔGsc is the change in superhelical energy upon DNA opening.

In derivation C of supporting material, we discerned that unwinding of a short DNA region containing m base pairs within a much longer DNA sequence with superhelical density σ causes a change in superhelical energy that could be approximated as

ΔGsc20RTmσ(15σ), (23)

where σ is the sum of the initial supercoiling σ0 and the transcription-induced dynamic supercoiling (47, 48, 49). In the derivation C we argue that Eq. 23 is applicable for the topologically closed circular DNA, provided that the ratio of the length of the open region to the length of the entire DNA is substantially smaller in magnitude than the superhelical density (which is the case for opening a few base pairs within a plasmid several kilobases long at superhelical density of about −0.05).

We argue that this equation is also applicable for dynamic transcription-induced supercoiling.

This equation is exact for the cases in which supercoiling within the DNA remain the same before and after the structural transition (e.g., unwinding). This would be the case for unconstrained linear or nicked DNA.

The σ-dependent term in parentheses appears to be due to additional relaxation of superhelical tension by intertwining of the single-stranded DNAs within the open region (50). According to Eq. 23, at native superhelical density (about −0.05) the relative contribution of this intertwining to superhelical energy is about 25%.

The value of dynamic supercoiling is difficult to reliably estimate with high precision theoretically. Rough estimates suggest that, in the absence of transcript tethering to membranes or bulky proteins, to create dynamic supercoiling close in magnitude to the native supercoiling the length of the transcript should be on the order of 10 000 nucleotides (51). In the system described in (26), from which we used data for our analysis, the length of the transcript in the region of R-loop formation was several hundred nucleotides; thus, it is possible that there is no substantial dynamic supercoiling in this system. In any case, it is reasonable to assume that the dynamic supercoiling would be the same for substrates with different initial superhelical densities, provided that their sizes and sequences are the same. Thus, since the supercoiling energy (Eq. 23) for the conditions of interest is roughly (with about 25% error) linear on σ, and the ratio of the rate constants (Eq. 22) depends on the difference in energies, in this difference the contribution of the dynamic supercoiling would cancel out, and the ratio of the rate constants would depend on the difference in initial superhelical densities:

keffσ0keffσ0refexp-ΔGscσ0ΔGscσ0refkBTexp20mσ0σ0ref. (24)

(In the far-right part of Eq. 24 we have omitted the contribution from the single-stranded DNA intertwining, because the term that corresponds to this contribution is non-linear on σ; consequently, the unknown contribution of dynamic supercoiling would not cancel upon subtraction of these terms. Omitting this contribution for superhelical densities that are less in magnitude than native (0.05) creates an error of less than 25%.)

According to our model, the value of the parameter m depends on the regime of R-loop formation: In the regime of fast tail passage ktp>k3 (or, in terms of length, N<NP) the dependence on supercoiling is defined by the equilibrium constant KI for the initiation complex (I) (see Eq. 9), for which m in principle might be as small as only one base pair; for the intermediate regime kalt<ktp<k3 (or, in terms of lengths, NP<N<Nalt), the dependence on supercoiling would be primarily defined by the equilibrium constant KP for the paranemic RNA-DNA duplex (P) (see Eq. 10), for which m would be larger than for the initiation complex (although less than about 10 bp that corresponds to one helical turn), and some additional dependence on supercoiling might arise from the tail-passage rate constant (see derivation A in supporting material); for long tails ktp<kalt (or, in terms of lengths N>Nalt), the rate is described by Eq. 16, and m would include the number of DNA base pairs unwound upon paranemic duplex formation, likely with some additional unwinding to “make room” for RNAP rotation (Fig. 2). Thus, our model predicts that the parameter m and, consequently, the dependence on supercoiling, increases upon increasing the distance between the promoter and the site of R-loop formation.

In (26), the experimentally obtained ratio of the rates for R-loop formation for the native superhelical and the linear DNA templates (for the latter, “static” supercoiling is zero) was about 10–20. Assuming that the native supercoiling is about −0.05, from Eq. 24 for the ratio of the rate constants about 10–20 for σ00.05 and σ0ref=0 we obtain an m value of about 2–3 nt. This value is consistent with the general prediction of the model that the rate of R-loop formation is defined by the opening of only a few base pairs. However, the m value about 2–3 nt appears more consistent with the fast tail-passage regime (where m corresponds to the initiation complex) rather than the slow tail-passage regime (where m corresponds to the paranemic duplex, for which m is likely to be larger). That might be surprising because in this system the R-loop-prone sequence was several hundred base pairs away from the transcription start site. One possibility is that the DNA immediately behind the transcription complex is already partially unwound, so the number of extra DNA base pairs needed to be unwound to form the paranemic duplex is smaller than the full length of the paranemic duplex. More definitive conclusions would require further experiments within a narrow distribution of supercoiling densities and at various distances from the transcription start site.

Dependence of the rate of R-loop formation on the sequence

For the sequence dependence, the line of arguments is the same as for the supercoiling dependence in the previous subsection except that the difference in superhelical energies is substituted by the difference in energies between the RNA-DNA and the DNA-DNA duplexes:

ΔGseq=(ΔGRNADNAΔGDNADNA)m. (25)

As mentioned in the previous subsection, m is only a few base pairs; thus, within a long heterogeneous R-loop-prone sequence a short cluster(s) of “super-R-loop-prone sequences” (i.e., the one with the most negative difference ΔGRNADNAΔGDNADNA) could strongly increase the efficiency of R-loop formation, especially if it is localized in the promoter-proximal part of the R-loop-prone sequence.

This effect of strong stimulation of R-loop formation by short clusters of very R-loop-prone sequences was previously observed and explained in (32).

As in the case of dependence on supercoiling, our model predicts that the sequence dependence for the rate of R-loop formation would become stronger as the R-loop-prone sequence is moved away from the transcription start site, i.e., the ratio of the rates of R-loop formation for the more R-loop-prone and for the less R-loop-prone sequences would increase upon increasing the distance from the transcription start site.

“Fate” of an R-loop after formation of the first plectonemic link

The formation of the first plectonemic link within the RNA-DNA hybrid that was considered in previous subsections is most likely to be the rate-limiting step for R-loop formation: the following steps, whether it is tail passages or RNAP rotations, would occur much faster because the stable plectonemic RNA-DNA hybrid prevents the system from returning to the initial state S (Fig. 1), from which the entire process must start over again. Besides, in the case of the tail-passage pathway, the longer the RNA-DNA hybrid the larger the gap for the tail passage and, consequently, the easier the tail passage. The RNAP rotation pathway is also likely to be easier if there is a larger gap. Also, since at least some RNAPs are capable of transcribing single-stranded DNA templates (52), they probably could continue transcription without reattachment to the non-template DNA strand, and, consequently, the RNAP would be free to rotate. Thus, our consideration of formation of the first plectonemic link within an R-loop is likely to be sufficient to evaluate rates and probabilities of R-loop formation. However, to evaluate probabilities for a particular sequence to be within an R-loop, one has in addition to determine 1) how far transcription could proceed after R-loop formation and 2) how efficient is the strand exchange between the RNA and the DNA. The simplest situation arises if transcription proceeds up to the end of the DNA template after R-loop formation, then following RNAP dissociation there is a rapid strand exchange between the RNA tail and the DNA duplex. In this case, the R-loop would be distributed over the DNA template in accordance with the relative stabilities of the RNA-DNA and the DNA-DNA duplexes, regardless of the position at which the R-loop formation was initiated; and, consequently, the relative probabilities for given sequences to be within an R-loop would be similar to the ones predicted by the equilibrium model (26). However, experimental evidence suggests that in the R-loop mode transcription is prone to premature termination ((53,54), reviewed in (5)); furthermore, preformed R-loops could (partially) block further rounds of transcription (30,55) and, consequently, could interfere with R-loop formation within other R-loop-prone sequences localized downstream from them (26). Strand exchange could also be slowed down by secondary structures within the RNA. All of these factors would affect R-loop distribution. Of course, in the presence of a sequence, which is much more R-loop-prone than the surrounding sequences, R-loops would be mostly localized within this sequence. However, more studies on the processivity of transcription in the “R-loop mode” and the kinetics of strand exchange would be required for prediction of the R-loop distribution with greater precision.

Discussion

Based on topological considerations, we propose a model that explains the decrease in R-loop formation upon moving away from the transcription start site, as observed in many experimental systems (27, 28, 29, 30, 31).

According to this model, as long as the transcription complex is intact (i.e., within the complex RNAP is bound to both the DNA and RNA strands), the only way for the nascent RNA “tail” to form a normal intertwined duplex with the template DNA strand is to pass through the gap between the DNA strands within a short unwound DNA region (Fig. 1). The farther from the transcription site R-loop formation occurs, the longer the nascent RNA tail and, consequently, the greater difficulty for tail passage. This can explain why the yield of R-loops decreases when the R-loop-prone sequence is placed farther away from the transcription start site. However, in many systems R-loops have been detected very far from the transcription start site (e.g., (26,34,35), reviewed in (36)), which suggests the existence of alternative pathways for R-loop formation.

It is important to note that topological constraints that interfere with R-loop formation appear only if the DNA strands are intact in the region of R-loop formation. If the non-template DNA strand contains a break (nick) near the site of R-loop initiation, the nascent RNA can freely wind around the template DNA strand without need for the tail passage through the gap between the DNA strands (Fig. S3). Accordingly, the presence of a nick strongly facilitates R-loop formation and initiates efficient R-loop formation even substantial distances away from the transcription start site (27,54). Other factors that facilitate R-loop formation and that probably could make it efficient even far from the transcription start site are unusual DNA structures and DNA-binding ligands that could induce long single-stranded regions within the template DNA strand to facilitate nascent RNA hybridization (38,56, 57, 58, 59, 60, 61, 62). Finally, in some cases the nascent RNA tail might be enzymatically cleaved (e.g., (63)), to remove topological problems for R-loop formation caused by a long tail. However, in some in vitro systems (e.g., (26)), R-loop formation has been observed several hundreds of base pairs away from the transcription start site in the absence of DNA breaks, nascent RNA cleavage, or unusual DNA structures. This supports the existence of “tail-independent” pathways for R-loop formation that involve intact DNA and RNA and do not rely on additional factors.

One possible tail-independent pathway (the RNAP rotation pathway, Fig. 2) involves the RNAP detachment (either transient or permanent) from the non-template DNA strand, making possible RNAP rotation (and consequent nascent RNA winding) around the template DNA strand. The possibility of such a pathway is supported by observations that at least some RNAPs can perform transcription in the absence of a non-template DNA strand (52), and that during transcription the non-template DNA strand can hybridize with complementary probes, suggesting its transient detachment from the RNAP (64).

Yet another possible tail-independent pathway is complete RNAP dissociation, during which the short RNA-DNA hybrid within the transcription complex “survives” RNAP dissociation and initiates the nascent RNA invasion into the DNA duplex via non-enzymatic strand exchange (Fig. 3). R-loop formation mediated by RNAP dissociation was directly demonstrated in an in vitro system with E. coli RNAP (37). Interestingly, upon studying R-loop formation in living cells it was found that R-loops localized far away from the transcription start site are much more pronounced in ex vivo assays, in which DNA cells are lysed under protein-denaturing conditions before R-loop monitoring than in in vivo assays, in which R-loops are monitored inside the cell (36). This might suggest that removal of RNAP prior to detection contributes to increased yield of promoter-distal R-loops in ex vivo assays (33). However, artificial removal of RNAP cannot be a universal mechanism for R-loop formation far from the transcription start site because for certain in vitro systems it was directly demonstrated that deproteinization is not responsible for R-loop formation (65). Also, the increase in the yield of R-loops with the time of transcription observed for T3 RNAP in an in vitro system (26) suggests that in this system R-loops continuously form in the course of transcription rather than at the moment of artificial deproteinization.

An interesting possibility for resolving topological problems for R-loop formation is winding of the displaced non-template DNA strand around the RNA-DNA duplex to generate triple-helical structures (Fig. 4). These structures (sometimes referred as “collapsed R-loops”) were initially implicated as a possible explanation for extremely stable RNA-DNA hybrids formed upon in vitro transcription with certain DNA sequences (38). These structures were also implicated in transcription-mediated replication blockage in vivo (39). Although later theoretical estimates (62) have shown that the superior stability of the Watson-Crick RNA-DNA duplex for the sequence used in (38) is sufficient to explain the experimental data without implication of additional triple-stranded structures, these hypothetical triple-stranded structures could provide a solution for topological problems for R-loop formation, as explained in detail in Fig. S1 and its legend. Note that within the collapsed R-loops, two homologous nucleic acid strands (the RNA and the non-template DNA strands) run in parallel to each other, in contrast to well-established triple-helical structures in which similar strands are antiparallel to each other (reviewed in (66)). The existence and stability of such “parallel” triple-helical structures are insufficiently understood. It is possible that interactions are weak between the RNA-DNA duplex and the displaced DNA strand within the collapsed R-loops, and, consequently, the collapsed R-loops convert to canonical R-loops as soon as the topological constraints created by RNAP binding to the DNA and the RNA strands disappear, due to either RNAP detachment from the non-template DNA strand or complete RNAP dissociation.

In summary, there are a number of possible mechanisms for R-loop formation far away from the transcription start site; however, for most of the experimental systems it remains to be established which mechanism is operating.

Our model provides a general description of the dependence of the rates of R-loop formation on the distance from the transcription start site in a manner that can accommodate all of the aforestated mechanisms.

Author contributions

B.P.B. designed and performed research; B.P.B. and P.C.H. analyzed data and wrote the paper.

Acknowledgments

We thank both reviewers for their insightful comments and suggestions that have enhanced the presentation of our model.

Declaration of interests

The authors declare no competing interests.

Editor: Smita Patel.

Footnotes

Supporting material can be found online at https://doi.org/10.1016/j.bpj.2022.08.026.

Contributor Information

Boris P. Belotserkovskii, Email: borpavbel@netscape.net.

Philip C. Hanawalt, Email: hanawalt@stanford.edu.

Supporting material

Document S1. Figures S1–S3 and supplemental derivations
mmc1.pdf (266.6KB, pdf)
Document S2. Article plus supporting material
mmc2.pdf (1.4MB, pdf)

References

  • 1.Aguilera A., García-Muse T. R loops: from transcription byproducts to threats to genome stability. Mol. Cell. 2012;46:115–124. doi: 10.1016/j.molcel.2012.04.009. [DOI] [PubMed] [Google Scholar]
  • 2.Santos-Pereira J.M., Aguilera A. R loops: new modulators of genome dynamics and function. Nat. Rev. Genet. 2015;16:583–597. doi: 10.1038/nrg3961. [DOI] [PubMed] [Google Scholar]
  • 3.Chédin F. Nascent connections: R-loops and chromatin patterning. Trends Genet. 2016;32:828–838. doi: 10.1016/j.tig.2016.10.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Sollier J., Cimprich K.A. Breaking bad: R-loops and genome integrity. Trends Cell Biol. 2015;25:514–522. doi: 10.1016/j.tcb.2015.05.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Belotserkovskii B.P., Tornaletti S., et al. Hanawalt P.C. R-loop generation during transcription: formation, processing and cellular outcomes. DNA Repair. 2018;71:69–81. doi: 10.1016/j.dnarep.2018.08.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Wahba L., Koshland D. The Rs of biology: R-loops and the regulation of regulators. Mol. Cell. 2013;50:611–612. doi: 10.1016/j.molcel.2013.05.024. [DOI] [PubMed] [Google Scholar]
  • 7.Groh M., Gromak N. Out of balance: R-loops in human disease. PLoS Genet. 2014;10:e1004630. doi: 10.1371/journal.pgen.1004630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Skourti-Stathaki K., Proudfoot N.J. A double-edged sword: R loops as threats to genome integrity and powerful regulators of gene expression. Genes Dev. 2014;28:1384–1396. doi: 10.1101/gad.242990.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Costantino L., Koshland D. The yin and yang of R-loop biology. Curr. Opin. Cell Biol. 2015;34:39–45. doi: 10.1016/j.ceb.2015.04.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Gowrishankar J., Leela J.K., Anupama K. R-loops in bacterial transcription: their causes and consequences. Transcription. 2013;4:153–157. doi: 10.4161/trns.25101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Crossley M.P., Bocek M., Cimprich K.A. R-loops as cellular regulators and genomic threats. Mol. Cell. 2019;73:398–411. doi: 10.1016/j.molcel.2019.01.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Chedin F., Benham C.J. Emerging roles for R-loop structures in the management of topological stress. J. Biol. Chem. 2020;295:4684–4695. doi: 10.1074/jbc.REV119.006364. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Hamperl S., Cimprich K.A. The contribution of co-transcriptional RNA:DNA hybrid structures to DNA damage and genome instability. DNA Repair. 2014;19:84–94. doi: 10.1016/j.dnarep.2014.03.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Camps M., Loeb L.A. Critical role of R-loops in processing replication blocks. Front. Biosci. 2005;10:689–698. doi: 10.2741/1564. [DOI] [PubMed] [Google Scholar]
  • 15.Freudenreich C.H. R-loops: Targets for nuclease cleavage and repeat instability. Curr. Genet. 2018;64:789–794. doi: 10.1007/s00294-018-0806-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Niehrs C., Luke B. Regulatory R-loops as facilitators of gene expression and genome stability. Nat. Rev. Mol. Cell Biol. 2020;21:167–178. doi: 10.1038/s41580-019-0206-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Wahba L., Gore S.K., Koshland D. The homologous recombination machinery modulates the formation of RNA-DNA hybrids and associated chromosome instability. Elife. 2013;2:e00505. doi: 10.7554/eLife.00505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Jiang F., Doudna J.A. CRISPR-Cas9 structures and mechanisms. Annu. Rev. Biophys. 2017;46:505–529. doi: 10.1146/annurev-biophys-062215-010822. [DOI] [PubMed] [Google Scholar]
  • 19.Ariel F., Lucero L., et al. Crespi M. R-loop mediated trans action of the APOLO long noncoding RNA. Mol. Cell. 2020;77:1055–1065.e4. doi: 10.1016/j.molcel.2019.12.015. [DOI] [PubMed] [Google Scholar]
  • 20.Daube S.S., von Hippel P.H. RNA displacement pathways during transcription from synthetic RNA-DNA bubble duplexes. Biochemistry. 1994;33:340–347. doi: 10.1021/bi00167a044. [DOI] [PubMed] [Google Scholar]
  • 21.Yin Y.W., Steitz T.A. The structural mechanism of translocation and helicase activity in T7 RNA polymerase. Cell. 2004;116:393–404. doi: 10.1016/s0092-8674(04)00120-5. [DOI] [PubMed] [Google Scholar]
  • 22.Jiang M., Ma N., et al. McAllister W.T. RNA displacement and resolution of the transcription bubble during transcription by T7 RNA polymerase. Mol. Cell. 2004;15:777–788. doi: 10.1016/j.molcel.2004.07.019. [DOI] [PubMed] [Google Scholar]
  • 23.Liu X., Bushnell D.A., Kornberg R.D. RNA polymerase II transcription: structure and mechanism. Biochimica et Biophysica Acta - Gene Regulatory Mechanisms. 2013;1829:2–8. doi: 10.1016/j.bbagrm.2012.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Korzheva N., Mustaev A. Transcription elongation complex: structure and function. Curr. Opin. Microbiol. 2001;4:119–125. doi: 10.1016/s1369-5274(00)00176-4. [DOI] [PubMed] [Google Scholar]
  • 25.Martinez-Rucobo F.W., Cramer P. Structural basis of transcription elongation. Biochimica et Biophysica Acta - Gene Regulatory Mechanisms. 2013;1829:9–19. doi: 10.1016/j.bbagrm.2012.09.002. [DOI] [PubMed] [Google Scholar]
  • 26.Stolz R., Sulthana S., et al. Chedin F. Interplay between DNA sequence and negative superhelicity drives R-loop structures. Proc. Natl. Acad. Sci. USA. 2019;116:6260–6269. doi: 10.1073/pnas.1819476116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Roy D., Zhang Z., et al. Lieber M.R. Competition between the RNA transcript and the nontemplate DNA strand during R-loop formation in vitro: a nick can serve as a strong R-loop initiation site. Mol. Cell Biol. 2010;30:146–159. doi: 10.1128/MCB.00897-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Chen L., Chen J.Y., et al. Fu X.D. R-ChIP using inactive RNase H reveals dynamic coupling of R-loops with transcriptional pausing at gene promoters. Mol. Cell. 2017;68:745–757.e5. doi: 10.1016/j.molcel.2017.10.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Dumelie J.G., Jaffrey S.R. Defining the location of promoter-associated R-loops at near-nucleotide resolution using bisDRIP-seq. Elife. 2017;6:e28306. doi: 10.7554/eLife.28306. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Belotserkovskii B.P., Soo Shin J.H., Hanawalt P.C. Strong transcription blockage mediated by R-loop formation within a G-rich homopurine-homopyrimidine sequence localized in the vicinity of the promoter. Nucleic Acids Res. 2017;45:6589–6599. doi: 10.1093/nar/gkx403. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Xu B., Clayton D.A. A persistent RNA-DNA hybrid is formed during transcription at a phylogenetically conserved mitochondrial DNA sequence. Mol. Cell Biol. 1995;15:580–589. doi: 10.1128/mcb.15.1.580. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Roy D., Lieber M.R. G clustering is important for the initiation of transcription-induced R-loops in vitro, whereas high G density without clustering is sufficient thereafter. Mol. Cell Biol. 2009;29:3124–3133. doi: 10.1128/MCB.00139-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Belotserkovskii B.P., Hanawalt P.C. Mechanism for R-loop formation remote from the transcription start site: topological issues and possible facilitation by dissociation of RNA polymerase. DNA Repair. 2022;110:103275. doi: 10.1016/j.dnarep.2022.103275. [DOI] [PubMed] [Google Scholar]
  • 34.Sanz L.A., Hartono S.R., et al. Chédin F. Prevalent, dynamic, and conserved R-loop structures associate with specific epigenomic signatures in mammals. Mol. Cell. 2016;63:167–178. doi: 10.1016/j.molcel.2016.05.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Masukata H., Tomizawa J. A mechanism of formation of a persistent hybrid between elongating RNA and template DNA. Cell. 1990;62:331–338. doi: 10.1016/0092-8674(90)90370-t. [DOI] [PubMed] [Google Scholar]
  • 36.Castillo-Guzman D., Chédin F. Defining R-loop classes and their contributions to genome instability. DNA Repair. 2021;106:103182. doi: 10.1016/j.dnarep.2021.103182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Richardson J.P. Attachment of nascent RNA molecules to superhelical DNA. J. Mol. Biol. 1975;98:565–579. doi: 10.1016/s0022-2836(75)80087-8. [DOI] [PubMed] [Google Scholar]
  • 38.Reaban M.E., Lebowitz J., Griffin J.A. Transcription induces the formation of a stable RNA.DNA hybrid in the immunoglobulin alpha switch region. J. Biol. Chem. 1994;269:21850–21857. [PubMed] [Google Scholar]
  • 39.Krasilnikova M.M., Samadashwily G.M., et al. Mirkin S.M. Transcription through a simple DNA repeat blocks replication elongation. EMBO J. 1998;17:5095–5102. doi: 10.1093/emboj/17.17.5095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Karamychev V.N., Panyutin I.G., et al. Zhurkin V.B. DNA and RNA folds in transcription complex as evidenced by iodine-125 radioprobing. J. Biomol. Struct. Dyn. 2000;17:155–167. doi: 10.1080/07391102.2000.10506616. [DOI] [PubMed] [Google Scholar]
  • 41.Nudler E. RNA polymerase backtracking in gene regulation and genome instability. Cell. 2012;149:1438–1445. doi: 10.1016/j.cell.2012.06.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Zatreanu D., Han Z., et al. Svejstrup J.Q. Elongation factor TFIIS prevents transcription stress and R-loop accumulation to maintain genome stability. Mol. Cell. 2019;76:57–69.e9. doi: 10.1016/j.molcel.2019.07.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Yu K., Chedin F., et al. Lieber M.R. R-loops at immunoglobulin class switch regions in the chromosomes of stimulated B cells. Nat. Immunol. 2003;4:442–451. doi: 10.1038/ni919. [DOI] [PubMed] [Google Scholar]
  • 44.Roy D., Yu K., Lieber M.R. Mechanism of R-loop formation at immunoglobulin class switch sequences. Mol. Cell Biol. 2008;28:50–60. doi: 10.1128/MCB.01251-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Da L.T., E C., et al. Yu J. T7 RNA polymerase translocation is facilitated by a helix opening on the fingers domain that may also prevent backtracking. Nucleic Acids Res. 2017;45:7909–7921. doi: 10.1093/nar/gkx495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Massé E., Drolet M. Escherichia coli DNA topoisomerase I inhibits R-loop formation by relaxing transcription-induced negative supercoiling. J. Biol. Chem. 1999;274:16659–16664. doi: 10.1074/jbc.274.23.16659. [DOI] [PubMed] [Google Scholar]
  • 47.Liu L.F., Wang J.C. Supercoiling of the DNA template during transcription. Proc. Natl. Acad. Sci. USA. 1987;84:7024–7027. doi: 10.1073/pnas.84.20.7024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Tsao Y.P., Wu H.Y., Liu L.F. Transcription-driven supercoiling of DNA: direct biochemical evidence from in vitro studies. Cell. 1989;56:111–118. doi: 10.1016/0092-8674(89)90989-6. [DOI] [PubMed] [Google Scholar]
  • 49.Nelson P. Transport of torsional stress in DNA. Proc. Natl. Acad. Sci. USA. 1999;96:14342–14347. doi: 10.1073/pnas.96.25.14342. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Benham C.J. Theoretical analysis of heteropolymeric transitions in superhelical DNA molecules of specified sequence. J. Chem. Phys. 1990;92:6294–6305. [Google Scholar]
  • 51.Belotserkovskii B.P. Relationships between the winding angle, the characteristic radius, and the torque for a long polymer chain wound around a cylinder: implications for RNA winding around DNA during transcription. Phys. Rev. E - Stat. Nonlinear Soft Matter Phys. 2014;89:022709. doi: 10.1103/PhysRevE.89.022709. [DOI] [PubMed] [Google Scholar]
  • 52.Milligan J.F., Groebe D.R., et al. Uhlenbeck O.C. Oligoribonucleotide synthesis using T7 RNA polymerase and synthetic DNA templates. Nucleic Acids Res. 1987;15:8783–8798. doi: 10.1093/nar/15.21.8783. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Belotserkovskii B.P., Liu R., et al. Hanawalt P.C. Mechanisms and implications of transcription blockage by guanine-rich DNA sequences. Proc. Natl. Acad. Sci. USA. 2010;107:12816–12821. doi: 10.1073/pnas.1007580107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Belotserkovskii B.P., Neil A.J., et al. Hanawalt P.C. Transcription blockage by homopurine DNA sequences: role of sequence composition and single-strand breaks. Nucleic Acids Res. 2013;41:1817–1828. doi: 10.1093/nar/gks1333. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Tous C., Aguilera A. Impairment of transcription elongation by R-loops in vitro. Biochem. Biophys. Res. Commun. 2007;360:428–432. doi: 10.1016/j.bbrc.2007.06.098. [DOI] [PubMed] [Google Scholar]
  • 56.Grabczyk E., Fishman M.C. A long purine-pyrimidine homopolymer acts as a transcriptional diode. J. Biol. Chem. 1995;270:1791–1797. doi: 10.1074/jbc.270.4.1791. [DOI] [PubMed] [Google Scholar]
  • 57.Grabczyk E., Usdin K. The GAA∗TTC triplet repeat expanded in Friedreich’s ataxia impedes transcription elongation by T7 RNA polymerase in a length and supercoil dependent manner. Nucleic Acids Res. 2000;28:2815–2822. doi: 10.1093/nar/28.14.2815. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Duquette M.L., Handa P., et al. Maizels N. Intracellular transcription of G-rich DNAs induces formation of G-loops, novel structures containing G4 DNA. Genes Dev. 2004;18:1618–1629. doi: 10.1101/gad.1200804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Neil A.J., Liang M.U., et al. Mirkin S.M. RNA-DNA hybrids promote the expansion of Friedreich's ataxia (GAA)n repeats via break-induced replication. Nucleic Acids Res. 2018;46:3487–3497. doi: 10.1093/nar/gky099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Belotserkovskii B.P., Hanawalt P.C. PNA binding to the non-template DNA strand interferes with transcription, suggesting a blockage mechanism mediated by R-loop formation. Mol. Carcinog. 2015;54:1508–1512. doi: 10.1002/mc.22209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Lim G., Hohng S. Single-molecule fluorescence studies on cotranscriptional G-quadruplex formation coupled with R-loop formation. Nucleic Acids Res. 2020;48:9195–9203. doi: 10.1093/nar/gkaa695. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Belotserkovskii B.P., Mirkin S.M., Hanawalt P.C. DNA sequences that interfere with transcription: implications for genome function and stability. Chem. Rev. 2013;113:8620–8637. doi: 10.1021/cr400078y. [DOI] [PubMed] [Google Scholar]
  • 63.Skourti-Stathaki K., Proudfoot N.J., Gromak N. Human senataxin resolves RNA/DNA hybrids formed at transcriptional pause sites to promote Xrn2-dependent termination. Mol. Cell. 2011;42:794–805. doi: 10.1016/j.molcel.2011.04.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Larsen H.J., Nielsen P.E. Transcription-mediated binding of peptide nucleic acid (PNA) to double-stranded DNA: sequence-specific suicide transcription. Nucleic Acids Res. 1996;24:458–463. doi: 10.1093/nar/24.3.458. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Malig M., Hartono S.R., et al. Chedin F. Ultra-deep coverage single-molecule R-loop footprinting reveals principles of R-loop formation. J. Mol. Biol. 2020;432:2271–2288. doi: 10.1016/j.jmb.2020.02.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Frank-Kamenetskii M.D., Mirkin S.M. Triplex DNA structures. Annu. Rev. Biochem. 1995;64:65–95. doi: 10.1146/annurev.bi.64.070195.000433. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S3 and supplemental derivations
mmc1.pdf (266.6KB, pdf)
Document S2. Article plus supporting material
mmc2.pdf (1.4MB, pdf)

Articles from Biophysical Journal are provided here courtesy of The Biophysical Society

RESOURCES