Abstract
CRISPR-Cas effector complexes recognise nucleic acid targets by base pairing with their crRNA which enables easy re-programming of the target specificity in rapidly emerging genome engineering applications. However, undesired recognition of off-targets, that are only partially complementary to the crRNA, occurs frequently and represents a severe limitation of the technique. Off-targeting lacks comprehensive quantitative understanding and prediction. Here, we present a detailed analysis of the target recognition dynamics by the Cascade surveillance complex on a set of mismatched DNA targets using single-molecule supercoiling experiments. We demonstrate that the observed dynamics can be quantitatively modelled as a random walk over the length of the crRNA-DNA hybrid using a minimal set of parameters. The model accurately describes the recognition of targets with single and double mutations providing an important basis for quantitative off-target predictions. Importantly the model intrinsically accounts for observed bias regarding the position and the proximity between mutations and reveals that the seed length for the initiation of target recognition is controlled by DNA supercoiling rather than the Cascade structure.
Subject terms: Single-molecule biophysics, Enzyme mechanisms
Based on thorough single-molecule analysis of off-target binding by the CRISPR-Cas surveillance complex Cascade, a mechanism-based model is presented that quantitatively describes the recognition dynamics of mutated targets by this protein.
Introduction
CRISPR (clustered regularly interspaced short palindromic repeats)–Cas (CRISPR-associated) systems constitute adaptive RNA-guided defense systems in prokaryotes against foreign nucleic acids1. Cas protein effector complexes guided by the crRNA recognize and trigger subsequent cleavage of invading nucleic acids2. Due to their programmable cleavage specificity, effector complexes such as Cas93, Cas124 and most recently Cascade5 were repurposed as genome editing tools in different model organisms ranging from bacteria to human cells. While the effector complexes can be addressed to target practically any unique sequence in a genome3, they often exhibit significant promiscuity in target recognition that leads to the binding and cleavage of only partially matching DNA sequences6,7. Such off-targeting has been detected using high throughput techniques such as genome-wide in vivo DNA binding and cleavage studies8–10, large on-purpose libraries for reporting DNA binding and cleavage in vivo11,12 as well as in vitro13,14. Off-targeting can result in highly undesired and unpredictable genetic rearrangements which is particularly problematic for therapeutic applications6.
The recent development of engineered effector variants15–17 can reduce but not abolish off-targeting16,18. A frequently used complementary approach to prevent off-targets are in silico off-target predictions that promise to identify crRNAs with the least promiscuity19–21. Prediction tools use typically heuristic scoring functions that try to reproduce sequence and mismatch position patterns from high throughput studies. Though many strong off-target sites are correctly predicted, a considerable fraction of weaker off-targets remains undiscovered by the algorithms9,22, such that off-targeting persists to be a challenging problem of CRISPR-Cas technologies. Furthermore, these algorithms do not provide quantitative measurable parameters and cannot predict how off-targeting changes with altered conditions, such as the local genomic supercoiling or the enzyme concentration.
Along with extensive characterization of off-targeting, considerable mechanistic insight into the target recognition process by CRISPR-Cas effector complexes has been obtained from biochemical, structural and single-molecule studies. A converging theme emerged (Fig. 1a) in which an effector complex first scans duplex DNA for a short complex-specific protospacer adjacent motif (PAM). Upon PAM recognition, it initiates base pairing between the crRNA and the PAM adjacent bases of the DNA target strand. The RNA-DNA heteroduplex can then reversibly expand expelling the non-target DNA strand and forming a triple-stranded R-loop structure. Upon full-length R-loop formation up to the PAM-distal end, a conformational change occurs that licenses DNA degradation23–25. For Cascade the latter comprises a global sliding of the Cse1–Cse2 filament26,27 that locks the R-loop in a highly stable conformation25,28 and allows the recruitment of the Cas3 nuclease (step iv in Fig. 1a). The actual target recognition is, however, a strand displacement reaction between the involved nucleic acid strands (steps ii and iii in Fig. 1a), in which the effector complex acts only as a sensor for the R-loop progression. Mismatches can be considered as energy barriers during the reversible R-loop expansion promoting its collapse28. The mismatch strength has hereby shown to be biased with respect to the position; PAM-proximal mismatches within the so-called seed region have been shown to impose stronger inhibition of R-loop formation compared to distal mismatches8,29. In general, the reversible nature of R-loop expansion and collapse in competition with irreversible locking or cleavage impose a kinetic rather than a typical “sticky”, i.e., affinity-based, target recognition mechanism28,30,31.
Despite an increasing detailed mechanistic understanding of target recognition by CRISPR–Cas complexes, the wealth of mechanistic knowledge has until recently32 not been exploited for off-target predictions nor, despite widely suggested, been applied to quantitatively understand the targeting dynamics in mechanistic studies30,31,33.
To establish such a link, we use here single-molecule DNA twist measurements to comprehensively quantify the dynamics of R-loop formation by the Cascade complex from Streptococcus thermophilus. Importantly, we resolve the transient R-loop sub-states on single- and double-mismatched DNA targets which remained hidden for other methods including high-throughput off-targeting measurements13,14. We show that the observed dynamics can be quantitatively modeled by describing R-loop formation as a random walk model in a simplified one-dimensional free energy landscape. The model was adapted from previous descriptions of protein-free strand displacement reactions in dynamic DNA nanotechnology34,35, which have recently been introduced to the CRISPR–Cas field31,32. Importantly, our modeling (i) provides direct evidence that R-loop expansion down to local sub-states follows a random-walk process (ii), shows that the single-base pair stepping of R-loop expansion occurs at a sub-millisecond time scale, (iii) returns absolute free energy penalties imposed by different mismatches, (iv) quantitatively predicts the non-trivial dependence of R-loop formation on the proximity between multiple mismatches and (v) reveals that the length of the seed region in Cascade is a function of the applied supercoiling rather than a structural property.
Overall, our findings establish an important mechanism-based approach for the prediction of off-targeting by Cascade and potentially other genome engineering tools. We furthermore provide quantitative insight into how off-targeting depends on DNA supercoiling, which in eukaryotes has been found to be highly locus- and gene specific36,37.
Results
Modeling the R-loop dynamics of Cascade
Previous investigations of the target recognition by Cascade14,28,31,38 and other effectors13,24,32 strongly suggested that following PAM binding, a dynamic R-loop structure gets nucleated which stochastically grows and shrinks with single base-pair steps31,39. When the R-loop expands until the PAM-distal end, a locking transition and/or cleavage is triggered (Fig. 1a). Recently, models were developed that describe protein-free DNA strand displacement reactions and R-loop formation as a random walk in a one-dimensional free energy landscape32,34,35,40,41. These models differ mainly in their system-specific energy landscapes that allow local rate variations. To develop and test a quantitative model for Cascade, we constructed a suitable one-dimensional free energy landscape for R-loop formation with the R-loop length as reaction coordinate (Fig. 1b). Cascade starts initially in the unbound state () corresponding to an R-loop of zero length. Upon PAM binding an R-loop of 1 bp length shall be nucleated for which a free energy penalty is considered. In a first iteration we assumed that the free energy for increasing R-loop lengths is constant since for each additional base-pair of the heteroduplex a base pair of the DNA-duplex needs to be disrupted (Fig. 1b, top). After full R-loop formation of 32 bp for Cascade, the R-loop enters the locked state (state ) associated with a decrease in free energy by . Negative supercoiling has been shown to assist R-loop formation25,28. This provides a constant negative bias for increasing R-loop lengths (Fig. 1b, middle). Also, protein contacts to the R-loop may contribute an additional favorable or unfavorable bias that we assume to be constant throughout the R-loop. For each mismatch that acts as a barrier for R-loop expansion, a local mismatch penalty was introduced that shifts all steps behind the mismatch equally upwards (Fig. 1b, bottom). A mismatch introduces thus a dynamic intermediate R-loop state , which extends over a few base pairs before the mismatch due to the random-walk nature of the R-loop (Fig. 1b, bottom). In absence of a free energy bias, the kinetic barriers between all transitions were assumed to be identical and described by the unbiased single base-pair stepping rate . In case of bias, as given by the free energy landscape, rate alterations were described using Arrhenius’ law (see Methods for details). With these few assumptions a fully determined linear rate model was obtained. Mean transition times between R-loop states could be calculated by solving the first passage problem for this model (see Supplementary Note 1). This allowed us to obtain the mean R-loop formation time but also the rates for transitions between different R-loop intermediate using , , and in case of mismatches one or several different values for as the only free parameters.
Random walk describes dynamics of intermediate R-loops
To test the applicability of the random walk model, we set out to comprehensively quantify the R-loop dynamics of Cascade using single-molecule DNA twisting experiments25,28,38. Surface-grafted DNA molecules tethering a magnetic bead on their free end were stretched vertically in a magnetic tweezers apparatus42,43. Negative supercoiling introduced by rotating the tweezers magnets provided a DNA length reduction due to the formation of writhe28,44,45 (Fig. 1c). DNA unwinding due to R-loop formation by Cascade absorbs part of the introduced supercoiling and causes a DNA length increase that is proportional to the unwound base pairs43 (Fig. 1c, d). This allows to resolve full, locked R-loop formation as well as dynamic R-loop intermediates24,28,33 (Fig. 1d, e).
We first quantified the dwells of intermediate R-loops of different lengths. We used DNA targets containing a limited number of matching base pairs adjacent to the PAM (from 8 to 22 bp) with the remaining base pairs being mismatched. This way only transitions between the and the states were observed (Fig. 2a and Supplementary Fig. 3a). We furthermore applied different negative supercoiling levels, given as mechanical torque , that were controlled by the applied stretching forces (see Methods). Natural superhelical densities in E. coli cells are in the range of and corresponding to torques between and 46,47 (Methods). Qualitative inspection of the obtained trajectories revealed that the dwell in the state increased with increasing negative supercoiling (Fig. 2b) and increasing length of the R-loop intermediate (Fig. 2c). This intuitively agrees with the free energy landscape of the random-walk model, since increased bias and length lower the free energy of the state and thus increase the energy barrier for a diffusive return to the state (Fig. 2f) leading to increased occupancies when modeling the state (Fig. 2f). Quantitative analysis of the dwells in both states provided the R-loop formation rates (Fig. 2d) and collapse rates of the state (Fig. 2e). was rather independent of the R-loop length and the applied supercoiling. In contrast, the R-loop collapse rate was strongly torque- and R-loop length-dependent and varied over three orders of magnitude. A global fit of the random-walk model to all collapse rates correctly described both the large spread of the rates between the different R-loop lengths as well as the torque dependence for a given R-loop length. This provides direct support that R-loop expansion and retraction follow a random-walk mechanism. Remarkably, the fitting used the unbiased single base-pair stepping rate as the only free parameter. Single base-pair steps during R-loop formation thus occur on the sub-millisecond time scale. Beyond the rates, also the occupancies of the and states were correctly described including their torque and length dependencies (see histograms in Fig. 2b, c).
R-loop dynamics at single mismatches provides mismatch penalties
We next studied the R-loop dynamics on targets containing a single mutation at about half the target length and six consecutive mismatches at the PAM-distal end to prevent R-loop locking25,28. On such a target, the R-loop fluctuated between three states – the , the and a dynamic state of a maximum length of 26 bp (Fig. 3a). Testing the three possible mismatches C:C, C:T, and C:A at position 17 (counting from the PAM) revealed that the mismatch type strongly influenced the transition rates and the occupancies of the three states (Fig. 3b). Furthermore, these parameters were influenced by increasing negative torque, where the state became increasingly populated at the expense of the state (Supplementary Fig. 4a). We quantitatively analyzed the four rates that describe the transitions between adjacent states (Methods, Supplementary Fig. 5). This revealed that the rates and their torque dependences were largely independent of the mismatch type except that described the mismatch passage from the to the state (Figs. 3d, e). Generally, rates describing R-loop expansion () increased with increasing negative torque, while rates describing R-loop retraction () were found to decrease. For a given mismatch, we applied a global fit to the torque dependence of all four rates (solid lines in Figs. 3d and 3e) yielding good agreement with the data. Consistently, expected occupancies of the , , and calculated from the best fit parameters were also in agreement with the measurements (Fig. 3b and Supplementary Fig. 4a right panels). The obtained values for (Supplementary Fig. 4b) and (Supplementary Fig. 4c) were mismatch-independent while the mismatch penalty (Fig. 3c left panel, Supplementary Table 1) was strongly mismatch-dependent. This is consistent with the intuitive expectation that the mismatch type influences only the corresponding penalty but not the other parameters. Applying different Cascade concentrations revealed a linear dependence of the initial R-loop intermediate formation rate on the concentration while leaving the other rates unchanged (Supplementary Fig. 4d–g) in agreement with concentration-independent values for the standard free energy of R-loop initiation and the other model parameters (Supplementary Fig. 4f). Altogether, the three-state dynamics over a single mismatch was well described by the random walk model.
Next, we investigated how depended on the mismatch position. When keeping the same mismatch type including the same nearest-neighbor base pairs, should be largely position independent if nucleic acid thermodynamics would dominate. To test this, we produced different Cascade complexes with a CCC stretch at different positions in the crRNA (Supplementary Tables 3 and 4) allowing a corresponding introduction of a C:C mismatch with two adjacent G:C base pairs (see sketch in Fig. 3c, right panel). Recording and fitting the torque dependence of the different transition rates for these complexes revealed that was within error invariant for mismatch positions between 11 and 17 bp that could be experimentally accessed (Fig. 3c right panel, Supplementary Table 1). We note that mismatch barriers were not observable at positions 6, 12, 18, 24, and 308,28 due to the disrupted base pairing in the crRNA–DNA hybrid at these positions48.
Seed length is dependent on supercoiling
A mismatch penalty that is rather independent of the mismatch position seems to contradict the larger impact of PAM-proximal mismatches in the seed region compared to PAM-distal mismatches observed in vivo8,29 and in vitro14,28. To resolve this apparent contradiction, we expanded the range of mismatch positions from 5 to 21 bp. Since for many of these targets intermediates were too short-lived to be observable, we measured only the time for full R-loop formation (state ) using a full-length target that supported locking (Fig. 4a and Supplementary Fig. 3b–e). In agreement with previous reports28,29, R-loop formation was slower for targets with PAM-proximal mismatches compared to PAM distal mismatches and the WT target (Supplementary Fig. 6a, b). We determined the mean R-loop formation time as a function of torque for the WT target and the single mismatch targets (Fig. 4b). R-loop formation for the WT target was little dependent on torque in the applied range. The R-loop formation times for single-mismatch targets decreased, however, strongly in a non-linear fashion with increasing negative torque and finally plateaued at the WT level (Fig. 4b and Supplementary Fig. 6d). The torque required for reaching the WT level was changing monotonously with mismatch position. We applied a global fit of the random walk model to the data using and obtained agreement with the experimental results (Fig. 4b and Supplementary Fig. 6d). The consideration of single mismatch targets alongside the WT target in this data set as well as the highly non-linear torque dependence allowed furthermore to probe a potential intrinsic bias of the free energy landscape in absence of torque. Global fitting with a free non-zero bias provided as best fit parameters , as well as . The positive value of reveals that in absence of torque the free energy landscape has a small upward bias (Supplementary Fig. 6g) corresponding to an apparent torque of ~1 pN nm which is much smaller than base pairing energies and mismatch penalties. Inclusion of the determined into fits of the previous data were mainly compensated by changes of while the obtained free energy values became only slightly reduced (see Supplementary Table 2).
To explore the impact of seed mismatches in more detail, we plotted the measured R-loop formation times for selected torque values as a function of the mismatch position (Fig. 4c). Using a 10-fold increased R-loop formation time compared to the WT target as a hypothetical threshold for a seed mismatch, we observed that the length of the seed region decreased with increasing torque from ~6 bp at −6.7 pN nm to ~15 bp at −3.4 pN nm. Predicting the R-loop formation time in absence of an external torque suggested that in this case, the seed region would cover almost the entire target sequence (see black line in Fig. 4c). To verify the prediction, we set up a fluorescence bulk solution assay based on a donor-quencher pair at the PAM distal end (Supplementary Fig. 6e) and measured the R-loop formation kinetics in the absence of supercoiling for different mismatch positions (Supplementary Fig. 6f). The extracted mean times of R-loop formation confirmed the theoretical prediction (Fig. 4c) demonstrating that the seed region in Cascade is mainly a product of the applied supercoiling, i.e., the bias of the free energy landscape. This can be intuitively understood by considering the different occupancies of the intermediate R-loop state for PAM-proximal and distal mismatches, which determines the full R-loop formation rate. In absence of a bias, all R-loop lengths up to the mismatch are energetically equal and thus equally populated, which supports similar R-loop formation times (Fig. 4d upper panel). In presence of a negative bias, the state of a PAM-proximal mismatch is energetically higher with respect to the state and thus less populated compared to a PAM-distal mismatch (Fig. 4d lower panel). This provides a comparably slower transition to the state for the PAM-proximal mismatch.
Intermediate R-loop dynamics in the presence of two mismatches
After verifying the random walk model for targets with single mismatches, we next tested whether it can be directly applied to describe the R-loop dynamics in presence of two mismatches. We produced double point mutants with the first mismatch located at positions 11, 13 or 14 and the second at position 17 within the locking deficient target (PAM-distal mismatches at positions 27-32). In this case R-loops could fluctuate between four possible states: , the intermediate states before each mismatch and as well as (Fig. 5a). For the 11–17 double mismatch target, the and the states could be distinguished, but the transitions between the states were too fast to be resolved for the closer mismatch spacings (Fig. 5b). State occupancies and transition dynamics were again torque-dependent (Supplementary Fig. 7c). Using the best-fit parameters from the single mismatch experiments, the random walk model predicted the measured state occupancies remarkably well (Fig. 5b and Supplementary Fig. 7c right panels). A 4-state approximation of the recorded trajectories for the 11–17 double mismatch substrate (Supplementary Fig. 7c, bottom panel) allowed to extract the six transition rates between subsequent states. Consistently, we obtained agreement for the extracted rates from the measurements and from Brownian dynamics simulations based on the model predictions (Supplementary Fig. 7a, b, d). Of note, the state was less frequently visited as closer the two mismatches were positioned (Fig. 5b) revealing that the mismatch proximity influences the formation of the full R-loop.
Proximity between double mismatches strongly influences R-loop formation
To investigate the influence of the proximity between two mismatches in detail, we studied locked R-loop formation on double C:C mismatch targets without terminal mismatches (Fig. 6a). Predictions by the random walk model showed that a strong inhibition is obtained when combining two PAM-proximal mismatches (brown areas in Fig. 6b, lower left corner) while a weak inhibition is obtained when combining two PAM-distal mismatches (blue areas in Fig. 6b). However, a considerable inhibition is also obtained when combining two PAM-distal mismatches that are in close proximity (green tail along the diagonal in Fig. 6b). This influence of the proximity between mismatches was only obtained for the random walk model but not for a simple addition of apparent free energy penalties (Fig. 6b, upper right, Supplementary Fig. 8c, d) as applied in heuristic scoring schemes11.
We next determined R-loop formation times as a function of torque for a second mismatch at the fixed position 17 and distances to the first mismatch ranging from 2 to 6 bp (Supplementary Fig. 8a, b). The mean R-loop formation time at a given torque increased strongly with decreasing distance between the two mismatches (Fig. 6c). This dependence on the mismatch proximity was quantitatively described by the random-walk model using the parameters from single-mismatch experiments (Fig. 6c, d). Intuitively, the proximity dependence can be understood by considering that for close mismatches the state is energetically lower than for distant mismatches, such that the occupancy of the is lowered at the expense of the state (Fig. 5c). This in turn provides less frequent transitions to the state for proximal compared to distal mismatches.
Discussion
In this study, we presented a highly comprehensive investigation of the R-loop dynamics within a CRISPR-Cas type I Cascade effector complex. Using direct measurements of the DNA untwisting during R-loop formation, we could uniquely resolve multiple R-loop intermediates on mutated DNA targets and carefully quantify their behavior as a function of the positions and types of the formed mismatches.
Importantly, the diverse data set could be consistently described by modeling the R-loop formation process as a random walk in a simplified one-dimensional free energy landscape. Previous work modeled the impact of the mismatch position on the global strand-displacement or R-loop formation reaction32,35 and reproduced general observed dependencies. Here, we demonstrate in great detail that the entire R-loop formation dynamics, including the substates, follows a random walk mechanism. Thus, the forming R-loop samples the continuum of available R-loop lengths, and unlocked R-loop intermediates dynamically extend over a range of different lengths (Fig. 2f). The time scale of the single base-pair R-loops steps was found to be in the sub-millisecond range (Supplementary Tables 1 and 2). This allows for a reversible, highly dynamic sampling of the sequence space during target search in which mismatches stall the R-loop formation process and promote its collapse (Fig. 4a). Combined with the irreversible locking step, this gives rise to a kinetic target recognition process in contrast to typical binding-affinity-based mechanisms30. In principle, given sufficient time, any target can become recognized. This sets important constraints when discussing specificities of effector complex variants15,16.
Overall, our data is well described by a rather simple form of the free energy landscape which in absence of mismatches exhibits just a constant supercoiling-dominated bias (Fig. 1b). This strongly suggests that a global and rather uniform bias will be the dominating feature of the free energy landscape of R-loop formation by Cascade. Some deviations between data and model do occur, however (see e.g., Fig. 2e). The relative rate deviations appear small (typically smaller than a factor of two) compared to the orders of magnitude covered and are not too surprising given the simplicity of the applied landscape. In addition to experimental errors, we mainly attribute the deviations to sequence- and/or structure-dependent local alterations of the free energy landscape from the simple model. As noted before, the crRNA-DNA hybrid displays disrupted base pairing every 6 bp48. A reduced stabilization of the unpaired DNA base and the absence of base stacking interactions may make these positions energetically less favorable, such that the global downhill bias of the free energy landscape will likely exhibit a periodic modulation. This may further be enhanced by the 6-bp periodicity of the Cas7 backbone, e.g., due to periodically recurring interaction sites with the DNA duplex during R-loop expansion26. This modulation does not appear to have a major impact on the observed trends of the data (see e.g., the R-loop length dependence in Fig. 2e) but may cause smaller alterations. Nonetheless, the modulation of the free energy landscape may give rise to short-lived kinetic R-loop intermediates every 6-bp which would be in agreement with a kinetically metastable partial R-loop structure of Cascade found at low temperature26.
An important finding of our data and modeling is that the seed sequence observed for all R-loop forming CRISPR-Cas effector complexes49 is at least partially a biophysical consequence of the external DNA supercoiling. In absence of supercoiling, the free energy landscape of Cascade was found to be a little biased. This provided a rather position-independent impact of the mismatches which can be interpreted as an ‘extended seed’ for Cascade targeting (Fig. 4c). In contrast, at sufficient negative bias of the free energy landscape induced by supercoiling, the target recognition was only little affected by PAM-distal mismatches which strongly limited the seed extension. The observed torque-dependence of the seed length (Fig. 4b) can explain observations of a short well-defined seed region in the cellular environment8,29 in contrast to more relaxed seed conditions in vitro in absence of supercoiling14,28. The absence of a well-defined seed range is in agreement with structures of Type I-E Cascade complexes26,48,50 where specialized motives in the PAM-proximal region of the R-loop were not observed. Generally, seed regions for RNA-guided nucleic acid recognition can be structurally determined, e.g., by a specific pre-ordering of the RNA-guide in the PAM-proximal region as observed for Cas951 and Cas12a52. In contrast, the guide RNA of Cascade appears to be ordered throughout its length53 thus enabling an ‘extended seed’.
The supercoil- and position-dependence of the mismatch impact directly affects the specificity of the target recognition process. CRISPR-Cas effector complexes have predominantly evolved for activity on negatively supercoiled DNA as typically found in prokaryotic cells54. These conditions would provide less stringent specificities but accelerated target recognition kinetics. In genome engineering applications of eukaryotic cells, however, only moderate supercoiling levels are present, which would make the target recognition more specific but also slower. Noticeably, the supercoiling in eukaryotic cells is highly locus specific36,37 which should be considered when developing improved locus-specific off-target predictors.
From the R-loop dynamics on targets with single mismatches we could directly obtain values for individual mismatch penalties. The position independence of the penalties suggested that the random-walk model accounted correctly for the position-dependent bias observed previously14. The obtained free energy penalties for the three different mismatches were about 4 kBT lower than mismatch penalties within DNA duplexes in absence of proteins55 but had similar relative differences (Supplementary Table 1). Furthermore, their average magnitude, as well as relative order, were comparable to apparent penalties determined in high-throughput in vitro binding experiments of a thermophilic Cascade complex (Supplementary Table 1)14. Given the position-independence of the mismatch penalties in our experiments, we expect that they are dominated by nucleic-acid thermodynamics but somewhat lowered due to the hybrid nature and the enforced distorted A-form of the crRNA-target strand duplex.
Most importantly, we could directly apply the single-mismatch penalties to quantitatively predict the R-loop formation dynamics on targets with two mismatches. Particularly, the random-walk model could correctly describe the proximity-dependence, i.e., an increased inhibition of R-loop formation with decreasing mismatch distance (Fig. 6c). Given its applicability to double mismatches during R-loop formation by Cascade as well as in protein-free strand displacement reactions35, we expect that this type of models can be easily extended to larger numbers and any types of mismatches. For such mechanism-based off-target predictors the presented work provides a thorough validation.
Generally, there are two major challenges for the establishment of mechanism-based off-target predictors: (i) the parametrization of all possible mismatch penalties corresponding to at least 48 parameters due to 12 different mismatch types and 4 different nearest neighbor base pairs onto which a mismatch can stack55 and (ii) the determination of more detailed free-energy landscapes particularly for effector complexes with structurally determined seed motifs that will introduce localized supercoil-independent bias. In a recent analysis of high-throughput data from in vitro target libraries13,14,32 a random walk model was employed to infer a local free energy landscape for R-loop formation by Cas9 revealing significant local bias. Despite the usage of only a single value for all mismatch penalties, an improved off-target classifier was obtained. This strongly supports the potential of mechanism-based off-target predictors. For their further improvement, we think that a combination of single-molecule measurements as presented here and of high-throughput data will be crucial. Biophysical measurements with improved temporal resolution56 should allow to directly infer the free energy landscapes of R-loop formation for different effector complexes. In turn, high-throughput data could subsequently be used to independently parametrize mismatch penalties using a known free energy landscape. Together, this should enable better mechanism-based prediction and thus a more rational selection of target sequences for precise programming of DNA editing tools with low off-target probabilities.
In addition to the prediction of target recognition of existing effectors, the established modeling approach can also be used to devise theoretical ‘optimum free energy landscapes’ of R-loop formation that support maximized targeting specificity and efficiency at the same time. This can help to evaluate novel effector complexes and rationally guide further engineering approaches of high-fidelity variants15,16,57.
Methods
One-dimensional random-walk model for R-loop formation
The R-loop dynamics was modeled as a random walk on a one-dimensional 1 bp lattice within a simplified free energy landscape based on the energy parameters , , , and as described in the main text (see also Fig. 1b). The rate model of the random walk was parameterized based on the principle of detailed balance which relates the ratio of the forward (indicated by ‘+’) and the backward (indicated by ‘-‘) rate constants between subsequent positions and to the free-energy difference between these positions:
1 |
For the unbiased free energy landscape with (Fig. 1b, top), we assume an equal rate for all transitions between R-loop states (i.e., equal kinetic barriers) except for transitions from the (and ) states. This excludes any sequence dependence of the stepping rates. PAM binding is comprised in the formation step of the first R-loop base pair without a distinct kinetic barrier for dissociation from the PAM. Based on these considerations the rate for R-loop initiation is given as
2 |
The dependence of R-loop initiation on the Cascade concentration was included by considering the contribution of the chemical potential of the Cascade complexes to
3 |
where is the standard initiation penalty at a reference concentration .
For a mismatch between crRNA and DNA target strand at position , we assume that the rate limiting step for mismatch establishment is the disruption of the DNA base pair, such that it occurs at the normal R-loop extension rate . Detailed balance provides in this case an increased rate for R-loop retraction that eliminates the mismatch
4 |
in agreement with the rate-limiting step being facilitated by destabilized base-pairing in the heteroduplex as also indicated by the lowered kinetic barrier at the mismatch position in Fig. 1b.
The applied negative supercoiling causes a constant bias of the free energy landscape per bp in the regime where the DNA length decreases linearly with the applied turns. The torque , which is the quantitative parameter of how the applied supercoiling stresses the DNA helix, is set by the applied force in the magnetic tweezers experiments. It was estimated as described before58. The bias per bp from the torque equals the work done against the torque
5 |
where is the angle by which the DNA becomes untwisted per R-loop expansion by 1 bp25. Assuming that the transition barrier is centered between two subsequent R-loop positions, R-loop expansion and retraction would both be affected by half of the bias providing the following corrections of all forward and backward rates for the acting torque
6 |
7 |
with being any valid position of the free energy landscape. These definitions provide a full parameterization of the rate model that describes the random walk. Mean transition times between any starting state and any end state were calculated by solving the first passage problem for this model (see Supplementary Note 1). To this end, transmissive boundaries were placed at the positions of end states and a single particle was added to the system. Upon arrival of the particle at a transmissive boundary it was instantaneously set to the start state. The mean transition time was then calculated from the steady-state flux of the single particle inside the rate landscape (see Supplementary Note 1). For intermediate R-loop states that are dynamic and extend over several base pairs, the position with the lowest free energy was taken as the state position. For DNA targets with a continuous stretch of mismatches at the PAM-distal end, the free energy landscape was cut off at the first mismatch position corresponding to an infinite free energy at this position.
For kinetic random walk simulations of R-loop length fluctuations (see Supplementary Fig. 5), we constructed the free energy landscape for a given target and calculated for each step of the lattice the forward and backward stepping rate constants and . Per simulation time step , a single bp forward or backward step was taken with probability or , respectively. was chosen sufficiently small, such that the stepping probabilities were much smaller than one.
DNA and proteins
The 2.1 kbp DNA constructs with additional biotin- and digoxigenin modified attachment handles at either end used in the magnetic tweezers experiments were prepared as previously described25,59. For each DNA target presented in this study a 73 bp blunt ended oligonucleotide duplex carrying the 35 bp long target sequence was cloned into the SmaI site of a pUC19 plasmid. From the plasmids, 2.1 kbp DNA fragments containing the targets as well as SpeI and NotI restriction enzyme sites at either end were produced by PCR (primers 1 and 2 in Supplementary Table 3). Biotinylated and digoxigenin-modified ~1.2 kbp DNA fragments were produced by PCR from pBluescript II SK+ with its multiple cloning site located approximately in the center of the fragments (primers 3 and 4 in Supplementary Table 3). The biotinylated and digoxigenin-modified fragments were digested with SpeI and NotI, respectively to yield modified ~600 bp attachment handles. Following digestion of the 2.1 kbp target fragment with both restriction enzymes, it was ligated with the handles to form the final 3.3 kbp DNA construct used in magnetic tweezer experiments. For the production of complexes with different crRNAs all spacer variants were introduced into the produced vector pACYCminCR-Eco31I/SapI through the SapI and Eco31I sites using synthetic oligonucleotides with corresponding single stranded overhangs (Supplementary Table 4)17. Cascade complexes with different crRNAs were expressed in E. coli BL21 (DE3) cells and purified as described60 using the pACYC-minCR derivatives (CmR) instead of the pACYC plasmid with homogeneous CRISPR region pCRh.
Magnetic tweezers experiments
R-loop formation experiments were performed in 20 mM Tris-HCl (pH 8.0), 150 mM NaCl and 0.1 mg ml−1 BSA at 170 pM (for experiments described in Figs. 2 and 3) or 0.5 nM (for experiments described in Figs. 4–6) Cascade using a custom-built magnetic tweezers setup61 at room temperature (25 °C). DNA molecules were bound to 0.5 µm streptavidin-coated magnetic beads (MasterBeads; Ademtech) and added into the antidigoxigenin-coated flow cell to form tethers at the bottom surface59,62. Single supercoilable molecules were selected43. The DNA length was determined at 120 Hz by videomicroscopy and real-time GPU-accelerated three-dimensional particle tracking63 from the position of the magnetic bead with respect to a surface-bound non-magnetic reference bead (Dynospheres; Invitrogen). Forces were calibrated from the lateral fluctuations of the DNA-tethered magnetic beads64. Torque values were calculated based on previous theoretical work58,65. For experiments in which dynamic sampling of R-loop intermediates in absence of locking was investigated (Figs. 2, 3, and 5), Cascade was added to the flow cell and the DNA molecule was negatively supercoiled (see Supplementary Fig. 3a). The number of negative turns depended on the applied force and on the change of supercoiling following the formation of the R-loop. Generally we aimed that DNA length transitions occur around half the relaxed molecule length at the given force and thus in the linear regime of the supercoiling curve28,43. This way the torque on the DNA stayed approximately constant. For experiments where R-loops became locked (Figs. 4 and 6), R-loop formation was induced as described before. To remove Cascade complexes with locked R-loops, the DNA molecule was positively supercoiled to (+10–14 turns, depending on the force and on the length of the individual DNA molecule) and the force was increased to ~2–3 pN to provide a high positive torque that would ‘wring out’ the R-loop. R-loop dissociation was seen as a sudden DNA length increase (see Supplementary Fig. 3b–e for the full procedure). R-loop formation–dissociation cycles were constantly repeated to obtain ≥25 individual events per applied condition.
Fluorescence bulk solution experiments
All oligonucleotides for the zero torque measurements are shown in Supplementary Table 5 and were purchased HPLC-purified from Sigma-Aldrich. Shipping concentrations of 100 μM were evaluated by measuring the absorbance at 260 nm using a P-330 NanoPhotometer (Implen). Complementary strands were then annealed at a concentration of 1 μM in buffer containing 10 mM Tris-HCl (pH of 8.0), 50 mM NaCl, and 1 mM EDTA and slow cooling from 95 to the storage temperature of 4 °C at 1 K/min.
All measurements were performed in a temperature controlled Cary Eclipse at 25 °C in 1500 μL cuvettes. Before each measurement, cuvettes were rinsed 5 times with ethanol, 5 times with mili-Q, incubated overnight in 2% Hellmanex 3 solution, and again rinsed 5 times with ethanol and 5 times with mili-Q.
In the beginning, a 1350 μL solution containing the double stranded DNA was measured for 600 s to obtain the ground level (9/10 of mean signal amplitude). Afterwards the reaction was started by quickly adding 150 μL of solution containing Cascade. Reaction conditions were 10 nM of dsDNA and 2 nM Cascade in a buffer containing 20 mM Tris−HCl (pH of 8.0), 150 mM NaCl and 0.1 μg/μL BSA.
The negative control (no protein added) was then subtracted from the measured trajectories. The fluorescence signal was then fitted to a sum of three exponentials of the form
8 |
As quantity of interest the mean time to overcome all three steps was taken:
9 |
Data analysis
The time resolution of the magnetic tweezers measurements can be approximated using , where is the signal-to-noise ratio and the characteristic DNA-length changes that need to be resolved for which we assumed and . The spring constant of the supercoiled DNA and the drag coefficient of the magnetic bead for axial displacements were determined from DNA length trajectories yielding and 64. This provided , i.e., a detection bandwith of . DNA length trajectories recorded at 120 Hz were therefore smoothed with a sliding average to 7.5 Hz. Transitions between the different R-loop states to generate 2-, 3-, and 4-state approximations of the R-loop trajectories were obtained by hidden Markov modeling using the vbFRET software package66. For the vbFRET software package, default parameters were used except for the number of expected states that was fixed to 2, 3, or 4 for targets containing no (Fig. 2), one (Fig. 3) or two (Fig. 5) internal mismatches, respectively. From the discrete-state trajectories, dwell time distributions and transition rates were extracted using MATLAB (MathWorks) and LabVIEW (National Instruments). This included the generation of cumulative dwell time distributions for individual states, which were fitted to single exponential functions to obtain mean dwell times and the corresponding transition rates. For the latter, the transition probabilities to neighboring states were correspondingly considered. For experiments in which dynamic sampling of R-loop intermediates was investigated (Figs. 2, 3, and 5), trajectories of at least 3000 s were recorded for each condition including typically ~1000 transitions. For experiments where R-loops became locked (Figs. 4 and 6), ≥25 locking events were obtained for each condition to determine mean R-loop formation times. All rate and time error bars represent the standard error of the mean. Particularly, the error of mean dwell times was calculated by dividing the mean time by the square root of the number of events. Errors for all fit parameters are given as single confidence intervals.
To verify that the temporal resolution of our nanomechanical system (bead on supercoiled DNA, see above) was sufficient to resolve the extracted R-loop transitions, we employed Brownian dynamics simulations to simulate the magnetic tweezers measurement of dynamic R-loop sampling. We first employed kinetic random-walk simulations to simulate trajectories of the R-loop dynamics (R-loop length over time) on a one-dimensional 1 bp lattice within the corresponding free energy landscape (see model description above) using the experimentally obtained parameters (Supplementary Table 1). Using the slope of the supercoiling curve at the corresponding force, we converted the R-loop length into an expected equilibrium extension of the DNA. We furthermore modeled the diffusive fluctuations of the magnetic bead and its response to DNA extension changes by one-dimensional Brownian dynamics simulations64. In brief, a deviation of the fluctuating bead from the equilibrium extension, caused a back-driving drift force due to the stretch elasticity of supercoiled DNA (comprising components of the DNA twist elasticity and the entropic elasticity of stretched DNA). Per time increment of the simulation , caused a displacement of , with being the steady state drift velocity of the bead inside the viscous medium for low Reynolds numbers. The viscous drag coefficient of a spherical particle with radius inside a medium with viscosity was given by the Stokes formula . Within , we furthermore considered random diffusive displacements that were drawn from a Gaussian distribution with zero mean and a variance of with being the diffusion coefficient for the particle. By successively updating the bead position by a total displacement of per time increment, we obtained the magnetic bead fluctuations in response to the R-loop length fluctuations (see light red trajectory in Supplementary Fig. 5a–d and Supplementary Fig. 7). The spring constant describing the stretch elasticity of supercoiled DNA as well as the effective hydrodynamic bead radius were obtained from power spectral density analysis of corresponding experimental trajectories of the length fluctuations of supercoiled DNA64.
We next identified for the simulated magnetic tweezers trajectories transitions between different R-loop intermediates using vbFRET and compared them with the actual transitions of the R-loop length simulations. We could correctly identify transitions for simple R-loop intermediates (as in Fig. 2, main text) as well as for the 3-state transitions observed for sufficiently strong single mismatches (C:C and C:T, see Fig. 3, main text, and Supplementary Fig. 5a, b). For targets containing a single C:A mismatch we observed a considerably slower collapse of the full R-loop state (rate ) compared to the C:T and C:C mismatches (Fig. 3b and Supplementary Fig. 5e) despite the anticipated independence of the R-loop collapse on the mismatch strength. We therefore hypothesized for this weak mismatch that transitions over the mismatch between the to the states (rates ) were too fast to be reliably detected given the temporal resolution of our setup. To correct for the undetected transitions between and states we increased the R-loop collapse rate to the level measured for C:C and C:T mismatches (), which proportionally also increases the R-loop formation rate and used the adjusted rates for characterizing the mismatch penalty (see Supplementary Fig. 5e). We additionally carried out simulations using the adjusted rates. Simulations of the R-loop length fluctuations provided the expected fast transitions (see Supplementary Fig. 5e, green data points). The simulated magnetic tweezers trajectories provided significantly lower transition rates that agreed with the experimental rates, supporting a correct adjustment of the rates for this weak mismatch. The correction procedure was also applied for extracting transition rates for the dynamic sampling of R-loop intermediates in case of double mismatches (positions 11, 17, Fig. 5e and Supplementary Fig. 7). While transitions between and as well as between and were correctly reproduced by the simulations, part of the fast transitions between the and states were not detected (Supplementary Fig. 7). Transition rate comparisons obtained from each trajectory are represented in Fig. 5e.
Estimation of the DNA torque in E. coli cells
Typical superhelical densities (i.e., the number of added superhelical turns per helical turns of the relaxed DNA) found in E. coli cells range from to 46,47. For plasmid DNA added superhelical turns are partitioned between writhe and twist at a ratio of 0.8 to 0.267. Thus, the superhelical density contributing to the DNA twist is
10 |
The torque in a twisted semiflexible polymer of length can be calculated from
11 |
where is the torsional persistence length and the twist angle. The number of helical turns within is given by , where is the helical pitch of B-form DNA. The twist angle is then given by . Inserting these relationships in the torque equation gives
12 |
Using 68–70, we get for the typical superhelical densities in E. coli cells torques between and .
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Acknowledgements
We would like to thank Xenia Auerbach for the assistance in the development of the code, used in this study. We would also like to thank Andrey Krivoy, Dominik J. Kauert, Julene Madariaga-Marcos and Kristina Kasaciunaite for valuable discussions during the preparation of this manuscript.
Source data
Author contributions
M.R. performed single molecule studies and analyzed the data. I.S. and T.S. purified proteins used in this study. P.I. performed fluorescence measurements presented in this study and analyzed the data. F.E.K. developed single molecule data analysis software. V.S. and R.S. designed the study. All authors interpreted the results and provided comments to the manuscript. All authors contributed to the preparation of the paper.
Peer review
Peer review information
Nature Communications thanks Chirlmin Joo and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Funding
This work was supported by a consolidator grant of the European Research Council (GA 724863) and by the Deutsche Forschungsgemeinschaft (DFG, grant SE 1646/9-1 within priority program 2141) to R.S. Open Access funding enabled and organized by Projekt DEAL.
Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author(s) upon request. Source data are provided with this paper. Minimal raw dataset required to reproduce data published in this paper is available at Zenodo database. Source data are provided with this paper.
Code availability
The custom-made LabVIEW code for the analysis of magnetic tweezers data is available at Zenodo: 10.5281/zenodo.7328018.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Virginijus Siksnys, Email: siksnys@ibt.lt.
Ralf Seidel, Email: ralf.seidel@physik.uni-leipzig.de.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-022-35116-5.
References
- 1.Barrangou R, et al. CRISPR provides acquired resistance against viruses in prokaryotes. Science. 2007;315:1709–1712. doi: 10.1126/science.1138140. [DOI] [PubMed] [Google Scholar]
- 2.Makarova KS, et al. Evolutionary classification of CRISPR–Cas systems: a burst of class 2 and derived variants. Nat. Rev. Microbiol. 2020;18:67–83. doi: 10.1038/s41579-019-0299-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Cong L, et al. Multiplex genome engineering using CRISPR/Cas systems. Science. 2013;339:819–823. doi: 10.1126/science.1231143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Zetsche B, et al. Multiplex gene editing by CRISPR-Cpf1 using a single crRNA array. Nat. Biotechnol. 2017;35:178. doi: 10.1038/nbt0217-178b. [DOI] [PubMed] [Google Scholar]
- 5.Cameron P, et al. Harnessing type I CRISPR–Cas systems for genome engineering in human cells. Nat. Biotechnol. 2019;37:1471–1477. doi: 10.1038/s41587-019-0310-0. [DOI] [PubMed] [Google Scholar]
- 6.Fu Y, et al. High-frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells. Nat. Biotechnol. 2013;31:822–826. doi: 10.1038/nbt.2623. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Pattanayak V, et al. High-throughput profiling of off-target DNA cleavage reveals RNA-programmed Cas9 nuclease specificity. Nat. Biotechnol. 2013;31:839–843. doi: 10.1038/nbt.2673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Fineran PC, et al. Degenerate target sites mediate rapid primed CRISPR adaptation. Proc. Natl Acad. Sci. USA. 2014;111:675. doi: 10.1073/pnas.1400071111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Kim D, et al. Digenome-seq: Genome-wide profiling of CRISPR-Cas9 off-target effects in human cells. Nat. Methods. 2015;12:237–243. doi: 10.1038/nmeth.3284. [DOI] [PubMed] [Google Scholar]
- 10.Wienert B, et al. Unbiased detection of CRISPR off-targets in vivo using DISCOVER-Seq. Science. 2019;364:286–289. doi: 10.1126/science.aav9023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Doench JG, et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 2016;34:184–191. doi: 10.1038/nbt.3437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Listgarten J, et al. Prediction of off-target activities for the end-to-end design of CRISPR guide RNAs. Nat. Biomed. Eng. 2018;2:38–47. doi: 10.1038/s41551-017-0178-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Boyle EA, et al. High-throughput biochemical profiling reveals sequence determinants of dCas9 off-target binding and unbinding. Proc. Natl Acad. Sci. USA. 2017;114:5461–5466. doi: 10.1073/pnas.1700557114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Jung C, et al. Massively parallel biophysical analysis of CRISPR-Cas complexes on next generation sequencing chips. Cell. 2017;170:35–47.e13. doi: 10.1016/j.cell.2017.05.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kleinstiver BP, et al. High-fidelity CRISPR-Cas9 nucleases with no detectable genome-wide off-target effects. Nature. 2016;529:490–495. doi: 10.1038/nature16526. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Slaymaker IM, et al. Rationally engineered Cas9 nucleases with improved specificity. Science. 2016;351:84–88. doi: 10.1126/science.aad5227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Songailiene I, et al. Decision-making in cascade complexes harboring crRNAs of altered length. Cell Rep. 2019;28:3157–3166.e4. doi: 10.1016/j.celrep.2019.08.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Frock RL, et al. Genome-wide detection of DNA double-stranded breaks induced by engineered nucleases. Nat. Biotechnol. 2015;33:179–186. doi: 10.1038/nbt.3101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Cho SW, et al. Analysis of off-target effects of CRISPR/Cas-derived RNA-guided endonucleases and nickases. Genome Res. 2014;24:132–141. doi: 10.1101/gr.162339.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Haeussler M, et al. Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR. Genome Biol. 2016;17:707. doi: 10.1186/s13059-016-1012-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Lin J, Wong KC. Off-target predictions in CRISPR-Cas9 gene editing using deep learning. Bioinformatics. 2018;34:i656–i663. doi: 10.1093/bioinformatics/bty554. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Tsai SQ, et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nat. Biotechnol. 2015;33:187–197. doi: 10.1038/nbt.3117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Sternberg SH, Lafrance B, Kaplan M, Doudna JA. Conformational control of DNA target cleavage by CRISPR-Cas9. Nature. 2015;527:110–113. doi: 10.1038/nature15544. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Sternberg, S. H., Redding, S., Jinek, M., Greene, E. C. & Doudna, J. A. DNA interrogation by the CRISPR RNA-guided endonuclease Cas9. Nature10.1038/nature13011 (2014). [DOI] [PMC free article] [PubMed]
- 25.Szczelkun MD, et al. Direct observation of R-loop formation by single RNA-guided Cas9 and Cascade effector complexes. Proc. Natl Acad. Sci. USA. 2014;111:9798–9803. doi: 10.1073/pnas.1402597111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Xiao Y, et al. Structure basis for directional R-loop formation and substrate handover mechanisms in type I CRISPR-Cas system. Cell. 2017;170:48–60. doi: 10.1016/j.cell.2017.06.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Wiedenheft B, et al. Structures of the RNA-guided surveillance complex from a bacterial immune system. Nature. 2011;477:486–489. doi: 10.1038/nature10402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Rutkauskas M, et al. Directional R-loop formation by the CRISPR-cas surveillance complex cascade provides efficient off-target site rejection. Cell Rep. 2015;10:1534–1543. doi: 10.1016/j.celrep.2015.01.067. [DOI] [PubMed] [Google Scholar]
- 29.Semenova E, et al. Interference by clustered regularly interspaced short palindromic repeat (CRISPR) RNA is governed by a seed sequence. Proc. Natl Acad. Sci. USA. 2011;108:10098–10103. doi: 10.1073/pnas.1104144108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Bisaria N, Jarmoskaite I, Herschlag D. Lessons from enzyme kinetics reveal specificity principles for RNA-guided nucleases in RNA interference and CRISPR-based genome editing. Cell Syst. 2017;4:21–29. doi: 10.1016/j.cels.2016.12.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Klein M, Eslami-Mossallam B, Arroyo DG, Depken M. Hybridization kinetics explains CRISPR-Cas off-targeting rules. Cell Rep. 2018;22:1413–1423. doi: 10.1016/j.celrep.2018.01.045. [DOI] [PubMed] [Google Scholar]
- 32.Eslami-Mossallam B, et al. A kinetic model predicts SpCas9 activity, improves off-target classification, and reveals the physical basis of targeting fidelity. Nat. Commun. 2022;13:347. doi: 10.1038/s41467-022-28994-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Singh D, et al. Mechanisms of improved specificity of engineered Cas9s revealed by single-molecule FRET analysis. Nat. Struct. Mol. Biol. 2018;25:347–354. doi: 10.1038/s41594-018-0051-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Srinivas N, et al. On the biophysics and kinetics of toehold-mediated DNA strand displacement. Nucleic Acids Res. 2013;41:10641–10658. doi: 10.1093/nar/gkt801. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Irmisch P, Ouldridge TE, Seidel R. Modeling DNA-strand displacement reactions in the presence of base-pair mismatches. J. Am. Chem. Soc. 2020;142:11451–11463. doi: 10.1021/jacs.0c03105. [DOI] [PubMed] [Google Scholar]
- 36.Kouzine F, et al. Transcription-dependent dynamic supercoiling is a short-range genomic force. Nat. Struct. Mol. Biol. 2013;20:396–403. doi: 10.1038/nsmb.2517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Naughton C, et al. Transcription forms and remodels supercoiling domains unfolding large-scale chromatin structures. Nat. Struct. Mol. Biol. 2013;20:387–395. doi: 10.1038/nsmb.2509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Krivoy A, et al. Primed CRISPR adaptation in Escherichia coli cells does not depend on conformational changes in the Cascade effector complex detected in Vitro. Nucleic Acids Res. 2018;46:4087–4098. doi: 10.1093/nar/gky219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Josephs EA, et al. Structure and specificity of the RNA-guided endonuclease Cas9 during DNA interrogation, target binding and cleavage. Nucleic Acids Res. 2015;43:8924–8941. doi: 10.1093/nar/gkv892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Machinek RRF, Ouldridge TE, Haley NEC, Bath J, Turberfield AJ. Programmable energy landscapes for kinetic control of DNA strand displacement. Nat. Commun. 2014;5:5324. doi: 10.1038/ncomms6324. [DOI] [PubMed] [Google Scholar]
- 41.Liu H, et al. Kinetics of RNA and RNA:DNA hybrid strand displacement. ACS Synth. Biol. 2021;10:3066–3073. doi: 10.1021/acssynbio.1c00336. [DOI] [PubMed] [Google Scholar]
- 42.Kemmerich FE, Kasaciunaite K, Seidel R. Modular magnetic tweezers for single-molecule characterizations of helicases. Methods. 2016;108:4–13. doi: 10.1016/j.ymeth.2016.07.004. [DOI] [PubMed] [Google Scholar]
- 43.Rutkauskas M, Krivoy A, Szczelkun MD, Rouillon C, Seidel R. Single-molecule insight into target recognition by CRISPR–Cas complexes. Methods Enzymol. 2017;582:239–273. doi: 10.1016/bs.mie.2016.10.001. [DOI] [PubMed] [Google Scholar]
- 44.Strick TR, Allemand JF, Bensimon D, Bensimon A, Croquette V. The elasticity of a single supercoiled DNA molecule. Science. 1996;271:1835–1837. doi: 10.1126/science.271.5257.1835. [DOI] [PubMed] [Google Scholar]
- 45.Brutzer H, Luzzietti N, Klaue D, Seidel R. Energetics at the DNA supercoiling transition. Biophys. J. 2010;98:1267–1276. doi: 10.1016/j.bpj.2009.12.4292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Brouns T, et al. Free energy landscape and dynamics of supercoiled DNA by high-speed atomic force microscopy. ACS Nano. 2018;12:11907–11916. doi: 10.1021/acsnano.8b06994. [DOI] [PubMed] [Google Scholar]
- 47.Higgins, N. P. & Vologodskii, A. V. Topological behavior of plasmid DNA. Microbiol. Spectr. 3, 1–25 (2015). [DOI] [PMC free article] [PubMed]
- 48.Mulepati S, Héroux A, Bailey S. Structural biology. Crystal structure of a CRISPR RNA-guided surveillance complex bound to a ssDNA target. Science. 2014;345:1479–1484. doi: 10.1126/science.1256996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Künne T, Swarts DC, Brouns SJJ. Planting the seed: target recognition of short guide RNAs. Trends Microbiol. 2014;22:74–83. doi: 10.1016/j.tim.2013.12.003. [DOI] [PubMed] [Google Scholar]
- 50.Hayes, R. P. et al. Structural basis for promiscuous PAM recognition in type I–E Cascade from E. coli. Nature 1–16 10.1038/nature16995 (2016). [DOI] [PMC free article] [PubMed]
- 51.Jiang F, et al. Structures of a CRISPR-Cas9 R-loop complex primed for DNA cleavage. Science. 2016;351:867–871. doi: 10.1126/science.aad8282. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Swarts DC, van der Oost J, Jinek M. Structural basis for guide RNA processing and seed-dependent DNA targeting by CRISPR-Cas12a. Mol. Cell. 2017;66:221–233.e4. doi: 10.1016/j.molcel.2017.03.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Jackson RN, et al. Structural biology. Crystal structure of the CRISPR RNA-guided surveillance complex from Escherichia coli. Science. 2014;345:1473–1479. doi: 10.1126/science.1256328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Dorman CJ. DNA supercoiling and transcription in bacteria: a two-way street. BMC Mol. Cell Biol. 2019;20:209. doi: 10.1186/s12860-019-0211-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.SantaLucia J, Hicks D. The thermodynamics of DNA structural motifs. Annu. Rev. Biophys. Biomol. Struct. 2004;33:415–440. doi: 10.1146/annurev.biophys.32.110601.141800. [DOI] [PubMed] [Google Scholar]
- 56.Ivanov IE, et al. Cas9 interrogates DNA in discrete steps modulated by mismatches and supercoiling. Proc. Natl Acad. Sci. USA. 2020;117:5853–5860. doi: 10.1073/pnas.1913445117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Chen JS, et al. Enhanced proofreading governs CRISPR-Cas9 targeting accuracy. Nature. 2017;550:407–410. doi: 10.1038/nature24268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Maffeo, C. et al. DNA-DNA interactions in tight supercoils are described by a small effective charge density. Phys. Rev. Lett. 105, (2010). [DOI] [PMC free article] [PubMed]
- 59.Luzzietti N, Knappe S, Richter I, Seidel R. Nicking enzyme-based internal labeling of DNA at multiple loci. Nat. Protoc. 2012;7:643–653. doi: 10.1038/nprot.2012.008. [DOI] [PubMed] [Google Scholar]
- 60.Sinkunas T, et al. In vitro reconstitution of Cascade-mediated CRISPR immunity in Streptococcus thermophilus. EMBO J. 2013;32:385–394. doi: 10.1038/emboj.2012.352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Klaue D, Seidel R. Torsional stiffness of single superparamagnetic microspheres in an external magnetic field. Phys. Rev. Lett. 2009;102:2016. doi: 10.1103/PhysRevLett.102.028302. [DOI] [PubMed] [Google Scholar]
- 62.Schwarz FW, et al. The helicase-like domains of type III restriction enzymes trigger long-range diffusion along DNA. Science. 2013;340:353–356. doi: 10.1126/science.1231122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Huhle A, et al. Camera-based three-dimensional real-time particle tracking at kHz rates and Ångström accuracy. Nat. Commun. 2015;6:2024. doi: 10.1038/ncomms6885. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Daldrop P, Brutzer H, Huhle A, Kauert DJ, Seidel R. Extending the range for force calibration in magnetic tweezers. Biophys. J. 2015;108:2550–2561. doi: 10.1016/j.bpj.2015.04.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Schöpflin R, Brutzer H, Müller O, Seidel R, Wedemann G. Probing the elasticity of DNA on short length scales by modeling supercoiling under tension. Biophys. J. 2012;103:323–330. doi: 10.1016/j.bpj.2012.05.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Bronson JE, Fei J, Hofman JM, Gonzalez RL, Wiggins CH. Learning rates and states from biophysical time series: a Bayesian approach to model selection and single-molecule FRET data. Biophys. J. 2009;97:3196–3205. doi: 10.1016/j.bpj.2009.09.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Ubbink J, Odijk T. Electrostatic-undulatory theory of plectonemically supercoiled DNA. Biophys. J. 1999;76:2502–2519. doi: 10.1016/S0006-3495(99)77405-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Kauert DJ, Kurth T, Liedl T, Seidel R. Direct mechanical measurements reveal the material properties of three-dimensional DNA origami. Nano Lett. 2011;11:5558–5563. doi: 10.1021/nl203503s. [DOI] [PubMed] [Google Scholar]
- 69.Lipfert J, Kerssemakers JWJ, Jager T, Dekker NH. Magnetic torque tweezers: measuring torsional stiffness in DNA and RecA-DNA filaments. Nat. Methods. 2010;7:977–980. doi: 10.1038/nmeth.1520. [DOI] [PubMed] [Google Scholar]
- 70.Bouchiat C, et al. Estimating the persistence length of a worm-like chain molecule from force-extension measurements. Biophys. J. 1999;76:409–413. doi: 10.1016/s0006-3495(99)77207-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The datasets generated during and/or analyzed during the current study are available from the corresponding author(s) upon request. Source data are provided with this paper. Minimal raw dataset required to reproduce data published in this paper is available at Zenodo database. Source data are provided with this paper.
The custom-made LabVIEW code for the analysis of magnetic tweezers data is available at Zenodo: 10.5281/zenodo.7328018.