Skip to main content
microLife logoLink to microLife
. 2025 Aug 21;6:uqaf020. doi: 10.1093/femsml/uqaf020

Analysis of tracrRNAs reveals subgroup V2 of type V-K CAST systems

Marcus Ziemann 1,3, Alexander Mitrofanov 2,3, Richard Stöckl 3, Omer S Alkhnbashi 4,5, Rolf Backofen 6,7,, Wolfgang R Hess 8,
PMCID: PMC12416283  PMID: 40927181

Abstract

Clustered regularly interspaced palindromic repeats (CRISPR)-associated transposons (CAST) consist of an integration between certain class 1 or class 2 CRISPR-Cas systems and Tn7-like transposons. Class 2 type V-K CAST systems are restricted to cyanobacteria. Here, we identified a unique subgroup of type V-K systems through phylogenetic analysis, classified as V-K_V2. Subgroup V-K_V2 CAST systems are characterized by an alternative tracrRNA, the exclusive use of Arc_2-type transcriptional regulators, and distinct differences in the length of protein domains in TnsB and TnsC. Although the occurrence of V-K_V2 CAST systems is restricted to Nostocales cyanobacteria, it shows signs of horizontal gene transfer, indicating its capability for genetic mobility. The predicted V-K_V2 tracrRNA secondary structure has been integrated into an updated version of the CRISPRtracrRNA program available on GitHub under https://github.com/BackofenLab/CRISPRtracrRNA/releases/tag/2.0.

Keywords: CRISPR, CRISPR-associated transposons, cyanobacteria, tracrRNA, transcriptional regulator


We report the identification of a novel subgroup of CRISPR-associated transposons in cyanobacteria that uses an alternative tracrRNA and present an updated version of the CRISPRtracrRNA algorithm to detect tracrRNAs.

Introduction

Clustered regularly interspaced palindromic repeats and associated protein-coding genes (CRISPR-Cas systems) are adaptive immune systems in bacteria and archaea and the basis for the development of various genome editing tools (Hille et al. 2018, Makarova et al. 2020, Bharathkumar et al. 2022, Liu et al. 2022). Naturally occurring CRISPR-Cas systems act in three steps: adaptation, processing, and interference (Makarova et al. 2015). The first step is initiated by contact with foreign nucleic acids, like phage DNA or RNA, or a plasmid. If the cell survives this attack, it can derive and integrate a short DNA fragment (spacer) from the invading nucleic acid in its genome. The fragments are located in an array consisting of spacers and short palindromic repeats (CRISPR array). In the second step, the CRISPR arrays are expressed as a long transcript, the pre-crRNA. This RNA will form specific hairpin structures based on the palindromic nature of the repeat sequences, which can be recognized by CRISPR-associated (Cas) proteins. The pre-crRNA is then processed into shorter CRISPR RNAs (crRNAs), which then form, together with other Cas proteins, the CRISPR-Cas interference complex. This complex is used, in the last step, to interact with DNA or RNA by base pairing. If the complex recognizes a sequence of sufficient similarity to the spacer, such as in the course of another phage attack, it cuts the target DNA or transcribed RNA and therefore protects the cell from infection.

CRISPR-Cas systems are widespread in most bacteria and archaea, but they do not share a common gene set or structure (Makarova et al. 2015, 2020). The systems are mainly classified into two classes: class 1, with multiple proteins forming the CRISPR-Cas effector complex, and class 2, using only a single effector complex protein. Both can be further divided into six types and over 30 different subtypes (Makarova et al. 2020). These subtypes differ in interference complex structure, composition, and the nucleic acid type that is targeted (DNA or RNA). In this study, we focus on the subtype V-K, a class 2 CRISPR-Cas system integrated with a transposon (Strecker et al. 2019, Rybarski et al. 2021, Ziemann et al. 2023). This CRISPR-associated transposon (CAST) exists exclusively in cyanobacteria and is associated with the transposase genes tnsB, tnsC, and tniQ. These transposases facilitate genetic mobility by targeting common genetic elements, like tRNA genes using the CRISPR-Cas interference complex for the targeting mechanism (Koonin and Makarova 2024). The subtype V-K CRISPR-Cas components are minimal, with Cas12k as the only interference protein, a rather short CRISPR array (∼4 spacers per array) and a trans-activating crRNA (tracrRNA) forming the CRISPR-Cas complex. Recently, another related family of Tn7-like transposons was described that targets CRISPR arrays instead of tRNA genes. Similar to type V-K, these systems are restricted to certain cyanobacteria (Chacon Machado and Peters 2025).

The tracrRNA is a common element of class 2 CRISPR-Cas systems and is an adaptation unit between the crRNA and the Cas protein (Deltcheva et al. 2011). The crRNA can bind with its repeat region a specific anti-repeat region (AR) on the tracrRNA via base-pairing, while the Cas-protein binds the tracrRNA. In V-K systems, the location of these tracrRNA genes is also remarkably conserved downstream of cas12k and upstream of the typical type V-K CRISPR array, transcribed in the same direction (Strecker et al. 2019, Saito et al. 2021, Ziemann et al. 2023). The structure of type V-K tracrRNAs was established by analyses of the Scytonema hofmanni CAST system (Xiao et al. 2021, Schmitz et al. 2022). These analyses revealed important RNA secondary structures such as the stem-loops P1-8, a pseudoknot, and repeat-anti-repeat duplexes (Xiao et al. 2021) and provided a general three-dimensional structure for these types of tracrRNA. Most importantly, they showed the interaction with crRNA, which is here not facilitated by one, but two ARs. The first region is part of an RNA-triplet complex inside the second stem-loop (P2) (Schmitz et al. 2022) and the second is at the 3-end of the tracrRNA, ∼120 nt apart from each other. This is rather unusual, but also occurs in other type V tracrRNAs, like in V-B, V-F1, and V-G systems (Liao and Beisel 2021).

In previous studies, we compared known tracrRNA structures of this system (Xiao et al. 2021, Schmitz et al. 2022) to our database of CAST systems in order to develop an algorithm, called CRISPRtracrRNA, for the detection of these RNA genes (Mitrofanov et al. 2022). Here, we extend this analysis. We have developed CRISPRtracrRNA version 2 (v.2) and were able to find an alternative tracrRNA, which appears as a structural derivative of the previously established version 1 tracrRNA (V1) and occurs only in a phylogenetically distinct group of CAST systems. This, as well as the exclusive association with a gene encoding a particular type of regulator, indicates that this constitutes an isolated subgroup of CAST systems (type V-K_V2).

Materials and methods

Cultivation of cyanobacteria and RNA extraction

The strain Scytonema sp. NIES-4073 was obtained from the NIES-collection (Microbial Culture Collection at the National Institute for Environmental Studies) (Tanabe et al. 2024) and grown in liquid MDM-medium (Watanabe 1960), under permanent white light illumination of 10–15 µmol photons m−2 s−1 at 20°C. The cultivation was done exclusively in cell culture flasks without shaking.

Scytonema sp. NIES-4073 forms thick cell clusters in liquid medium, which made a measurement of optical density impossible. 500-ml flasks were filled with 150 ml MDM medium and inoculated. Fresh medium was added in regular intervals. After three months, cells from 500 ml cultures were collected by filtration through 47 mm Supor 800 membranes with 0.8 µm pores . After resuspension in 300 µl MDM medium (Watanabe 1960), the cells were transferred to screw cap tubes, together with 500 µl glass beads (250 l µ0.1–0.25 µm in diameter and 250 µl 0.25–0.5 µm in diameter). Then, 500 µl PGTX (Pinto et al. 2009) was added, and the tubes were frozen in liquid nitrogen. Cell disruption (3 cycles 30 Hz for 10 min with short breaks in between) with mixer mill MM400 (Retsch) was performed at 4°C. Samples were separated from the beads and incubated for 30 min at 65°C in a water bath, one volume of chloroform:isoamyl alcohol (24:1) was added, and samples were incubated for 10 min at room temperature with several vortexing cycles. After centrifugation for 3 min at 3250 g in a swing-out rotor, the supernatant was transferred to a fresh tube, and one volume of chloroform:isoamyl alcohol was added. This step was repeated twice. RNA was precipitated by the addition of one volume of isopropanol.

Eight µg of total RNA per sample were analyzed by northern hybridization using single-stranded radioactively labeled RNA probes as described (Ziemann et al. 2023). The probes were generated in vitro from PCR-amplified templates (see Supplementary Table S1 for primers).

Phylogenetic analysis of Cas12k and the transposases

The identified Cas12k, TnsB, TnsC, and TniQ proteins were compared to each other and analyzed for potentially incorrectly annotated start positions, as described (Ziemann et al. 2023). Corrected sequences deviating from the original annotations are noted in Dataset 3. Proteins with truncated sequences were removed from the analysis. The sequences were aligned by M-coffee (Notredame et al. 2000, Di Tommaso et al. 2011) and further analyzed by the BEAST algorithm (v2.7.7) (Suchard et al. 2018, Bouckaert et al. 2019). The phylogenetic analyses were calculated with a strict clock model and Yule model as tree prior, using the standard parameters [birthRate.t: Uniform(0,0, Infinity); initial = (1.0)(0.0, ∞)]. The MCMC chain was sampled for TniQ and Cas12k every 1000 steps over 5 million generations (Yule 1925, Henikoff and Henikoff 1992, Gernhard 2008) and every 1000 steps over 1 million generations for the analysis of TnsC and TnsB. In order to ensure a reasonable, effective sample size (ESS; threshold at 200), the results were analyzed by the Tracer v 1.7.2 tool (Rambaut et al. 2018) and the first 10% of trees were discarded from the analysis. The trees were then generated by the tree annotator tool (from the BEAST package).

tracrRNA predictions

Homologous sequences (see Dataset 1 for the precise sequences) between cas12k and the second repeat of the CRISPR array were aligned by M-coffee (Notredame et al. 2000, Di Tommaso et al. 2011) and used as input for the web tool RNAalishapes from the shapes studio server (Janssen and Giegerich 2015). For the analysis of both structures, the sample function was used under default conditions. The resulting structures were then analyzed to identify nucleotide binding pairs that exist in more than 50% of all predicted structures. This analysis was done with 19 V2 sequences and 10 selected V1 systems for comparison (Fig. 3 and Datasets 1 and 5). The resulting structures were then compared to the known V1 tracrRNA structures and stem-loops. For the model shown in Fig. 3, gaps were removed from the structure to simplify the visualization. The complete structures are available in Fig. S5 and Datasets 6 and 7. The specific detected stem-loops were also aligned and compared to each other (Fig. S7) using the msa-package from R and the webtool weblogo (Schneider and Stephens 1990, Crooks et al. 2004, Bodenhofer et al. 2015).

Figure 3.

Figure 3.

Predicted shapes of tracrRNAs V1 and V2. The individual stem-loops are marked with different colors, and ARs are marked in yellow. The structures were established by shapes studio (Janssen and Giegerich 2015) and based on the alignments of 10 selected V1 systems and 19 V2 systems (see Fig. 2 and Datasets 5–7). Gaps were deleted from the structure. The consensus sequence of the respective crRNA repeat was added, and the tracrRNA-bound segments are marked in orange. Blue lines indicate potential interactions between individual nucleotides. The figure was drawn with RNAcanvas (Johnson and Simon 2023). See Fig. S4 for the V2 sequence-structure alignment, Fig. S5 for a linear visualization, Fig. S6 for the V1 sequence-structure alignment, and Fig. S7 for a sequence logo of the compared structural elements.

Results and discussion

Alternative tracrRNA of CAST type V-K

In our previously established database of CAST systems (Ziemann et al. 2023), tracrRNAs were predicted in ∼77% of CAST systems. These tracrRNAs were predicted to be transcribed from loci located downstream of cas12k genes, in the same direction and upstream of the CRISPR array (if an array was present). These sequences were used as input to develop the CRISPRtracrRNA algorithm (Mitrofanov et al. 2022).

However, the algorithm occasionally detected partial tracrRNAs in CAST systems without previously established tracrRNAs, at the expected position downstream of cas12k. These predicted tracrRNAs could represent potentially truncated forms, but the respective sequences between the cas12k genes and the CRISPR array are highly conserved (Fig. 1B). Previous genome-wide mapping of transcription start sites in Anabaena sp. PCC 7120 (Mitschke et al. 2011) indicated transcription of a putative tracrRNA starting from position 3284143f on the forward strand (GenBank accession BA000019.2). This particular tracrRNA seemed to be degraded by the integration of an IS5 family, group IS1031 transposable element (Wolk et al. 2010). However, sequence comparison showed that the associated promoter region is conserved (Fig. 1B). The respective transcription start site was assigned as the start of the alternative tracrRNA, here called tracrRNA_V2. The interrupting transposon is located between positions 3 284 212 and 3 285 122 and the corresponding transposase is encoded by genes all2692 and all2693. Although the V-K_V1 system in Anabaena sp. PCC 7120 is adjacent to a copy of this transposable element (alr3610 and alr3611), a functional connection to the CRISPR-Cas system V-K seems unlikely, given the fact that overall a total of eight copies of this transposon exist in this organism (Wolk et al. 2010).

Figure 1.

Figure 1.

Gene arrangement in type V-K CAST systems and the conservation of V2 tracrRNAs. (A) The gene arrangement shows the CAST system from the right (RE) to the left end (LE). The repeat-spacer array is drawn in gray and yellow, other colors indicate gene functions (red, transposase genes; rose, CAST regulators; dark purple, cas12k effectors; yellow, tRNAs and tracrRNAs). The scheme is not drawn to scale. (B) Sequence logo based on an alignment of the areas between cas12k and the CRISPR array, based on 19 systems with V2 tracrRNAs and the previously established promoter sequence of Anabaena sp. PCC 7120 (Mitschke et al. 2011). Conserved promoter elements are boxed in blue (−35 element), green (−10 element), and gray (transcription start site). The individual sequences can be found in Dataset 1, and the multiple sequence alignment is provided in Dataset 2. The transcription start site (position +1) was experimentally mapped in Anabaena (Nostoc) sp. PCC 7120 (Mitschke et al. 2011) and taken as reference for the other sequences.

An additional search for the cas12k-gene and tracrRNA-like sequences resulted in the identification of ten additional CAST systems with tracrRNA_V2, which increased our dataset to 128 type V-K CAST systems in total (Datasets 3 and 4). Further analysis showed that these V2 systems were all closely related. They are exclusively associated with regulatory genes encoding the Arc_2 regulators (Ziemann et al. 2023), and the phylogenetic analysis showed their relatedness in a single, coherent clade (Dataset 3 and Fig. 2). Phylogenetic analyses of the associated transposases yielded a similar tight clade (Figs. S1–S3), further suggesting that these CAST systems are genetically closely related and distinct from other systems.

Figure 2.

Figure 2.

Phylogenetic tree of CAST systems based on Cas12k sequences, labeled by the respective strain names. The distinct group of type V-K_2 CAST systems is shaded. A selected group of V1 CAST systems are marked with asterisks. These systems were later used for comparisons with V2 systems. Numbers at branches indicate respective posterior probabilities. The respective associated regulators (AR), taxonomic order (O), and tracrRNA types (TV) are indicated as given. The protein sequences were aligned with M-coffee (Di Tommaso et al. 2011) and analyzed by BEAST (Suchard et al. 2018). For the multiple sequence alignment of Cas12k proteins of all V2 and 10 selected V1 CAST systems (here marked with asterisks), see Fig. S9.

Based on these evolutionary connections, the 21 V2-associated systems were further analyzed. The DNA fragments between the cas12k gene and the end of the CRISPR array were aligned and used for an analysis by shapes studios (Janssen and Giegerich 2015), a webtool application to predict possible RNA structures out of sequence alignments. For this, only 19 of these sequences were used because the other two showed signs of degradation (Dataset 1 and Fig. 1B). The potential structures or shapes from this analysis (Dataset 6) were then investigated to find the most common base pairings that existed in over 50% of the structures. The construction of a corresponding sequence-structure alignment, calculation of base pair compatibilities, and identification of compensatory base substitutions analogous to the locARNA algorithm (Will et al. 2012) supported the prediction of the V2 tracrRNA secondary structure (Figs. 3 and S5). Based on this analysis, two base pairings were removed from the structure because the corresponding bases were incompatible in more than two sequences (Fig. S4). For comparison, the same method was used with ten other sequences from V1 systems, which were chosen based on the phylogeny of Cas12k, and included CAST systems from Anabaena sp. PCC 7120 and S. hofmanni (Figs. 2, S5, and S6 and Datasets 5 and 7).

The predictions indicate that seven of the eight stem-loops typical for type V-K V1 tracrRNAs (Xiao et al. 2021) exist in V-K_V2 tracrRNAs as well (Figs. 3 and S7). Clear sequence and structural similarities exist for stem-loops P2–P7, while the P1 structure differs. Remarkably, we identified four additional stem-loops that appear to be inserted between P5 and P7 (X1–4; Figs. 3 and S7) in the V2 tracrRNA relative to the V1 tracrRNA.

The program was not able to predict a binding between the first anti-repeat and the repeat sequence, neither in the V1 nor the V2 system. This was expected given that the binding is facilitated by a central triplex structure, which is too complex to be predicted with this method (Schmitz et al. 2022). More importantly, the V2 tracrRNA includes the same anti-repeat sequence and structure necessary for this tracrRNA–crRNA interaction (anti-repeat 1) (Schmitz et al. 2022). Therefore, it can be assumed to bind the crRNA in the same manner as in V1 systems. However, the typical second anti-repeat area (AACCCnCCC) is missing in V2. Instead, the structure prediction of shapes studio analysis and sequence comparisons indicated a different anti-repeat 2 right after the P4 stem-loop (AACUACCACCCCC) (Figs. 3, S5, and S7 and Dataset 6). This difference in the anti-repeat can also be seen in the CRISPR array repeats. The 3′-end, which in V1 binds the tracrRNA, changed from 5′-wkrGGyGGGTTGAAAG-3′ to 5′-GGTGGTGGGTTGAAAG-3′ in V2 (Fig. S7). The tracrRNA-crRNA-binding in V2 seems to be established by ten nucleotides at the 3'-end with three variable, mismatching nucleotides in between, similar to the interaction in tracrRNA V1.

In vivo proof of tracrRNA_V2 of CAST type V-K

To prove the existence of the tracrRNA_V2, we cultivated Scytonema sp. NIES-4073, which contains a complete CAST system with an Arc_2 repressor and a potential V2 tracrRNA. The previously used model organism Anabaena sp. PCC 7120 also contains a type V-K_V2 system. However, the tracrRNA was degraded by the insertion of a transposon inside its gene and, therefore, was insufficient for this analysis. The Scytonema sp. NIES-4073 strain was chosen because this CAST system showed no sign of degradation and because of its availability from the NIES strain collection. After 3 months of cultivation, RNA was extracted and used for northern blots against the tracrRNA and the CRISPR array. Several bands were detected for both transcripts, indicating their expression and maturation (Fig. 4). Two prominent RNA fragments were detected in the northern blot against the tracrRNA probe at 450 nt and 700 nt. The 450-nt fragment is likely the mature tracrRNA. Interestingly, a 700-nt band was also observed in the northern blot of the CRISPR array, which indicates that tracrRNA and the CRISPR array are transcribed into a joint precursor.

Figure 4.

Figure 4.

Experimental detection of the V2 tracrRNA. (A) Location of the tracrRNA and CRISPR array in Scytonema sp. NIES-4073 with marked positions of northern blot probes and transcription initiation site (bend arrow). The figure is not drawn to scale. (B) Northern blot hybridization against probes for tracrRNA and the CRISPR array.

Phylogenetic analysis of CAST systems

To better understand the characteristics of the type V-K_V2 CAST systems described here, phylogenetic analyses were performed for the most prominent CAST proteins, TnsB, TnsC, TniQ (Figs. S1, S2, and S3), and Cas12k (Fig. 2). All four phylogenetic trees placed systems with a V2 tracrRNA into a distinct group among all CAST systems. The analysis of the V2 Cas12k proteins showed very few structural differences compared to other Cas12k proteins; however, the sequence identity of >60.2% distinguishes them from other Cas12k proteins. In comparison, the amino acid sequence identity between V1 and V2 Cas12K proteins is ≤47.5% (Figs. 2  and S9). This contrasts findings for TnsB (shared sequence identity in V2 ≥66.0% and ≤43.1% between V1 and V2), where V2 systems showed a difference in domain lengths compared to the homologs from V1 systems. Based on the MSA and predictions by Alphafold 3 (Abramson et al. 2024), the specific DNA-binding region Iγ is 20 amino acids larger, forming a plateau of three additional helices (Park et al. 2022) (Fig. S8 and Datasets 8 and 9). This change enlarges the domain significantly, which is known to bind the IS elements of the CAST transposon during the transposition (Park et al. 2022).

However, the Alphafold prediction showed no interaction of the extra region with DNA (Fig. S8 and Dataset 9). Moreover, the long terminal repeats (LTRs) of V1 and V2 systems are of similar length (Fig. 5B). Interestingly, the alignments also showed unique consensus sequences for V2 CAST systems with higher levels of conservation, consistent with the higher conservation of TnsB. In addition, the V2 systems also showed a very consistent insertion distance to their anchor protospacer (Saito et al. 2021, Ziemann et al. 2023). The distance between the PAM sequence and the LE element is 54 nt, shorter than the average distance of 62 nt in V1 type V-K systems, which was also predicted for the system in S. hofmanni (Strecker et al. 2019) (Fig. 5A).

Figure 5.

Figure 5.

Comparison between V-K_V1 and V2 systems in insertion distance and IS elements. (A) Distance between the PAM sequence of the anchor-protospacer and the LE element in V1 and V2 systems in comparison. The dashed lines show the average distances. Note that there is one outlier (V1) with a distance of 146 nt that was not included. (B) Sequence logo based on alignments of identified IS elements of selected V1 and all V2 CAST systems. Ten related V1 systems were used (see strains marked by asterisks in Fig. 2 and in Dataset 3). The LTRs and short repeats (SR) of the sequences are boxed in orange (LTR) and blue (SR) boxes.

The other V2 transposases showed fewer differences than their V1 homologs. The V2 TnsC showed distinct prolongations of the C-terminal and N-terminal domains compared to V1 TnsC (shared sequence identity in V2 ≥79.1% and ≤33.8% between V1 and V2). The N-terminal domain is important for the interaction with TniQ, and the C-terminal domain interacts with TnsB in the transposition complex (Tenjo-Castaño et al. 2024). For this reason, V2 TnsC proteins are also significantly larger (306 aa) than V1 TnsC (278 aa); however, this does not seem to be related to an additional domain (Fig. S9). TniQ V2 is highly conserved and lacks differences in domain arrangements between V1 and V2 V-K systems (shared sequence identity in V2 ≥84.1% and ≤61.1% between V1 and V2) (Fig. S9).

In general, all phylogenetic trees indicated that CAST systems spread by horizontal gene transfer. Closely related cas12k and transposase genes exist in hosts belonging to different orders of cyanobacteria (Figs. 2, S1, S2, and S3). Moreover, six species could be identified that simultaneously harbor both V1 and V2 systems. The variation among V2 system hosts is lower, shown by the fact that it exists exclusively in Nostocales. Nevertheless, the system can be found in six different families and eight different genera (Dataset 3), and the phylogenetic trees of the CAST genes show no correlation between host taxonomy and gene phylogeny (Figs. 2, S1, S2, and S3). Indicating that the V2 system also spreads via horizontal gene transfer.

An updated version of the CRISPRtracrRNA algorithm

The CRISPRtracrRNA algorithm was originally designed to facilitate the efficient identification of tracrRNA sequences in genomes containing type II and type V CRISPR-Cas systems (Mitrofanov et al. 2022). CRISPRtracrRNA integrates multiple methodologies to robustly detect tracrRNA sequences. These methodologies include CRISPRidentify (Mitrofanov et al. 2021), which is used for initial CRISPR array identification. Repeat sequences from the detected array are then used by the CRISPRtracrRNA method to perform a search for the anti-repeat part of the tracrRNA candidates. CRISPRtracrRNA utilizes CRISPRcasIdentifier (Padilha et al. 2020) for the classification of associated cas genes and therefore the type of the tracrRNA. On top of that, the algorithm performs a search for the Rho-independent terminator sequences. Lastly, the method utilizes sequence-structure analysis using a pre-trained covariance model to detect potential similarities with the verified sequences and reduce the false positive rate. The accuracy of the covariance model is highly dependent on both the quality and size of the training dataset. The new data allowed us to improve and update the Type V covariance model for the CRISPRtracrRNA.

Previously, the type V covariance model was trained on 91 sequences. To enhance its quality, we incorporated 19 additional novel sequences into the dataset. Of these, 10 sequences were added to the training set, while the remaining 9 were retained as a test set to validate model performance. Model training was conducted using the GraphClust2 Galaxy pipeline (Afgan et al. 2018, Miladi et al. 2019, Galaxy Community 2024), consistent with our prior methodology.

Upon training the updated model, we observed the emergence of four distinct clusters, compared to the two clusters in the previous version. This prompted a thorough performance evaluation of the new model. The calibration of the covariance model was carried out using the cmcalibrate module of the Infernal tool (Nawrocki and Eddy 2013). This calibration process is critical for optimizing model parameters, including E-value thresholds, which are essential for assessing statistical significance in sequence search results.

Following calibration, we evaluated the model's performance on both the training and test datasets using cmscan module of the Infernal tool (Nawrocki and Eddy 2013). The results demonstrated a significant improvement in performance on the test set, without any compromise in training set performance (Fig. S10). Additionally, a comparative analysis of sequence coverage between the updated and previous models revealed that the increased number of clusters in the new model provided substantially improved coverage across both the training and test datasets (Fig. S11).

The updated model has been integrated into the CRISPRtracrRNA pipeline, now released as CRISPRtracrRNA 2.0. In this release, we also resolved compatibility issues with the Conda environment that had accumulated over time. The updated tool is now available on our GitHub repository, providing a robust and enhanced framework for tracrRNA sequence identification.

Conclusion

Cyanobacteria are a rich source of CRISPR-Cas systems (Scholz et al. 2013, Behler et al. 2018, Hou et al. 2019, McBride et al. 2020, Reimann et al. 2020). Among these systems are the type V-K CAST systems that are unique to Nostocales cyanobacteria (Strecker et al. 2019, Rybarski et al. 2021, Ziemann et al. 2023). Here, we describe type V-K_V2 systems, which are set apart by their tracrRNAs, the associated transcriptional regulator family, and distinct features of the associated Cas12k, TnsB, and TnsC proteins. The newly identified V2 tracrRNA sequences were incorporated into the training dataset for the CRISPRtracrRNA algorithm, which has been extended and updated accordingly.

Supplementary Material

uqaf020_Supplemental_Files

Acknowledgments

We thank Viktoria Reimann for help with the cultivation and Dr. Brenes-Álvarez (both Freiburg) for help with the microscopy of Scytonema sp. NIES-4073. We thank Michael Schmitz, Seraina Oberli, and Martin Jinek (all Zurich) for helpful discussions of V-K_V2 CAST systems.

Contributor Information

Marcus Ziemann, Faculty of Biology, Genetics and Experimental Bioinformatics, University of Freiburg, D-79104 Freiburg, Germany.

Alexander Mitrofanov, Department of Computer Science, Bioinformatics Group, University of Freiburg, D-79110 Freiburg, Germany.

Richard Stöckl, Institute of Microbiology and Archaea Centre, University of Regensburg, D-93053 Regensburg, Germany.

Omer S Alkhnbashi, Center for Applied and Translational Genomics (CATG), Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai Health, Dubai P.O. Box 505055, United Arab Emirates; College of Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai Health, Dubai P.O. Box 505055, United Arab Emirates.

Rolf Backofen, Department of Computer Science, Bioinformatics Group, University of Freiburg, D-79110 Freiburg, Germany; Signalling Research Centres BIOSS and CIBSS, University of Freiburg, D-79104 Freiburg, Germany.

Wolfgang R Hess, Faculty of Biology, Genetics and Experimental Bioinformatics, University of Freiburg, D-79104 Freiburg, Germany.

Author contributions

WRH and RB designed the project and secured funding. MZ carried out all experimental and phylogenetic analyses and predicted tracrRNA secondary structures. AM, RS, OSA and RB developed CRISPRtracrRNA v.2. Data were analyzed by MZ, AM and WRH; MZ, AM and WRH wrote the manuscript with input from all authors.

Conflict of interest

None declared.

Funding

German Research Foundation (Deutsche Forschungsgemeinschaft), grants BA 2168/23-1/2 and HE 2544/14-1/2 within the priority program SPP 2141 “Much more than Defence: the Multiple Functions and Facets of CRISPR-Cas” and by SFB 1597 (Project-ID 499552394). We thank the University of Freiburg for funding for open access charges.

References

  1. Abramson  J, Adler  J, Dunger  J  et al.  Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024;630:493–500. 10.1038/s41586-024-07487-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Afgan  E, Baker  D, Batut  B  et al.  The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res. 2018;46:W537–44. 10.1093/nar/gky379. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Behler  J, Sharma  K, Reimann  V  et al.  The host-encoded RNase E endonuclease as the crRNA maturation enzyme in a CRISPR–Cas subtype III-bv system. Nat Microbiol. 2018;3:367–77. 10.1038/s41564-017-0103-5. [DOI] [PubMed] [Google Scholar]
  4. Bharathkumar  N, Sunil  A, Meera  P  et al.  CRISPR/Cas-based modifications for therapeutic applications: a review. Mol Biotechnol. 2022;64:355–72. 10.1007/s12033-021-00422-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bodenhofer  U, Bonatesta  E, Horejš-Kainrath  C  et al.  msa: an R package for multiple sequence alignment. Bioinformatics. 2015;31:3997–9. 10.1093/bioinformatics/btv494. [DOI] [PubMed] [Google Scholar]
  6. Bouckaert  R, Vaughan  TG, Barido-Sottani  J  et al.  BEAST 2.5: an advanced software platform for Bayesian evolutionary analysis. Pertea M (ed.). PLoS Comput Biol. 2019;15:e1006650. 10.1371/journal.pcbi.1006650. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Chacon Machado  L, Peters  JE. A family of Tn7-like transposons evolved to target CRISPR repeats. Mobile DNA. 2025;16:5. 10.1186/s13100-025-00344-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Crooks  GE, Hon  G, Chandonia  J-M  et al.  WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–90. 10.1101/gr.849004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Deltcheva  E, Chylinski  K, Sharma  CM  et al.  CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III. Nature. 2011;471:602–7. 10.1038/nature09886. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Di Tommaso  P, Moretti  S, Xenarios  I  et al.  T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension. Nucleic Acids Res. 2011;39:W13–7. 10.1093/nar/gkr245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Galaxy Community . The Galaxy platform for accessible, reproducible, and collaborative data analyses: 2024 update. Nucleic Acids Res. 2024;52:W83–94. 10.1093/nar/gkae410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Gernhard  T. The conditioned reconstructed process. J Theor Biol. 2008;253:769–78. 10.1016/j.jtbi.2008.04.005. [DOI] [PubMed] [Google Scholar]
  13. Henikoff  S, Henikoff  JG. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA. 1992;89:10915–9. 10.1073/pnas.89.22.10915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Hille  F, Richter  H, Wong  SP  et al.  The biology of CRISPR-Cas: backward and forward. Cell. 2018;172:1239–59. 10.1016/j.cell.2017.11.032. [DOI] [PubMed] [Google Scholar]
  15. Hou  S, Brenes-Álvarez  M, Reimann  V  et al.  CRISPR-Cas systems in multicellular cyanobacteria. RNA Biol. 2019;16:518–29. 10.1080/15476286.2018.1493330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Janssen  S, Giegerich  R. The RNA shapes studio. Bioinformatics. 2015;31:423–5. 10.1093/bioinformatics/btu649. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Johnson  PZ, Simon  AE. RNAcanvas: interactive drawing and exploration of nucleic acid structures. Nucleic Acids Res. 2023;51:W501–8. 10.1093/nar/gkad302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Koonin  EV, Makarova  KS. CRISPR in mobile genetic elements: counter-defense, inter-element competition and RNA-guided transposition. BMC Biol. 2024;22:295. 10.1186/s12915-024-02090-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Liao  C, Beisel  CL. The tracrRNA in CRISPR biology and technologies. Ann Rev Gen. 2021;55:161–81. 10.1146/annurev-genet-071719-022559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Liu  G, Lin  Q, Jin  S  et al.  The CRISPR-Cas toolbox and gene editing technologies. Mol Cell. 2022;82:333–47. 10.1016/j.molcel.2021.12.002. [DOI] [PubMed] [Google Scholar]
  21. Makarova  KS, Wolf  YI, Alkhnbashi  OS  et al.  An updated evolutionary classification of CRISPR-Cas systems. Nat Rev Micro. 2015;13:722–36. 10.1038/nrmicro3569. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Makarova  KS, Wolf  YI, Iranzo  J  et al.  Evolutionary classification of CRISPR-Cas systems: a burst of class 2 and derived variants. Nat Rev Micro. 2020;18:67–83. 10.1038/s41579-019-0299-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. McBride  TM, Schwartz  EA, Kumar  A  et al.  Diverse CRISPR-Cas complexes require independent translation of small and large subunits from a single gene. Mol Cell. 2020;80:971–9. 10.1016/j.molcel.2020.11.003. [DOI] [PubMed] [Google Scholar]
  24. Miladi  M, Sokhoyan  E, Houwaart  T  et al.  GraphClust2: annotation and discovery of structured RNAs with scalable and accessible integrative clustering. Gigascience. 2019;8:giz150. 10.1093/gigascience/giz150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Mitrofanov  A, Alkhnbashi  OS, Shmakov  SA  et al.  CRISPRidentify: identification of CRISPR arrays using machine learning approach. Nucleic Acids Res. 2021;49:e20. 10.1093/nar/gkaa1158. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Mitrofanov  A, Ziemann  M, Alkhnbashi  OS  et al.  CRISPRtracrRNA: robust approach for CRISPR tracrRNA detection. Bioinformatics. 2022;38:ii42–8. 10.1093/bioinformatics/btac466. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Mitschke  J, Vioque  A, Haas  F  et al.  Dynamics of transcriptional start site selection during nitrogen stress-induced cell differentiation in Anabaena sp. PCC7120. Proc Natl Acad Sci USA. 2011;108:20130–5. 10.1073/pnas.1112724108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Nawrocki  EP, Eddy  SR  Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013;29:2933–5. 10.1093/bioinformatics/btt509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Notredame  C, Higgins  DG, Heringa  J. T-coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000;302:205–17. 10.1006/jmbi.2000.4042. [DOI] [PubMed] [Google Scholar]
  30. Padilha  VA, Alkhnbashi  OS, Shah  SA  et al.  CRISPRcasIdentifier: machine learning for accurate identification and classification of CRISPR-Cas systems. Gigascience. 2020;9:giaa062. 10.1093/gigascience/giaa062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Park  J-U, Tsai  AW-L, Chen  TH  et al.  Mechanistic details of CRISPR-associated transposon recruitment and integration revealed by cryo-EM. Proc Natl Acad Sci USA. 2022;119:e2202590119. 10.1073/pnas.2202590119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Pinto  F, Thapper  A, Sontheim  W  et al.  Analysis of current and alternative phenol based RNA extraction methodologies for cyanobacteria. BMC Mol Biol. 2009;10:79. 10.1186/1471-2199-10-79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Rambaut  A, Drummond  AJ, Xie  D  et al.  Posterior summarization in Bayesian phylogenetics using tracer 1.7. Susko E (ed.). Syst Biol. 2018;67:901–4. 10.1093/sysbio/syy032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Reimann  V, Ziemann  M, Li  H  et al.  Specificities and functional coordination between the two Cas6 maturation endonucleases in Anabaena sp. PCC 7120 assign orphan CRISPR arrays to three groups. RNA Biol. 2020;17:1442–53. 10.1080/15476286.2020.1774197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Rybarski  JR, Hu  K, Hill  AM  et al.  Metagenomic discovery of CRISPR-associated transposons. Proc Natl Acad Sci USA. 2021;118:e2112279118. 10.1073/pnas.2112279118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Saito  M, Ladha  A, Strecker  J  et al.  Dual modes of CRISPR-associated transposon homing. Cell. 2021;184:2441–2453.e18. 10.1016/j.cell.2021.03.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Schmitz  M, Querques  I, Oberli  S  et al.  Structural basis for the assembly of the type V CRISPR-associated transposon complex. Cell. 2022;185:4999–5010. 10.1016/j.cell.2022.11.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Schneider  TD, Stephens  RM. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 1990;18:6097–100. 10.1093/nar/18.20.6097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Scholz  I, Lange  SJ, Hein  S  et al.  CRISPR-Cas systems in the cyanobacterium Synechocystis sp. PCC6803 exhibit distinct processing pathways involving at least two Cas6 and a Cmr2 protein. PLoS One. 2013;8:e56470. 10.1371/journal.pone.0056470. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Strecker  J, Ladha  A, Gardner  Z  et al.  RNA-guided DNA insertion with CRISPR-associated transposases. Science. 2019;365:48–53. 10.1126/science.aax9181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Suchard  MA, Lemey  P, Baele  G  et al.  Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 2018;4:vey016. 10.1093/ve/vey016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Tanabe  Y, Ishimoto  M, Totsu  K, Kawachi  M. Microbial Culture Collection, National Institute for Environmental Studies. Dataset 5.14. National Institute for Environmental Studies, 2024. 10.15468/8rml10. [DOI] [Google Scholar]
  43. Tenjo-Castaño  F, Sofos  N, Stutzke  LS  et al.  Conformational landscape of the type V-K CRISPR-associated transposon integration assembly. Mol Cell. 2024;84:2353–2367.e5. 10.1016/j.molcel.2024.05.005. [DOI] [PubMed] [Google Scholar]
  44. Watanabe  A. List of algal strains in collection at the Institute of Applied Microbiology, University of Tokyo. J Gen Appl Microbiol. 1960;6:283–92. 10.2323/jgam.6.283. [DOI] [Google Scholar]
  45. Will  S, Joshi  T, Hofacker  IL  et al.  LocARNA-P: accurate boundary prediction and improved detection of structural RNAs. RNA. 2012;18:900–14. 10.1261/rna.029041.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Wolk  CP, Lechno-Yossef  S, Jäger  KM. The insertion sequences of Anabaena sp. strain PCC 7120 and their effects on its open reading frames. J Bacteriol. 2010;192:5289–303. 10.1128/JB.00460-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Xiao  R, Wang  S, Han  R  et al.  Structural basis of target DNA recognition by CRISPR-Cas12k for RNA-guided DNA transposition. Mol Cell. 2021;81:4457–66. 10.1016/j.molcel.2021.07.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Yule  U. A mathematical theory of evolution, based on the conclusions of Dr. JC Willis, F. R. S. Phil Trans R Soc. 1925;213:21–87. [Google Scholar]
  49. Ziemann  M, Reimann  V, Liang  Y  et al.  CvkR is a MerR-type transcriptional repressor of class 2 type V-K CRISPR-associated transposase systems. Nat Commun. 2023;14:924. 10.1038/s41467-023-36542-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

uqaf020_Supplemental_Files

Articles from microLife are provided here courtesy of Oxford University Press

RESOURCES